We decided to not use docker for network performance In production flows not dor deployment. virtualisation of the network brings 50% decrease In perf. It may change with calico because it abstract network with routing not virtualizing like flannel Le 12 avr. 2016 2:22 PM, "John Omernik" <j...@omernik.com> a écrit :
> On 2. I had some thoughts there. How "expensive" would it be fore > Zeppelin to run a timer of sorts that can be accessed via a specific URL. > Basically, this URL would return the idle time. This thing that knows most > if Zeppelin has activity is Zeppelin. So, any actions within Zeppelin > would reset this timer basically, changing notebooks, opening, closing, > moving notes around, running notes, adding new notes, changing interpreter > settings. Any requests that are handled by Zeppelin in the UI, would reset > said timer. A request to the "timer" URL obviously would NOT reset the > timer, but basically, if nothing that was user actionable (we'd have to > separate user actionable items from automated API requests) was run, the > timer would not get reset. This would allow us using Zeppelin in a > multi-user/multi-tenant environment to monitor for idle instances and take > action when the occur. (Ideally, we could through an authenticated API > issue a "save" of all notebooks before taking said action... > > So, to summarize: > > API that provides seconds since last human action... > > Monitor that API, when seconds since last human actions exceed enterprise > threshold, then API can issue the "Safe Save all" to Zeppelin, which will > go ahead and do a save (addition point, the timer API could return seconds > since last human use and a bool value of "all saved" or not... basically, > if normal Zeppelin processes have saved all human interaction, the API > could indicate that, then, when the timer check hits the API, it knows, > "The seconds past the threshold, and Zeppelin reports all saved, we can > issue a termination, or if it's not all safe, it can issue the "save all" > command, and wait for it to be safe... if something is keeping Zeppelin > from being in a safe condition for shutdown, the API would reflect this and > prevent a shutdown). > > Then, API seconds exceed enterprise threshold, we can safely shutdown the > instance of Zeppelin returning resources to the cluster. > > Would love discussion here... > > On Tue, Apr 12, 2016 at 1:57 AM, vincent gromakowski < > vincent.gromakow...@gmail.com> wrote: > >> 1. I am using ansible to deploy zeppelin on all slaves and to launch >> zeppelin instance for one user. So if zeppelin binaries are already >> deployed, the launch is very quick through marathon (1 or 2 sec). ooking >> for velocity solution (based on jfrog) on Mesos to manage binaries and >> artifacts with versioning, rights... No use of docker for network >> performance constraints >> >> 2. Same answer as John. Still running. I will test dynamic resource for >> spark interpreter but zeppelin daemon will still be up and taking 4GB >> >> 3. I have a service discovery that authenticate the user and route him to >> his instance (and only his instance). It's based right now on a simple >> shell script pulling marathon through its API and updating an apache >> configuration file every 15s. The username is in the marathon task. We will >> update this with a fully industrialized solution (consul ? haproxy ?...) >> >> >> 3. >> >> 2016-04-12 2:37 GMT+02:00 Johnny W. <jzw.ser...@gmail.com>: >> >>> Thanks John for your insights. >>> >>> For 2., one solution we have experimented is spark dynamic resource >>> allocation. We could define a timer to scale down. Hope that helps. >>> >>> J. >>> >>> On Mon, Apr 11, 2016 at 4:24 PM, John Omernik <j...@omernik.com> wrote: >>> >>>> 1. Things launch pretty fast for me, however, it depends if the docker >>>> container I am running Zeppelin in is cached on the node mesos wants to run >>>> it on. If not, it pulls from a local docker registry, so worst case, up to >>>> a minute to get things running if the image isn't cached. >>>> >>>> 2. No, if the user logs out it stays running. Ideally I would want to >>>> setup some sort of timer that could scale down an instance if left unused. >>>> I have some ideas here, but haven't put them into practice yet. I wanted >>>> to play with Nginx to see if I could do something there (lack of activity >>>> causes Nginx to shutdown Zeppelin for example). With spark resources, one >>>> thing I wanted to play with using fine grain scaling with mesos, to only >>>> use resources if queries were actually running. Lots of tools to fit the >>>> bill here, just need to identify the right ones. >>>> >>>> 3. Dns resolution is handed for me with mesos-dns. Each instance has >>>> its own Id and the dns name auto updates in mesos dns based on mesos tasks >>>> so I always know where Zeppelin is running. >>>> >>>> On Monday, April 11, 2016, Johnny W. <jzw.ser...@gmail.com> wrote: >>>> >>>>> John & Vincent, I am interested in the per instance per user approach. >>>>> I have some questions about this approach: >>>>> -- >>>>> 1. how long will it take to launch a Zeppelin instance (and initialize >>>>> SparkContext) when user log in? >>>>> 2. will the instance be destroyed when user log out? if not, how do >>>>> you deal with the resource assigned to Zeppelin/SparkContext? >>>>> 3. for auto failover through marathon, how do you deal with the DNS >>>>> resolve for clients? >>>>> >>>>> Thanks! >>>>> J. >>>>> >>>>> On Fri, Apr 8, 2016 at 10:09 AM, John Omernik <j...@omernik.com> >>>>> wrote: >>>>> >>>>>> So for us, we are doing something similar to Vincent, however, >>>>>> instead of Gluster, we are using MapR-FS and the NFS mount. Basically, >>>>>> this >>>>>> gives us a shared filesystem that is running on all nodes, with strong >>>>>> security (Filesystem ACEs for fine grained permissions) built in >>>>>> auditing, >>>>>> Posix compliance, true random read/write (as opposed to HDFS), snapshots, >>>>>> and cluster to cluster replication. There are also some neat things with >>>>>> Volumes and Volume placement we are doing . That provides our storage >>>>>> layer. Then we have docker for actually running Zeppelin, and since it's >>>>>> a >>>>>> instance per User, that helps organize who has access to what (Still >>>>>> hashing out the details on that). Marathon on Mesos is how we ensure >>>>>> that >>>>>> Zeppelin is actually available, and then when it comes to spark, we are >>>>>> just submitting to Mesos, which is right there. Since everything is on >>>>>> one >>>>>> cluster, the user has a home directory (on a volume) where I store all >>>>>> configs for each instance of Zeppelin, and they can also put adhoc data >>>>>> in >>>>>> their home directory. Spark and Apache Drill can both query anything in >>>>>> MapR FS, making it a pretty powerful combination. >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Apr 8, 2016 at 6:33 AM, vincent gromakowski < >>>>>> vincent.gromakow...@gmail.com> wrote: >>>>>> >>>>>>> Using it for 3 months without any incident >>>>>>> Le 8 avr. 2016 9:09 AM, "ashish rawat" <dceash...@gmail.com> a >>>>>>> écrit : >>>>>>> >>>>>>>> Sounds great. How long have you been using glusterfs in prod? and >>>>>>>> have you encountered any challenges. The only difficulty for me to use >>>>>>>> it, >>>>>>>> would be a lack of expertise to fix broken things, so hope it's >>>>>>>> stability >>>>>>>> isn't something to be concerned about. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Ashish >>>>>>>> >>>>>>>> On Fri, Apr 8, 2016 at 12:20 PM, vincent gromakowski < >>>>>>>> vincent.gromakow...@gmail.com> wrote: >>>>>>>> >>>>>>>>> use fuse interface. Gluster volume is directly accessible as local >>>>>>>>> storage on all nodes but performance is only 200 Mb/s. More than >>>>>>>>> enough for >>>>>>>>> notebooks. For data prefer tachyon/alluxio on top of gluster... >>>>>>>>> Le 8 avr. 2016 6:35 AM, "ashish rawat" <dceash...@gmail.com> a >>>>>>>>> écrit : >>>>>>>>> >>>>>>>>>> Thanks Eran and Vincent. >>>>>>>>>> Eran, I would definitely like to try it out, since it won't add >>>>>>>>>> to the complexity of my deployment. Would see the S3 implementation, >>>>>>>>>> to >>>>>>>>>> figure out how complex it would be. >>>>>>>>>> >>>>>>>>>> Vincent, >>>>>>>>>> I haven't explored glusterfs at all. Would it also require to >>>>>>>>>> write an implementation of storage interface? Or zeppelin can work >>>>>>>>>> with it, >>>>>>>>>> out of the box? >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Ashish >>>>>>>>>> >>>>>>>>>> On Wed, Apr 6, 2016 at 12:53 PM, vincent gromakowski < >>>>>>>>>> vincent.gromakow...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> For 1 marathon on mesos restart zeppelin daemon In case of >>>>>>>>>>> failure. >>>>>>>>>>> For 2 glusterfs fuse mount allows to share notebooks on all >>>>>>>>>>> mesos nodes. >>>>>>>>>>> For 3 not available right now In our design but a manual >>>>>>>>>>> restart In zeppelin config page is acceptable for US. >>>>>>>>>>> Le 6 avr. 2016 8:18 AM, "Eran Witkon" <eranwit...@gmail.com> a >>>>>>>>>>> écrit : >>>>>>>>>>> >>>>>>>>>>>> Yes this is correct. >>>>>>>>>>>> For HA disk, if you don't have HA storage and no access to S3 >>>>>>>>>>>> then AFAIK you don't have other option at the moment. >>>>>>>>>>>> If you like to save notebooks to elastic then I suggest you >>>>>>>>>>>> look at the storage interface and implementation for git and s3 and >>>>>>>>>>>> implement that yourself. It does sound like an interesting feature >>>>>>>>>>>> Best >>>>>>>>>>>> Eran >>>>>>>>>>>> On Wed, 6 Apr 2016 at 08:57 ashish rawat <dceash...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thanks Eran. So 3, seems to be something external to Zeppelin, >>>>>>>>>>>>> and hopefully 1 only means running "zeppelin-daemon.sh start" on >>>>>>>>>>>>> a slave >>>>>>>>>>>>> machine, when master become inaccessible. Is that correct? >>>>>>>>>>>>> >>>>>>>>>>>>> My main concern still remains on the storage front. And I >>>>>>>>>>>>> don't really have high availability disks or even hdfs in my >>>>>>>>>>>>> setup. I have >>>>>>>>>>>>> been using elastic search cluster for data high availability, but >>>>>>>>>>>>> was >>>>>>>>>>>>> hoping that zeppelin can save notebooks to a Elastic Search (like >>>>>>>>>>>>> kibana) >>>>>>>>>>>>> or maybe a document store. >>>>>>>>>>>>> >>>>>>>>>>>>> Any idea if anything is planned in that direction. Don't want >>>>>>>>>>>>> to fallback to 'rsync' like options. >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Ashish >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Apr 5, 2016 at 11:17 PM, Eran Witkon < >>>>>>>>>>>>> eranwit...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> For 1 you need to have both zeppelin web HA and zeppelin >>>>>>>>>>>>>> deamon HA >>>>>>>>>>>>>> For 2 I guess you can use HDFS if you implement the storage >>>>>>>>>>>>>> interface for HDFS. But i am not sure. >>>>>>>>>>>>>> For 3 I mean that if you connect to an external cluster for >>>>>>>>>>>>>> example a spark cluster you need to make sure your spark cluster >>>>>>>>>>>>>> is HA. >>>>>>>>>>>>>> Otherwise you will have zeppelin running but your notebook will >>>>>>>>>>>>>> fail as no >>>>>>>>>>>>>> spark cluster available. >>>>>>>>>>>>>> HTH >>>>>>>>>>>>>> Eran >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, 5 Apr 2016 at 20:20 ashish rawat <dceash...@gmail.com> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks Eran for your reply. >>>>>>>>>>>>>>> For 1) I am assuming that it would similar to HA of any >>>>>>>>>>>>>>> other web application, i.e. running multiple instances and >>>>>>>>>>>>>>> switching to the >>>>>>>>>>>>>>> backup server when master is down, is it not the case? >>>>>>>>>>>>>>> For 2) is it also possible to save it on hdfs? >>>>>>>>>>>>>>> Can you please explain 3, are you referring to interpreter >>>>>>>>>>>>>>> config? If I am using Spark interpreter and submitting jobs to >>>>>>>>>>>>>>> it, and if >>>>>>>>>>>>>>> zeppelin master node goes down, then what could be the problem >>>>>>>>>>>>>>> in slave >>>>>>>>>>>>>>> node pointing to the same cluster and submitting jobs? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> Ashish >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Apr 5, 2016 at 10:08 PM, Eran Witkon < >>>>>>>>>>>>>>> eranwit...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I would say you need to account for these things >>>>>>>>>>>>>>>> 1) availability of the zeppelin deamon >>>>>>>>>>>>>>>> 2) availability of the notebookd files >>>>>>>>>>>>>>>> 3) availability of the interpreters used. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For 1 i don't know of out-of-box solution >>>>>>>>>>>>>>>> For 2 any ha storage will do, s3 or any ha external mounted >>>>>>>>>>>>>>>> disk >>>>>>>>>>>>>>>> For 3 it is up to the interpreter and your big data ha >>>>>>>>>>>>>>>> solution >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, 5 Apr 2016 at 19:29 ashish rawat < >>>>>>>>>>>>>>>> dceash...@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Is there a suggested architecture to run Zeppelin in high >>>>>>>>>>>>>>>>> availability mode. The only option I could find was by saving >>>>>>>>>>>>>>>>> notebooks to >>>>>>>>>>>>>>>>> S3. Are there any options if one is not using AWS? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>> Ashish >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> Sent from my iThing >>>> >>> >>> >> >