It's a global decision on our SMACK stack platform but maybe we will go for applications only on docker for devops (client of spark). For zeppelin I dont see the need (no devops) Le 13 avr. 2016 4:05 PM, "John Omernik" <j...@omernik.com> a écrit :
> Is this a specific Docker decision or a Zeppelin on Docker decision. I am > curious on the amount of network traffic Zeppelin actually generates. I > could be around, but I made the assumption that most of the network traffic > with Zeppelin is results from the various endpoints (Spark, JDBC, Elastic > Search etc) and not heavy lifting type activities. > > > John > On Apr 12, 2016 5:03 PM, "vincent gromakowski" < > vincent.gromakow...@gmail.com> wrote: > >> We decided to not use docker for network performance In production >> flows not dor deployment. virtualisation of the network brings 50% decrease >> In perf. It may change with calico because it abstract network with routing >> not virtualizing like flannel >> Le 12 avr. 2016 2:22 PM, "John Omernik" <j...@omernik.com> a écrit : >> >>> On 2. I had some thoughts there. How "expensive" would it be fore >>> Zeppelin to run a timer of sorts that can be accessed via a specific URL. >>> Basically, this URL would return the idle time. This thing that knows most >>> if Zeppelin has activity is Zeppelin. So, any actions within Zeppelin >>> would reset this timer basically, changing notebooks, opening, closing, >>> moving notes around, running notes, adding new notes, changing interpreter >>> settings. Any requests that are handled by Zeppelin in the UI, would reset >>> said timer. A request to the "timer" URL obviously would NOT reset the >>> timer, but basically, if nothing that was user actionable (we'd have to >>> separate user actionable items from automated API requests) was run, the >>> timer would not get reset. This would allow us using Zeppelin in a >>> multi-user/multi-tenant environment to monitor for idle instances and take >>> action when the occur. (Ideally, we could through an authenticated API >>> issue a "save" of all notebooks before taking said action... >>> >>> So, to summarize: >>> >>> API that provides seconds since last human action... >>> >>> Monitor that API, when seconds since last human actions exceed >>> enterprise threshold, then API can issue the "Safe Save all" to Zeppelin, >>> which will go ahead and do a save (addition point, the timer API could >>> return seconds since last human use and a bool value of "all saved" or >>> not... basically, if normal Zeppelin processes have saved all human >>> interaction, the API could indicate that, then, when the timer check hits >>> the API, it knows, "The seconds past the threshold, and Zeppelin reports >>> all saved, we can issue a termination, or if it's not all safe, it can >>> issue the "save all" command, and wait for it to be safe... if something is >>> keeping Zeppelin from being in a safe condition for shutdown, the API would >>> reflect this and prevent a shutdown). >>> >>> Then, API seconds exceed enterprise threshold, we can safely shutdown >>> the instance of Zeppelin returning resources to the cluster. >>> >>> Would love discussion here... >>> >>> On Tue, Apr 12, 2016 at 1:57 AM, vincent gromakowski < >>> vincent.gromakow...@gmail.com> wrote: >>> >>>> 1. I am using ansible to deploy zeppelin on all slaves and to launch >>>> zeppelin instance for one user. So if zeppelin binaries are already >>>> deployed, the launch is very quick through marathon (1 or 2 sec). ooking >>>> for velocity solution (based on jfrog) on Mesos to manage binaries and >>>> artifacts with versioning, rights... No use of docker for network >>>> performance constraints >>>> >>>> 2. Same answer as John. Still running. I will test dynamic resource for >>>> spark interpreter but zeppelin daemon will still be up and taking 4GB >>>> >>>> 3. I have a service discovery that authenticate the user and route him >>>> to his instance (and only his instance). It's based right now on a simple >>>> shell script pulling marathon through its API and updating an apache >>>> configuration file every 15s. The username is in the marathon task. We will >>>> update this with a fully industrialized solution (consul ? haproxy ?...) >>>> >>>> >>>> 3. >>>> >>>> 2016-04-12 2:37 GMT+02:00 Johnny W. <jzw.ser...@gmail.com>: >>>> >>>>> Thanks John for your insights. >>>>> >>>>> For 2., one solution we have experimented is spark dynamic resource >>>>> allocation. We could define a timer to scale down. Hope that helps. >>>>> >>>>> J. >>>>> >>>>> On Mon, Apr 11, 2016 at 4:24 PM, John Omernik <j...@omernik.com> >>>>> wrote: >>>>> >>>>>> 1. Things launch pretty fast for me, however, it depends if the >>>>>> docker container I am running Zeppelin in is cached on the node mesos >>>>>> wants >>>>>> to run it on. If not, it pulls from a local docker registry, so worst >>>>>> case, >>>>>> up to a minute to get things running if the image isn't cached. >>>>>> >>>>>> 2. No, if the user logs out it stays running. Ideally I would want >>>>>> to setup some sort of timer that could scale down an instance if left >>>>>> unused. I have some ideas here, but haven't put them into practice yet. >>>>>> I wanted to play with Nginx to see if I could do something there (lack of >>>>>> activity causes Nginx to shutdown Zeppelin for example). With spark >>>>>> resources, one thing I wanted to play with using fine grain scaling with >>>>>> mesos, to only use resources if queries were actually running. Lots of >>>>>> tools to fit the bill here, just need to identify the right ones. >>>>>> >>>>>> 3. Dns resolution is handed for me with mesos-dns. Each instance has >>>>>> its own Id and the dns name auto updates in mesos dns based on mesos >>>>>> tasks >>>>>> so I always know where Zeppelin is running. >>>>>> >>>>>> On Monday, April 11, 2016, Johnny W. <jzw.ser...@gmail.com> wrote: >>>>>> >>>>>>> John & Vincent, I am interested in the per instance per user >>>>>>> approach. I have some questions about this approach: >>>>>>> -- >>>>>>> 1. how long will it take to launch a Zeppelin instance (and >>>>>>> initialize SparkContext) when user log in? >>>>>>> 2. will the instance be destroyed when user log out? if not, how do >>>>>>> you deal with the resource assigned to Zeppelin/SparkContext? >>>>>>> 3. for auto failover through marathon, how do you deal with the DNS >>>>>>> resolve for clients? >>>>>>> >>>>>>> Thanks! >>>>>>> J. >>>>>>> >>>>>>> On Fri, Apr 8, 2016 at 10:09 AM, John Omernik <j...@omernik.com> >>>>>>> wrote: >>>>>>> >>>>>>>> So for us, we are doing something similar to Vincent, however, >>>>>>>> instead of Gluster, we are using MapR-FS and the NFS mount. Basically, >>>>>>>> this >>>>>>>> gives us a shared filesystem that is running on all nodes, with strong >>>>>>>> security (Filesystem ACEs for fine grained permissions) built in >>>>>>>> auditing, >>>>>>>> Posix compliance, true random read/write (as opposed to HDFS), >>>>>>>> snapshots, >>>>>>>> and cluster to cluster replication. There are also some neat things >>>>>>>> with >>>>>>>> Volumes and Volume placement we are doing . That provides our storage >>>>>>>> layer. Then we have docker for actually running Zeppelin, and since >>>>>>>> it's a >>>>>>>> instance per User, that helps organize who has access to what (Still >>>>>>>> hashing out the details on that). Marathon on Mesos is how we ensure >>>>>>>> that >>>>>>>> Zeppelin is actually available, and then when it comes to spark, we are >>>>>>>> just submitting to Mesos, which is right there. Since everything is on >>>>>>>> one >>>>>>>> cluster, the user has a home directory (on a volume) where I store all >>>>>>>> configs for each instance of Zeppelin, and they can also put adhoc >>>>>>>> data in >>>>>>>> their home directory. Spark and Apache Drill can both query anything in >>>>>>>> MapR FS, making it a pretty powerful combination. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Apr 8, 2016 at 6:33 AM, vincent gromakowski < >>>>>>>> vincent.gromakow...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Using it for 3 months without any incident >>>>>>>>> Le 8 avr. 2016 9:09 AM, "ashish rawat" <dceash...@gmail.com> a >>>>>>>>> écrit : >>>>>>>>> >>>>>>>>>> Sounds great. How long have you been using glusterfs in prod? and >>>>>>>>>> have you encountered any challenges. The only difficulty for me to >>>>>>>>>> use it, >>>>>>>>>> would be a lack of expertise to fix broken things, so hope it's >>>>>>>>>> stability >>>>>>>>>> isn't something to be concerned about. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Ashish >>>>>>>>>> >>>>>>>>>> On Fri, Apr 8, 2016 at 12:20 PM, vincent gromakowski < >>>>>>>>>> vincent.gromakow...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> use fuse interface. Gluster volume is directly accessible as >>>>>>>>>>> local storage on all nodes but performance is only 200 Mb/s. More >>>>>>>>>>> than >>>>>>>>>>> enough for notebooks. For data prefer tachyon/alluxio on top of >>>>>>>>>>> gluster... >>>>>>>>>>> Le 8 avr. 2016 6:35 AM, "ashish rawat" <dceash...@gmail.com> a >>>>>>>>>>> écrit : >>>>>>>>>>> >>>>>>>>>>>> Thanks Eran and Vincent. >>>>>>>>>>>> Eran, I would definitely like to try it out, since it won't add >>>>>>>>>>>> to the complexity of my deployment. Would see the S3 >>>>>>>>>>>> implementation, to >>>>>>>>>>>> figure out how complex it would be. >>>>>>>>>>>> >>>>>>>>>>>> Vincent, >>>>>>>>>>>> I haven't explored glusterfs at all. Would it also require to >>>>>>>>>>>> write an implementation of storage interface? Or zeppelin can work >>>>>>>>>>>> with it, >>>>>>>>>>>> out of the box? >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Ashish >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Apr 6, 2016 at 12:53 PM, vincent gromakowski < >>>>>>>>>>>> vincent.gromakow...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> For 1 marathon on mesos restart zeppelin daemon In case of >>>>>>>>>>>>> failure. >>>>>>>>>>>>> For 2 glusterfs fuse mount allows to share notebooks on all >>>>>>>>>>>>> mesos nodes. >>>>>>>>>>>>> For 3 not available right now In our design but a manual >>>>>>>>>>>>> restart In zeppelin config page is acceptable for US. >>>>>>>>>>>>> Le 6 avr. 2016 8:18 AM, "Eran Witkon" <eranwit...@gmail.com> >>>>>>>>>>>>> a écrit : >>>>>>>>>>>>> >>>>>>>>>>>>>> Yes this is correct. >>>>>>>>>>>>>> For HA disk, if you don't have HA storage and no access to S3 >>>>>>>>>>>>>> then AFAIK you don't have other option at the moment. >>>>>>>>>>>>>> If you like to save notebooks to elastic then I suggest you >>>>>>>>>>>>>> look at the storage interface and implementation for git and s3 >>>>>>>>>>>>>> and >>>>>>>>>>>>>> implement that yourself. It does sound like an interesting >>>>>>>>>>>>>> feature >>>>>>>>>>>>>> Best >>>>>>>>>>>>>> Eran >>>>>>>>>>>>>> On Wed, 6 Apr 2016 at 08:57 ashish rawat <dceash...@gmail.com> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks Eran. So 3, seems to be something external to >>>>>>>>>>>>>>> Zeppelin, and hopefully 1 only means running >>>>>>>>>>>>>>> "zeppelin-daemon.sh start" on >>>>>>>>>>>>>>> a slave machine, when master become inaccessible. Is that >>>>>>>>>>>>>>> correct? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> My main concern still remains on the storage front. And I >>>>>>>>>>>>>>> don't really have high availability disks or even hdfs in my >>>>>>>>>>>>>>> setup. I have >>>>>>>>>>>>>>> been using elastic search cluster for data high availability, >>>>>>>>>>>>>>> but was >>>>>>>>>>>>>>> hoping that zeppelin can save notebooks to a Elastic Search >>>>>>>>>>>>>>> (like kibana) >>>>>>>>>>>>>>> or maybe a document store. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Any idea if anything is planned in that direction. Don't >>>>>>>>>>>>>>> want to fallback to 'rsync' like options. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> Ashish >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Apr 5, 2016 at 11:17 PM, Eran Witkon < >>>>>>>>>>>>>>> eranwit...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For 1 you need to have both zeppelin web HA and zeppelin >>>>>>>>>>>>>>>> deamon HA >>>>>>>>>>>>>>>> For 2 I guess you can use HDFS if you implement the storage >>>>>>>>>>>>>>>> interface for HDFS. But i am not sure. >>>>>>>>>>>>>>>> For 3 I mean that if you connect to an external cluster for >>>>>>>>>>>>>>>> example a spark cluster you need to make sure your spark >>>>>>>>>>>>>>>> cluster is HA. >>>>>>>>>>>>>>>> Otherwise you will have zeppelin running but your notebook >>>>>>>>>>>>>>>> will fail as no >>>>>>>>>>>>>>>> spark cluster available. >>>>>>>>>>>>>>>> HTH >>>>>>>>>>>>>>>> Eran >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, 5 Apr 2016 at 20:20 ashish rawat < >>>>>>>>>>>>>>>> dceash...@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks Eran for your reply. >>>>>>>>>>>>>>>>> For 1) I am assuming that it would similar to HA of any >>>>>>>>>>>>>>>>> other web application, i.e. running multiple instances and >>>>>>>>>>>>>>>>> switching to the >>>>>>>>>>>>>>>>> backup server when master is down, is it not the case? >>>>>>>>>>>>>>>>> For 2) is it also possible to save it on hdfs? >>>>>>>>>>>>>>>>> Can you please explain 3, are you referring to interpreter >>>>>>>>>>>>>>>>> config? If I am using Spark interpreter and submitting jobs >>>>>>>>>>>>>>>>> to it, and if >>>>>>>>>>>>>>>>> zeppelin master node goes down, then what could be the >>>>>>>>>>>>>>>>> problem in slave >>>>>>>>>>>>>>>>> node pointing to the same cluster and submitting jobs? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>> Ashish >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Apr 5, 2016 at 10:08 PM, Eran Witkon < >>>>>>>>>>>>>>>>> eranwit...@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I would say you need to account for these things >>>>>>>>>>>>>>>>>> 1) availability of the zeppelin deamon >>>>>>>>>>>>>>>>>> 2) availability of the notebookd files >>>>>>>>>>>>>>>>>> 3) availability of the interpreters used. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> For 1 i don't know of out-of-box solution >>>>>>>>>>>>>>>>>> For 2 any ha storage will do, s3 or any ha external >>>>>>>>>>>>>>>>>> mounted disk >>>>>>>>>>>>>>>>>> For 3 it is up to the interpreter and your big data ha >>>>>>>>>>>>>>>>>> solution >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Tue, 5 Apr 2016 at 19:29 ashish rawat < >>>>>>>>>>>>>>>>>> dceash...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Is there a suggested architecture to run Zeppelin in >>>>>>>>>>>>>>>>>>> high availability mode. The only option I could find was by >>>>>>>>>>>>>>>>>>> saving >>>>>>>>>>>>>>>>>>> notebooks to S3. Are there any options if one is not using >>>>>>>>>>>>>>>>>>> AWS? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>> Ashish >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Sent from my iThing >>>>>> >>>>> >>>>> >>>> >>>