Re: HA for Zeppelin

vincent gromakowski Wed, 13 Apr 2016 08:19:40 -0700

It's a global decision on our  SMACK stack platform but maybe we will go
for applications only on docker for devops (client of spark). For zeppelin
I dont see the need (no devops)
Le 13 avr. 2016 4:05 PM, "John Omernik" <j...@omernik.com> a écrit :


> Is this a specific Docker decision or a Zeppelin on Docker decision. I am
> curious on the amount of network traffic Zeppelin actually generates. I
> could be around, but I made the assumption that most of the network traffic
> with Zeppelin is results from the various endpoints (Spark, JDBC, Elastic
> Search etc) and not heavy lifting type activities.
>
>
> John
> On Apr 12, 2016 5:03 PM, "vincent gromakowski" <
> vincent.gromakow...@gmail.com> wrote:
>
>> We  decided  to not use docker for network performance In production
>> flows not dor deployment. virtualisation of the network brings 50% decrease
>> In perf. It may change with calico because it abstract network with routing
>> not virtualizing like flannel
>> Le 12 avr. 2016 2:22 PM, "John Omernik" <j...@omernik.com> a écrit :
>>
>>> On 2.  I had some thoughts there.  How "expensive" would it be fore
>>> Zeppelin to run a timer of sorts that can be accessed via a specific URL.
>>> Basically, this URL would return the idle time. This thing that knows most
>>> if Zeppelin has activity is Zeppelin.  So, any actions within Zeppelin
>>> would reset this timer basically, changing notebooks, opening, closing,
>>> moving notes around, running notes, adding new notes, changing interpreter
>>> settings. Any requests that are handled by Zeppelin in the UI, would reset
>>> said timer. A request to the "timer" URL obviously would NOT reset the
>>> timer, but basically, if nothing that was user actionable (we'd have to
>>> separate user actionable items from automated API requests) was run, the
>>> timer would not get reset. This would allow us using Zeppelin in a
>>> multi-user/multi-tenant environment to monitor for idle instances and take
>>> action when the occur. (Ideally, we could through an authenticated API
>>> issue a "save" of all notebooks before taking said action...
>>>
>>> So, to summarize:
>>>
>>> API that provides seconds since last human action...
>>>
>>> Monitor that API, when seconds since last human actions exceed
>>> enterprise threshold, then API can issue the "Safe Save all"  to Zeppelin,
>>> which will go ahead and do a save (addition point, the timer API could
>>> return seconds since last human use and a bool value of "all saved" or
>>> not... basically, if normal Zeppelin processes have saved all human
>>> interaction, the API could indicate that, then, when the timer check hits
>>> the API, it knows, "The seconds past the threshold, and Zeppelin reports
>>> all saved, we can issue a termination, or if it's not all safe, it can
>>> issue the "save all" command, and wait for it to be safe... if something is
>>> keeping Zeppelin from being in a safe condition for shutdown, the API would
>>> reflect this and prevent a shutdown).
>>>
>>> Then, API seconds exceed enterprise threshold, we can safely shutdown
>>> the instance of Zeppelin returning resources to the cluster.
>>>
>>> Would love discussion here...
>>>
>>> On Tue, Apr 12, 2016 at 1:57 AM, vincent gromakowski <
>>> vincent.gromakow...@gmail.com> wrote:
>>>
>>>> 1. I am using ansible to deploy zeppelin on all slaves and to launch
>>>> zeppelin instance for one user. So if zeppelin binaries are already
>>>> deployed, the launch is very quick through marathon (1 or 2 sec). ooking
>>>> for velocity solution (based on jfrog) on Mesos to manage binaries and
>>>> artifacts with versioning, rights... No use of docker for network
>>>> performance constraints
>>>>
>>>> 2. Same answer as John. Still running. I will test dynamic resource for
>>>> spark interpreter but zeppelin daemon will still be up and taking 4GB
>>>>
>>>> 3. I have a service discovery that authenticate the user and route him
>>>> to his instance (and only his instance). It's based right now on a simple
>>>> shell script pulling marathon through its API and updating an apache
>>>> configuration file every 15s. The username is in the marathon task. We will
>>>> update this with a fully industrialized solution (consul ? haproxy ?...)
>>>>
>>>>
>>>> 3.
>>>>
>>>> 2016-04-12 2:37 GMT+02:00 Johnny W. <jzw.ser...@gmail.com>:
>>>>
>>>>> Thanks John for your insights.
>>>>>
>>>>> For 2., one solution we have experimented is spark dynamic resource
>>>>> allocation. We could define a timer to scale down. Hope that helps.
>>>>>
>>>>> J.
>>>>>
>>>>> On Mon, Apr 11, 2016 at 4:24 PM, John Omernik <j...@omernik.com>
>>>>> wrote:
>>>>>
>>>>>> 1. Things launch pretty fast for me, however, it depends if the
>>>>>> docker container I am running Zeppelin in is cached on the node mesos 
>>>>>> wants
>>>>>> to run it on. If not, it pulls from a local docker registry, so worst 
>>>>>> case,
>>>>>> up to a minute to get things running if the image isn't cached.
>>>>>>
>>>>>> 2. No, if the user logs out it stays running.  Ideally I would want
>>>>>> to setup some sort of timer that could scale down an instance if left
>>>>>> unused.  I have some ideas here, but haven't put them into practice yet.
>>>>>> I wanted to play with Nginx to see if I could do something there (lack of
>>>>>> activity causes Nginx to shutdown Zeppelin for example). With spark
>>>>>> resources, one thing I wanted to play with using fine grain scaling with
>>>>>> mesos, to only use resources if queries were actually running.  Lots of
>>>>>> tools to fit the bill here, just need to identify the right ones.
>>>>>>
>>>>>> 3. Dns resolution is handed for me with mesos-dns.  Each instance has
>>>>>> its own Id  and the dns name auto updates in mesos dns based on mesos 
>>>>>> tasks
>>>>>> so I always know where Zeppelin is running.
>>>>>>
>>>>>> On Monday, April 11, 2016, Johnny W. <jzw.ser...@gmail.com> wrote:
>>>>>>
>>>>>>> John & Vincent, I am interested in the per instance per user
>>>>>>> approach. I have some questions about this approach:
>>>>>>> --
>>>>>>> 1. how long will it take to launch a Zeppelin instance (and
>>>>>>> initialize SparkContext) when user log in?
>>>>>>> 2. will the instance be destroyed when user log out? if not, how do
>>>>>>> you deal with the resource assigned to Zeppelin/SparkContext?
>>>>>>> 3. for auto failover through marathon, how do you deal with the DNS
>>>>>>> resolve for clients?
>>>>>>>
>>>>>>> Thanks!
>>>>>>> J.
>>>>>>>
>>>>>>> On Fri, Apr 8, 2016 at 10:09 AM, John Omernik <j...@omernik.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> So for us, we are doing something similar to Vincent, however,
>>>>>>>> instead of Gluster, we are using MapR-FS and the NFS mount. Basically, 
>>>>>>>> this
>>>>>>>> gives us a shared filesystem that is running on all nodes, with strong
>>>>>>>> security (Filesystem ACEs for fine grained permissions) built in 
>>>>>>>> auditing,
>>>>>>>> Posix compliance, true random read/write (as opposed to HDFS), 
>>>>>>>> snapshots,
>>>>>>>> and cluster to cluster replication. There are also some neat things 
>>>>>>>> with
>>>>>>>> Volumes and Volume placement we are doing . That provides our storage
>>>>>>>> layer. Then we have docker for actually running Zeppelin, and since 
>>>>>>>> it's a
>>>>>>>> instance per User, that helps organize who has access to what (Still
>>>>>>>> hashing out the details on that).  Marathon on Mesos is how we ensure 
>>>>>>>> that
>>>>>>>> Zeppelin is actually available, and then when it comes to spark, we are
>>>>>>>> just submitting to Mesos, which is right there. Since everything is on 
>>>>>>>> one
>>>>>>>> cluster, the user has a home directory (on a volume) where I store all
>>>>>>>> configs for each instance of Zeppelin, and they can also put adhoc 
>>>>>>>> data in
>>>>>>>> their home directory. Spark and Apache Drill can both query anything in
>>>>>>>> MapR FS, making it a pretty powerful combination.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Apr 8, 2016 at 6:33 AM, vincent gromakowski <
>>>>>>>> vincent.gromakow...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Using it for 3 months without any incident
>>>>>>>>> Le 8 avr. 2016 9:09 AM, "ashish rawat" <dceash...@gmail.com> a
>>>>>>>>> écrit :
>>>>>>>>>
>>>>>>>>>> Sounds great. How long have you been using glusterfs in prod? and
>>>>>>>>>> have you encountered any challenges. The only difficulty for me to 
>>>>>>>>>> use it,
>>>>>>>>>> would be a lack of expertise to fix broken things, so hope it's 
>>>>>>>>>> stability
>>>>>>>>>> isn't something to be concerned about.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Ashish
>>>>>>>>>>
>>>>>>>>>> On Fri, Apr 8, 2016 at 12:20 PM, vincent gromakowski <
>>>>>>>>>> vincent.gromakow...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> use fuse interface. Gluster volume is directly accessible as
>>>>>>>>>>> local storage on all nodes but performance is only 200 Mb/s. More 
>>>>>>>>>>> than
>>>>>>>>>>> enough for notebooks. For data prefer tachyon/alluxio on top of 
>>>>>>>>>>> gluster...
>>>>>>>>>>> Le 8 avr. 2016 6:35 AM, "ashish rawat" <dceash...@gmail.com> a
>>>>>>>>>>> écrit :
>>>>>>>>>>>
>>>>>>>>>>>> Thanks Eran and Vincent.
>>>>>>>>>>>> Eran, I would definitely like to try it out, since it won't add
>>>>>>>>>>>> to the complexity of my deployment. Would see the S3 
>>>>>>>>>>>> implementation, to
>>>>>>>>>>>> figure out how complex it would be.
>>>>>>>>>>>>
>>>>>>>>>>>> Vincent,
>>>>>>>>>>>> I haven't explored glusterfs at all. Would it also require to
>>>>>>>>>>>> write an implementation of storage interface? Or zeppelin can work 
>>>>>>>>>>>> with it,
>>>>>>>>>>>> out of the box?
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Ashish
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Apr 6, 2016 at 12:53 PM, vincent gromakowski <
>>>>>>>>>>>> vincent.gromakow...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> For 1 marathon on mesos restart zeppelin daemon In case of
>>>>>>>>>>>>> failure.
>>>>>>>>>>>>> For 2 glusterfs fuse mount allows to share notebooks on all
>>>>>>>>>>>>> mesos nodes.
>>>>>>>>>>>>> For 3 not available right now In our  design but a manual
>>>>>>>>>>>>> restart In zeppelin config page is acceptable for US.
>>>>>>>>>>>>> Le 6 avr. 2016 8:18 AM, "Eran Witkon" <eranwit...@gmail.com>
>>>>>>>>>>>>> a écrit :
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes this is correct.
>>>>>>>>>>>>>> For HA disk, if you don't have HA storage and no access to S3
>>>>>>>>>>>>>> then AFAIK you don't have other option at the moment.
>>>>>>>>>>>>>> If you like to save notebooks to elastic then I suggest you
>>>>>>>>>>>>>> look at the storage interface and implementation for git and s3 
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> implement that yourself. It does sound like an interesting 
>>>>>>>>>>>>>> feature
>>>>>>>>>>>>>> Best
>>>>>>>>>>>>>> Eran
>>>>>>>>>>>>>> On Wed, 6 Apr 2016 at 08:57 ashish rawat <dceash...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks Eran. So 3, seems to be something external to
>>>>>>>>>>>>>>> Zeppelin, and hopefully 1 only means running 
>>>>>>>>>>>>>>> "zeppelin-daemon.sh start" on
>>>>>>>>>>>>>>> a slave machine, when master become inaccessible. Is that 
>>>>>>>>>>>>>>> correct?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My main concern still remains on the storage front. And I
>>>>>>>>>>>>>>> don't really have high availability disks or even hdfs in my 
>>>>>>>>>>>>>>> setup. I have
>>>>>>>>>>>>>>> been using elastic search cluster for data high availability, 
>>>>>>>>>>>>>>> but was
>>>>>>>>>>>>>>> hoping that zeppelin can save notebooks to a Elastic Search 
>>>>>>>>>>>>>>> (like kibana)
>>>>>>>>>>>>>>> or maybe a document store.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any idea if anything is planned in that direction. Don't
>>>>>>>>>>>>>>> want to fallback to 'rsync' like options.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Ashish
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Apr 5, 2016 at 11:17 PM, Eran Witkon <
>>>>>>>>>>>>>>> eranwit...@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For 1 you need to have both zeppelin web HA and zeppelin
>>>>>>>>>>>>>>>> deamon HA
>>>>>>>>>>>>>>>> For 2 I guess you can use HDFS if you implement the storage
>>>>>>>>>>>>>>>> interface for HDFS. But i am not sure.
>>>>>>>>>>>>>>>> For 3 I mean that if you connect to an external cluster for
>>>>>>>>>>>>>>>> example a spark cluster you need to make sure your spark 
>>>>>>>>>>>>>>>> cluster is HA.
>>>>>>>>>>>>>>>> Otherwise you will have zeppelin running but your notebook 
>>>>>>>>>>>>>>>> will fail as no
>>>>>>>>>>>>>>>> spark cluster available.
>>>>>>>>>>>>>>>> HTH
>>>>>>>>>>>>>>>> Eran
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, 5 Apr 2016 at 20:20 ashish rawat <
>>>>>>>>>>>>>>>> dceash...@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks Eran for your reply.
>>>>>>>>>>>>>>>>> For 1) I am assuming that it would similar to HA of any
>>>>>>>>>>>>>>>>> other web application, i.e. running multiple instances and 
>>>>>>>>>>>>>>>>> switching to the
>>>>>>>>>>>>>>>>> backup server when master is down, is it not the case?
>>>>>>>>>>>>>>>>> For 2) is it also possible to save it on hdfs?
>>>>>>>>>>>>>>>>> Can you please explain 3, are you referring to interpreter
>>>>>>>>>>>>>>>>> config? If I am using Spark interpreter and submitting jobs 
>>>>>>>>>>>>>>>>> to it, and if
>>>>>>>>>>>>>>>>> zeppelin master node goes down, then what could be the 
>>>>>>>>>>>>>>>>> problem in slave
>>>>>>>>>>>>>>>>> node pointing to the same cluster and submitting jobs?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>> Ashish
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Apr 5, 2016 at 10:08 PM, Eran Witkon <
>>>>>>>>>>>>>>>>> eranwit...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I would say you need to account for these things
>>>>>>>>>>>>>>>>>> 1) availability of the zeppelin deamon
>>>>>>>>>>>>>>>>>> 2) availability of the notebookd files
>>>>>>>>>>>>>>>>>> 3) availability of the interpreters used.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> For 1 i don't know of out-of-box solution
>>>>>>>>>>>>>>>>>> For 2 any ha storage will do, s3 or any ha external
>>>>>>>>>>>>>>>>>> mounted disk
>>>>>>>>>>>>>>>>>> For 3 it is up to the interpreter and your big data ha
>>>>>>>>>>>>>>>>>> solution
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tue, 5 Apr 2016 at 19:29 ashish rawat <
>>>>>>>>>>>>>>>>>> dceash...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Is there a suggested architecture to run Zeppelin in
>>>>>>>>>>>>>>>>>>> high availability mode. The only option I could find was by 
>>>>>>>>>>>>>>>>>>> saving
>>>>>>>>>>>>>>>>>>> notebooks to S3. Are there any options if one is not using 
>>>>>>>>>>>>>>>>>>> AWS?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>> Ashish
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sent from my iThing
>>>>>>
>>>>>
>>>>>
>>>>
>>>

Re: HA for Zeppelin

Reply via email to