Re: HA for Zeppelin

vincent gromakowski Tue, 12 Apr 2016 15:03:20 -0700

We  decided  to not use docker for network performance In production flows
not dor deployment. virtualisation of the network brings 50% decrease In
perf. It may change with calico because it abstract network with routing
not virtualizing like flannel
Le 12 avr. 2016 2:22 PM, "John Omernik" <j...@omernik.com> a écrit :


> On 2.  I had some thoughts there.  How "expensive" would it be fore
> Zeppelin to run a timer of sorts that can be accessed via a specific URL.
> Basically, this URL would return the idle time. This thing that knows most
> if Zeppelin has activity is Zeppelin.  So, any actions within Zeppelin
> would reset this timer basically, changing notebooks, opening, closing,
> moving notes around, running notes, adding new notes, changing interpreter
> settings. Any requests that are handled by Zeppelin in the UI, would reset
> said timer. A request to the "timer" URL obviously would NOT reset the
> timer, but basically, if nothing that was user actionable (we'd have to
> separate user actionable items from automated API requests) was run, the
> timer would not get reset. This would allow us using Zeppelin in a
> multi-user/multi-tenant environment to monitor for idle instances and take
> action when the occur. (Ideally, we could through an authenticated API
> issue a "save" of all notebooks before taking said action...
>
> So, to summarize:
>
> API that provides seconds since last human action...
>
> Monitor that API, when seconds since last human actions exceed enterprise
> threshold, then API can issue the "Safe Save all"  to Zeppelin, which will
> go ahead and do a save (addition point, the timer API could return seconds
> since last human use and a bool value of "all saved" or not... basically,
> if normal Zeppelin processes have saved all human interaction, the API
> could indicate that, then, when the timer check hits the API, it knows,
> "The seconds past the threshold, and Zeppelin reports all saved, we can
> issue a termination, or if it's not all safe, it can issue the "save all"
> command, and wait for it to be safe... if something is keeping Zeppelin
> from being in a safe condition for shutdown, the API would reflect this and
> prevent a shutdown).
>
> Then, API seconds exceed enterprise threshold, we can safely shutdown the
> instance of Zeppelin returning resources to the cluster.
>
> Would love discussion here...
>
> On Tue, Apr 12, 2016 at 1:57 AM, vincent gromakowski <
> vincent.gromakow...@gmail.com> wrote:
>
>> 1. I am using ansible to deploy zeppelin on all slaves and to launch
>> zeppelin instance for one user. So if zeppelin binaries are already
>> deployed, the launch is very quick through marathon (1 or 2 sec). ooking
>> for velocity solution (based on jfrog) on Mesos to manage binaries and
>> artifacts with versioning, rights... No use of docker for network
>> performance constraints
>>
>> 2. Same answer as John. Still running. I will test dynamic resource for
>> spark interpreter but zeppelin daemon will still be up and taking 4GB
>>
>> 3. I have a service discovery that authenticate the user and route him to
>> his instance (and only his instance). It's based right now on a simple
>> shell script pulling marathon through its API and updating an apache
>> configuration file every 15s. The username is in the marathon task. We will
>> update this with a fully industrialized solution (consul ? haproxy ?...)
>>
>>
>> 3.
>>
>> 2016-04-12 2:37 GMT+02:00 Johnny W. <jzw.ser...@gmail.com>:
>>
>>> Thanks John for your insights.
>>>
>>> For 2., one solution we have experimented is spark dynamic resource
>>> allocation. We could define a timer to scale down. Hope that helps.
>>>
>>> J.
>>>
>>> On Mon, Apr 11, 2016 at 4:24 PM, John Omernik <j...@omernik.com> wrote:
>>>
>>>> 1. Things launch pretty fast for me, however, it depends if the docker
>>>> container I am running Zeppelin in is cached on the node mesos wants to run
>>>> it on. If not, it pulls from a local docker registry, so worst case, up to
>>>> a minute to get things running if the image isn't cached.
>>>>
>>>> 2. No, if the user logs out it stays running.  Ideally I would want to
>>>> setup some sort of timer that could scale down an instance if left unused.
>>>> I have some ideas here, but haven't put them into practice yet.   I wanted
>>>> to play with Nginx to see if I could do something there (lack of activity
>>>> causes Nginx to shutdown Zeppelin for example). With spark resources, one
>>>> thing I wanted to play with using fine grain scaling with mesos, to only
>>>> use resources if queries were actually running.  Lots of tools to fit the
>>>> bill here, just need to identify the right ones.
>>>>
>>>> 3. Dns resolution is handed for me with mesos-dns.  Each instance has
>>>> its own Id  and the dns name auto updates in mesos dns based on mesos tasks
>>>> so I always know where Zeppelin is running.
>>>>
>>>> On Monday, April 11, 2016, Johnny W. <jzw.ser...@gmail.com> wrote:
>>>>
>>>>> John & Vincent, I am interested in the per instance per user approach.
>>>>> I have some questions about this approach:
>>>>> --
>>>>> 1. how long will it take to launch a Zeppelin instance (and initialize
>>>>> SparkContext) when user log in?
>>>>> 2. will the instance be destroyed when user log out? if not, how do
>>>>> you deal with the resource assigned to Zeppelin/SparkContext?
>>>>> 3. for auto failover through marathon, how do you deal with the DNS
>>>>> resolve for clients?
>>>>>
>>>>> Thanks!
>>>>> J.
>>>>>
>>>>> On Fri, Apr 8, 2016 at 10:09 AM, John Omernik <j...@omernik.com>
>>>>> wrote:
>>>>>
>>>>>> So for us, we are doing something similar to Vincent, however,
>>>>>> instead of Gluster, we are using MapR-FS and the NFS mount. Basically, 
>>>>>> this
>>>>>> gives us a shared filesystem that is running on all nodes, with strong
>>>>>> security (Filesystem ACEs for fine grained permissions) built in 
>>>>>> auditing,
>>>>>> Posix compliance, true random read/write (as opposed to HDFS), snapshots,
>>>>>> and cluster to cluster replication. There are also some neat things with
>>>>>> Volumes and Volume placement we are doing . That provides our storage
>>>>>> layer. Then we have docker for actually running Zeppelin, and since it's 
>>>>>> a
>>>>>> instance per User, that helps organize who has access to what (Still
>>>>>> hashing out the details on that).  Marathon on Mesos is how we ensure 
>>>>>> that
>>>>>> Zeppelin is actually available, and then when it comes to spark, we are
>>>>>> just submitting to Mesos, which is right there. Since everything is on 
>>>>>> one
>>>>>> cluster, the user has a home directory (on a volume) where I store all
>>>>>> configs for each instance of Zeppelin, and they can also put adhoc data 
>>>>>> in
>>>>>> their home directory. Spark and Apache Drill can both query anything in
>>>>>> MapR FS, making it a pretty powerful combination.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 8, 2016 at 6:33 AM, vincent gromakowski <
>>>>>> vincent.gromakow...@gmail.com> wrote:
>>>>>>
>>>>>>> Using it for 3 months without any incident
>>>>>>> Le 8 avr. 2016 9:09 AM, "ashish rawat" <dceash...@gmail.com> a
>>>>>>> écrit :
>>>>>>>
>>>>>>>> Sounds great. How long have you been using glusterfs in prod? and
>>>>>>>> have you encountered any challenges. The only difficulty for me to use 
>>>>>>>> it,
>>>>>>>> would be a lack of expertise to fix broken things, so hope it's 
>>>>>>>> stability
>>>>>>>> isn't something to be concerned about.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Ashish
>>>>>>>>
>>>>>>>> On Fri, Apr 8, 2016 at 12:20 PM, vincent gromakowski <
>>>>>>>> vincent.gromakow...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> use fuse interface. Gluster volume is directly accessible as local
>>>>>>>>> storage on all nodes but performance is only 200 Mb/s. More than 
>>>>>>>>> enough for
>>>>>>>>> notebooks. For data prefer tachyon/alluxio on top of gluster...
>>>>>>>>> Le 8 avr. 2016 6:35 AM, "ashish rawat" <dceash...@gmail.com> a
>>>>>>>>> écrit :
>>>>>>>>>
>>>>>>>>>> Thanks Eran and Vincent.
>>>>>>>>>> Eran, I would definitely like to try it out, since it won't add
>>>>>>>>>> to the complexity of my deployment. Would see the S3 implementation, 
>>>>>>>>>> to
>>>>>>>>>> figure out how complex it would be.
>>>>>>>>>>
>>>>>>>>>> Vincent,
>>>>>>>>>> I haven't explored glusterfs at all. Would it also require to
>>>>>>>>>> write an implementation of storage interface? Or zeppelin can work 
>>>>>>>>>> with it,
>>>>>>>>>> out of the box?
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Ashish
>>>>>>>>>>
>>>>>>>>>> On Wed, Apr 6, 2016 at 12:53 PM, vincent gromakowski <
>>>>>>>>>> vincent.gromakow...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> For 1 marathon on mesos restart zeppelin daemon In case of
>>>>>>>>>>> failure.
>>>>>>>>>>> For 2 glusterfs fuse mount allows to share notebooks on all
>>>>>>>>>>> mesos nodes.
>>>>>>>>>>> For 3 not available right now In our  design but a manual
>>>>>>>>>>> restart In zeppelin config page is acceptable for US.
>>>>>>>>>>> Le 6 avr. 2016 8:18 AM, "Eran Witkon" <eranwit...@gmail.com> a
>>>>>>>>>>> écrit :
>>>>>>>>>>>
>>>>>>>>>>>> Yes this is correct.
>>>>>>>>>>>> For HA disk, if you don't have HA storage and no access to S3
>>>>>>>>>>>> then AFAIK you don't have other option at the moment.
>>>>>>>>>>>> If you like to save notebooks to elastic then I suggest you
>>>>>>>>>>>> look at the storage interface and implementation for git and s3 and
>>>>>>>>>>>> implement that yourself. It does sound like an interesting feature
>>>>>>>>>>>> Best
>>>>>>>>>>>> Eran
>>>>>>>>>>>> On Wed, 6 Apr 2016 at 08:57 ashish rawat <dceash...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks Eran. So 3, seems to be something external to Zeppelin,
>>>>>>>>>>>>> and hopefully 1 only means running "zeppelin-daemon.sh start" on 
>>>>>>>>>>>>> a slave
>>>>>>>>>>>>> machine, when master become inaccessible. Is that correct?
>>>>>>>>>>>>>
>>>>>>>>>>>>> My main concern still remains on the storage front. And I
>>>>>>>>>>>>> don't really have high availability disks or even hdfs in my 
>>>>>>>>>>>>> setup. I have
>>>>>>>>>>>>> been using elastic search cluster for data high availability, but 
>>>>>>>>>>>>> was
>>>>>>>>>>>>> hoping that zeppelin can save notebooks to a Elastic Search (like 
>>>>>>>>>>>>> kibana)
>>>>>>>>>>>>> or maybe a document store.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any idea if anything is planned in that direction. Don't want
>>>>>>>>>>>>> to fallback to 'rsync' like options.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Ashish
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Apr 5, 2016 at 11:17 PM, Eran Witkon <
>>>>>>>>>>>>> eranwit...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> For 1 you need to have both zeppelin web HA and zeppelin
>>>>>>>>>>>>>> deamon HA
>>>>>>>>>>>>>> For 2 I guess you can use HDFS if you implement the storage
>>>>>>>>>>>>>> interface for HDFS. But i am not sure.
>>>>>>>>>>>>>> For 3 I mean that if you connect to an external cluster for
>>>>>>>>>>>>>> example a spark cluster you need to make sure your spark cluster 
>>>>>>>>>>>>>> is HA.
>>>>>>>>>>>>>> Otherwise you will have zeppelin running but your notebook will 
>>>>>>>>>>>>>> fail as no
>>>>>>>>>>>>>> spark cluster available.
>>>>>>>>>>>>>> HTH
>>>>>>>>>>>>>> Eran
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, 5 Apr 2016 at 20:20 ashish rawat <dceash...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks Eran for your reply.
>>>>>>>>>>>>>>> For 1) I am assuming that it would similar to HA of any
>>>>>>>>>>>>>>> other web application, i.e. running multiple instances and 
>>>>>>>>>>>>>>> switching to the
>>>>>>>>>>>>>>> backup server when master is down, is it not the case?
>>>>>>>>>>>>>>> For 2) is it also possible to save it on hdfs?
>>>>>>>>>>>>>>> Can you please explain 3, are you referring to interpreter
>>>>>>>>>>>>>>> config? If I am using Spark interpreter and submitting jobs to 
>>>>>>>>>>>>>>> it, and if
>>>>>>>>>>>>>>> zeppelin master node goes down, then what could be the problem 
>>>>>>>>>>>>>>> in slave
>>>>>>>>>>>>>>> node pointing to the same cluster and submitting jobs?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Ashish
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Apr 5, 2016 at 10:08 PM, Eran Witkon <
>>>>>>>>>>>>>>> eranwit...@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I would say you need to account for these things
>>>>>>>>>>>>>>>> 1) availability of the zeppelin deamon
>>>>>>>>>>>>>>>> 2) availability of the notebookd files
>>>>>>>>>>>>>>>> 3) availability of the interpreters used.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For 1 i don't know of out-of-box solution
>>>>>>>>>>>>>>>> For 2 any ha storage will do, s3 or any ha external mounted
>>>>>>>>>>>>>>>> disk
>>>>>>>>>>>>>>>> For 3 it is up to the interpreter and your big data ha
>>>>>>>>>>>>>>>> solution
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, 5 Apr 2016 at 19:29 ashish rawat <
>>>>>>>>>>>>>>>> dceash...@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Is there a suggested architecture to run Zeppelin in high
>>>>>>>>>>>>>>>>> availability mode. The only option I could find was by saving 
>>>>>>>>>>>>>>>>> notebooks to
>>>>>>>>>>>>>>>>> S3. Are there any options if one is not using AWS?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>> Ashish
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Sent from my iThing
>>>>
>>>
>>>
>>
>

Re: HA for Zeppelin

Reply via email to