Re: HA for Zeppelin

ashish rawat Sun, 10 Apr 2016 23:03:49 -0700

Thanks Vincent and John, for providing these viable options.

On Fri, Apr 8, 2016 at 10:39 PM, John Omernik <j...@omernik.com> wrote:


> So for us, we are doing something similar to Vincent, however, instead of
> Gluster, we are using MapR-FS and the NFS mount. Basically, this gives us a
> shared filesystem that is running on all nodes, with strong security
> (Filesystem ACEs for fine grained permissions) built in auditing, Posix
> compliance, true random read/write (as opposed to HDFS), snapshots, and
> cluster to cluster replication. There are also some neat things with
> Volumes and Volume placement we are doing . That provides our storage
> layer. Then we have docker for actually running Zeppelin, and since it's a
> instance per User, that helps organize who has access to what (Still
> hashing out the details on that).  Marathon on Mesos is how we ensure that
> Zeppelin is actually available, and then when it comes to spark, we are
> just submitting to Mesos, which is right there. Since everything is on one
> cluster, the user has a home directory (on a volume) where I store all
> configs for each instance of Zeppelin, and they can also put adhoc data in
> their home directory. Spark and Apache Drill can both query anything in
> MapR FS, making it a pretty powerful combination.
>
>
>
> On Fri, Apr 8, 2016 at 6:33 AM, vincent gromakowski <
> vincent.gromakow...@gmail.com> wrote:
>
>> Using it for 3 months without any incident
>> Le 8 avr. 2016 9:09 AM, "ashish rawat" <dceash...@gmail.com> a écrit :
>>
>>> Sounds great. How long have you been using glusterfs in prod? and have
>>> you encountered any challenges. The only difficulty for me to use it, would
>>> be a lack of expertise to fix broken things, so hope it's stability isn't
>>> something to be concerned about.
>>>
>>> Regards,
>>> Ashish
>>>
>>> On Fri, Apr 8, 2016 at 12:20 PM, vincent gromakowski <
>>> vincent.gromakow...@gmail.com> wrote:
>>>
>>>> use fuse interface. Gluster volume is directly accessible as local
>>>> storage on all nodes but performance is only 200 Mb/s. More than enough for
>>>> notebooks. For data prefer tachyon/alluxio on top of gluster...
>>>> Le 8 avr. 2016 6:35 AM, "ashish rawat" <dceash...@gmail.com> a écrit :
>>>>
>>>>> Thanks Eran and Vincent.
>>>>> Eran, I would definitely like to try it out, since it won't add to the
>>>>> complexity of my deployment. Would see the S3 implementation, to figure 
>>>>> out
>>>>> how complex it would be.
>>>>>
>>>>> Vincent,
>>>>> I haven't explored glusterfs at all. Would it also require to write an
>>>>> implementation of storage interface? Or zeppelin can work with it, out of
>>>>> the box?
>>>>>
>>>>> Regards,
>>>>> Ashish
>>>>>
>>>>> On Wed, Apr 6, 2016 at 12:53 PM, vincent gromakowski <
>>>>> vincent.gromakow...@gmail.com> wrote:
>>>>>
>>>>>> For 1 marathon on mesos restart zeppelin daemon In case of failure.
>>>>>> For 2 glusterfs fuse mount allows to share notebooks on all mesos
>>>>>> nodes.
>>>>>> For 3 not available right now In our  design but a manual restart In
>>>>>> zeppelin config page is acceptable for US.
>>>>>> Le 6 avr. 2016 8:18 AM, "Eran Witkon" <eranwit...@gmail.com> a
>>>>>> écrit :
>>>>>>
>>>>>>> Yes this is correct.
>>>>>>> For HA disk, if you don't have HA storage and no access to S3 then
>>>>>>> AFAIK you don't have other option at the moment.
>>>>>>> If you like to save notebooks to elastic then I suggest you look at
>>>>>>> the storage interface and implementation for git and s3 and implement 
>>>>>>> that
>>>>>>> yourself. It does sound like an interesting feature
>>>>>>> Best
>>>>>>> Eran
>>>>>>> On Wed, 6 Apr 2016 at 08:57 ashish rawat <dceash...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks Eran. So 3, seems to be something external to Zeppelin, and
>>>>>>>> hopefully 1 only means running "zeppelin-daemon.sh start" on a slave
>>>>>>>> machine, when master become inaccessible. Is that correct?
>>>>>>>>
>>>>>>>> My main concern still remains on the storage front. And I don't
>>>>>>>> really have high availability disks or even hdfs in my setup. I have 
>>>>>>>> been
>>>>>>>> using elastic search cluster for data high availability, but was hoping
>>>>>>>> that zeppelin can save notebooks to a Elastic Search (like kibana) or 
>>>>>>>> maybe
>>>>>>>> a document store.
>>>>>>>>
>>>>>>>> Any idea if anything is planned in that direction. Don't want to
>>>>>>>> fallback to 'rsync' like options.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Ashish
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Apr 5, 2016 at 11:17 PM, Eran Witkon <eranwit...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> For 1 you need to have both zeppelin web HA and zeppelin deamon HA
>>>>>>>>> For 2 I guess you can use HDFS if you implement the storage
>>>>>>>>> interface for HDFS. But i am not sure.
>>>>>>>>> For 3 I mean that if you connect to an external cluster for
>>>>>>>>> example a spark cluster you need to make sure your spark cluster is 
>>>>>>>>> HA.
>>>>>>>>> Otherwise you will have zeppelin running but your notebook will fail 
>>>>>>>>> as no
>>>>>>>>> spark cluster available.
>>>>>>>>> HTH
>>>>>>>>> Eran
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, 5 Apr 2016 at 20:20 ashish rawat <dceash...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Eran for your reply.
>>>>>>>>>> For 1) I am assuming that it would similar to HA of any other web
>>>>>>>>>> application, i.e. running multiple instances and switching to the 
>>>>>>>>>> backup
>>>>>>>>>> server when master is down, is it not the case?
>>>>>>>>>> For 2) is it also possible to save it on hdfs?
>>>>>>>>>> Can you please explain 3, are you referring to interpreter
>>>>>>>>>> config? If I am using Spark interpreter and submitting jobs to it, 
>>>>>>>>>> and if
>>>>>>>>>> zeppelin master node goes down, then what could be the problem in 
>>>>>>>>>> slave
>>>>>>>>>> node pointing to the same cluster and submitting jobs?
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Ashish
>>>>>>>>>>
>>>>>>>>>> On Tue, Apr 5, 2016 at 10:08 PM, Eran Witkon <
>>>>>>>>>> eranwit...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I would say you need to account for these things
>>>>>>>>>>> 1) availability of the zeppelin deamon
>>>>>>>>>>> 2) availability of the notebookd files
>>>>>>>>>>> 3) availability of the interpreters used.
>>>>>>>>>>>
>>>>>>>>>>> For 1 i don't know of out-of-box solution
>>>>>>>>>>> For 2 any ha storage will do, s3 or any ha external mounted disk
>>>>>>>>>>> For 3 it is up to the interpreter and your big data ha solution
>>>>>>>>>>>
>>>>>>>>>>> On Tue, 5 Apr 2016 at 19:29 ashish rawat <dceash...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Is there a suggested architecture to run Zeppelin in high
>>>>>>>>>>>> availability mode. The only option I could find was by saving 
>>>>>>>>>>>> notebooks to
>>>>>>>>>>>> S3. Are there any options if one is not using AWS?
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Ashish
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>
>>>
>

Re: HA for Zeppelin

Reply via email to