Thanks Vincent and John, for providing these viable options. On Fri, Apr 8, 2016 at 10:39 PM, John Omernik <j...@omernik.com> wrote:
> So for us, we are doing something similar to Vincent, however, instead of > Gluster, we are using MapR-FS and the NFS mount. Basically, this gives us a > shared filesystem that is running on all nodes, with strong security > (Filesystem ACEs for fine grained permissions) built in auditing, Posix > compliance, true random read/write (as opposed to HDFS), snapshots, and > cluster to cluster replication. There are also some neat things with > Volumes and Volume placement we are doing . That provides our storage > layer. Then we have docker for actually running Zeppelin, and since it's a > instance per User, that helps organize who has access to what (Still > hashing out the details on that). Marathon on Mesos is how we ensure that > Zeppelin is actually available, and then when it comes to spark, we are > just submitting to Mesos, which is right there. Since everything is on one > cluster, the user has a home directory (on a volume) where I store all > configs for each instance of Zeppelin, and they can also put adhoc data in > their home directory. Spark and Apache Drill can both query anything in > MapR FS, making it a pretty powerful combination. > > > > On Fri, Apr 8, 2016 at 6:33 AM, vincent gromakowski < > vincent.gromakow...@gmail.com> wrote: > >> Using it for 3 months without any incident >> Le 8 avr. 2016 9:09 AM, "ashish rawat" <dceash...@gmail.com> a écrit : >> >>> Sounds great. How long have you been using glusterfs in prod? and have >>> you encountered any challenges. The only difficulty for me to use it, would >>> be a lack of expertise to fix broken things, so hope it's stability isn't >>> something to be concerned about. >>> >>> Regards, >>> Ashish >>> >>> On Fri, Apr 8, 2016 at 12:20 PM, vincent gromakowski < >>> vincent.gromakow...@gmail.com> wrote: >>> >>>> use fuse interface. Gluster volume is directly accessible as local >>>> storage on all nodes but performance is only 200 Mb/s. More than enough for >>>> notebooks. For data prefer tachyon/alluxio on top of gluster... >>>> Le 8 avr. 2016 6:35 AM, "ashish rawat" <dceash...@gmail.com> a écrit : >>>> >>>>> Thanks Eran and Vincent. >>>>> Eran, I would definitely like to try it out, since it won't add to the >>>>> complexity of my deployment. Would see the S3 implementation, to figure >>>>> out >>>>> how complex it would be. >>>>> >>>>> Vincent, >>>>> I haven't explored glusterfs at all. Would it also require to write an >>>>> implementation of storage interface? Or zeppelin can work with it, out of >>>>> the box? >>>>> >>>>> Regards, >>>>> Ashish >>>>> >>>>> On Wed, Apr 6, 2016 at 12:53 PM, vincent gromakowski < >>>>> vincent.gromakow...@gmail.com> wrote: >>>>> >>>>>> For 1 marathon on mesos restart zeppelin daemon In case of failure. >>>>>> For 2 glusterfs fuse mount allows to share notebooks on all mesos >>>>>> nodes. >>>>>> For 3 not available right now In our design but a manual restart In >>>>>> zeppelin config page is acceptable for US. >>>>>> Le 6 avr. 2016 8:18 AM, "Eran Witkon" <eranwit...@gmail.com> a >>>>>> écrit : >>>>>> >>>>>>> Yes this is correct. >>>>>>> For HA disk, if you don't have HA storage and no access to S3 then >>>>>>> AFAIK you don't have other option at the moment. >>>>>>> If you like to save notebooks to elastic then I suggest you look at >>>>>>> the storage interface and implementation for git and s3 and implement >>>>>>> that >>>>>>> yourself. It does sound like an interesting feature >>>>>>> Best >>>>>>> Eran >>>>>>> On Wed, 6 Apr 2016 at 08:57 ashish rawat <dceash...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks Eran. So 3, seems to be something external to Zeppelin, and >>>>>>>> hopefully 1 only means running "zeppelin-daemon.sh start" on a slave >>>>>>>> machine, when master become inaccessible. Is that correct? >>>>>>>> >>>>>>>> My main concern still remains on the storage front. And I don't >>>>>>>> really have high availability disks or even hdfs in my setup. I have >>>>>>>> been >>>>>>>> using elastic search cluster for data high availability, but was hoping >>>>>>>> that zeppelin can save notebooks to a Elastic Search (like kibana) or >>>>>>>> maybe >>>>>>>> a document store. >>>>>>>> >>>>>>>> Any idea if anything is planned in that direction. Don't want to >>>>>>>> fallback to 'rsync' like options. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Ashish >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Apr 5, 2016 at 11:17 PM, Eran Witkon <eranwit...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> For 1 you need to have both zeppelin web HA and zeppelin deamon HA >>>>>>>>> For 2 I guess you can use HDFS if you implement the storage >>>>>>>>> interface for HDFS. But i am not sure. >>>>>>>>> For 3 I mean that if you connect to an external cluster for >>>>>>>>> example a spark cluster you need to make sure your spark cluster is >>>>>>>>> HA. >>>>>>>>> Otherwise you will have zeppelin running but your notebook will fail >>>>>>>>> as no >>>>>>>>> spark cluster available. >>>>>>>>> HTH >>>>>>>>> Eran >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, 5 Apr 2016 at 20:20 ashish rawat <dceash...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks Eran for your reply. >>>>>>>>>> For 1) I am assuming that it would similar to HA of any other web >>>>>>>>>> application, i.e. running multiple instances and switching to the >>>>>>>>>> backup >>>>>>>>>> server when master is down, is it not the case? >>>>>>>>>> For 2) is it also possible to save it on hdfs? >>>>>>>>>> Can you please explain 3, are you referring to interpreter >>>>>>>>>> config? If I am using Spark interpreter and submitting jobs to it, >>>>>>>>>> and if >>>>>>>>>> zeppelin master node goes down, then what could be the problem in >>>>>>>>>> slave >>>>>>>>>> node pointing to the same cluster and submitting jobs? >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Ashish >>>>>>>>>> >>>>>>>>>> On Tue, Apr 5, 2016 at 10:08 PM, Eran Witkon < >>>>>>>>>> eranwit...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> I would say you need to account for these things >>>>>>>>>>> 1) availability of the zeppelin deamon >>>>>>>>>>> 2) availability of the notebookd files >>>>>>>>>>> 3) availability of the interpreters used. >>>>>>>>>>> >>>>>>>>>>> For 1 i don't know of out-of-box solution >>>>>>>>>>> For 2 any ha storage will do, s3 or any ha external mounted disk >>>>>>>>>>> For 3 it is up to the interpreter and your big data ha solution >>>>>>>>>>> >>>>>>>>>>> On Tue, 5 Apr 2016 at 19:29 ashish rawat <dceash...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> Is there a suggested architecture to run Zeppelin in high >>>>>>>>>>>> availability mode. The only option I could find was by saving >>>>>>>>>>>> notebooks to >>>>>>>>>>>> S3. Are there any options if one is not using AWS? >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Ashish >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>> >>> >