Personally I found this discussion very interesting as those exactly the
issues we encountered (as many others) in our company.

Actually, my main issues were:
1. Notebooks persistence and sharing across clusters.
2. Users management and especially sharing notebooks that we want (and not
sharing does we don't want...).

I would love to have tree-like notebooks mng with user mng on top of that.
Also, currently I find the notebook.json file a bit too sensitive to format
changes. Actually I encountered few times were a corrupted notebook format
prevented Zeppelin from start.

My two cents regarding persistency: I'm using GitHub repo to store my
notebooks. Each new cluster fetch base branch and than create new branch
for itself. It push updates and tag its branch daily.

In my company our next challenge is the user management and the way to
provide data sharing between users.


On Sun, Jun 14, 2015 at 3:38 AM, Corneau Damien <[email protected]>
wrote:

> So far, I know a lot of people using multiple instance of zeppelin to
> restrain the notebooks access. (For teams or people)
>
> Its a great way to not mess with each other notebooks and ressources.
>
> For the filesystem file structure, I think it will be a natural evolution
> from the current flat structure. There already was some discussions about
> it. Although there would be probably a lot of work related to that feature
> to do on the UI side (Renaming, Creating Folder, Moving Notebooks etc...)
> On Jun 14, 2015 1:53 AM, "John Omernik" <[email protected]> wrote:
>
>> Moon -
>>
>> Thank you, those seem like on the right track. I am not too worried about
>> a notebook persistence option as much as a way that we can specify the root
>> folder and then use a tree like navigation that knows the differences
>> between notebooks folders and regular folders. I think as people use it,
>> they would want to logically group certain notebooks together.  This could
>> be the initial "system" to manage notebooks, but I could also see away to
>> add fields to notebooks including a description field, and a "indexable"
>> option on either whole notebooks or items in the notebooks, that way down
>> the line we could add a search in addition to the tree view.
>>
>> Those are all "future" items, but in the short term, a way to get away
>> from a "flat" structure for folder naming I think would help with
>> organization for many people.  Consider someone who multiple projects, or
>> multiple users using them, that list could get long and chaotic very
>> quickly.
>>
>> On the subject of authentication, one thing I'd ask the group and devs is
>> the long term goal of Zeppelin? Do we want to make a notebook server can
>> support 1 user? 10 Users? 100? 1000?  How do we scale that?  If we add
>> Authentication we should consider the usage in an enterprise...
>> authentication is nice, don't get me wrong, it's needed, but I am curious
>> on the roadmap/strategy on that subject.  I was looking into individual
>> docker containers per user. That way each Zeppelin instance can be granted
>> more resource depending on the user's requirements.  But I am not familiar
>> with the Zeppelin structures to understand if this method has pitfalls.
>>
>> My eventual goal would be to setup scripts for provisioning in a way that
>> takes a "skeleton" docker image, fills in certain items (each user gets a
>> pair of ports, each user has defaults for memory, each user has their own
>> data environments setup) Those could all be auto provisioned and scripted.
>> Then the Docker container is run on an Apache Mesos cluster in a way that
>> that username is actually in the marathon app name. This would allow me to,
>> after auto provisioning, provide a user with a username and port that,
>> using Mesos DNS, allows them to connect up regardless of where the
>> container is run on the cluster.
>>
>> I know not everyone who uses Zeppelin would use that approach, so I guess
>> the reason for putting this all here is to see what the strategy is for
>> Zeppelin, can or should it support methods? are there huge problems with
>> the approach I am laying out? Can I contribute some of the ideas (if people
>> who know the project don't have any huge reasons for not having many
>> Zeppelin instances running).
>>
>> This is a great conversation, and I think speaks to the usefulness of
>> this project.
>>
>> John
>>
>>
>> On Sat, Jun 13, 2015 at 11:31 AM, moon soo Lee <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> Here's some related pullrequests you might interested.
>>>
>>> notebook storage options
>>> https://github.com/apache/incubator-zeppelin/pull/44
>>>
>>> authentication
>>> https://github.com/apache/incubator-zeppelin/pull/53
>>>
>>> Thanks,
>>> moon
>>>
>>> On Fri, Jun 12, 2015 at 11:08 PM Corneau Damien <[email protected]>
>>> wrote:
>>>
>>>> So, except for that notebook naming, what you would like is to have a
>>>> folder tree strucutre for notebooks instead of a flat structure. That way
>>>> you could navigate in those folders just like a normal filesystem.
>>>>
>>>> One problem with the acl restriction you would like to do though is the
>>>> 'user'. Zeppelin web interface is just the zeppelin instance and doesnt
>>>> have knowledge of which user is using it
>>>> On Jun 12, 2015 11:32 PM, "John Omernik" <[email protected]> wrote:
>>>>
>>>>> Hey all, are there any notebook storage options that are
>>>>> configurable?  Let me explain what I have observed and go from there with
>>>>> my specific questions
>>>>>
>>>>> I set a NFS share location to be my notebook location
>>>>>
>>>>> export ZEPPELIN_NOTEBOOK_DIR=/mnt/zeppelin_notebooks
>>>>>
>>>>> My ideas was I could have a directory per user in that folder (with
>>>>> permissions set to only user) and then a shared directory which would be
>>>>> usable by a group of users based off.  (This is me not knowing anything
>>>>> about how notebooks are stored).
>>>>>
>>>>> When I implemented it, it APPEARS that Zeppelin uses the base
>>>>> NOTEBOOK_DIR and just creates a Folder with a random name per notebook. In
>>>>> that folder there is a file named note.json.  It appears that in the file,
>>>>> there is a "Name" json item that is the value you can rename notebooks 
>>>>> too.
>>>>>
>>>>> That is how it "appears" to work. What I am asking by can we change
>>>>> this, or is it configurable, is Can we set a root directory, that we can
>>>>> navigate through as tree.  And then click through that tree? This would
>>>>> allow better organization for individual users and groups of users. It
>>>>> would also allow some sense of security as users navigate the tree.
>>>>>
>>>>> This then comes back to the "directory" per notebook. Is that
>>>>> required? Are there, at times, other files other than note.json stored in
>>>>> these directories?  If so, perhaps we could do a prefix that is ignored by
>>>>> the Tree in the GUI.  For example, if the user "johndoe" has a folder
>>>>> johndoe, it would show up as a folder, but a folder that starts with ZNB-
>>>>> like ZNB-2ATDB8F8R, in the gui would show up as a notebook (and it would
>>>>> check the note.json file for the name of the notebook). This would allow
>>>>> much more intuitive storage and management for a team of users.
>>>>>
>>>>> I would "prefer" that the name actually be the directory name, rather
>>>>> than the identifier that Zeppelin creates (it would allow easier 
>>>>> management
>>>>> of the notebooks outside of Zeppelin) however I don't know the reasoning
>>>>> behind it, therefore it's open to discussion for me.
>>>>>
>>>>> So for example
>>>>>
>>>>> /mnt/zeppelin_notebooks
>>>>>
>>>>> In here I may have These folders
>>>>> johndoe
>>>>> janesmith
>>>>> shared
>>>>> ZNB-2ATDB8F8R * -> note.json "name" field is "How to use company xyz
>>>>> notebooks"
>>>>>
>>>>> In the gui, it would start at the /mnt/zeppelin_notebooks
>>>>>
>>>>> it would list with folder icons:
>>>>> johndoe
>>>>> janesmith
>>>>> shared
>>>>>
>>>>> it would list with a notebook icon:
>>>>> "How to use company xyz notebooks"
>>>>>
>>>>> if user johndoe clicked on his folder it would show the notebooks and
>>>>> other directories as well as a parent (..) link that pulls the user back 
>>>>> up
>>>>> a directory.
>>>>>
>>>>> If johndoe tries to click on janesmith, it would give an access denied
>>>>> (because the Zeppelin binary would try to cwd into that directory, but get
>>>>> a filesystem access denied because it's running as johndoe)
>>>>>
>>>>> I am just curious on any other sort of discussion we can have here
>>>>> that would make this easier for groups of users to use?
>>>>>
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>>
>>

Reply via email to