I was thinking, for an enterprise that if I used the "instance" per user, (using Docker etc) That using a nginx frontend to Allow any user to connect to johndoe-notebook.zeppelin.marathon.mesos if they set a client SSL cert, that would "Authenticate" them to the nginx server. From there, Nginx would have a authentication to "container" list that it would update and basically if johndoe (based on the cert) has access to johndoe-notebook, it would proxy the web and websocket ports to the proper container.
In some perfect world where I am a hot java developer I would setup a widget in Zeppelin that would be able to show the users "containers", the provisioned ssl certs, and then if I am a owner of x container, I could map other users to access. Perhaps. Seems complicated but doable if well thought out. On Sun, Jun 14, 2015 at 4:01 AM, Ophir Cohen <[email protected]> wrote: > Personally I found this discussion very interesting as those exactly the > issues we encountered (as many others) in our company. > > Actually, my main issues were: > 1. Notebooks persistence and sharing across clusters. > 2. Users management and especially sharing notebooks that we want (and not > sharing does we don't want...). > > I would love to have tree-like notebooks mng with user mng on top of that. > Also, currently I find the notebook.json file a bit too sensitive to > format changes. Actually I encountered few times were a corrupted notebook > format prevented Zeppelin from start. > > My two cents regarding persistency: I'm using GitHub repo to store my > notebooks. Each new cluster fetch base branch and than create new branch > for itself. It push updates and tag its branch daily. > > In my company our next challenge is the user management and the way to > provide data sharing between users. > > > On Sun, Jun 14, 2015 at 3:38 AM, Corneau Damien <[email protected]> > wrote: > >> So far, I know a lot of people using multiple instance of zeppelin to >> restrain the notebooks access. (For teams or people) >> >> Its a great way to not mess with each other notebooks and ressources. >> >> For the filesystem file structure, I think it will be a natural evolution >> from the current flat structure. There already was some discussions about >> it. Although there would be probably a lot of work related to that feature >> to do on the UI side (Renaming, Creating Folder, Moving Notebooks etc...) >> On Jun 14, 2015 1:53 AM, "John Omernik" <[email protected]> wrote: >> >>> Moon - >>> >>> Thank you, those seem like on the right track. I am not too worried >>> about a notebook persistence option as much as a way that we can specify >>> the root folder and then use a tree like navigation that knows the >>> differences between notebooks folders and regular folders. I think as >>> people use it, they would want to logically group certain notebooks >>> together. This could be the initial "system" to manage notebooks, but I >>> could also see away to add fields to notebooks including a description >>> field, and a "indexable" option on either whole notebooks or items in the >>> notebooks, that way down the line we could add a search in addition to the >>> tree view. >>> >>> Those are all "future" items, but in the short term, a way to get away >>> from a "flat" structure for folder naming I think would help with >>> organization for many people. Consider someone who multiple projects, or >>> multiple users using them, that list could get long and chaotic very >>> quickly. >>> >>> On the subject of authentication, one thing I'd ask the group and devs >>> is the long term goal of Zeppelin? Do we want to make a notebook server can >>> support 1 user? 10 Users? 100? 1000? How do we scale that? If we add >>> Authentication we should consider the usage in an enterprise... >>> authentication is nice, don't get me wrong, it's needed, but I am curious >>> on the roadmap/strategy on that subject. I was looking into individual >>> docker containers per user. That way each Zeppelin instance can be granted >>> more resource depending on the user's requirements. But I am not familiar >>> with the Zeppelin structures to understand if this method has pitfalls. >>> >>> My eventual goal would be to setup scripts for provisioning in a way >>> that takes a "skeleton" docker image, fills in certain items (each user >>> gets a pair of ports, each user has defaults for memory, each user has >>> their own data environments setup) Those could all be auto provisioned and >>> scripted. Then the Docker container is run on an Apache Mesos cluster in a >>> way that that username is actually in the marathon app name. This would >>> allow me to, after auto provisioning, provide a user with a username and >>> port that, using Mesos DNS, allows them to connect up regardless of where >>> the container is run on the cluster. >>> >>> I know not everyone who uses Zeppelin would use that approach, so I >>> guess the reason for putting this all here is to see what the strategy is >>> for Zeppelin, can or should it support methods? are there huge problems >>> with the approach I am laying out? Can I contribute some of the ideas (if >>> people who know the project don't have any huge reasons for not having many >>> Zeppelin instances running). >>> >>> This is a great conversation, and I think speaks to the usefulness of >>> this project. >>> >>> John >>> >>> >>> On Sat, Jun 13, 2015 at 11:31 AM, moon soo Lee <[email protected]> wrote: >>> >>>> Hi, >>>> >>>> Here's some related pullrequests you might interested. >>>> >>>> notebook storage options >>>> https://github.com/apache/incubator-zeppelin/pull/44 >>>> >>>> authentication >>>> https://github.com/apache/incubator-zeppelin/pull/53 >>>> >>>> Thanks, >>>> moon >>>> >>>> On Fri, Jun 12, 2015 at 11:08 PM Corneau Damien <[email protected]> >>>> wrote: >>>> >>>>> So, except for that notebook naming, what you would like is to have a >>>>> folder tree strucutre for notebooks instead of a flat structure. That way >>>>> you could navigate in those folders just like a normal filesystem. >>>>> >>>>> One problem with the acl restriction you would like to do though is >>>>> the 'user'. Zeppelin web interface is just the zeppelin instance and >>>>> doesnt >>>>> have knowledge of which user is using it >>>>> On Jun 12, 2015 11:32 PM, "John Omernik" <[email protected]> wrote: >>>>> >>>>>> Hey all, are there any notebook storage options that are >>>>>> configurable? Let me explain what I have observed and go from there with >>>>>> my specific questions >>>>>> >>>>>> I set a NFS share location to be my notebook location >>>>>> >>>>>> export ZEPPELIN_NOTEBOOK_DIR=/mnt/zeppelin_notebooks >>>>>> >>>>>> My ideas was I could have a directory per user in that folder (with >>>>>> permissions set to only user) and then a shared directory which would be >>>>>> usable by a group of users based off. (This is me not knowing anything >>>>>> about how notebooks are stored). >>>>>> >>>>>> When I implemented it, it APPEARS that Zeppelin uses the base >>>>>> NOTEBOOK_DIR and just creates a Folder with a random name per notebook. >>>>>> In >>>>>> that folder there is a file named note.json. It appears that in the >>>>>> file, >>>>>> there is a "Name" json item that is the value you can rename notebooks >>>>>> too. >>>>>> >>>>>> That is how it "appears" to work. What I am asking by can we change >>>>>> this, or is it configurable, is Can we set a root directory, that we can >>>>>> navigate through as tree. And then click through that tree? This would >>>>>> allow better organization for individual users and groups of users. It >>>>>> would also allow some sense of security as users navigate the tree. >>>>>> >>>>>> This then comes back to the "directory" per notebook. Is that >>>>>> required? Are there, at times, other files other than note.json stored in >>>>>> these directories? If so, perhaps we could do a prefix that is ignored >>>>>> by >>>>>> the Tree in the GUI. For example, if the user "johndoe" has a folder >>>>>> johndoe, it would show up as a folder, but a folder that starts with ZNB- >>>>>> like ZNB-2ATDB8F8R, in the gui would show up as a notebook (and it would >>>>>> check the note.json file for the name of the notebook). This would allow >>>>>> much more intuitive storage and management for a team of users. >>>>>> >>>>>> I would "prefer" that the name actually be the directory name, rather >>>>>> than the identifier that Zeppelin creates (it would allow easier >>>>>> management >>>>>> of the notebooks outside of Zeppelin) however I don't know the reasoning >>>>>> behind it, therefore it's open to discussion for me. >>>>>> >>>>>> So for example >>>>>> >>>>>> /mnt/zeppelin_notebooks >>>>>> >>>>>> In here I may have These folders >>>>>> johndoe >>>>>> janesmith >>>>>> shared >>>>>> ZNB-2ATDB8F8R * -> note.json "name" field is "How to use company xyz >>>>>> notebooks" >>>>>> >>>>>> In the gui, it would start at the /mnt/zeppelin_notebooks >>>>>> >>>>>> it would list with folder icons: >>>>>> johndoe >>>>>> janesmith >>>>>> shared >>>>>> >>>>>> it would list with a notebook icon: >>>>>> "How to use company xyz notebooks" >>>>>> >>>>>> if user johndoe clicked on his folder it would show the notebooks and >>>>>> other directories as well as a parent (..) link that pulls the user back >>>>>> up >>>>>> a directory. >>>>>> >>>>>> If johndoe tries to click on janesmith, it would give an access >>>>>> denied (because the Zeppelin binary would try to cwd into that directory, >>>>>> but get a filesystem access denied because it's running as johndoe) >>>>>> >>>>>> I am just curious on any other sort of discussion we can have here >>>>>> that would make this easier for groups of users to use? >>>>>> >>>>>> >>>>>> John >>>>>> >>>>>> >>>>>> >>> >
