I also agree to keep noteId, lots of zeppelin api use noteid to identify note. Removing it would cause lots of changes.
andreas.we...@gmail.com <andreas.we...@gmail.com>于2018年9月4日周二 上午3:03写道: > Somehow subject was deleted in earlier mail. So here again for adding my > thoughts to the proper thread: > > Sure, there might exist some kind of naming policy (or let's better call > it naming convention) in zeppelin multiuser environments. But as long as > there is no way to technically enforce a certain naming convention to turn > it into a naming policy (which would be a nice idea BTW), it's IMHO a vague > assumption, that there exist policies and users are following these. I'm > thinking here of problems that might come up when changing the existing > implementation and then deal with migration, because assumptions do not > match reality. > > My real life scenario here is, that zeppelin can be configured to make > notebooks visible only to the owner (and invisible to any other user) by > default: ZEPPELIN_NOTEBOOK_PUBLIC=false, which is IMHO a good idea when > setting up zeppelin as multiuser environment in larger scenarios. In this > case note owners can use any or no naming convention they like when > creating and using a note for personal purposes only, because only the > owner will see it - also and even if there exists certain naming policies > on an global organisation level. A global naming convention must only be > followed when users start sharing notes (means: adding at least reader > permissions to any other user). > > So I think noteId is a must have in the filename. > > Andreas > > On 2018/08/31 01:46:38, Jongyoul Lee <jongy...@gmail.com> wrote: > > Hi, > > > > I have a bit different thoughts about the conflicts of the name of a new > > note created. In a multiuser environment, AFAIK, most teams and > companies, > > generally, use a prefix for the group policy internally. In my case, > > user/{user_id}/{notebook_name_they_want}.zpln. In this case, naming > > conflicts rarely happen. And it will be stored under a specific folder. > If > > someone needed two different same named notes in the same directory, I > > might not be appropriate. WDYT? > > > > JL > > > > On Fri, Aug 31, 2018 at 4:44 AM, andreas.we...@gmail.com < > > andreas.we...@gmail.com> wrote: > > > > > another reason for keeping noteId is uniqueness in case of multi-user > > > environments. In that case users have separate zeppelin workspaces, > which > > > is something we are using in production: see > ZEPPELIN_NOTEBOOK_PUBLIC=false > > > in the doc [1]. In that case users might be very confused when they > can not > > > create notebooks with a name that already exists, but they most likely > > > don't see (yet). > > > > > > So I like the proposal {note_name}_{note_id}.zpln. where note_name > could > > > contains folders, e.g. folder_1/mynote_abcd.zpln. Even though I like > > > {note_name}.{note_id}.zpln (dot in between note_name and note_id) even > > > better :-) > > > > > > Regards > > > Andreas > > > > > > > > > [1] http://zeppelin.apache.org/docs/0.8.0/setup/security/ > > > > notebook_authorization.html#separate-notebook-workspaces-public-vs-private > > > > > > On 2018/08/18 08:42:44, Jeff Zhang <zjf...@gmail.com> wrote: > > > > BTW, I also prefer to use note name as identify of note if the issue > I > > > > mentioned before is acceptable for most of users. > > > > > > > > > > > > > > > > Jeff Zhang <zjf...@gmail.com>于2018年8月18日周六 下午4:40写道: > > > > > > > > > > > > > > I am afraid we can not remove noteId, as noteId is the unique > > > identifier > > > > > of note and is immutable which is used in a lot places, such as > > > paragraph > > > > > share and rest api. > > > > > If we use note name as note id then it may break user's app if note > > > name > > > > > is changed > > > > > > > > > > > > > > > Jongyoul Lee <jongy...@gmail.com>于2018年8月18日周六 下午2:33写道: > > > > > > > > > >> Hi, thanks for this kind of discussion. > > > > >> > > > > >> About noteId, How about changing note id to note name? AFAIK, > Note id > > > is > > > > >> just an identifier and we can set any value to it. > > > > >> > > > > >> There’re two potential problems. We should be more careful to > handle > > > note > > > > >> id as it could have very various type of characters. And Second, > in > > > case > > > > >> where someone changes a note name, those who are seeing and > updating > > > the > > > > >> same note wouldn’t access that note. We could handle it by using > > > websockets. > > > > >> > > > > >> WDYT? > > > > >> > > > > >> On Tue, 14 Aug 2018 at 6:14 PM Jeff Zhang <zjf...@gmail.com> > wrote: > > > > >> > > > > >>> >>> But I’m still not comfortable with note ids in the name of > the > > > > >>> notebook itself. Those names would look ugly if you shared your > > > notebooks > > > > >>> on github for example. You don’t see Jupyter notebooks with > names > > > like > > > > >>> that. If you have to keep the note ids with the notebooks could > you > > > not > > > > >>> simply put the note id at the top of the notebook as Ruslan > > > suggested? Then > > > > >>> you’d only have to read the first line of each notebook. > > > > >>> > > > > >>> I know putting note_id in the note file name is not so elegant, > but > > > this > > > > >>> is what we have to compromise to keep compatibility as we use > noteId > > > to > > > > >>> uniquely identify note right now. And I don't think putting > noteId > > > in the > > > > >>> top first line of note would help much. We still have to read > note > > > files > > > > >>> which take much more time than just read the file names via file > > > system. > > > > >>> > > > > >>> Regarding the readability of note file name, I think it won't > affect > > > > >>> much. E.g. This is the note book file name like: *My Project/My > > > Spark > > > > >>> Tutorial Note_2A94M5J1Z.zpln* > > > > >>> What user see in notebook menu is still *My Project/My Spark > > > Tutorial* *Note > > > > >>> *which is no difference from what we see now. > > > > >>> > > > > >>> And thanks again for the feedback and comments, I am so glad to > see > > > so > > > > >>> many discussion in community. > > > > >>> > > > > >>> > > > > >>> > > > > >>> Partridge, Lucas (GE Aviation) <lucas.partri...@ge.com > >于2018年8月14日周二 > > > > >>> 下午4:29写道: > > > > >>> > > > > >>>> I agree you’re inviting consistency issues if you maintained a > > > separate > > > > >>>> note id-to-note name mapping file. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> But I’m still not comfortable with note ids in the name of the > > > notebook > > > > >>>> itself. Those names would look ugly if you shared your > notebooks > > > on github > > > > >>>> for example. You don’t see Jupyter notebooks with names like > > > that. If you > > > > >>>> have to keep the note ids with the notebooks could you not > simply > > > put the > > > > >>>> note id at the top of the notebook as Ruslan suggested? Then > you’d > > > only > > > > >>>> have to read the first line of each notebook. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> Presumably if you copied the notebooks to another Zeppelin > server > > > they > > > > >>>> would be restored with the same note ids there too? And > hopefully > > > there > > > > >>>> would be no id clash with notebooks already on that server… > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> *From:* Jeff Zhang <zjf...@gmail.com> > > > > >>>> *Sent:* 14 August 2018 03:49 > > > > >>>> *To:* users@zeppelin.apache.org > > > > >>>> > > > > >>>> > > > > >>>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in > > > [Title].zpln > > > > >>>> instead of [NOTEID]/note.json > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> Thanks for the discussion. > > > > >>>> > > > > >>>> >>> I'm afraid about non-latin symbols in folder and note name. > And > > > > >>>> what about hieroglyphs? > > > > >>>> > > > > >>>> AFAIK, linux allow all the characters to be file name except > `\0` > > > and > > > > >>>> '/'. I can create file name with Chinese character in linux, I > > > guess you > > > > >>>> can use Russian as well. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> >>> If I understand correctly, this is being done solely to > speed up > > > > >>>> loading list of notebooks? What if a list of notebook names, > their > > > ids, > > > > >>>> folder structure, etc can be *cached* in a separate small json > > > file? Or > > > > >>>> perhaps in a small embedded key-value store, like www.mapdb.org > > > would > > > > >>>> do? Just thinking out loud. This would require a way to lazily > > > re-sync the > > > > >>>> cache. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> This not only to speed up the loading but also make the system > > > > >>>> architecture easy to maintain. Because for now we have to build > the > > > folder > > > > >>>> structure of notes in memory, many code in zeppelin is doing > this > > > > >>>> (Personally I don't think we need any code for this function if > we > > > could > > > > >>>> get the folder structure from the note file storage system). Use > > > another > > > > >>>> storage to keep the mapping of note name and note id will bring > > > another > > > > >>>> classic problem of distributed system: consistency. How do we > make > > > sure the > > > > >>>> consistency between the real note file and this mapping > component. > > > If we > > > > >>>> create/rename/remove note, we have to both update the notebook > repo > > > and the > > > > >>>> mapping storage. Any bug in code would bring inconsistency issue > > > based on > > > > >>>> my experience. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> Ruslan Dautkhanov <dautkha...@gmail.com>于2018年8月14日周二 上午3:58写道: > > > > >>>> > > > > >>>> Thanks for bringing this up for discussion. My 2 cents below. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> I am with Maksim and Felix on concerns with special characters > now > > > > >>>> allowed in notebook names, and also concerns with different > > > charsets. > > > > >>>> Russian language, for example, most commonly use iso-8859-5, > > > koi-8r/u, > > > > >>>> windows-1251 charsets etc. This seems like will bring whole new > set > > > of > > > > >>>> localization issues. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> If I understand correctly, this is being done solely to speed up > > > > >>>> loading list of notebooks? What if a list of notebook names, > their > > > ids, > > > > >>>> folder structure, etc can be *cached* in a separate small json > > > file? Or > > > > >>>> perhaps in a small embedded key-value store, like www.mapdb.org > > > would > > > > >>>> do? Just thinking out loud. This would require a way to lazily > > > re-sync the > > > > >>>> cache. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> Another way to speed up json reads is to somehow force "name" > > > attribute > > > > >>>> to be at the top of the json document that's written to disk. > Then > > > > >>>> re-implement json files reader to read just header of the file > and > > > do a > > > > >>>> partial json parse ( or in the lack of options, grab "name" > > > attribute from > > > > >>>> the json file header by a regex for example). > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> Back to filenames and charsets, I think issue may be more > > > complicated, > > > > >>>> if you store notebooks on a remote filesystem (nfs/ samba etc), > and > > > what if > > > > >>>> remote server and local nfs client have differences in default > fs > > > charsets? > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> Ideally would be if all filesystems would use UTF-8 for example, > > > but I > > > > >>>> am not certain that's a good assumption to make. Also exposing > > > notebook > > > > >>>> names can bring some other issues, like I know some users > > > occasionally add > > > > >>>> trailing/leading spaces etc. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> On Mon, Aug 13, 2018 at 10:38 AM Belousov Maksim Eduardovich < > > > > >>>> m.belou...@tinkoff.ru> wrote: > > > > >>>> > > > > >>>> The use of Russian and other specific letters in the note name > is > > > big > > > > >>>> advantage of Zeppelin. I would not like to give up this > > > functionality. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> I support the idea about `zpln` file extension. > > > > >>>> > > > > >>>> The folder structure also sounds good. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> I'm afraid about non-latin symbols in folder and note name. And > what > > > > >>>> about hieroglyphs? > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> Apache Zeppelin may be the first to use Russian letters in file > > > system > > > > >>>> in our company. > > > > >>>> > > > > >>>> I see a lot of risks to use non-latin symbols and a lot of > issues to > > > > >>>> make new folder structure stable. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> ------------------------------ > > > > >>>> > > > > >>>> *От:* Jeff Zhang <zjf...@gmail.com> > > > > >>>> *Отправлено:* 13 августа 2018 г. 12:50 > > > > >>>> *Кому:* users@zeppelin.apache.org > > > > >>>> *Тема:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln > > > instead > > > > >>>> of [NOTEID]/note.json > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> >>> Do we need the note id in the file name at all? What’s wrong > > > with > > > > >>>> just note_name.zpln? > > > > >>>> > > > > >>>> The reason I keep note id is because currently we use noteId to > > > > >>>> identify one note. e.g. we use note id in both websocket api and > > > rest api. > > > > >>>> It is almost impossible to remove noteId for the current > > > architecture. If > > > > >>>> we put note id into file content of note_name.zpln, then we > have to > > > read > > > > >>>> the note file every time, then we meet the issues I mentioned > above > > > again. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> >>> If the file content is json then why not use note_name.json > > > instead > > > > >>>> of .zpln? That would make it easier for editors to know how to > > > > >>>> load/highlight the file contents. > > > > >>>> > > > > >>>> I am not strongly biased on *.zpln. But I think one purpose is > to > > > help > > > > >>>> third parties to identify zeppelin note properly. e.g. github > can > > > identify > > > > >>>> jupyter notebook (*.ipynb) and render it properly. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> >>> Is there any reason for not using *real* folders or > directories > > > > >>>> for organising the notebooks rather than embedding the folder > > > hierarchy in > > > > >>>> the names of the notebooks? If someone wants to ‘move’ the > > > notebooks to > > > > >>>> another folder they’d have to manually rename all the > > > files/notebooks at > > > > >>>> present. That’s not very user-friendly. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> Actually my proposal is to use real folders. What user see in > > > zeppelin > > > > >>>> note menu is the actual notes folder structure. If they want to > > > move the > > > > >>>> notebooks to another folder, they can change the folder name > just > > > like what > > > > >>>> user did in file system. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> Partridge, Lucas (GE Aviation) <lucas.partri...@ge.com > >于2018年8月13日周一 > > > 下午 > > > > >>>> 4:43写道: > > > > >>>> > > > > >>>> Hi Jeff, > > > > >>>> > > > > >>>> I have some questions about this proposal (I can’t edit the > design > > > doc): > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> 1. Do we need the note id in the file name at all? What’s > wrong > > > > >>>> with just note_name.zpln? > > > > >>>> 2. If the file content is json then why not use > note_name.json > > > > >>>> instead of .zpln? That would make it easier for editors to > know > > > how to > > > > >>>> load/highlight the file contents. > > > > >>>> 3. Is there any reason for not using *real* folders or > > > directories > > > > >>>> for organising the notebooks rather than embedding the folder > > > hierarchy in > > > > >>>> the names of the notebooks? If someone wants to ‘move’ the > > > notebooks to > > > > >>>> another folder they’d have to manually rename all the > > > files/notebooks at > > > > >>>> present. That’s not very user-friendly. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> Thanks, Lucas. > > > > >>>> > > > > >>>> *From:* Jeff Zhang <zjf...@gmail.com> > > > > >>>> *Sent:* 13 August 2018 09:06 > > > > >>>> *To:* users@zeppelin.apache.org > > > > >>>> *Cc:* dev <d...@zeppelin.apache.org> > > > > >>>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in > > > [Title].zpln > > > > >>>> instead of [NOTEID]/note.json > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> In that case, zeppelin should fail to create note. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> Felix Cheung <felixcheun...@hotmail.com>于2018年8月13日周一 下午3:47写道: > > > > >>>> > > > > >>>> Perhaps one concern is users having characters in note name > that are > > > > >>>> invalid for file name/file path? > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> ------------------------------ > > > > >>>> > > > > >>>> *From:* Mohit Jaggi <mohitja...@gmail.com> > > > > >>>> *Sent:* Sunday, August 12, 2018 6:02 PM > > > > >>>> *To:* users@zeppelin.apache.org > > > > >>>> *Cc:* dev > > > > >>>> *Subject:* Re: [DISCUSS] ZEPPELIN-2619. Save note in > [Title].zpln > > > > >>>> instead of [NOTEID]/note.json > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> sounds like a good idea! > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zjf...@gmail.com> > > > wrote: > > > > >>>> > > > > >>>> Motivation > > > > >>>> > > > > >>>> The motivation of ZEPPELIN-2619 is to change the notes > storage > > > > >>>> structure. Previously we store it using {noteId}/note.json, we’d > > > like to > > > > >>>> change it into {note_name}_{note_id}.zpln. There are several > > > reasons for > > > > >>>> this change. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> 1. {noteId}/note.json is not scalable. We put all notes in > one > > > root > > > > >>>> folder in flat structure. And when zeppelin server starts, we > > > need to read > > > > >>>> all note.json to get the note file name and build the note > > > folder structure > > > > >>>> (Because we need to get the note name which is stored in > > > note.json to build > > > > >>>> the notebook menu). This would be a nightmare when you have > > > large amounts > > > > >>>> of notes. > > > > >>>> 2. {noteId}/note.json is not maintainable. It is difficult > for a > > > > >>>> developer/administrator to find note file based on note name. > > > > >>>> 3. {noteId}/note.json has no folder structure. Currently > zeppelin > > > > >>>> have to build the folder structure internally in memory > > > according note name > > > > >>>> which is a big overhead. > > > > >>>> > > > > >>>> > > > > >>>> New Approach > > > > >>>> > > > > >>>> As I mentioned above, I propose to change the note storage > > > structure > > > > >>>> to {note_name}_{note_id}.zpln. note_name could contains > folders, > > > e.g. > > > > >>>> folder_1/mynote_abcd.zpln > > > > >>>> > > > > >>>> This kind of note storage structure could bring several > benefits. > > > > >>>> > > > > >>>> 1. We don’t need to load all notes when zeppelin starts. We > just > > > > >>>> need to list each folder to get the note name and note_id. > > > > >>>> 2. It is much maintainable so that it is easy to find the > note > > > file > > > > >>>> based on note name. > > > > >>>> 3. It has the folder structure already. That can be mapped > to the > > > > >>>> note folder structure. > > > > >>>> > > > > >>>> > > > > >>>> Side Effect > > > > >>>> > > > > >>>> This approach only works for file system storage, so that means > we > > > have > > > > >>>> to drop support for MongoNotebookRepo. I think it is ok because > I > > > didn’t > > > > >>>> see any users talk about this in community, so I assume no one > is > > > using it. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> This is overall design, welcome any comments and feedback. > Thanks. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> Here's the google docs, you can also comment it here. > > > > >>>> > > > > >>>> > > > > >>>> > https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8T > > > ATYcGkDL1DNZoE/edit?usp=sharing > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> -- > > > > >> 이종열, Jongyoul Lee, 李宗烈 > > > > >> http://madeng.net > > > > >> > > > > > > > > > > > > > > > > > > > > -- > > 이종열, Jongyoul Lee, 李宗烈 > > http://madeng.net > > >