I am afraid we can not remove noteId, as noteId is the unique identifier of note and is immutable which is used in a lot places, such as paragraph share and rest api. If we use note name as note id then it may break user's app if note name is changed
Jongyoul Lee <jongy...@gmail.com>于2018年8月18日周六 下午2:33写道: > Hi, thanks for this kind of discussion. > > About noteId, How about changing note id to note name? AFAIK, Note id is > just an identifier and we can set any value to it. > > There’re two potential problems. We should be more careful to handle note > id as it could have very various type of characters. And Second, in case > where someone changes a note name, those who are seeing and updating the > same note wouldn’t access that note. We could handle it by using websockets. > > WDYT? > > On Tue, 14 Aug 2018 at 6:14 PM Jeff Zhang <zjf...@gmail.com> wrote: > >> >>> But I’m still not comfortable with note ids in the name of the >> notebook itself. Those names would look ugly if you shared your notebooks >> on github for example. You don’t see Jupyter notebooks with names like >> that. If you have to keep the note ids with the notebooks could you not >> simply put the note id at the top of the notebook as Ruslan suggested? Then >> you’d only have to read the first line of each notebook. >> >> I know putting note_id in the note file name is not so elegant, but this >> is what we have to compromise to keep compatibility as we use noteId to >> uniquely identify note right now. And I don't think putting noteId in the >> top first line of note would help much. We still have to read note files >> which take much more time than just read the file names via file system. >> >> Regarding the readability of note file name, I think it won't affect >> much. E.g. This is the note book file name like: *My Project/My Spark >> Tutorial Note_2A94M5J1Z.zpln* >> What user see in notebook menu is still *My Project/My Spark Tutorial* *Note >> *which is no difference from what we see now. >> >> And thanks again for the feedback and comments, I am so glad to see so >> many discussion in community. >> >> >> >> Partridge, Lucas (GE Aviation) <lucas.partri...@ge.com>于2018年8月14日周二 >> 下午4:29写道: >> >>> I agree you’re inviting consistency issues if you maintained a separate >>> note id-to-note name mapping file. >>> >>> >>> >>> But I’m still not comfortable with note ids in the name of the notebook >>> itself. Those names would look ugly if you shared your notebooks on github >>> for example. You don’t see Jupyter notebooks with names like that. If you >>> have to keep the note ids with the notebooks could you not simply put the >>> note id at the top of the notebook as Ruslan suggested? Then you’d only >>> have to read the first line of each notebook. >>> >>> >>> >>> Presumably if you copied the notebooks to another Zeppelin server they >>> would be restored with the same note ids there too? And hopefully there >>> would be no id clash with notebooks already on that server… >>> >>> >>> >>> *From:* Jeff Zhang <zjf...@gmail.com> >>> *Sent:* 14 August 2018 03:49 >>> *To:* users@zeppelin.apache.org >>> >>> >>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln >>> instead of [NOTEID]/note.json >>> >>> >>> >>> >>> >>> Thanks for the discussion. >>> >>> >>> I'm afraid about non-latin symbols in folder and note name. And >>> what about hieroglyphs? >>> >>> AFAIK, linux allow all the characters to be file name except `\0` and >>> '/'. I can create file name with Chinese character in linux, I guess you >>> can use Russian as well. >>> >>> >>> >>> >>> If I understand correctly, this is being done solely to speed up >>> loading list of notebooks? What if a list of notebook names, their ids, >>> folder structure, etc can be *cached* in a separate small json file? Or >>> perhaps in a small embedded key-value store, like www.mapdb.org would >>> do? Just thinking out loud. This would require a way to lazily re-sync the >>> cache. >>> >>> >>> >>> This not only to speed up the loading but also make the system >>> architecture easy to maintain. Because for now we have to build the folder >>> structure of notes in memory, many code in zeppelin is doing this >>> (Personally I don't think we need any code for this function if we could >>> get the folder structure from the note file storage system). Use another >>> storage to keep the mapping of note name and note id will bring another >>> classic problem of distributed system: consistency. How do we make sure the >>> consistency between the real note file and this mapping component. If we >>> create/rename/remove note, we have to both update the notebook repo and the >>> mapping storage. Any bug in code would bring inconsistency issue based on >>> my experience. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Ruslan Dautkhanov <dautkha...@gmail.com>于2018年8月14日周二 上午3:58写道: >>> >>> Thanks for bringing this up for discussion. My 2 cents below. >>> >>> >>> >>> I am with Maksim and Felix on concerns with special characters now >>> allowed in notebook names, and also concerns with different charsets. >>> Russian language, for example, most commonly use iso-8859-5, koi-8r/u, >>> windows-1251 charsets etc. This seems like will bring whole new set of >>> localization issues. >>> >>> >>> >>> If I understand correctly, this is being done solely to speed up loading >>> list of notebooks? What if a list of notebook names, their ids, folder >>> structure, etc can be *cached* in a separate small json file? Or perhaps in >>> a small embedded key-value store, like www.mapdb.org would do? Just >>> thinking out loud. This would require a way to lazily re-sync the cache. >>> >>> >>> >>> Another way to speed up json reads is to somehow force "name" attribute >>> to be at the top of the json document that's written to disk. Then >>> re-implement json files reader to read just header of the file and do a >>> partial json parse ( or in the lack of options, grab "name" attribute from >>> the json file header by a regex for example). >>> >>> >>> >>> Back to filenames and charsets, I think issue may be more complicated, >>> if you store notebooks on a remote filesystem (nfs/ samba etc), and what if >>> remote server and local nfs client have differences in default fs charsets? >>> >>> >>> >>> Ideally would be if all filesystems would use UTF-8 for example, but I >>> am not certain that's a good assumption to make. Also exposing notebook >>> names can bring some other issues, like I know some users occasionally add >>> trailing/leading spaces etc. >>> >>> >>> >>> >>> >>> On Mon, Aug 13, 2018 at 10:38 AM Belousov Maksim Eduardovich < >>> m.belou...@tinkoff.ru> wrote: >>> >>> The use of Russian and other specific letters in the note name is big >>> advantage of Zeppelin. I would not like to give up this functionality. >>> >>> >>> >>> I support the idea about `zpln` file extension. >>> >>> The folder structure also sounds good. >>> >>> >>> >>> I'm afraid about non-latin symbols in folder and note name. And what >>> about hieroglyphs? >>> >>> >>> >>> Apache Zeppelin may be the first to use Russian letters in file system >>> in our company. >>> >>> I see a lot of risks to use non-latin symbols and a lot of issues to >>> make new folder structure stable. >>> >>> >>> >>> >>> >>> >>> ------------------------------ >>> >>> *От:* Jeff Zhang <zjf...@gmail.com> >>> *Отправлено:* 13 августа 2018 г. 12:50 >>> *Кому:* users@zeppelin.apache.org >>> *Тема:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead >>> of [NOTEID]/note.json >>> >>> >>> >>> >>> Do we need the note id in the file name at all? What’s wrong with >>> just note_name.zpln? >>> >>> The reason I keep note id is because currently we use noteId to identify >>> one note. e.g. we use note id in both websocket api and rest api. It is >>> almost impossible to remove noteId for the current architecture. If we put >>> note id into file content of note_name.zpln, then we have to read the note >>> file every time, then we meet the issues I mentioned above again. >>> >>> >>> >>> >>> If the file content is json then why not use note_name.json instead >>> of .zpln? That would make it easier for editors to know how to >>> load/highlight the file contents. >>> >>> I am not strongly biased on *.zpln. But I think one purpose is to help >>> third parties to identify zeppelin note properly. e.g. github can identify >>> jupyter notebook (*.ipynb) and render it properly. >>> >>> >>> >>> >>> Is there any reason for not using *real* folders or directories for >>> organising the notebooks rather than embedding the folder hierarchy in the >>> names of the notebooks? If someone wants to ‘move’ the notebooks to >>> another folder they’d have to manually rename all the files/notebooks at >>> present. That’s not very user-friendly. >>> >>> >>> >>> Actually my proposal is to use real folders. What user see in zeppelin >>> note menu is the actual notes folder structure. If they want to move the >>> notebooks to another folder, they can change the folder name just like what >>> user did in file system. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Partridge, Lucas (GE Aviation) <lucas.partri...@ge.com>于2018年8月13日周一 下午 >>> 4:43写道: >>> >>> Hi Jeff, >>> >>> I have some questions about this proposal (I can’t edit the design doc): >>> >>> >>> >>> 1. Do we need the note id in the file name at all? What’s wrong with >>> just note_name.zpln? >>> 2. If the file content is json then why not use note_name.json >>> instead of .zpln? That would make it easier for editors to know how to >>> load/highlight the file contents. >>> 3. Is there any reason for not using *real* folders or directories >>> for organising the notebooks rather than embedding the folder hierarchy >>> in >>> the names of the notebooks? If someone wants to ‘move’ the notebooks to >>> another folder they’d have to manually rename all the files/notebooks at >>> present. That’s not very user-friendly. >>> >>> >>> >>> Thanks, Lucas. >>> >>> *From:* Jeff Zhang <zjf...@gmail.com> >>> *Sent:* 13 August 2018 09:06 >>> *To:* users@zeppelin.apache.org >>> *Cc:* dev <d...@zeppelin.apache.org> >>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln >>> instead of [NOTEID]/note.json >>> >>> >>> >>> In that case, zeppelin should fail to create note. >>> >>> >>> >>> Felix Cheung <felixcheun...@hotmail.com>于2018年8月13日周一 下午3:47写道: >>> >>> Perhaps one concern is users having characters in note name that are >>> invalid for file name/file path? >>> >>> >>> >>> >>> ------------------------------ >>> >>> *From:* Mohit Jaggi <mohitja...@gmail.com> >>> *Sent:* Sunday, August 12, 2018 6:02 PM >>> *To:* users@zeppelin.apache.org >>> *Cc:* dev >>> *Subject:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln >>> instead of [NOTEID]/note.json >>> >>> >>> >>> sounds like a good idea! >>> >>> >>> >>> On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zjf...@gmail.com> wrote: >>> >>> Motivation >>> >>> The motivation of ZEPPELIN-2619 is to change the notes storage >>> structure. Previously we store it using {noteId}/note.json, we’d like to >>> change it into {note_name}_{note_id}.zpln. There are several reasons for >>> this change. >>> >>> >>> >>> 1. {noteId}/note.json is not scalable. We put all notes in one root >>> folder in flat structure. And when zeppelin server starts, we need to >>> read >>> all note.json to get the note file name and build the note folder >>> structure >>> (Because we need to get the note name which is stored in note.json to >>> build >>> the notebook menu). This would be a nightmare when you have large amounts >>> of notes. >>> 2. {noteId}/note.json is not maintainable. It is difficult for a >>> developer/administrator to find note file based on note name. >>> 3. {noteId}/note.json has no folder structure. Currently zeppelin >>> have to build the folder structure internally in memory according note >>> name >>> which is a big overhead. >>> >>> >>> New Approach >>> >>> As I mentioned above, I propose to change the note storage structure >>> to {note_name}_{note_id}.zpln. note_name could contains folders, e.g. >>> folder_1/mynote_abcd.zpln >>> >>> This kind of note storage structure could bring several benefits. >>> >>> 1. We don’t need to load all notes when zeppelin starts. We just >>> need to list each folder to get the note name and note_id. >>> 2. It is much maintainable so that it is easy to find the note file >>> based on note name. >>> 3. It has the folder structure already. That can be mapped to the >>> note folder structure. >>> >>> >>> Side Effect >>> >>> This approach only works for file system storage, so that means we have >>> to drop support for MongoNotebookRepo. I think it is ok because I didn’t >>> see any users talk about this in community, so I assume no one is using it. >>> >>> >>> >>> This is overall design, welcome any comments and feedback. Thanks. >>> >>> >>> >>> Here's the google docs, you can also comment it here. >>> >>> >>> https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing >>> >>> >>> >>> >>> >>> -- > 이종열, Jongyoul Lee, 李宗烈 > http://madeng.net >