Ok, thanks Jeff – that all makes sense! Yes, rendering and diffing notebooks in github would be very nice.
From: Jeff Zhang <zjf...@gmail.com> Sent: 13 August 2018 10:50 To: users@zeppelin.apache.org Subject: EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json >>> Do we need the note id in the file name at all? What’s wrong with just >>> note_name.zpln? The reason I keep note id is because currently we use noteId to identify one note. e.g. we use note id in both websocket api and rest api. It is almost impossible to remove noteId for the current architecture. If we put note id into file content of note_name.zpln, then we have to read the note file every time, then we meet the issues I mentioned above again. >>> If the file content is json then why not use note_name.json instead of >>> .zpln? That would make it easier for editors to know how to load/highlight >>> the file contents. I am not strongly biased on *.zpln. But I think one purpose is to help third parties to identify zeppelin note properly. e.g. github can identify jupyter notebook (*.ipynb) and render it properly. >>> Is there any reason for not using real folders or directories for >>> organising the notebooks rather than embedding the folder hierarchy in the >>> names of the notebooks? If someone wants to ‘move’ the notebooks to >>> another folder they’d have to manually rename all the files/notebooks at >>> present. That’s not very user-friendly. Actually my proposal is to use real folders. What user see in zeppelin note menu is the actual notes folder structure. If they want to move the notebooks to another folder, they can change the folder name just like what user did in file system. Partridge, Lucas (GE Aviation) <lucas.partri...@ge.com<mailto:lucas.partri...@ge.com>>于2018年8月13日周一 下午4:43写道: Hi Jeff, I have some questions about this proposal (I can’t edit the design doc): 1. Do we need the note id in the file name at all? What’s wrong with just note_name.zpln? 2. If the file content is json then why not use note_name.json instead of .zpln? That would make it easier for editors to know how to load/highlight the file contents. 3. Is there any reason for not using real folders or directories for organising the notebooks rather than embedding the folder hierarchy in the names of the notebooks? If someone wants to ‘move’ the notebooks to another folder they’d have to manually rename all the files/notebooks at present. That’s not very user-friendly. Thanks, Lucas. From: Jeff Zhang <zjf...@gmail.com<mailto:zjf...@gmail.com>> Sent: 13 August 2018 09:06 To: users@zeppelin.apache.org<mailto:users@zeppelin.apache.org> Cc: dev <d...@zeppelin.apache.org<mailto:d...@zeppelin.apache.org>> Subject: EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json In that case, zeppelin should fail to create note. Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>>于2018年8月13日周一 下午3:47写道: Perhaps one concern is users having characters in note name that are invalid for file name/file path? ________________________________ From: Mohit Jaggi <mohitja...@gmail.com<mailto:mohitja...@gmail.com>> Sent: Sunday, August 12, 2018 6:02 PM To: users@zeppelin.apache.org<mailto:users@zeppelin.apache.org> Cc: dev Subject: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json sounds like a good idea! On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zjf...@gmail.com<mailto:zjf...@gmail.com>> wrote: Motivation The motivation of ZEPPELIN-2619 is to change the notes storage structure. Previously we store it using {noteId}/note.json, we’d like to change it into {note_name}_{note_id}.zpln. There are several reasons for this change. 1. {noteId}/note.json is not scalable. We put all notes in one root folder in flat structure. And when zeppelin server starts, we need to read all note.json to get the note file name and build the note folder structure (Because we need to get the note name which is stored in note.json to build the notebook menu). This would be a nightmare when you have large amounts of notes. 2. {noteId}/note.json is not maintainable. It is difficult for a developer/administrator to find note file based on note name. 3. {noteId}/note.json has no folder structure. Currently zeppelin have to build the folder structure internally in memory according note name which is a big overhead. New Approach As I mentioned above, I propose to change the note storage structure to {note_name}_{note_id}.zpln. note_name could contains folders, e.g. folder_1/mynote_abcd.zpln This kind of note storage structure could bring several benefits. 1. We don’t need to load all notes when zeppelin starts. We just need to list each folder to get the note name and note_id. 2. It is much maintainable so that it is easy to find the note file based on note name. 3. It has the folder structure already. That can be mapped to the note folder structure. Side Effect This approach only works for file system storage, so that means we have to drop support for MongoNotebookRepo. I think it is ok because I didn’t see any users talk about this in community, so I assume no one is using it. This is overall design, welcome any comments and feedback. Thanks. Here's the google docs, you can also comment it here. https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing