I am afraid we can not remove noteId, as noteId is the unique identifier of
note and is immutable which is used in a lot places, such as paragraph
share and rest api.
If we use note name as note id then it may break user's app if note name is
changed


Jongyoul Lee <jongy...@gmail.com>于2018年8月18日周六 下午2:33写道:

> Hi, thanks for this kind of discussion.
>
> About noteId, How about changing note id to note name? AFAIK, Note id is
> just an identifier and we can set any value to it.
>
> There’re two potential problems. We should be more careful to handle note
> id as it could have very various type of characters. And Second, in case
> where someone changes a note name, those who are seeing and updating the
> same note wouldn’t access that note. We could handle it by using websockets.
>
> WDYT?
>
> On Tue, 14 Aug 2018 at 6:14 PM Jeff Zhang <zjf...@gmail.com> wrote:
>
>> >>> But I’m still not comfortable with note ids in the name of the
>> notebook itself.  Those names would look ugly if you shared your notebooks
>> on github for example.  You don’t see Jupyter notebooks with names like
>> that. If you have to keep the note ids with the notebooks could you not
>> simply put the note id at the top of the notebook as Ruslan suggested? Then
>> you’d only have to read the first line of each notebook.
>>
>> I know putting note_id in the note file name is not so elegant, but this
>> is what we have to compromise to keep compatibility as we use noteId to
>> uniquely identify note right now. And I don't think putting noteId in the
>> top first line of note would help much. We still have to read note files
>> which take much more time than just read the file names via file system.
>>
>> Regarding the readability of note file name, I think it won't affect
>> much. E.g. This is the note book file name like:  *My Project/My Spark
>> Tutorial Note_2A94M5J1Z.zpln*
>> What user see in notebook menu is still *My Project/My Spark Tutorial* *Note
>> *which is no difference from what we see now.
>>
>> And thanks again for the feedback and comments, I am so glad to see so
>> many discussion in community.
>>
>>
>>
>> Partridge, Lucas (GE Aviation) <lucas.partri...@ge.com>于2018年8月14日周二
>> 下午4:29写道:
>>
>>> I agree you’re inviting consistency issues if you maintained a separate
>>> note id-to-note name mapping file.
>>>
>>>
>>>
>>> But I’m still not comfortable with note ids in the name of the notebook
>>> itself.  Those names would look ugly if you shared your notebooks on github
>>> for example.  You don’t see Jupyter notebooks with names like that.  If you
>>> have to keep the note ids with the notebooks could you not simply put the
>>> note id at the top of the notebook as Ruslan suggested? Then you’d only
>>> have to read the first line of each notebook.
>>>
>>>
>>>
>>> Presumably if you copied the notebooks to another Zeppelin server they
>>> would be restored with the same note ids there too? And hopefully there
>>> would be no id clash with notebooks already on that server…
>>>
>>>
>>>
>>> *From:* Jeff Zhang <zjf...@gmail.com>
>>> *Sent:* 14 August 2018 03:49
>>> *To:* users@zeppelin.apache.org
>>>
>>>
>>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
>>> instead of [NOTEID]/note.json
>>>
>>>
>>>
>>>
>>>
>>> Thanks for the discussion.
>>>
>>> >>> I'm afraid about non-latin symbols in folder and note name. And
>>> what about hieroglyphs?
>>>
>>> AFAIK, linux allow all the characters to be file name except `\0` and
>>> '/'.  I can create file name with Chinese character in linux, I guess you
>>> can use Russian as well.
>>>
>>>
>>>
>>> >>> If I understand correctly, this is being done solely to speed up
>>> loading list of notebooks? What if a list of notebook names, their ids,
>>> folder structure, etc can be *cached* in a separate small json file? Or
>>> perhaps in a small embedded key-value store, like www.mapdb.org would
>>> do? Just thinking out loud. This would require a way to lazily re-sync the
>>> cache.
>>>
>>>
>>>
>>> This not only to speed up the loading but also make the system
>>> architecture easy to maintain. Because for now we have to build the folder
>>> structure of notes in memory, many code in zeppelin is doing this
>>> (Personally I don't think we need any code for this function if we could
>>> get the folder structure from the note file storage system). Use another
>>> storage to keep the mapping of note name and note id will bring another
>>> classic problem of distributed system: consistency. How do we make sure the
>>> consistency between the real note file and this mapping component. If we
>>> create/rename/remove note, we have to both update the notebook repo and the
>>> mapping storage. Any bug in code would bring inconsistency issue based on
>>> my experience.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Ruslan Dautkhanov <dautkha...@gmail.com>于2018年8月14日周二 上午3:58写道:
>>>
>>> Thanks for bringing this up for discussion. My 2 cents below.
>>>
>>>
>>>
>>> I am with Maksim and Felix on concerns with special characters now
>>> allowed in notebook names, and also concerns with different charsets.
>>> Russian language, for example, most commonly use iso-8859-5, koi-8r/u,
>>> windows-1251 charsets etc. This seems like will bring whole new set of
>>> localization issues.
>>>
>>>
>>>
>>> If I understand correctly, this is being done solely to speed up loading
>>> list of notebooks? What if a list of notebook names, their ids, folder
>>> structure, etc can be *cached* in a separate small json file? Or perhaps in
>>> a small embedded key-value store, like www.mapdb.org would do? Just
>>> thinking out loud. This would require a way to lazily re-sync the cache.
>>>
>>>
>>>
>>> Another way to speed up json reads is to somehow force "name" attribute
>>> to be at the top of the json document that's written to disk. Then
>>> re-implement json files reader to read just header of the file and do a
>>> partial json parse ( or in the lack of options, grab "name" attribute from
>>> the json file header by a regex for example).
>>>
>>>
>>>
>>> Back to filenames and charsets, I think issue may be more complicated,
>>> if you store notebooks on a remote filesystem (nfs/ samba etc), and what if
>>> remote server and local nfs client have differences in default fs charsets?
>>>
>>>
>>>
>>> Ideally would be if all filesystems would use UTF-8 for example, but I
>>> am not certain that's a good assumption to make. Also exposing notebook
>>> names can bring some other issues, like I know some users occasionally add
>>> trailing/leading spaces etc.
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Aug 13, 2018 at 10:38 AM Belousov Maksim Eduardovich <
>>> m.belou...@tinkoff.ru> wrote:
>>>
>>> The use of Russian and other specific letters in the note name is big
>>> advantage of Zeppelin. I would not like to give up this functionality.
>>>
>>>
>>>
>>> I support the idea about `zpln` file extension.
>>>
>>> The folder structure also sounds good.
>>>
>>>
>>>
>>> I'm afraid about non-latin symbols in folder and note name. And what
>>> about hieroglyphs?
>>>
>>>
>>>
>>> Apache Zeppelin may be the first to use Russian letters in file system
>>> in our company.
>>>
>>> I see a lot of risks to use non-latin symbols and a lot of issues to
>>> make new folder structure stable.
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> *От:* Jeff Zhang <zjf...@gmail.com>
>>> *Отправлено:* 13 августа 2018 г. 12:50
>>> *Кому:* users@zeppelin.apache.org
>>> *Тема:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead
>>> of [NOTEID]/note.json
>>>
>>>
>>>
>>> >>> Do we need the note id in the file name at all? What’s wrong with
>>> just note_name.zpln?
>>>
>>> The reason I keep note id is because currently we use noteId to identify
>>> one note. e.g. we use note id in both websocket api and rest api. It is
>>> almost impossible to remove noteId for the current architecture. If we put
>>> note id into file content of note_name.zpln, then we have to read the note
>>> file every time, then we meet the issues I mentioned above again.
>>>
>>>
>>>
>>> >>> If the file content is json then why not use note_name.json instead
>>> of .zpln? That would make it easier for editors to know how to
>>> load/highlight the file contents.
>>>
>>> I am not strongly biased on *.zpln. But I think one purpose is to help
>>> third parties to identify zeppelin note properly. e.g. github can identify
>>> jupyter notebook (*.ipynb) and render it properly.
>>>
>>>
>>>
>>> >>> Is there any reason for not using *real* folders or directories for
>>> organising the notebooks rather than embedding the folder hierarchy in the
>>> names of the notebooks?  If someone wants to ‘move’ the notebooks to
>>> another folder they’d have to manually rename all the files/notebooks at
>>> present.  That’s not very user-friendly.
>>>
>>>
>>>
>>> Actually my proposal is to use real folders. What user see in zeppelin
>>> note menu is the actual notes folder structure. If they want to move the
>>> notebooks to another folder, they can change the folder name just like what
>>> user did in file system.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Partridge, Lucas (GE Aviation) <lucas.partri...@ge.com>于2018年8月13日周一 下午
>>> 4:43写道:
>>>
>>> Hi Jeff,
>>>
>>> I have some questions about this proposal (I can’t edit the design doc):
>>>
>>>
>>>
>>>    1. Do we need the note id in the file name at all? What’s wrong with
>>>    just note_name.zpln?
>>>    2. If the file content is json then why not use note_name.json
>>>    instead of .zpln? That would make it easier for editors to know how to
>>>    load/highlight the file contents.
>>>    3. Is there any reason for not using *real* folders or directories
>>>    for organising the notebooks rather than embedding the folder hierarchy 
>>> in
>>>    the names of the notebooks?  If someone wants to ‘move’ the notebooks to
>>>    another folder they’d have to manually rename all the files/notebooks at
>>>    present.  That’s not very user-friendly.
>>>
>>>
>>>
>>> Thanks, Lucas.
>>>
>>> *From:* Jeff Zhang <zjf...@gmail.com>
>>> *Sent:* 13 August 2018 09:06
>>> *To:* users@zeppelin.apache.org
>>> *Cc:* dev <d...@zeppelin.apache.org>
>>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
>>> instead of [NOTEID]/note.json
>>>
>>>
>>>
>>> In that case, zeppelin should fail to create note.
>>>
>>>
>>>
>>> Felix Cheung <felixcheun...@hotmail.com>于2018年8月13日周一 下午3:47写道:
>>>
>>> Perhaps one concern is users having characters in note name that are
>>> invalid for file name/file path?
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> *From:* Mohit Jaggi <mohitja...@gmail.com>
>>> *Sent:* Sunday, August 12, 2018 6:02 PM
>>> *To:* users@zeppelin.apache.org
>>> *Cc:* dev
>>> *Subject:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
>>> instead of [NOTEID]/note.json
>>>
>>>
>>>
>>> sounds like a good idea!
>>>
>>>
>>>
>>> On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zjf...@gmail.com> wrote:
>>>
>>> Motivation
>>>
>>>    The motivation of ZEPPELIN-2619 is to change the notes storage
>>> structure. Previously we store it using {noteId}/note.json, we’d like to
>>> change it into {note_name}_{note_id}.zpln. There are several reasons for
>>> this change.
>>>
>>>
>>>
>>>    1. {noteId}/note.json is not scalable. We put all notes in one root
>>>    folder in flat structure. And when zeppelin server starts, we need to 
>>> read
>>>    all note.json to get the note file name and build the note folder 
>>> structure
>>>    (Because we need to get the note name which is stored in note.json to 
>>> build
>>>    the notebook menu). This would be a nightmare when you have large amounts
>>>    of notes.
>>>    2. {noteId}/note.json is not maintainable. It is difficult for a
>>>    developer/administrator to find note file based on note name.
>>>    3. {noteId}/note.json has no folder structure. Currently zeppelin
>>>    have to build the folder structure internally in memory according note 
>>> name
>>>    which is a big overhead.
>>>
>>>
>>> New Approach
>>>
>>>    As I mentioned above, I propose to change the note storage structure
>>> to {note_name}_{note_id}.zpln.  note_name could contains folders, e.g.
>>> folder_1/mynote_abcd.zpln
>>>
>>> This kind of note storage structure could bring several benefits.
>>>
>>>    1. We don’t need to load all notes when zeppelin starts. We just
>>>    need to list each folder to get the note name and note_id.
>>>    2. It is much maintainable so that it is easy to find the note file
>>>    based on note name.
>>>    3. It has the folder structure already. That can be mapped to the
>>>    note folder structure.
>>>
>>>
>>> Side Effect
>>>
>>> This approach only works for file system storage, so that means we have
>>> to drop support for MongoNotebookRepo. I think it is ok because I didn’t
>>> see any users talk about this in community, so I assume no one is using it.
>>>
>>>
>>>
>>> This is overall design, welcome any comments and feedback. Thanks.
>>>
>>>
>>>
>>> Here's the google docs, you can also comment it here.
>>>
>>>
>>> https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing
>>>
>>>
>>>
>>>
>>>
>>> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>

Reply via email to