I also agree to keep noteId, lots of zeppelin api use noteid to identify
note.
Removing it would cause lots of changes.


andreas.we...@gmail.com <andreas.we...@gmail.com>于2018年9月4日周二 上午3:03写道:

> Somehow subject was deleted in earlier mail. So here again for adding my
> thoughts to the proper thread:
>
> Sure, there might exist some kind of naming policy (or let's better call
> it naming convention) in zeppelin multiuser environments. But as long as
> there is no way to technically enforce a certain naming convention to turn
> it into a naming policy (which would be a nice idea BTW), it's IMHO a vague
> assumption, that there exist policies and users are following these. I'm
> thinking here of problems that might come up when changing the existing
> implementation and then deal with migration, because assumptions do not
> match reality.
>
> My real life scenario here is, that zeppelin can be configured to make
> notebooks visible only to the owner (and invisible to any other user) by
> default: ZEPPELIN_NOTEBOOK_PUBLIC=false, which is IMHO a good idea when
> setting up zeppelin as multiuser environment in larger scenarios. In this
> case note owners can use any or no naming convention they like when
> creating and using a note for personal purposes only, because only the
> owner will see it - also and even if there exists certain naming policies
> on an global organisation level. A global naming convention must only be
> followed when users start sharing notes (means: adding at least reader
> permissions to any other user).
>
> So I think noteId is a must have in the filename.
>
> Andreas
>
> On 2018/08/31 01:46:38, Jongyoul Lee <jongy...@gmail.com> wrote:
> > Hi,
> >
> > I have a bit different thoughts about the conflicts of the name of a new
> > note created. In a multiuser environment, AFAIK, most teams and
> companies,
> > generally, use a prefix for the group policy internally. In my case,
> > user/{user_id}/{notebook_name_they_want}.zpln. In this case, naming
> > conflicts rarely happen. And it will be stored under a specific folder.
> If
> > someone needed two different same named notes in the same directory, I
> > might not be appropriate. WDYT?
> >
> > JL
> >
> > On Fri, Aug 31, 2018 at 4:44 AM, andreas.we...@gmail.com <
> > andreas.we...@gmail.com> wrote:
> >
> > > another reason for keeping noteId is uniqueness in case of multi-user
> > > environments. In that case users have separate zeppelin workspaces,
> which
> > > is something we are using in production: see
> ZEPPELIN_NOTEBOOK_PUBLIC=false
> > > in the doc [1]. In that case users might be very confused when they
> can not
> > > create notebooks with a name that already exists, but they most likely
> > > don't see (yet).
> > >
> > > So I like the proposal {note_name}_{note_id}.zpln. where note_name
> could
> > > contains folders, e.g. folder_1/mynote_abcd.zpln. Even though I like
> > > {note_name}.{note_id}.zpln (dot in between note_name and note_id) even
> > > better :-)
> > >
> > > Regards
> > > Andreas
> > >
> > >
> > > [1] http://zeppelin.apache.org/docs/0.8.0/setup/security/
> > >
> notebook_authorization.html#separate-notebook-workspaces-public-vs-private
> > >
> > > On 2018/08/18 08:42:44, Jeff Zhang <zjf...@gmail.com> wrote:
> > > > BTW, I also prefer to use note name as identify of note if the issue
> I
> > > > mentioned before is acceptable for most of users.
> > > >
> > > >
> > > >
> > > > Jeff Zhang <zjf...@gmail.com>于2018年8月18日周六 下午4:40写道:
> > > >
> > > > >
> > > > > I am afraid we can not remove noteId, as noteId is the unique
> > > identifier
> > > > > of note and is immutable which is used in a lot places, such as
> > > paragraph
> > > > > share and rest api.
> > > > > If we use note name as note id then it may break user's app if note
> > > name
> > > > > is changed
> > > > >
> > > > >
> > > > > Jongyoul Lee <jongy...@gmail.com>于2018年8月18日周六 下午2:33写道:
> > > > >
> > > > >> Hi, thanks for this kind of discussion.
> > > > >>
> > > > >> About noteId, How about changing note id to note name? AFAIK,
> Note id
> > > is
> > > > >> just an identifier and we can set any value to it.
> > > > >>
> > > > >> There’re two potential problems. We should be more careful to
> handle
> > > note
> > > > >> id as it could have very various type of characters. And Second,
> in
> > > case
> > > > >> where someone changes a note name, those who are seeing and
> updating
> > > the
> > > > >> same note wouldn’t access that note. We could handle it by using
> > > websockets.
> > > > >>
> > > > >> WDYT?
> > > > >>
> > > > >> On Tue, 14 Aug 2018 at 6:14 PM Jeff Zhang <zjf...@gmail.com>
> wrote:
> > > > >>
> > > > >>> >>> But I’m still not comfortable with note ids in the name of
> the
> > > > >>> notebook itself.  Those names would look ugly if you shared your
> > > notebooks
> > > > >>> on github for example.  You don’t see Jupyter notebooks with
> names
> > > like
> > > > >>> that. If you have to keep the note ids with the notebooks could
> you
> > > not
> > > > >>> simply put the note id at the top of the notebook as Ruslan
> > > suggested? Then
> > > > >>> you’d only have to read the first line of each notebook.
> > > > >>>
> > > > >>> I know putting note_id in the note file name is not so elegant,
> but
> > > this
> > > > >>> is what we have to compromise to keep compatibility as we use
> noteId
> > > to
> > > > >>> uniquely identify note right now. And I don't think putting
> noteId
> > > in the
> > > > >>> top first line of note would help much. We still have to read
> note
> > > files
> > > > >>> which take much more time than just read the file names via file
> > > system.
> > > > >>>
> > > > >>> Regarding the readability of note file name, I think it won't
> affect
> > > > >>> much. E.g. This is the note book file name like:  *My Project/My
> > > Spark
> > > > >>> Tutorial Note_2A94M5J1Z.zpln*
> > > > >>> What user see in notebook menu is still *My Project/My Spark
> > > Tutorial* *Note
> > > > >>> *which is no difference from what we see now.
> > > > >>>
> > > > >>> And thanks again for the feedback and comments, I am so glad to
> see
> > > so
> > > > >>> many discussion in community.
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> Partridge, Lucas (GE Aviation) <lucas.partri...@ge.com
> >于2018年8月14日周二
> > > > >>> 下午4:29写道:
> > > > >>>
> > > > >>>> I agree you’re inviting consistency issues if you maintained a
> > > separate
> > > > >>>> note id-to-note name mapping file.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> But I’m still not comfortable with note ids in the name of the
> > > notebook
> > > > >>>> itself.  Those names would look ugly if you shared your
> notebooks
> > > on github
> > > > >>>> for example.  You don’t see Jupyter notebooks with names like
> > > that.  If you
> > > > >>>> have to keep the note ids with the notebooks could you not
> simply
> > > put the
> > > > >>>> note id at the top of the notebook as Ruslan suggested? Then
> you’d
> > > only
> > > > >>>> have to read the first line of each notebook.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Presumably if you copied the notebooks to another Zeppelin
> server
> > > they
> > > > >>>> would be restored with the same note ids there too? And
> hopefully
> > > there
> > > > >>>> would be no id clash with notebooks already on that server…
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> *From:* Jeff Zhang <zjf...@gmail.com>
> > > > >>>> *Sent:* 14 August 2018 03:49
> > > > >>>> *To:* users@zeppelin.apache.org
> > > > >>>>
> > > > >>>>
> > > > >>>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in
> > > [Title].zpln
> > > > >>>> instead of [NOTEID]/note.json
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Thanks for the discussion.
> > > > >>>>
> > > > >>>> >>> I'm afraid about non-latin symbols in folder and note name.
> And
> > > > >>>> what about hieroglyphs?
> > > > >>>>
> > > > >>>> AFAIK, linux allow all the characters to be file name except
> `\0`
> > > and
> > > > >>>> '/'.  I can create file name with Chinese character in linux, I
> > > guess you
> > > > >>>> can use Russian as well.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> >>> If I understand correctly, this is being done solely to
> speed up
> > > > >>>> loading list of notebooks? What if a list of notebook names,
> their
> > > ids,
> > > > >>>> folder structure, etc can be *cached* in a separate small json
> > > file? Or
> > > > >>>> perhaps in a small embedded key-value store, like www.mapdb.org
> > > would
> > > > >>>> do? Just thinking out loud. This would require a way to lazily
> > > re-sync the
> > > > >>>> cache.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> This not only to speed up the loading but also make the system
> > > > >>>> architecture easy to maintain. Because for now we have to build
> the
> > > folder
> > > > >>>> structure of notes in memory, many code in zeppelin is doing
> this
> > > > >>>> (Personally I don't think we need any code for this function if
> we
> > > could
> > > > >>>> get the folder structure from the note file storage system). Use
> > > another
> > > > >>>> storage to keep the mapping of note name and note id will bring
> > > another
> > > > >>>> classic problem of distributed system: consistency. How do we
> make
> > > sure the
> > > > >>>> consistency between the real note file and this mapping
> component.
> > > If we
> > > > >>>> create/rename/remove note, we have to both update the notebook
> repo
> > > and the
> > > > >>>> mapping storage. Any bug in code would bring inconsistency issue
> > > based on
> > > > >>>> my experience.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Ruslan Dautkhanov <dautkha...@gmail.com>于2018年8月14日周二 上午3:58写道:
> > > > >>>>
> > > > >>>> Thanks for bringing this up for discussion. My 2 cents below.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> I am with Maksim and Felix on concerns with special characters
> now
> > > > >>>> allowed in notebook names, and also concerns with different
> > > charsets.
> > > > >>>> Russian language, for example, most commonly use iso-8859-5,
> > > koi-8r/u,
> > > > >>>> windows-1251 charsets etc. This seems like will bring whole new
> set
> > > of
> > > > >>>> localization issues.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> If I understand correctly, this is being done solely to speed up
> > > > >>>> loading list of notebooks? What if a list of notebook names,
> their
> > > ids,
> > > > >>>> folder structure, etc can be *cached* in a separate small json
> > > file? Or
> > > > >>>> perhaps in a small embedded key-value store, like www.mapdb.org
> > > would
> > > > >>>> do? Just thinking out loud. This would require a way to lazily
> > > re-sync the
> > > > >>>> cache.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Another way to speed up json reads is to somehow force "name"
> > > attribute
> > > > >>>> to be at the top of the json document that's written to disk.
> Then
> > > > >>>> re-implement json files reader to read just header of the file
> and
> > > do a
> > > > >>>> partial json parse ( or in the lack of options, grab "name"
> > > attribute from
> > > > >>>> the json file header by a regex for example).
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Back to filenames and charsets, I think issue may be more
> > > complicated,
> > > > >>>> if you store notebooks on a remote filesystem (nfs/ samba etc),
> and
> > > what if
> > > > >>>> remote server and local nfs client have differences in default
> fs
> > > charsets?
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Ideally would be if all filesystems would use UTF-8 for example,
> > > but I
> > > > >>>> am not certain that's a good assumption to make. Also exposing
> > > notebook
> > > > >>>> names can bring some other issues, like I know some users
> > > occasionally add
> > > > >>>> trailing/leading spaces etc.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> On Mon, Aug 13, 2018 at 10:38 AM Belousov Maksim Eduardovich <
> > > > >>>> m.belou...@tinkoff.ru> wrote:
> > > > >>>>
> > > > >>>> The use of Russian and other specific letters in the note name
> is
> > > big
> > > > >>>> advantage of Zeppelin. I would not like to give up this
> > > functionality.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> I support the idea about `zpln` file extension.
> > > > >>>>
> > > > >>>> The folder structure also sounds good.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> I'm afraid about non-latin symbols in folder and note name. And
> what
> > > > >>>> about hieroglyphs?
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Apache Zeppelin may be the first to use Russian letters in file
> > > system
> > > > >>>> in our company.
> > > > >>>>
> > > > >>>> I see a lot of risks to use non-latin symbols and a lot of
> issues to
> > > > >>>> make new folder structure stable.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> ------------------------------
> > > > >>>>
> > > > >>>> *От:* Jeff Zhang <zjf...@gmail.com>
> > > > >>>> *Отправлено:* 13 августа 2018 г. 12:50
> > > > >>>> *Кому:* users@zeppelin.apache.org
> > > > >>>> *Тема:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
> > > instead
> > > > >>>> of [NOTEID]/note.json
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> >>> Do we need the note id in the file name at all? What’s wrong
> > > with
> > > > >>>> just note_name.zpln?
> > > > >>>>
> > > > >>>> The reason I keep note id is because currently we use noteId to
> > > > >>>> identify one note. e.g. we use note id in both websocket api and
> > > rest api.
> > > > >>>> It is almost impossible to remove noteId for the current
> > > architecture. If
> > > > >>>> we put note id into file content of note_name.zpln, then we
> have to
> > > read
> > > > >>>> the note file every time, then we meet the issues I mentioned
> above
> > > again.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> >>> If the file content is json then why not use note_name.json
> > > instead
> > > > >>>> of .zpln? That would make it easier for editors to know how to
> > > > >>>> load/highlight the file contents.
> > > > >>>>
> > > > >>>> I am not strongly biased on *.zpln. But I think one purpose is
> to
> > > help
> > > > >>>> third parties to identify zeppelin note properly. e.g. github
> can
> > > identify
> > > > >>>> jupyter notebook (*.ipynb) and render it properly.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> >>> Is there any reason for not using *real* folders or
> directories
> > > > >>>> for organising the notebooks rather than embedding the folder
> > > hierarchy in
> > > > >>>> the names of the notebooks?  If someone wants to ‘move’ the
> > > notebooks to
> > > > >>>> another folder they’d have to manually rename all the
> > > files/notebooks at
> > > > >>>> present.  That’s not very user-friendly.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Actually my proposal is to use real folders. What user see in
> > > zeppelin
> > > > >>>> note menu is the actual notes folder structure. If they want to
> > > move the
> > > > >>>> notebooks to another folder, they can change the folder name
> just
> > > like what
> > > > >>>> user did in file system.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Partridge, Lucas (GE Aviation) <lucas.partri...@ge.com
> >于2018年8月13日周一
> > > 下午
> > > > >>>> 4:43写道:
> > > > >>>>
> > > > >>>> Hi Jeff,
> > > > >>>>
> > > > >>>> I have some questions about this proposal (I can’t edit the
> design
> > > doc):
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>    1. Do we need the note id in the file name at all? What’s
> wrong
> > > > >>>>    with just note_name.zpln?
> > > > >>>>    2. If the file content is json then why not use
> note_name.json
> > > > >>>>    instead of .zpln? That would make it easier for editors to
> know
> > > how to
> > > > >>>>    load/highlight the file contents.
> > > > >>>>    3. Is there any reason for not using *real* folders or
> > > directories
> > > > >>>>    for organising the notebooks rather than embedding the folder
> > > hierarchy in
> > > > >>>>    the names of the notebooks?  If someone wants to ‘move’ the
> > > notebooks to
> > > > >>>>    another folder they’d have to manually rename all the
> > > files/notebooks at
> > > > >>>>    present.  That’s not very user-friendly.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Thanks, Lucas.
> > > > >>>>
> > > > >>>> *From:* Jeff Zhang <zjf...@gmail.com>
> > > > >>>> *Sent:* 13 August 2018 09:06
> > > > >>>> *To:* users@zeppelin.apache.org
> > > > >>>> *Cc:* dev <d...@zeppelin.apache.org>
> > > > >>>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in
> > > [Title].zpln
> > > > >>>> instead of [NOTEID]/note.json
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> In that case, zeppelin should fail to create note.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Felix Cheung <felixcheun...@hotmail.com>于2018年8月13日周一 下午3:47写道:
> > > > >>>>
> > > > >>>> Perhaps one concern is users having characters in note name
> that are
> > > > >>>> invalid for file name/file path?
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> ------------------------------
> > > > >>>>
> > > > >>>> *From:* Mohit Jaggi <mohitja...@gmail.com>
> > > > >>>> *Sent:* Sunday, August 12, 2018 6:02 PM
> > > > >>>> *To:* users@zeppelin.apache.org
> > > > >>>> *Cc:* dev
> > > > >>>> *Subject:* Re: [DISCUSS] ZEPPELIN-2619. Save note in
> [Title].zpln
> > > > >>>> instead of [NOTEID]/note.json
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> sounds like a good idea!
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zjf...@gmail.com>
> > > wrote:
> > > > >>>>
> > > > >>>> Motivation
> > > > >>>>
> > > > >>>>    The motivation of ZEPPELIN-2619 is to change the notes
> storage
> > > > >>>> structure. Previously we store it using {noteId}/note.json, we’d
> > > like to
> > > > >>>> change it into {note_name}_{note_id}.zpln. There are several
> > > reasons for
> > > > >>>> this change.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>    1. {noteId}/note.json is not scalable. We put all notes in
> one
> > > root
> > > > >>>>    folder in flat structure. And when zeppelin server starts, we
> > > need to read
> > > > >>>>    all note.json to get the note file name and build the note
> > > folder structure
> > > > >>>>    (Because we need to get the note name which is stored in
> > > note.json to build
> > > > >>>>    the notebook menu). This would be a nightmare when you have
> > > large amounts
> > > > >>>>    of notes.
> > > > >>>>    2. {noteId}/note.json is not maintainable. It is difficult
> for a
> > > > >>>>    developer/administrator to find note file based on note name.
> > > > >>>>    3. {noteId}/note.json has no folder structure. Currently
> zeppelin
> > > > >>>>    have to build the folder structure internally in memory
> > > according note name
> > > > >>>>    which is a big overhead.
> > > > >>>>
> > > > >>>>
> > > > >>>> New Approach
> > > > >>>>
> > > > >>>>    As I mentioned above, I propose to change the note storage
> > > structure
> > > > >>>> to {note_name}_{note_id}.zpln.  note_name could contains
> folders,
> > > e.g.
> > > > >>>> folder_1/mynote_abcd.zpln
> > > > >>>>
> > > > >>>> This kind of note storage structure could bring several
> benefits.
> > > > >>>>
> > > > >>>>    1. We don’t need to load all notes when zeppelin starts. We
> just
> > > > >>>>    need to list each folder to get the note name and note_id.
> > > > >>>>    2. It is much maintainable so that it is easy to find the
> note
> > > file
> > > > >>>>    based on note name.
> > > > >>>>    3. It has the folder structure already. That can be mapped
> to the
> > > > >>>>    note folder structure.
> > > > >>>>
> > > > >>>>
> > > > >>>> Side Effect
> > > > >>>>
> > > > >>>> This approach only works for file system storage, so that means
> we
> > > have
> > > > >>>> to drop support for MongoNotebookRepo. I think it is ok because
> I
> > > didn’t
> > > > >>>> see any users talk about this in community, so I assume no one
> is
> > > using it.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> This is overall design, welcome any comments and feedback.
> Thanks.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Here's the google docs, you can also comment it here.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8T
> > > ATYcGkDL1DNZoE/edit?usp=sharing
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> --
> > > > >> 이종열, Jongyoul Lee, 李宗烈
> > > > >> http://madeng.net
> > > > >>
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > 이종열, Jongyoul Lee, 李宗烈
> > http://madeng.net
> >
>

Reply via email to