@Jim Klo suggested a nice solution. A document centric database implies embedding what you need inside a document. However it still depends from your actual environment/scope/specific
On Fri, Feb 22, 2013 at 8:04 PM, Jim Klo <[email protected]> wrote: > > interesting discussion! > > jim, yes, in my case nesting won't be very deep so i've been thinking > about your suggestion: > > 1 : orig message > 1-2 : reply to 1 > 1-2-5 : reply to 2 > 1-3 : reply to 1 > 1-3-4 : reply to > > you say "1-3-4" is an ID, right? > > > Yes > > so tell me, if the message with the ID "1-3-4" wants to find out who his > parent is, then we'd have to parse the ID, remove one digit and query the > database for "1-3" to get the parent. do you think this is optimal for the > couchdb? > > > You could do it this way; I've only outlined a strategy for ID's that > sorts in a hierarchal fashion, not a way to determine ancestry. I'd > probably suggest just storing the parent id in the child - depends upon > your actual use case. > > Another alternative, but would require the use of a view, is to store the > parent ancestry as an array in each document instead of using a composite > id ( I believe others have suggested this too): > > { id:1, ancestry: []} > { id:2, ancestry: [1]} > { id:5, ancestry: [1, 2]} > { id:3, ancestry: [1]} > { id:4, ancestry: [1, 3]} > { id:6, ancestry: [1, 2]} > { id:7, ancestry: [1, 3]} > > then a map function: > > function(doc) { > emit(doc.ancestry.concat([doc.id]), null); > } > > and just use the built-in _count reduce if you need to count the number of > nodes in a tree (or sub tree) > > This will create a view that sorts in the same way as the id, but using an > array instead of a string. This view will look something like, noting that > the key is what's sorted: > > { > rows: [ > { id: 1, key: [ 1 ], value= null}, > { id: 2, key: [ 1, 2 ], value= null}, > { id: 5, key: [ 1, 2, 5 ], value= null}, > { id: 6, key: [ 1, 2, 6 ], value= null}, > { id: 3, key: [ 1, 3 ], value= null}, > { id: 4, key: [ 1, 3, 4 ], value= null}, > { id: 7, key: [ 1, 3, 7 ], value= null}, > ] > } > > you can then query an entire tree with startkey=[1]&endkey=[1,{}] which > will give you all the descendants for the one tree with one query. > fetching a sub tree works similarly: startkey=[1,2]&endkey=[1,2,{}] > > and you will have equivalent functionality to using a composite id with > possibly a simpler to comprehend mechanism for building queries and > creating ids at the expense of creating a view... > > you'll probably also want a view that just lists root messages too… that > would be: > > function(doc) { > if (doc.ancestry.length == 0) > emit(doc.id, null); > } > > - Jim > > > * > Jim Klo > Senior Software Engineer > Center for Software Engineering > SRI International > * > * > t. @nsomnac > * > > > On 22/02/13 20:57, Jim Klo wrote: > > On Feb 21, 2013, at 11:31 PM, "Elisiano Petrini" <[email protected]> > wrote: > > I understand the concept, but on the same level there could be more than > one child. Using this as an ID would work only if you put something > unique to the node in composing the ID. > > I don't recall multiple nodes being a constraint in the problem, even then > it's solvable using sharded ID's - the reality is there's always finite > number of nodes. The only thing that changes is that tree won't sort > naturally in the order they arrive. Remember, the parent node has to exist > everywhere for a collision to occur. > > Generating a sequential ID (per tree level) depending on the order of > arrival might lead to collisions (if messages arrive at the same time). > > Sure, but only if you have a badly designed sequence generators that are > not designed to operate in a sharded manner. A simple way is to > pre-allocate a pool of ID's to a process (say 500 ID's per process). No > other process should be able to use another process' allocation. Once > exhausted, the process fetches another pool of IDs. This may not be ideal > in a mobile scenario, but would probably still work, you just might have to > devise a smarter allocation algorithm based upon usage and connectivity > frequencies. (And now I've really digressed) This is a very common solution > done in the RDBMS world to combat ID conflicts. > > > Anyhow, I feel we're digressing here :) > > It is worth discussion as not everyone is experienced in massively > parallel systems. :) > > If the originator has only a single node - they shouldn't have any > conflict issues with the simplest solution. > > > > On Fri, 2013-02-22 at 16:23 +0900, svilen wrote: > > that's the path to root stored as id > so answers to 1-2 will be say 1-2-5 1-2-6 1-2-33 > > Hi Kim, > I don't think using this kind of documents' ids is a viable > solution: what if there is more that one message in response to > another (ie:several people answering to the same message)? > > On Fri, 2013-02-22 at 07:00 +0000, Jim Klo wrote: > > In theory you mention infinite depth, but is that realistic? > > A simple way is to make ID's a composite serial and chain them > together. > > 1 : orig message > 1-2 : reply to 1 > 1-2-5 : reply to 2 > 1-3 : reply to 1 > 1-3-4 : reply to > > The keys will then sort in order, such that can build the tree DFS > and use range queries to get just segments. > > You can obviously do some creative things in compression - using > different number bases to handle deep trees. > > > > On Feb 21, 2013, at 10:10 PM, "Michael Heuberger" > <[email protected]> wrote: > > andrey, svilen, many thanks again. > > agreed, storing immediate children in a document is a bad idea. i > think i go for storing only an immediate parent id to get an > infinite tree depth. > > thanks for your inspirations guys > > cheers > michael > > On 22/02/13 18:49, Andrey Kuprianov wrote: > > PS - storing only an immediate parent id would be the solution > for an infinite tree depth. > > > On Fri, Feb 22, 2013 at 1:48 PM, Andrey Kuprianov < > [email protected]> wrote: > > Storing immediate children in a parent is a bad idea in a sense > that you will never be able to traverse your tree back to the > root, given a child node. > > Storing all in one doc - you will run out of space before you > know it. Documents have a finite length and the larger the > document, the slower it will get to serialize and unserialize > it. > > Storing just an immediate parent in children might result in a > costly multiple read operations, if you want to get root of the > tree given an arbitrary child. > > Therefore the only feasible solution is to store references to > the parent chain - i.e. ancestors - starting from an immediate > parent and ending with a tree root. > > > Here's a simple calculation for ancestors approach: > > Suppose the maximum document size is 1MB. A uuid is 32 > characters in length + taking into account commas and brackets, > then the depth of your tree (I am not taking content into > account) could be roughly: > > ((32 + 1)*n + 2 - 1) = 1024 * 1024 ==> n = 31775 levels > > However, max document size is configurable from the settings > (as of CouchDB 1.2) and could easily be up to 4GB, giving you > over 130mil of levels. Should be enough... :) > > > > On Fri, Feb 22, 2013 at 1:14 PM, Michael Heuberger < > [email protected]> wrote: > > thanks andrey and svil > > depth is infinite. in theory a message can have million > replies forth and back. similar like a very long email thread. > > i think storing ancestors ą la andrey would be too much hassle > but works well for few levels. > > svil, what did you mean with dicts and with keeping all > children in one doc? can you tell me more? > > thanks, > michael > > > On 22/02/13 17:27, svilen wrote: > > my rough guess: > > if it's finite depth, then all in one doc: > - list of (item or (list of ...)) > - or same with dicts > > else, one doc per message keeping just link-to-parent, > or multiple links-to-grand...grand-parents and/or root. > similar to the strategies in SQL - nested etc. > keeping all chidlren of same node in one doc is also > possible.. > > in any case either traversal or storing or both will be > difficult. > > ciao > svil > > On Fri, 22 Feb 2013 17:13:01 +1300 > Michael Heuberger > <michael.heuberger@**binarykitchen.com<[email protected] > >> > wrote: > > Hello guys > > I'd like to store messages in a tree like structure. > Whenever you reply to a message, then the original message > becomes the parent message. > > How would you implement something like this in CouchDB? Just > curious and need a little guidance + advice here. > > Thanks, > Michael > > -- > > Binary Kitchen > Michael Heuberger > 4c Dunbar Road > Mt Eden > Auckland 1024 > (New Zealand) > > Mobile (text only) ... +64 21 261 89 81 > Email ................ [email protected] > Website .............. http://www.binarykitchen.com > > -- > > Binary Kitchen > Michael Heuberger > 4c Dunbar Road > Mt Eden > Auckland 1024 > (New Zealand) > > Mobile (text only) ... +64 21 261 89 81 > Email ................ [email protected] > Website .............. http://www.binarykitchen.com > > -- > > Binary Kitchen > Michael Heuberger > 4c Dunbar Road > Mt Eden > Auckland 1024 > (New Zealand) > > Mobile (text only) ... +64 21 261 89 81 > Email ................ [email protected] > Website .............. http://www.binarykitchen.com > > > > -- > > Binary Kitchen > Michael Heuberger > 4c Dunbar Road > Mt Eden > Auckland 1024 > (New Zealand) > > Mobile (text only) ... +64 21 261 89 81 > Email ................ [email protected] > Website .............. http://www.binarykitchen.com > > >
