Re: Tree like structures in CouchDB

Jim Klo Fri, 22 Feb 2013 14:43:11 -0800


> I wanted to give my feedback about what I've learned in this area.
> 
> First, I don't use the doc _id at all for sorting docs. It solves one single 
> use-case, but fails if you have others, so instead, I do this:
> 
> Every doc, whether the parent or child has identifying information. So a 
> child might contain the parent id, thread id, etc. Parent doesn't need to 
> know about it's children so it doesn't matter, as those can be pulled in a 
> single view query.
> 
> Say I want to do something as originally stated, I'd create a view where I 
> emit([parent_id, next_level_id, next_level_id], null) with default values for 
> the latter nested levels being 0 by default. When I query the view, I get 
> back a result set that would look like the following.
> [
> {"id":"0f1e244b14452a884f3dfa5b4086f793","key":[1, 0, 0],"value":null}, <- 
> parent
> {"id":"27f4c6bb9bcaad331e68f80629bffa6e","key":[1, 1, 0],"value":null}, <- 
> first level
> {"id":"46c17a23254c2dcce0860b4c398e0009","key":[1, 1, 1],"value":null}, <- 
> first item in first level
> {"id":"95903e4c2e2cbb5e2dfbc934adf6095f","key":[1, 1, 2],"value":null} <- 
> second item in first level
> ]
>


you would still need to track ancestry in most cases,… the second solution 
makes that possible… also your example only works for a single 'giant' tree, 
unless I'm missing something… and not a forest. I'm also not seeing  how you 
would get all the nodes without having to execute a query for every node on the 
tree - which is pretty inefficient IMHO

also as others have noted - keeping track of an independent serial, for the 
sake of just ordering the tree, with concurrency would be a real challenge; 
which is why I use serial ID's.


> To pull the entire thread based on the parent query is simply 
> startkey=[1,0,0]&endkey=[1,{}]

then is your parent_id, really a root_id?  Then I'm really confused how you 
would use this with trees at all…  I'm not sure how you model as I'd get 
duplicates from which I could never use to reconstruct the tree:

- A                                     A root                          [A, 0, 
0]       
        - B                             1st child of A                  [A, 1, 
1]       
                - C                     1st child of B                  [A, 2, 
1] ???
                - D                     2nd child of B          [A, 2, 2] ???
        - E                             2nd child of A          [A, 1, 2]
                - F                     1st child of E                  [A, 2, 
1] ??? 
                        -G              1st child of F                  [A, 3, 
1]
                - H                     2nd child of E          [A, 2, 2] ??? 
        

> 
> The advantage of this approach is simply that say I want to display a list of 
> all posts by user for a specific thread, I can create a view where I 
> emit([parent_id, user_id, comment_id], null)

you could do this with either approach, it's not really an advantage.

> 
> This gives the ability to pull a specific comment for a user based on user_id 
> and thread_id, or an entire list of comments based on user_id. These sorts of 
> indexes are very cheap and flexible. You never have to mess with creating 
> your own custom id system. Of course, the tradeoff is that you have to do 
> your own conflict resolution for async operations with thread ids if you want 
> them to increment. Better solution here is to use both timestamp and user_id 
> for the actual comment to ensure it is unique and still sorts well.
> 

again serial id's solve that (not the default UUID's couchdb issues, AFAIK they 
are not incremental, however I could be wrong), is there a reason you want to 
avoid having a smart id?

FWIW: this all seems like deja vu 
http://markmail.org/search/list:org.apache.couchdb.user+modelling+a+tree+in+couchdb+date:201112-201201+

smime.p7s
Description: S/MIME cryptographic signature

Re: Tree like structures in CouchDB

Reply via email to