Michael, It's hard to gauge what's best - as it depends on your use case... Possibly provide more details?
Q: What are your navigation requirements? Drill up/down, jump to other trees, etc. Q: Are you building or in need of a dynamic tree control? Do you want to lazy load the elements? Q: Do you need to find specific subtrees? I've built some tree visualizations with hundreds of thousands of nodes - the reality is you can't actually physically display anything human readable with even a few thousand tree nodes. If you can start defining how you need to use the data - the best model for storage and retrieval will weed itself out. FWIW, I'd like to understand Wendall's solution, as I don't understand how to recreate the tree from his description. Sorry Wendall! :(. I swear I've seen you describe this on list before, but couldn't find it in the archives. I think both potentially are reasonable approaches, and both have pros and cons for usage. - Jim On Feb 22, 2013, at 4:29 PM, "Michael Heuberger" <[email protected]> wrote: > Sorry guys, but now you left me a bit confused. > > Again, what's the recommendation to implement tree structures for messages in > CouchDB? > > Cheers > Michael > > On 23/02/13 11:42, Jim Klo wrote: >> >>> I wanted to give my feedback about what I've learned in this area. >>> >>> First, I don't use the doc _id at all for sorting docs. It solves one >>> single use-case, but fails if you have others, so instead, I do this: >>> >>> Every doc, whether the parent or child has identifying information. So a >>> child might contain the parent id, thread id, etc. Parent doesn't need to >>> know about it's children so it doesn't matter, as those can be pulled in a >>> single view query. >>> >>> Say I want to do something as originally stated, I'd create a view where I >>> emit([parent_id, next_level_id, next_level_id], null) with default values >>> for the latter nested levels being 0 by default. When I query the view, I >>> get back a result set that would look like the following. >>> [ >>> {"id":"0f1e244b14452a884f3dfa5b4086f793","key":[1, 0, 0],"value":null}, <- >>> parent >>> {"id":"27f4c6bb9bcaad331e68f80629bffa6e","key":[1, 1, 0],"value":null}, <- >>> first level >>> {"id":"46c17a23254c2dcce0860b4c398e0009","key":[1, 1, 1],"value":null}, <- >>> first item in first level >>> {"id":"95903e4c2e2cbb5e2dfbc934adf6095f","key":[1, 1, 2],"value":null} <- >>> second item in first level >>> ] >> you would still need to track ancestry in most cases,… the second solution >> makes that possible… also your example only works for a single 'giant' tree, >> unless I'm missing something… and not a forest. I'm also not seeing how you >> would get all the nodes without having to execute a query for every node on >> the tree - which is pretty inefficient IMHO >> >> also as others have noted - keeping track of an independent serial, for the >> sake of just ordering the tree, with concurrency would be a real challenge; >> which is why I use serial ID's. >> >> >>> To pull the entire thread based on the parent query is simply >>> startkey=[1,0,0]&endkey=[1,{}] >> then is your parent_id, really a root_id? Then I'm really confused how you >> would use this with trees at all… I'm not sure how you model as I'd get >> duplicates from which I could never use to reconstruct the tree: >> >> - A A root [A, 0, 0] >> - B 1st child of A [A, 1, 1] >> - C 1st child of B [A, 2, 1] ??? >> - D 2nd child of B [A, 2, 2] ??? >> - E 2nd child of A [A, 1, 2] >> - F 1st child of E [A, 2, 1] ??? >> -G 1st child of F [A, 3, 1] >> - H 2nd child of E [A, 2, 2] ??? >> >> >>> The advantage of this approach is simply that say I want to display a list >>> of all posts by user for a specific thread, I can create a view where I >>> emit([parent_id, user_id, comment_id], null) >> you could do this with either approach, it's not really an advantage. >> >>> This gives the ability to pull a specific comment for a user based on >>> user_id and thread_id, or an entire list of comments based on user_id. >>> These sorts of indexes are very cheap and flexible. You never have to mess >>> with creating your own custom id system. Of course, the tradeoff is that >>> you have to do your own conflict resolution for async operations with >>> thread ids if you want them to increment. Better solution here is to use >>> both timestamp and user_id for the actual comment to ensure it is unique >>> and still sorts well. >> again serial id's solve that (not the default UUID's couchdb issues, AFAIK >> they are not incremental, however I could be wrong), is there a reason you >> want to avoid having a smart id? >> >> FWIW: this all seems like deja vu >> http://markmail.org/search/list:org.apache.couchdb.user+modelling+a+tree+in+couchdb+date:201112-201201+ > > -- > > Binary Kitchen > Michael Heuberger > 4c Dunbar Road > Mt Eden > Auckland 1024 > (New Zealand) > > Mobile (text only) ... +64 21 261 89 81 > Email ................ [email protected] > Website .............. http://www.binarykitchen.com >
