I'd love to see the various key/parent ID structure/query strategy proposals that have been mentioned in this thread summarized on a wiki, with a chart of which requirements mentioned below each fulfills. That would be awesome.
Tim On 02/22/2013 11:02 PM, Michael Heuberger wrote: > Thanks Jim. > > Ok, here are more details / requirements for my needs: > > * Very important is to get the parents and their parents above, up to > about 200 levels I'd say. > * On the other hand, for parents it's not that important to know their > children. > * Yes, lazy loading please. There is no need for dynamic tree control. > * I don't need to find specific subtrees. > > That's all. Any suggestions? > > Michael > > On 23/02/13 17:42, Jim Klo wrote: >> Michael, >> >> It's hard to gauge what's best - as it depends on your use case... >> Possibly provide more details? >> >> Q: What are your navigation requirements? Drill up/down, jump to >> other trees, etc. >> >> Q: Are you building or in need of a dynamic tree control? Do you want >> to lazy load the elements? >> >> Q: Do you need to find specific subtrees? I've built some tree >> visualizations with hundreds of thousands of nodes - the reality is >> you can't actually physically display anything human readable with >> even a few thousand tree nodes. >> >> If you can start defining how you need to use the data - the best >> model for storage and retrieval will weed itself out. >> >> FWIW, I'd like to understand Wendall's solution, as I don't >> understand how to recreate the tree from his description. Sorry >> Wendall! :(. I swear I've seen you describe this on list before, but >> couldn't find it in the archives. >> >> I think both potentially are reasonable approaches, and both have >> pros and cons for usage. >> >> - Jim >> >> On Feb 22, 2013, at 4:29 PM, "Michael Heuberger" >> <[email protected]> wrote: >> >>> Sorry guys, but now you left me a bit confused. >>> >>> Again, what's the recommendation to implement tree structures for >>> messages in CouchDB? >>> >>> Cheers >>> Michael >>> >>> On 23/02/13 11:42, Jim Klo wrote: >>>>> I wanted to give my feedback about what I've learned in this area. >>>>> >>>>> First, I don't use the doc _id at all for sorting docs. It solves >>>>> one single use-case, but fails if you have others, so instead, I >>>>> do this: >>>>> >>>>> Every doc, whether the parent or child has identifying >>>>> information. So a child might contain the parent id, thread id, >>>>> etc. Parent doesn't need to know about it's children so it doesn't >>>>> matter, as those can be pulled in a single view query. >>>>> >>>>> Say I want to do something as originally stated, I'd create a view >>>>> where I emit([parent_id, next_level_id, next_level_id], null) with >>>>> default values for the latter nested levels being 0 by default. >>>>> When I query the view, I get back a result set that would look >>>>> like the following. >>>>> [ >>>>> {"id":"0f1e244b14452a884f3dfa5b4086f793","key":[1, 0, >>>>> 0],"value":null}, <- parent >>>>> {"id":"27f4c6bb9bcaad331e68f80629bffa6e","key":[1, 1, >>>>> 0],"value":null}, <- first level >>>>> {"id":"46c17a23254c2dcce0860b4c398e0009","key":[1, 1, >>>>> 1],"value":null}, <- first item in first level >>>>> {"id":"95903e4c2e2cbb5e2dfbc934adf6095f","key":[1, 1, >>>>> 2],"value":null} <- second item in first level >>>>> ] >>>> you would still need to track ancestry in most cases,… the second >>>> solution makes that possible… also your example only works for a >>>> single 'giant' tree, unless I'm missing something… and not a >>>> forest. I'm also not seeing how you would get all the nodes >>>> without having to execute a query for every node on the tree - >>>> which is pretty inefficient IMHO >>>> >>>> also as others have noted - keeping track of an independent serial, >>>> for the sake of just ordering the tree, with concurrency would be a >>>> real challenge; which is why I use serial ID's. >>>> >>>> >>>>> To pull the entire thread based on the parent query is simply >>>>> startkey=[1,0,0]&endkey=[1,{}] >>>> then is your parent_id, really a root_id? Then I'm really confused >>>> how you would use this with trees at all… I'm not sure how you >>>> model as I'd get duplicates from which I could never use to >>>> reconstruct the tree: >>>> >>>> - A A root [A, 0, 0] >>>> - B 1st child of A [A, 1, 1] >>>> - C 1st child of B [A, 2, 1] ??? >>>> - D 2nd child of B [A, 2, 2] ??? >>>> - E 2nd child of A [A, 1, 2] >>>> - F 1st child of E [A, 2, 1] ??? >>>> -G 1st child of F [A, 3, 1] >>>> - H 2nd child of E [A, 2, 2] ??? >>>> >>>>> The advantage of this approach is simply that say I want to >>>>> display a list of all posts by user for a specific thread, I can >>>>> create a view where I emit([parent_id, user_id, comment_id], null) >>>> you could do this with either approach, it's not really an advantage. >>>> >>>>> This gives the ability to pull a specific comment for a user based >>>>> on user_id and thread_id, or an entire list of comments based on >>>>> user_id. These sorts of indexes are very cheap and flexible. You >>>>> never have to mess with creating your own custom id system. Of >>>>> course, the tradeoff is that you have to do your own conflict >>>>> resolution for async operations with thread ids if you want them >>>>> to increment. Better solution here is to use both timestamp and >>>>> user_id for the actual comment to ensure it is unique and still >>>>> sorts well. >>>> again serial id's solve that (not the default UUID's couchdb >>>> issues, AFAIK they are not incremental, however I could be wrong), >>>> is there a reason you want to avoid having a smart id? >>>> >>>> FWIW: this all seems like deja vu >>>> http://markmail.org/search/list:org.apache.couchdb.user+modelling+a+tree+in+couchdb+date:201112-201201+ >>> -- >>> >>> Binary Kitchen >>> Michael Heuberger >>> 4c Dunbar Road >>> Mt Eden >>> Auckland 1024 >>> (New Zealand) >>> >>> Mobile (text only) ... +64 21 261 89 81 >>> Email ................ [email protected] >>> Website .............. http://www.binarykitchen.com >>> >
