Re: Tree like structures in CouchDB

Jim Klo Fri, 22 Feb 2013 20:42:47 -0800

Michael, 

It's hard to gauge what's best - as it depends on your use case... Possibly 
provide more details?


Q: What are your navigation requirements? Drill up/down, jump to other trees, 
etc.

Q: Are you building or in need of a dynamic tree control? Do you want to lazy 
load the elements?

Q: Do you need to find specific subtrees? I've built some tree visualizations 
with hundreds of thousands of nodes - the reality is you can't actually 
physically display anything human readable with even a few thousand tree nodes. 

If you can start defining how you need to use the data - the best model for 
storage and retrieval will weed itself out. 

FWIW, I'd like to understand Wendall's solution, as I don't understand how to 
recreate the tree from his description. Sorry Wendall! :(. I swear I've seen 
you describe this on list before, but couldn't find it in the archives.

I think both potentially are reasonable approaches, and both have pros and cons 
for usage. 

- Jim

On Feb 22, 2013, at 4:29 PM, "Michael Heuberger" 
<[email protected]> wrote:

> Sorry guys, but now you left me a bit confused.
> 
> Again, what's the recommendation to implement tree structures for messages in 
> CouchDB?
> 
> Cheers
> Michael
> 
> On 23/02/13 11:42, Jim Klo wrote:
>> 
>>> I wanted to give my feedback about what I've learned in this area.
>>> 
>>> First, I don't use the doc _id at all for sorting docs. It solves one 
>>> single use-case, but fails if you have others, so instead, I do this:
>>> 
>>> Every doc, whether the parent or child has identifying information. So a 
>>> child might contain the parent id, thread id, etc. Parent doesn't need to 
>>> know about it's children so it doesn't matter, as those can be pulled in a 
>>> single view query.
>>> 
>>> Say I want to do something as originally stated, I'd create a view where I 
>>> emit([parent_id, next_level_id, next_level_id], null) with default values 
>>> for the latter nested levels being 0 by default. When I query the view, I 
>>> get back a result set that would look like the following.
>>> [
>>> {"id":"0f1e244b14452a884f3dfa5b4086f793","key":[1, 0, 0],"value":null}, <- 
>>> parent
>>> {"id":"27f4c6bb9bcaad331e68f80629bffa6e","key":[1, 1, 0],"value":null}, <- 
>>> first level
>>> {"id":"46c17a23254c2dcce0860b4c398e0009","key":[1, 1, 1],"value":null}, <- 
>>> first item in first level
>>> {"id":"95903e4c2e2cbb5e2dfbc934adf6095f","key":[1, 1, 2],"value":null} <- 
>>> second item in first level
>>> ]
>> you would still need to track ancestry in most cases,… the second solution 
>> makes that possible… also your example only works for a single 'giant' tree, 
>> unless I'm missing something… and not a forest. I'm also not seeing  how you 
>> would get all the nodes without having to execute a query for every node on 
>> the tree - which is pretty inefficient IMHO
>> 
>> also as others have noted - keeping track of an independent serial, for the 
>> sake of just ordering the tree, with concurrency would be a real challenge; 
>> which is why I use serial ID's.
>> 
>> 
>>> To pull the entire thread based on the parent query is simply 
>>> startkey=[1,0,0]&endkey=[1,{}]
>> then is your parent_id, really a root_id?  Then I'm really confused how you 
>> would use this with trees at all…  I'm not sure how you model as I'd get 
>> duplicates from which I could never use to reconstruct the tree:
>> 
>> - A                    A root                [A, 0, 0]    
>>    - B                1st child of A            [A, 1, 1]    
>>        - C            1st child of B            [A, 2, 1] ???
>>        - D            2nd child of B        [A, 2, 2] ???
>>    - E                2nd child of A        [A, 1, 2]
>>        - F            1st child of E            [A, 2, 1] ???
>>            -G        1st child of F            [A, 3, 1]
>>        - H            2nd child of E        [A, 2, 2] ???
>>    
>> 
>>> The advantage of this approach is simply that say I want to display a list 
>>> of all posts by user for a specific thread, I can create a view where I 
>>> emit([parent_id, user_id, comment_id], null)
>> you could do this with either approach, it's not really an advantage.
>> 
>>> This gives the ability to pull a specific comment for a user based on 
>>> user_id and thread_id, or an entire list of comments based on user_id. 
>>> These sorts of indexes are very cheap and flexible. You never have to mess 
>>> with creating your own custom id system. Of course, the tradeoff is that 
>>> you have to do your own conflict resolution for async operations with 
>>> thread ids if you want them to increment. Better solution here is to use 
>>> both timestamp and user_id for the actual comment to ensure it is unique 
>>> and still sorts well.
>> again serial id's solve that (not the default UUID's couchdb issues, AFAIK 
>> they are not incremental, however I could be wrong), is there a reason you 
>> want to avoid having a smart id?
>> 
>> FWIW: this all seems like deja vu 
>> http://markmail.org/search/list:org.apache.couchdb.user+modelling+a+tree+in+couchdb+date:201112-201201+
> 
> -- 
> 
> Binary Kitchen
> Michael Heuberger
> 4c Dunbar Road
> Mt Eden
> Auckland 1024
> (New Zealand)
> 
> Mobile (text only) ...  +64 21 261 89 81
> Email ................  [email protected]
> Website ..............  http://www.binarykitchen.com
>

Re: Tree like structures in CouchDB

Reply via email to