> I wanted to give my feedback about what I've learned in this area.
>
> First, I don't use the doc _id at all for sorting docs. It solves one single
> use-case, but fails if you have others, so instead, I do this:
>
> Every doc, whether the parent or child has identifying information. So a
> child might contain the parent id, thread id, etc. Parent doesn't need to
> know about it's children so it doesn't matter, as those can be pulled in a
> single view query.
>
> Say I want to do something as originally stated, I'd create a view where I
> emit([parent_id, next_level_id, next_level_id], null) with default values for
> the latter nested levels being 0 by default. When I query the view, I get
> back a result set that would look like the following.
> [
> {"id":"0f1e244b14452a884f3dfa5b4086f793","key":[1, 0, 0],"value":null}, <-
> parent
> {"id":"27f4c6bb9bcaad331e68f80629bffa6e","key":[1, 1, 0],"value":null}, <-
> first level
> {"id":"46c17a23254c2dcce0860b4c398e0009","key":[1, 1, 1],"value":null}, <-
> first item in first level
> {"id":"95903e4c2e2cbb5e2dfbc934adf6095f","key":[1, 1, 2],"value":null} <-
> second item in first level
> ]
>
you would still need to track ancestry in most cases,… the second solution
makes that possible… also your example only works for a single 'giant' tree,
unless I'm missing something… and not a forest. I'm also not seeing how you
would get all the nodes without having to execute a query for every node on the
tree - which is pretty inefficient IMHO
also as others have noted - keeping track of an independent serial, for the
sake of just ordering the tree, with concurrency would be a real challenge;
which is why I use serial ID's.
> To pull the entire thread based on the parent query is simply
> startkey=[1,0,0]&endkey=[1,{}]
then is your parent_id, really a root_id? Then I'm really confused how you
would use this with trees at all… I'm not sure how you model as I'd get
duplicates from which I could never use to reconstruct the tree:
- A A root [A, 0,
0]
- B 1st child of A [A, 1,
1]
- C 1st child of B [A, 2,
1] ???
- D 2nd child of B [A, 2, 2] ???
- E 2nd child of A [A, 1, 2]
- F 1st child of E [A, 2,
1] ???
-G 1st child of F [A, 3,
1]
- H 2nd child of E [A, 2, 2] ???
>
> The advantage of this approach is simply that say I want to display a list of
> all posts by user for a specific thread, I can create a view where I
> emit([parent_id, user_id, comment_id], null)
you could do this with either approach, it's not really an advantage.
>
> This gives the ability to pull a specific comment for a user based on user_id
> and thread_id, or an entire list of comments based on user_id. These sorts of
> indexes are very cheap and flexible. You never have to mess with creating
> your own custom id system. Of course, the tradeoff is that you have to do
> your own conflict resolution for async operations with thread ids if you want
> them to increment. Better solution here is to use both timestamp and user_id
> for the actual comment to ensure it is unique and still sorts well.
>
again serial id's solve that (not the default UUID's couchdb issues, AFAIK they
are not incremental, however I could be wrong), is there a reason you want to
avoid having a smart id?
FWIW: this all seems like deja vu
http://markmail.org/search/list:org.apache.couchdb.user+modelling+a+tree+in+couchdb+date:201112-201201+
smime.p7s
Description: S/MIME cryptographic signature
