+1

On 24/02/13 07:44, Tim Black wrote:
I'd love to see the various key/parent ID structure/query strategy
proposals that have been mentioned in this thread summarized on a wiki,
with a chart of which requirements mentioned below each fulfills.  That
would be awesome.

Tim

On 02/22/2013 11:02 PM, Michael Heuberger wrote:
Thanks Jim.

Ok, here are more details / requirements for my needs:

  * Very important is to get the parents and their parents above, up to
    about 200 levels I'd say.
  * On the other hand, for parents it's not that important to know their
    children.
  * Yes, lazy loading please. There is no need for dynamic tree control.
  * I don't need to find specific subtrees.

That's all. Any suggestions?

Michael

On 23/02/13 17:42, Jim Klo wrote:
Michael,

It's hard to gauge what's best - as it depends on your use case...
Possibly provide more details?

Q: What are your navigation requirements? Drill up/down, jump to
other trees, etc.

Q: Are you building or in need of a dynamic tree control? Do you want
to lazy load the elements?

Q: Do you need to find specific subtrees? I've built some tree
visualizations with hundreds of thousands of nodes - the reality is
you can't actually physically display anything human readable with
even a few thousand tree nodes.

If you can start defining how you need to use the data - the best
model for storage and retrieval will weed itself out.

FWIW, I'd like to understand Wendall's solution, as I don't
understand how to recreate the tree from his description. Sorry
Wendall! :(. I swear I've seen you describe this on list before, but
couldn't find it in the archives.

I think both potentially are reasonable approaches, and both have
pros and cons for usage.

- Jim

On Feb 22, 2013, at 4:29 PM, "Michael Heuberger"
<[email protected]> wrote:

Sorry guys, but now you left me a bit confused.

Again, what's the recommendation to implement tree structures for
messages in CouchDB?

Cheers
Michael

On 23/02/13 11:42, Jim Klo wrote:
I wanted to give my feedback about what I've learned in this area.

First, I don't use the doc _id at all for sorting docs. It solves
one single use-case, but fails if you have others, so instead, I
do this:

Every doc, whether the parent or child has identifying
information. So a child might contain the parent id, thread id,
etc. Parent doesn't need to know about it's children so it doesn't
matter, as those can be pulled in a single view query.

Say I want to do something as originally stated, I'd create a view
where I emit([parent_id, next_level_id, next_level_id], null) with
default values for the latter nested levels being 0 by default.
When I query the view, I get back a result set that would look
like the following.
[
{"id":"0f1e244b14452a884f3dfa5b4086f793","key":[1, 0,
0],"value":null}, <- parent
{"id":"27f4c6bb9bcaad331e68f80629bffa6e","key":[1, 1,
0],"value":null}, <- first level
{"id":"46c17a23254c2dcce0860b4c398e0009","key":[1, 1,
1],"value":null}, <- first item in first level
{"id":"95903e4c2e2cbb5e2dfbc934adf6095f","key":[1, 1,
2],"value":null} <- second item in first level
]
you would still need to track ancestry in most cases,… the second
solution makes that possible… also your example only works for a
single 'giant' tree, unless I'm missing something… and not a
forest. I'm also not seeing  how you would get all the nodes
without having to execute a query for every node on the tree -
which is pretty inefficient IMHO

also as others have noted - keeping track of an independent serial,
for the sake of just ordering the tree, with concurrency would be a
real challenge; which is why I use serial ID's.


To pull the entire thread based on the parent query is simply
startkey=[1,0,0]&endkey=[1,{}]
then is your parent_id, really a root_id?  Then I'm really confused
how you would use this with trees at all…  I'm not sure how you
model as I'd get duplicates from which I could never use to
reconstruct the tree:

- A                    A root                [A, 0, 0]
     - B                1st child of A            [A, 1, 1]
         - C            1st child of B            [A, 2, 1] ???
         - D            2nd child of B        [A, 2, 2] ???
     - E                2nd child of A        [A, 1, 2]
         - F            1st child of E            [A, 2, 1] ???
             -G        1st child of F            [A, 3, 1]
         - H            2nd child of E        [A, 2, 2] ???
The advantage of this approach is simply that say I want to
display a list of all posts by user for a specific thread, I can
create a view where I emit([parent_id, user_id, comment_id], null)
you could do this with either approach, it's not really an advantage.

This gives the ability to pull a specific comment for a user based
on user_id and thread_id, or an entire list of comments based on
user_id. These sorts of indexes are very cheap and flexible. You
never have to mess with creating your own custom id system. Of
course, the tradeoff is that you have to do your own conflict
resolution for async operations with thread ids if you want them
to increment. Better solution here is to use both timestamp and
user_id for the actual comment to ensure it is unique and still
sorts well.
again serial id's solve that (not the default UUID's couchdb
issues, AFAIK they are not incremental, however I could be wrong),
is there a reason you want to avoid having a smart id?

FWIW: this all seems like deja vu
http://markmail.org/search/list:org.apache.couchdb.user+modelling+a+tree+in+couchdb+date:201112-201201+
--

Binary Kitchen
Michael Heuberger
4c Dunbar Road
Mt Eden
Auckland 1024
(New Zealand)

Mobile (text only) ...  +64 21 261 89 81
Email ................  [email protected]
Website ..............  http://www.binarykitchen.com



--

Binary Kitchen
Michael Heuberger
4c Dunbar Road
Mt Eden
Auckland 1024
(New Zealand)

Mobile (text only) ...  +64 21 261 89 81
Email ................  [email protected]
Website ..............  http://www.binarykitchen.com

Reply via email to