Re: Best practice for storing large dynamic tree in CouchDB?

CGS Wed, 04 Jan 2012 01:24:22 -0800

Hi,

1. Not a good approach for your case due to its speed in processing arequest. Nevertheless, it's a solution.

2. It seems a good start, but there is still work to be done. In thatexample you have low multilevel tree (post and comments - most of thetime 2-level tree), while in your case you have to think it as a dynamicdeep multilevel tree (the worst case). The problems you have to thinkabout are:

a) deleting a node requires deletion of the whole tree branch;

b) renaming a node requires update for all the documents within thattree branch.My suggestion (at least something to start from) to avoid such problemswould be to design your document as (by adding few more fields than inthat example):


{
   _id: <first given name or an encoded name>,
   _rev: <whatever; not your direct concern>,
   status: <"active", "deleted" or "modified">,
   name: <modified name or name in human readable way>,
   parent: <parent ID>,
   permissions: <OS permissions for this node>,
   others: <other information>
}

Note: I prefer encoded name because at retrieval, some charactersallowed by OS's may not be available (e.g., "+" in the _id will returngarbage if you use cURL).

That means, every time you change a node (by deleting it or modifyingits name), you don't need to change the whole branch, but only thestatus and the name for that node. E.g., in case of deleting a node,when you search for a sub-node, you can check all the time the status ofthe node and if it is flagged as deleted, it means your sub-node isdeleted as well. This can help you to "recover" easier your erasednodes. As for the searching for a node which was renamed, you can easilyput an if(doc.name == new_name) emit(doc.id,null).

This approach will be slower at high number of levels, as you can easilysee, but pretty fast at current OS operations. A faster search approachwould be to make a dictionary, but that would slow downinsertion/deletion/modification (at least 2 documents to be modifiedinstead of one, but that can be sped up by having the dictionary inanother database) and it will also require a smart way to insert thedictionary (at thousands of files and directories, you may be needed tosplit your dictionary document in more pieces).


I hope this will give you at least an idea how to solve your problem.

CGS




On 01/04/2012 08:59 AM, Nicolas Raoul wrote:

Hello,

I want to store a tree in CouchDB.
My app is a large filesystem in which folders/files can be moved/added/deleted.

What is the best practice for this use case?
Below are the approaches I have found on the Internet:

1) Wiki howto
http://wiki.apache.org/couchdb/How_to_store_hierarchical_data
Is this page really a howto? The redundancy is quite astonishing.
Even worse, the author himself says in paragraph "Moving a node to
another parent" that moving nodes is unreliable, and that he is "not
sure of the best approach to avoid such a problem".

2) Link to parent
Approach #2 at http://www.cmlenz.net/archives/2007/10/couchdb-joins
Each node contains a reference to its parent.
It seems good enough for the author's use case, but I am not sure it
is scalable to mine.

Both of these articles have been written by people who admittedly
"have been playing with CouchDB lately".
Could anybody provide some feedback on those approaches?

Or is there another approach that could be described as a "best
practice" for storing large dynamic tree in CouchDB?

Thanks a lot!
Nicolas Raoul
http://nicolas-raoul.blogspot.com

Re: Best practice for storing large dynamic tree in CouchDB?

Reply via email to