Hi,

1. Not a good approach for your case due to its speed in processing a request. Nevertheless, it's a solution.

2. It seems a good start, but there is still work to be done. In that example you have low multilevel tree (post and comments - most of the time 2-level tree), while in your case you have to think it as a dynamic deep multilevel tree (the worst case). The problems you have to think about are:
a) deleting a node requires deletion of the whole tree branch;
b) renaming a node requires update for all the documents within that tree branch. My suggestion (at least something to start from) to avoid such problems would be to design your document as (by adding few more fields than in that example):

{
   _id: <first given name or an encoded name>,
   _rev: <whatever; not your direct concern>,
   status: <"active", "deleted" or "modified">,
   name: <modified name or name in human readable way>,
   parent: <parent ID>,
   permissions: <OS permissions for this node>,
   others: <other information>
}

Note: I prefer encoded name because at retrieval, some characters allowed by OS's may not be available (e.g., "+" in the _id will return garbage if you use cURL).

That means, every time you change a node (by deleting it or modifying its name), you don't need to change the whole branch, but only the status and the name for that node. E.g., in case of deleting a node, when you search for a sub-node, you can check all the time the status of the node and if it is flagged as deleted, it means your sub-node is deleted as well. This can help you to "recover" easier your erased nodes. As for the searching for a node which was renamed, you can easily put an if(doc.name == new_name) emit(doc.id,null).

This approach will be slower at high number of levels, as you can easily see, but pretty fast at current OS operations. A faster search approach would be to make a dictionary, but that would slow down insertion/deletion/modification (at least 2 documents to be modified instead of one, but that can be sped up by having the dictionary in another database) and it will also require a smart way to insert the dictionary (at thousands of files and directories, you may be needed to split your dictionary document in more pieces).

I hope this will give you at least an idea how to solve your problem.

CGS




On 01/04/2012 08:59 AM, Nicolas Raoul wrote:
Hello,

I want to store a tree in CouchDB.
My app is a large filesystem in which folders/files can be moved/added/deleted.

What is the best practice for this use case?
Below are the approaches I have found on the Internet:

1) Wiki howto
http://wiki.apache.org/couchdb/How_to_store_hierarchical_data
Is this page really a howto? The redundancy is quite astonishing.
Even worse, the author himself says in paragraph "Moving a node to
another parent" that moving nodes is unreliable, and that he is "not
sure of the best approach to avoid such a problem".

2) Link to parent
Approach #2 at http://www.cmlenz.net/archives/2007/10/couchdb-joins
Each node contains a reference to its parent.
It seems good enough for the author's use case, but I am not sure it
is scalable to mine.

Both of these articles have been written by people who admittedly
"have been playing with CouchDB lately".
Could anybody provide some feedback on those approaches?

Or is there another approach that could be described as a "best
practice" for storing large dynamic tree in CouchDB?

Thanks a lot!
Nicolas Raoul
http://nicolas-raoul.blogspot.com

Reply via email to