Hi,
1. Not a good approach for your case due to its speed in processing a
request. Nevertheless, it's a solution.
2. It seems a good start, but there is still work to be done. In that
example you have low multilevel tree (post and comments - most of the
time 2-level tree), while in your case you have to think it as a dynamic
deep multilevel tree (the worst case). The problems you have to think
about are:
a) deleting a node requires deletion of the whole tree branch;
b) renaming a node requires update for all the documents within that
tree branch.
My suggestion (at least something to start from) to avoid such problems
would be to design your document as (by adding few more fields than in
that example):
{
_id: <first given name or an encoded name>,
_rev: <whatever; not your direct concern>,
status: <"active", "deleted" or "modified">,
name: <modified name or name in human readable way>,
parent: <parent ID>,
permissions: <OS permissions for this node>,
others: <other information>
}
Note: I prefer encoded name because at retrieval, some characters
allowed by OS's may not be available (e.g., "+" in the _id will return
garbage if you use cURL).
That means, every time you change a node (by deleting it or modifying
its name), you don't need to change the whole branch, but only the
status and the name for that node. E.g., in case of deleting a node,
when you search for a sub-node, you can check all the time the status of
the node and if it is flagged as deleted, it means your sub-node is
deleted as well. This can help you to "recover" easier your erased
nodes. As for the searching for a node which was renamed, you can easily
put an if(doc.name == new_name) emit(doc.id,null).
This approach will be slower at high number of levels, as you can easily
see, but pretty fast at current OS operations. A faster search approach
would be to make a dictionary, but that would slow down
insertion/deletion/modification (at least 2 documents to be modified
instead of one, but that can be sped up by having the dictionary in
another database) and it will also require a smart way to insert the
dictionary (at thousands of files and directories, you may be needed to
split your dictionary document in more pieces).
I hope this will give you at least an idea how to solve your problem.
CGS
On 01/04/2012 08:59 AM, Nicolas Raoul wrote:
Hello,
I want to store a tree in CouchDB.
My app is a large filesystem in which folders/files can be moved/added/deleted.
What is the best practice for this use case?
Below are the approaches I have found on the Internet:
1) Wiki howto
http://wiki.apache.org/couchdb/How_to_store_hierarchical_data
Is this page really a howto? The redundancy is quite astonishing.
Even worse, the author himself says in paragraph "Moving a node to
another parent" that moving nodes is unreliable, and that he is "not
sure of the best approach to avoid such a problem".
2) Link to parent
Approach #2 at http://www.cmlenz.net/archives/2007/10/couchdb-joins
Each node contains a reference to its parent.
It seems good enough for the author's use case, but I am not sure it
is scalable to mine.
Both of these articles have been written by people who admittedly
"have been playing with CouchDB lately".
Could anybody provide some feedback on those approaches?
Or is there another approach that could be described as a "best
practice" for storing large dynamic tree in CouchDB?
Thanks a lot!
Nicolas Raoul
http://nicolas-raoul.blogspot.com