Greetings,

I recently started exploring the capabilities of couchdb and although I find it really interesting and flexible, I am experiencing some difficulties:

Is there any recommended way to store hierarchical data? Consider for example the case of a file system with multiple directories. I can think of some possible scenarios each with different capabilities and limitations: * Each file and each folder is represented by a single document, with each folder document containing a "contents" list that has the ids of the subdocuments under the specific folder (the usual tree structure). In this case, deleting a file would require updating more than one document (the file for deletion and the parent folder for the "contents" attribute) which seems dangerous considering the absence of transactional operations (what about deleting a whole folder?). Moreover, accessing the file "foo/bar/cow" would require a conventional pathname translation which adds overhead (cut the pathname in chunks, request the "foo" folder, retrieve the ids of its contents, find which one corresponds to the "bar" folder etc..) * Each file and each folder is represented by a single document, with each file having an attribute "parent id" that contains the id of its parent folder(reverse tree structure). In this case deleting the file requires only one operation and seems more robust. However pathname translation gets fuzzy and seems to add a lot of overhead (retrieve id of folder, find documents having this "parent id" attribute, find the one you want among them...) * Each file is represented by a single document that has a "path" attribute that indicates the directory that is being stored to. This gives the advantage of avoiding conventional pathname translation and retrieving the correct document immediately. However, operations such as renaming a folder require updating many documents and should be avoided.
   * Keep the whole file system in a single document. Ouch!

I am aware of the bulk update technique with the "all or nothing" attribute, but it is to my understanding that it should be avoided, especially when dealing with clustering and replication. In addition, things seem to get more obscure when considering file sharing possibilities between the users of the file system.

I would be glad if you could provide me some pointers on how to circumvent the disadvantages of each of the methods above.

In general, do you thing that since dealing with documents is so flexible and provided the absence of transactional operations one should try to organize his data as decoupled as possible?

Thank you for your time ,

Andreas

Reply via email to