Folks, I have a bit of a unique use case in which I could end up with a large number of very small docs (half dazen to a dozen fields or atrributes) with anywhere from a coupe hundred to a couple thousand (human readable) bytes in size.
However, each one of these docs will have an id (some have natural short id's, like youtube's 11 char video ids but many will not) a tilte, maybe a thmbnail, short description and a url. So for visualization purposes, let's pretend that they are typical rss news headlines. These also have an author, publisher, original date published and date published on the site. In addition to those attributes, end users could end up classifying each document in one or multiple ways and there could be half dozen to a dozen different classification schemes - geographic (world, country, etc), subject (custom schemes resembling Dewey Decimal and/or Library of Congress, etc) as well as other sort of classifications schemes. However, as in these two examples, all of these schemes are at least a bit hierarchical in nature - but all would work in quite the same manner. >From the design point of view, I need to be able to present all of the material (: well :) in all possible ways, sorted by either date published (original or on web), within any of the classified categories. In addition, I need to be able to keep track of who did what to any of the documents, including simply reading it, in addition to posting, classifying, etc. For this a doc with user, docid, date/time and action would just about do the trick. However, all of the different ways of categorizing each doc end up creating a situation where the disk usage by docs themselves will end up being dwarfed by resources that will end up being taken by views indexing. So, all of this is starting to play tricks on my mind and causing me to try to come up with shorter doc _id values as well as trying to figure out how to create views so that a document that is placed in one child category does not need to be put into its parent categories - for cases where child docs need to be shown as if they are in their parent categories. So, the whole thing has me scratching my hand and questioning if CouchDB is the right tool for what on surface appears to be quite a simplistic requirement. P.S. Like most of what we do, if what I am doing was to get traction, I could end up with 10,000's of categories and 10,000,000's of docs that are of half dozen to dozen different types (likely each in own db). Based on a small proof of concept, with just 3 views on each doc, my db ends up being roughly 10 times the size of docs without the views. If so, then my views:docs ratio could go as high as 10-20-50:1 and this scares me. I would really appreciate your comments and/or suggestions. Regards, Zdravko
