On 11 January 2012 04:07, Mahesh Paolini-Subramanya <[email protected]> wrote: > WIth a (somewhat. kinda. sorta. maybe.) similar requirement, I ended up > doing this as follows > (1) created a 'daily' database, that data got dumped into in very > small increments - approximately 5 docs/second > (2) uni-directionally replicated the documents out of this database > into a 'reporting' database that I could suck data out of > (3) sucked data out of the reporting database at 15 minute intervals, > processed them somewhat, and dumped all of *those* into one single (highly > sharded) bigcouch db > > The advantages here were > - My data was captured in the format best suited for the data > generating events (minimum processing of the event data) thanx to (1) > - The processing of this data did not impact the writing of the data > thanx to (2) allowing for maximum throughput > - I could compact and archive the 'daily' database every day, thus > significantly minimizing disk space thanx to (1). Also, We only retain the > 'daily' data for 3 months, since anything beyond that is stale (for our > purposes. YMMV) > - The collated data that ends up in bigcouch per (3) is much *much* > smaller. But, if we ended up needing a different collation (and yes, that > happens every now and then), I can just rerun the reporting process (up to > the last 3 months of course). In fact, I can have multiple collations > running in parallel... > > Hope this helps. If you need more info, just ping me... > > Cheers > > Mahesh Paolini-Subramanya > That Tall Bald Indian Guy... > Google+ | Blog | Twitter > > On Jan 11, 2012, at 4:13 AM, Martin Hewitt wrote: > >> Hi all, >> >> I'm currently scoping a project which will measure a variety of indicators >> over a long period, and I'm trying to work out where to strike the balance >> of document number vs document size. >> >> I could have one document per metric, leading to a small number of >> documents, but with each document containing ticks for every 5-second >> interval of any given day, these documents would quickly become huge. >> >> Clearly, I could decompose these huge per-metric documents down into smaller >> documents, and I'm in the fortunate position that, because I'm dealing with >> time, I can decompose by year, months, day, hour, minute or even second. >> >> Going all the way to second-level would clearly create a huge number of >> documents, but all of very small size, so that's the other extreme. >> >> I'm aware the usual response to this is "somewhere in the middle", which is >> my working hypothesis (decomposing to a "day" level), but I was wondering if >> there was a) anything in CouchDB's architecture that would make one side of >> the "middle" more suited, or b) if someone has experience architecting >> something like this. >> >> Any help gratefully appreciated. >> >> Martin >
Simon & Mahesh, These examples would be a great addition to the wiki :-)) A+ Dave
