Re: Performance of many documents vs large documents

Dave Cottlehuber Wed, 11 Jan 2012 04:00:05 -0800

On 11 January 2012 04:07, Mahesh Paolini-Subramanya <[email protected]> wrote:
> WIth a (somewhat.  kinda.  sorta.  maybe.) similar requirement, I ended up 
> doing this as follows
>        (1) created a 'daily' database, that data got dumped into in very 
> small increments - approximately 5 docs/second
>        (2) uni-directionally replicated the documents out of this database 
> into a 'reporting' database that I could suck data out of
>        (3) sucked data out of the reporting database at 15 minute intervals, 
> processed them somewhat, and dumped all of *those* into one single (highly 
> sharded) bigcouch db
>
> The advantages here were
>        - My data was captured in the format best suited for the data 
> generating events (minimum processing of the event data) thanx to (1)
>        - The processing of this data did not impact the writing of the data 
> thanx to (2) allowing for maximum throughput
>        - I could compact and archive the 'daily' database every day, thus 
> significantly minimizing disk space thanx to (1). Also, We only retain the 
> 'daily' data for 3 months, since anything beyond that is stale (for our 
> purposes. YMMV)
>        - The collated data that ends up in bigcouch per (3) is much *much* 
> smaller. But, if we ended up needing a different collation (and yes, that 
> happens every now and then), I can just rerun the reporting process (up to 
> the last 3 months of course).  In fact, I can have multiple collations 
> running in parallel...
>
> Hope this helps. If you need more info, just ping me...
>
> Cheers
>
> Mahesh Paolini-Subramanya
> That Tall Bald Indian Guy...
> Google+  | Blog   | Twitter
>
> On Jan 11, 2012, at 4:13 AM, Martin Hewitt wrote:
>
>> Hi all,
>>
>> I'm currently scoping a project which will measure a variety of indicators 
>> over a long period, and I'm trying to work out where to strike the balance 
>> of document number vs document size.
>>
>> I could have one document per metric, leading to a small number of 
>> documents, but with each document containing ticks for every 5-second 
>> interval of any given day, these documents would quickly become huge.
>>
>> Clearly, I could decompose these huge per-metric documents down into smaller 
>> documents, and I'm in the fortunate position that, because I'm dealing with 
>> time, I can decompose by year, months, day, hour, minute or even second.
>>
>> Going all the way to second-level would clearly create a huge number of 
>> documents, but all of very small size, so that's the other extreme.
>>
>> I'm aware the usual response to this is "somewhere in the middle", which is 
>> my working hypothesis (decomposing to a "day" level), but I was wondering if 
>> there was a) anything in CouchDB's architecture that would make one side of 
>> the "middle" more suited, or b) if someone has experience architecting 
>> something like this.
>>
>> Any help gratefully appreciated.
>>
>> Martin
>


Simon & Mahesh,

These examples would be a great addition to the wiki :-))

A+
Dave

Re: Performance of many documents vs large documents

Reply via email to