On Jul 26, 2010, at 10:41 AM, Simon Metson wrote:

> Hi,
>       We've done things at this scale with CouchDB. The key thing is to do 
> bulk inserts, and to trigger view indexing as you go. For instance our code 
> by default will bulk insert 5000 records, then hit a view, then do the next 
> 5000 then hit the view etc. Of course the batch size is something you'd want 
> to tune, since it'll depend on your documents and views. It's much quicker to 
> do the view index incrementally than hit all N million records at once. You 
> might also want to hit view and db compaction occasionally, especially if 
> you're also doing bulk deletes.
> Cheers
> Simon
> 

Also, 1.0 should be significantly faster for your use case.

Chris

> On 26 Jul 2010, at 18:00, Norman Barker wrote:
> 
>> Hi,
>> 
>> I have sampled the wikipedia tsv collection from freebase
>> (http://wiki.freebase.com/wiki/WEX/Documentation#articles), I ran this
>> through awk and drop the xml field and then did a simple conversion to
>> JSON. I then call _bulk_docs 150 docs at a time into couch 0.11.
>> 
>> I wrote a simple view in erlang that emits the date as a key (I am
>> actually using this to test the free text search couchdb-clucene), the
>> views are fast once computed.
>> 
>> The amount of disk storage used by couchdb is an issue, and the write
>> times are slow, I changed my view and my 2.3 million view computation
>> is still running!
>> 
>>       "request_time": {
>>           "description": "length of a request inside CouchDB without
>> MochiWeb",
>>           "current": 2253451.122,
>>           "sum": 2253451.122,
>>           "mean": 501.212,
>>           "stddev": 12275.385,
>>           "min": 0.5,
>>           "max": 798124.0
>>       },
>> 
>> For my use case once the system is up there is only a few updates per
>> hour, but doing the initial harvest takes a long time.
>> 
>> Does 1.0 make substantial gains on this, if so how, are there any
>> other areas that I should be looking at to improve this, I am happy
>> writing erlang code.
>> 
>> thanks,
>> 
>> Norman
> 

Reply via email to