couchdb and millions of records

Norman Barker Mon, 26 Jul 2010 10:00:55 -0700

Hi,

I have sampled the wikipedia tsv collection from freebase
(http://wiki.freebase.com/wiki/WEX/Documentation#articles), I ran this
through awk and drop the xml field and then did a simple conversion to
JSON. I then call _bulk_docs 150 docs at a time into couch 0.11.


I wrote a simple view in erlang that emits the date as a key (I am
actually using this to test the free text search couchdb-clucene), the
views are fast once computed.

The amount of disk storage used by couchdb is an issue, and the write
times are slow, I changed my view and my 2.3 million view computation
is still running!

        "request_time": {
            "description": "length of a request inside CouchDB without
MochiWeb",
            "current": 2253451.122,
            "sum": 2253451.122,
            "mean": 501.212,
            "stddev": 12275.385,
            "min": 0.5,
            "max": 798124.0
        },

For my use case once the system is up there is only a few updates per
hour, but doing the initial harvest takes a long time.

Does 1.0 make substantial gains on this, if so how, are there any
other areas that I should be looking at to improve this, I am happy
writing erlang code.

thanks,

Norman

couchdb and millions of records

Reply via email to