Thanks for all the replies, I'll upgrade couch and erlang to the latest
and retest. Yes, this is a single time import, but 70 millions records
at 50 - 60 writes a second doesn't mean a day, it means 2 weeks or
more. I don't mind throwing extra hardware at the problem, but I just
want to make sure I'm throwing extra hardware in the right place and
using existing hardware as best as I can. If writes to all DBs are
serialized in a single thread, then if I partition the data into two DBs
and fire up two copies of couch, I should be able to make use of another
processor on the same machine? I'll test this tomorrow along with the
newer versions.
Thanks,
Josh
Paul Davis wrote:
Erlang 5.5.5 is borked. 5.6.x should be ok.
Also, yes, writes to the database are serialized in a single thread.
For reference, when storing data, are you using the _bulk_docs
interface?
Also, in trunk the fsync calls are turned off by default now so you
should notice more speedup there.
Also, if these are archived records, wouldn't this be a single time
cost? Faster is always better, but if it takes a day, is that a big
deal?
HTH
Paul
On Wed, Jan 7, 2009 at 2:55 PM, Josh Bryan <[email protected]> wrote:
Chris Anderson wrote:
On Wed, Jan 7, 2009 at 4:37 PM, Josh Bryan <[email protected]> wrote:
Hi,
I am looking into CouchDB as a solution to store a bunch (approx 70
million) archived documents. While planning for the import process, I
did some benchmarking to figure out how long the import will take. I
get about 50-70 inserts per second on average. However, when I looked
for the bottleneck, I couldn't figure it out. I am connected to the
database via a fast lan and can verify that the network is not
saturated. I can also verify that disk IO is not saturated. The only
clue is that of the 4 cpus on the server, it seems that only one is
getting fully loaded. Also, of the 5 erlang processes I can see
running, only one of them seems to be getting most of the cpu time. I
know that erlang is built with smp enabled, so if it is cpu bound, why
can't it make use of the other 3 processors?
I thought that perhaps there was some internal write lock issue per
database that allowed only one thread to write to a db at a time, so I
tried running the benchmarks while hitting multiple databases, but still
got the same write rate across the databases. Is there some globally
shared resource in couchdb that limits all writes to a single thread?
Thanks,
Josh
Before we can help you diagnose the performance you're seeing, could
you tell us the version of CouchDB and the version of Erlang that you
are using? It wouldn't hurt to describe the hardware in more detail
either.
I am seeing similar results on two systems.
System 1:
Quad core Intel(R) Xeon(R) CPU 5160 @ 3.00GHz
2 GB ram
Linux 2.6.18-4 -- Debian Lenny
Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [smp:4]
[async-threads:0] [kernel-poll:false]
couchdb - Apache CouchDB 0.8.0-incubating
System 2:
Intel(R) Pentium(R) D CPU 3.00GHz
3 GB ram
Erlang (BEAM) emulator version 5.5.5 [source] [async-threads:0]
[kernel-poll:false]
couchdb - Apache CouchDB 0.9.0a724455-incubating
Thanks