On 8 Jan 2009, at 03:12, Paul Davis wrote:

On Wed, Jan 7, 2009 at 3:47 PM, Josh Bryan <[email protected]> wrote:
Thanks for all the replies, I'll upgrade couch and erlang to the latest and retest. Yes, this is a single time import, but 70 millions records at 50 - 60 writes a second doesn't mean a day, it means 2 weeks or more. I don't mind throwing extra hardware at the problem, but I just want to make sure I'm throwing extra hardware in the right place and using existing hardware as best as I can. If writes to all DBs are serialized in a single thread, then if I partition the data into two DBs and fire up two copies of couch, I
should be able to make use of another processor on the same machine?

Each DB should get its own updater process I believe so yes this
should lead to a speedup.

Depending on how smartly Erlang distributes the DB writer processes over
CPUs you might not even need to run two instances.

Cheers
Jan
--




I'll
test this tomorrow along with the newer versions.

Thanks,
Josh

Paul Davis wrote:

Erlang 5.5.5 is borked. 5.6.x should be ok.

Also, yes, writes to the database are serialized in a single thread.
For reference, when storing data, are you using the _bulk_docs
interface?

Also, in trunk the fsync calls are turned off by default now so you
should notice more speedup there.

Also, if these are archived records, wouldn't this be a single time
cost? Faster is always better, but if it takes a day, is that a big
deal?

HTH
Paul

On Wed, Jan 7, 2009 at 2:55 PM, Josh Bryan <[email protected]> wrote:


Chris Anderson wrote:


On Wed, Jan 7, 2009 at 4:37 PM, Josh Bryan <[email protected]>
wrote:



Hi,

I am looking into CouchDB as a solution to store a bunch (approx 70 million) archived documents. While planning for the import process, I did some benchmarking to figure out how long the import will take. I get about 50-70 inserts per second on average. However, when I looked for the bottleneck, I couldn't figure it out. I am connected to the
database via a fast lan and can verify that the network is not
saturated. I can also verify that disk IO is not saturated. The only clue is that of the 4 cpus on the server, it seems that only one is
getting fully loaded.  Also, of the 5 erlang processes I can see
running, only one of them seems to be getting most of the cpu time. I know that erlang is built with smp enabled, so if it is cpu bound, why
can't it make use of the other 3 processors?

I thought that perhaps there was some internal write lock issue per database that allowed only one thread to write to a db at a time, so I tried running the benchmarks while hitting multiple databases, but
still
got the same write rate across the databases. Is there some globally shared resource in couchdb that limits all writes to a single thread?

Thanks,
Josh




Before we can help you diagnose the performance you're seeing, could you tell us the version of CouchDB and the version of Erlang that you are using? It wouldn't hurt to describe the hardware in more detail
either.




I am seeing similar results on two systems.

System 1:
Quad core Intel(R) Xeon(R) CPU 5160  @ 3.00GHz
2 GB ram
Linux 2.6.18-4  -- Debian Lenny
Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [smp:4]
[async-threads:0] [kernel-poll:false]
couchdb - Apache CouchDB 0.8.0-incubating

System 2:
Intel(R) Pentium(R) D CPU 3.00GHz
3 GB ram
Erlang (BEAM) emulator version 5.5.5 [source] [async-threads:0]
[kernel-poll:false]
couchdb - Apache CouchDB 0.9.0a724455-incubating

Thanks







Reply via email to