On 8 Jan 2009, at 03:12, Paul Davis wrote:
On Wed, Jan 7, 2009 at 3:47 PM, Josh Bryan <[email protected]>
wrote:
Thanks for all the replies, I'll upgrade couch and erlang to the
latest and
retest. Yes, this is a single time import, but 70 millions records
at 50 -
60 writes a second doesn't mean a day, it means 2 weeks or more. I
don't
mind throwing extra hardware at the problem, but I just want to
make sure
I'm throwing extra hardware in the right place and using existing
hardware
as best as I can. If writes to all DBs are serialized in a single
thread,
then if I partition the data into two DBs and fire up two copies of
couch, I
should be able to make use of another processor on the same machine?
Each DB should get its own updater process I believe so yes this
should lead to a speedup.
Depending on how smartly Erlang distributes the DB writer processes over
CPUs you might not even need to run two instances.
Cheers
Jan
--
I'll
test this tomorrow along with the newer versions.
Thanks,
Josh
Paul Davis wrote:
Erlang 5.5.5 is borked. 5.6.x should be ok.
Also, yes, writes to the database are serialized in a single thread.
For reference, when storing data, are you using the _bulk_docs
interface?
Also, in trunk the fsync calls are turned off by default now so you
should notice more speedup there.
Also, if these are archived records, wouldn't this be a single time
cost? Faster is always better, but if it takes a day, is that a big
deal?
HTH
Paul
On Wed, Jan 7, 2009 at 2:55 PM, Josh Bryan <[email protected]>
wrote:
Chris Anderson wrote:
On Wed, Jan 7, 2009 at 4:37 PM, Josh Bryan <[email protected]>
wrote:
Hi,
I am looking into CouchDB as a solution to store a bunch
(approx 70
million) archived documents. While planning for the import
process, I
did some benchmarking to figure out how long the import will
take. I
get about 50-70 inserts per second on average. However, when I
looked
for the bottleneck, I couldn't figure it out. I am connected
to the
database via a fast lan and can verify that the network is not
saturated. I can also verify that disk IO is not saturated.
The only
clue is that of the 4 cpus on the server, it seems that only
one is
getting fully loaded. Also, of the 5 erlang processes I can see
running, only one of them seems to be getting most of the cpu
time. I
know that erlang is built with smp enabled, so if it is cpu
bound, why
can't it make use of the other 3 processors?
I thought that perhaps there was some internal write lock issue
per
database that allowed only one thread to write to a db at a
time, so I
tried running the benchmarks while hitting multiple databases,
but
still
got the same write rate across the databases. Is there some
globally
shared resource in couchdb that limits all writes to a single
thread?
Thanks,
Josh
Before we can help you diagnose the performance you're seeing,
could
you tell us the version of CouchDB and the version of Erlang
that you
are using? It wouldn't hurt to describe the hardware in more
detail
either.
I am seeing similar results on two systems.
System 1:
Quad core Intel(R) Xeon(R) CPU 5160 @ 3.00GHz
2 GB ram
Linux 2.6.18-4 -- Debian Lenny
Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [smp:4]
[async-threads:0] [kernel-poll:false]
couchdb - Apache CouchDB 0.8.0-incubating
System 2:
Intel(R) Pentium(R) D CPU 3.00GHz
3 GB ram
Erlang (BEAM) emulator version 5.5.5 [source] [async-threads:0]
[kernel-poll:false]
couchdb - Apache CouchDB 0.9.0a724455-incubating
Thanks