On Wed, Jan 7, 2009 at 3:47 PM, Josh Bryan <[email protected]> wrote: > Thanks for all the replies, I'll upgrade couch and erlang to the latest and > retest. Yes, this is a single time import, but 70 millions records at 50 - > 60 writes a second doesn't mean a day, it means 2 weeks or more. I don't > mind throwing extra hardware at the problem, but I just want to make sure > I'm throwing extra hardware in the right place and using existing hardware > as best as I can. If writes to all DBs are serialized in a single thread, > then if I partition the data into two DBs and fire up two copies of couch, I > should be able to make use of another processor on the same machine?
Each DB should get its own updater process I believe so yes this should lead to a speedup. > I'll > test this tomorrow along with the newer versions. > > Thanks, > Josh > > Paul Davis wrote: >> >> Erlang 5.5.5 is borked. 5.6.x should be ok. >> >> Also, yes, writes to the database are serialized in a single thread. >> For reference, when storing data, are you using the _bulk_docs >> interface? >> >> Also, in trunk the fsync calls are turned off by default now so you >> should notice more speedup there. >> >> Also, if these are archived records, wouldn't this be a single time >> cost? Faster is always better, but if it takes a day, is that a big >> deal? >> >> HTH >> Paul >> >> On Wed, Jan 7, 2009 at 2:55 PM, Josh Bryan <[email protected]> wrote: >> >>> >>> Chris Anderson wrote: >>> >>>> >>>> On Wed, Jan 7, 2009 at 4:37 PM, Josh Bryan <[email protected]> >>>> wrote: >>>> >>>> >>>>> >>>>> Hi, >>>>> >>>>> I am looking into CouchDB as a solution to store a bunch (approx 70 >>>>> million) archived documents. While planning for the import process, I >>>>> did some benchmarking to figure out how long the import will take. I >>>>> get about 50-70 inserts per second on average. However, when I looked >>>>> for the bottleneck, I couldn't figure it out. I am connected to the >>>>> database via a fast lan and can verify that the network is not >>>>> saturated. I can also verify that disk IO is not saturated. The only >>>>> clue is that of the 4 cpus on the server, it seems that only one is >>>>> getting fully loaded. Also, of the 5 erlang processes I can see >>>>> running, only one of them seems to be getting most of the cpu time. I >>>>> know that erlang is built with smp enabled, so if it is cpu bound, why >>>>> can't it make use of the other 3 processors? >>>>> >>>>> I thought that perhaps there was some internal write lock issue per >>>>> database that allowed only one thread to write to a db at a time, so I >>>>> tried running the benchmarks while hitting multiple databases, but >>>>> still >>>>> got the same write rate across the databases. Is there some globally >>>>> shared resource in couchdb that limits all writes to a single thread? >>>>> >>>>> Thanks, >>>>> Josh >>>>> >>>>> >>>>> >>>> >>>> Before we can help you diagnose the performance you're seeing, could >>>> you tell us the version of CouchDB and the version of Erlang that you >>>> are using? It wouldn't hurt to describe the hardware in more detail >>>> either. >>>> >>>> >>>> >>> >>> I am seeing similar results on two systems. >>> >>> System 1: >>> Quad core Intel(R) Xeon(R) CPU 5160 @ 3.00GHz >>> 2 GB ram >>> Linux 2.6.18-4 -- Debian Lenny >>> Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [smp:4] >>> [async-threads:0] [kernel-poll:false] >>> couchdb - Apache CouchDB 0.8.0-incubating >>> >>> System 2: >>> Intel(R) Pentium(R) D CPU 3.00GHz >>> 3 GB ram >>> Erlang (BEAM) emulator version 5.5.5 [source] [async-threads:0] >>> [kernel-poll:false] >>> couchdb - Apache CouchDB 0.9.0a724455-incubating >>> >>> Thanks >>> >>> >>> >>> >
