Hi,
https://issues.apache.org/jira/browse/COUCHDB-722 Thanks, Fredrik -----Original Message----- From: Adam Kocoloski [mailto:[email protected]] Sent: den 19 april 2010 16:05 To: [email protected] Subject: Re: CouchDB and Hadoop_ Hi Fredrik, thanks for the details. The CPU utilization does not sound normal at all. I have a node replicating 30-75 updates/sec (unique documents, diurnal fluctuations) for several months now and it almost never uses more than 50% of one core of a virtualized e5410 box with 1.7G of RAM. I would definitely look into the crashes first and see if that resolves the giant fluctuations in CPU. Is there a JIRA ticket I can follow? (I'm one of the developers of the replicator). Best, Adam On Apr 19, 2010, at 4:07 AM, Fredrik Widlund wrote: > > > Hi, > > The case I've tested so far is using couch in the following setup (which is a > small part of what would be a production level setup for us) > - two bidirectionally synced nodes > - <50 writes/s to node A, each updating a unique doc > - <50 writes/s to node B, each updating a unique doc > - <50 reads/s from each node > - regular compacting the database containing the docs > > The two nodes run on quad (e5520) cpu with 16G ram. CPU ramp down and up to > 400% (i.e. full load on all cores) every few seconds. Couch 0.11.0 crashes > regularly, which has been reported and is being worked on from what I > understand. Also, the replications tasks breaks and has to be restarted very > often, probably due to the problem above. > > Now, I've received a temporary patch as a possible work-around for the > crashes and I haven't tested this case with the work-around yet, but I would > assume this hopefully sorts out the crashes, but not the cpu load. > > Kind regards, > Fredrik Widlund > > -----Original Message----- > From: Randall Leeds [mailto:[email protected]] > Sent: den 16 april 2010 21:06 > To: [email protected] > Subject: Re: CouchDB and Hadoop_ > > Hey Fredrik, > > I'm one of the couchdb-lounge developers. I'd like to understand > better what your performance concerns are. Why are you concerned about > replicating a large number of changes? A distributed file system would > be doing the same thing but at a lower level. If such a system were to > work you'd be saving only HTTP and JSON overhead vs replication. If > the replicator is too slow, that is something that can possibly be > improved. If you're concerned about the runtime impact of replication > this is tunable via the [replicator] configuration section. > > couchdb-lounge already uses nginx for distributing simple GET and PUT > operations to documents and a python-twisted daemon to handle views. > The twisted daemon has configurable caching (with the one caveat that > the cache is currently unbounded, so the daemon needs to be restarted > periodically.... I should really fix this :-P). It should be possible > to chain any standard nginx caching modules in front of the lounge > proxy module. > > If you have other concerns or would like to investigate more, ping me > on irc (tilgovi) or join us over on > http://groups.google.com/group/couchdb-lounge > > -Randall > > On Fri, Apr 16, 2010 at 09:54, Fredrik Widlund > <[email protected]> wrote: >> >> >> Thanks, I will! We will actually use nginx for "dumb" caching, but add an >> api layer in between the cache and the couch. Also we actually need to >> mirror data to provide HA, and the performance issues we're having are more >> about constantly replicating a large number of changes than accelerating the >> reads. I'm not sure if couchdb-lounge would address this. >> >> We did stumble upon a bug that's being addressed and we we're also provided >> with a temporary work-around and it could be due to that, but with a quite >> modest load we periodically kept hitting the roof of a e5520 quad-core so >> I'm a bit worried about the performance aspect. >> >> Kind regards, >> Fredrik Widlund >> >> -----Ursprungligt meddelande----- >> Från: David Coallier [mailto:[email protected]] >> Skickat: den 16 april 2010 18:06 >> Till: [email protected] >> Ämne: Re: CouchDB and Hadoop_ >> >> On 16 April 2010 16:22, Fredrik Widlund <[email protected]> wrote: >>> >>> >>> Well, we're building a solution on Couch and replication on a relatively >>> large scale and saying "it just works" doesn't really describe it for us. I >>> really like the Couch design but it's a bit of a challenge making it work, >>> for us. I can describe the case if you like. >>> >>> Also we already have a decentralized distributed file system layer (which >>> often is a natural part of a cloud solution I suppose) so if we could run >>> it on top of that it would lessen the complexity of the overall solution. >>> >>> In any case it was a quick comment to the Hadoop question, and maybe it >>> just wouldn't work that way. You could in general discuss atomic >>> operations/locking and performance implications by moving synchronization >>> to a lower abstraction layer I guess. >>> >> <snip> >> >> You should look into couchdb-lounge . It should resolve most of your >> "sharding" replication issues :) >> >> -- >> David Coallier >> >> >
