Thanks Fredrik. I think I have a pretty good handle on what's happening and have replied in detail in JIRA. Best,
Adam On Apr 19, 2010, at 10:22 AM, Fredrik Widlund wrote: > > > Hi, > > https://issues.apache.org/jira/browse/COUCHDB-722 > > Thanks, > Fredrik > > -----Original Message----- > From: Adam Kocoloski [mailto:[email protected]] > Sent: den 19 april 2010 16:05 > To: [email protected] > Subject: Re: CouchDB and Hadoop_ > > Hi Fredrik, thanks for the details. The CPU utilization does not sound > normal at all. I have a node replicating 30-75 updates/sec (unique > documents, diurnal fluctuations) for several months now and it almost never > uses more than 50% of one core of a virtualized e5410 box with 1.7G of RAM. > > I would definitely look into the crashes first and see if that resolves the > giant fluctuations in CPU. Is there a JIRA ticket I can follow? (I'm one of > the developers of the replicator). Best, > > Adam > > On Apr 19, 2010, at 4:07 AM, Fredrik Widlund wrote: > >> >> >> Hi, >> >> The case I've tested so far is using couch in the following setup (which is >> a small part of what would be a production level setup for us) >> - two bidirectionally synced nodes >> - <50 writes/s to node A, each updating a unique doc >> - <50 writes/s to node B, each updating a unique doc >> - <50 reads/s from each node >> - regular compacting the database containing the docs >> >> The two nodes run on quad (e5520) cpu with 16G ram. CPU ramp down and up to >> 400% (i.e. full load on all cores) every few seconds. Couch 0.11.0 crashes >> regularly, which has been reported and is being worked on from what I >> understand. Also, the replications tasks breaks and has to be restarted very >> often, probably due to the problem above. >> >> Now, I've received a temporary patch as a possible work-around for the >> crashes and I haven't tested this case with the work-around yet, but I would >> assume this hopefully sorts out the crashes, but not the cpu load. >> >> Kind regards, >> Fredrik Widlund >> >> -----Original Message----- >> From: Randall Leeds [mailto:[email protected]] >> Sent: den 16 april 2010 21:06 >> To: [email protected] >> Subject: Re: CouchDB and Hadoop_ >> >> Hey Fredrik, >> >> I'm one of the couchdb-lounge developers. I'd like to understand >> better what your performance concerns are. Why are you concerned about >> replicating a large number of changes? A distributed file system would >> be doing the same thing but at a lower level. If such a system were to >> work you'd be saving only HTTP and JSON overhead vs replication. If >> the replicator is too slow, that is something that can possibly be >> improved. If you're concerned about the runtime impact of replication >> this is tunable via the [replicator] configuration section. >> >> couchdb-lounge already uses nginx for distributing simple GET and PUT >> operations to documents and a python-twisted daemon to handle views. >> The twisted daemon has configurable caching (with the one caveat that >> the cache is currently unbounded, so the daemon needs to be restarted >> periodically.... I should really fix this :-P). It should be possible >> to chain any standard nginx caching modules in front of the lounge >> proxy module. >> >> If you have other concerns or would like to investigate more, ping me >> on irc (tilgovi) or join us over on >> http://groups.google.com/group/couchdb-lounge >> >> -Randall >> >> On Fri, Apr 16, 2010 at 09:54, Fredrik Widlund >> <[email protected]> wrote: >>> >>> >>> Thanks, I will! We will actually use nginx for "dumb" caching, but add an >>> api layer in between the cache and the couch. Also we actually need to >>> mirror data to provide HA, and the performance issues we're having are more >>> about constantly replicating a large number of changes than accelerating >>> the reads. I'm not sure if couchdb-lounge would address this. >>> >>> We did stumble upon a bug that's being addressed and we we're also provided >>> with a temporary work-around and it could be due to that, but with a quite >>> modest load we periodically kept hitting the roof of a e5520 quad-core so >>> I'm a bit worried about the performance aspect. >>> >>> Kind regards, >>> Fredrik Widlund >>> >>> -----Ursprungligt meddelande----- >>> Från: David Coallier [mailto:[email protected]] >>> Skickat: den 16 april 2010 18:06 >>> Till: [email protected] >>> Ämne: Re: CouchDB and Hadoop_ >>> >>> On 16 April 2010 16:22, Fredrik Widlund <[email protected]> wrote: >>>> >>>> >>>> Well, we're building a solution on Couch and replication on a relatively >>>> large scale and saying "it just works" doesn't really describe it for us. >>>> I really like the Couch design but it's a bit of a challenge making it >>>> work, for us. I can describe the case if you like. >>>> >>>> Also we already have a decentralized distributed file system layer (which >>>> often is a natural part of a cloud solution I suppose) so if we could run >>>> it on top of that it would lessen the complexity of the overall solution. >>>> >>>> In any case it was a quick comment to the Hadoop question, and maybe it >>>> just wouldn't work that way. You could in general discuss atomic >>>> operations/locking and performance implications by moving synchronization >>>> to a lower abstraction layer I guess. >>>> >>> <snip> >>> >>> You should look into couchdb-lounge . It should resolve most of your >>> "sharding" replication issues :) >>> >>> -- >>> David Coallier >>> >>> >> > >
