Re: CouchDB and Hadoop_

Adam Kocoloski Mon, 19 Apr 2010 07:39:34 -0700

Thanks Fredrik.  I think I have a pretty good handle on what's happening and 
have replied in detail in JIRA.  Best,


Adam

On Apr 19, 2010, at 10:22 AM, Fredrik Widlund wrote:

> 
> 
> Hi,
> 
> https://issues.apache.org/jira/browse/COUCHDB-722
> 
> Thanks,
> Fredrik
> 
> -----Original Message-----
> From: Adam Kocoloski [mailto:[email protected]]
> Sent: den 19 april 2010 16:05
> To: [email protected]
> Subject: Re: CouchDB and Hadoop_
> 
> Hi Fredrik, thanks for the details.  The CPU utilization does not sound 
> normal at all.  I have a node replicating 30-75 updates/sec (unique 
> documents, diurnal fluctuations) for several months now and it almost never 
> uses more than 50% of one core of a virtualized e5410 box with 1.7G of RAM.
> 
> I would definitely look into the crashes first and see if that resolves the 
> giant fluctuations in CPU.  Is there a JIRA ticket I can follow? (I'm one of 
> the developers of the replicator).  Best,
> 
> Adam
> 
> On Apr 19, 2010, at 4:07 AM, Fredrik Widlund wrote:
> 
>> 
>> 
>> Hi,
>> 
>> The case I've tested so far is using couch in the following setup (which is 
>> a small part of what would be a production level setup for us)
>> - two bidirectionally synced nodes
>> - <50 writes/s to node A, each updating a unique doc
>> - <50 writes/s to node B, each updating a unique doc
>> - <50 reads/s from each node
>> - regular compacting the database containing the docs
>> 
>> The two nodes run on quad (e5520) cpu with 16G ram. CPU ramp down and up to 
>> 400% (i.e. full load on all cores) every few seconds. Couch 0.11.0 crashes 
>> regularly, which has been reported and is being worked on from what I 
>> understand. Also, the replications tasks breaks and has to be restarted very 
>> often, probably due to the problem above.
>> 
>> Now, I've received a temporary patch as a possible work-around for the 
>> crashes and I haven't tested this case with the work-around yet, but I would 
>> assume this hopefully sorts out the crashes, but not the cpu load.
>> 
>> Kind regards,
>> Fredrik Widlund
>> 
>> -----Original Message-----
>> From: Randall Leeds [mailto:[email protected]]
>> Sent: den 16 april 2010 21:06
>> To: [email protected]
>> Subject: Re: CouchDB and Hadoop_
>> 
>> Hey Fredrik,
>> 
>> I'm one of the couchdb-lounge developers. I'd like to understand
>> better what your performance concerns are. Why are you concerned about
>> replicating a large number of changes? A distributed file system would
>> be doing the same thing but at a lower level. If such a system were to
>> work you'd be saving only HTTP and JSON overhead vs replication. If
>> the replicator is too slow, that is something that can possibly be
>> improved. If you're concerned about the runtime impact of replication
>> this is tunable via the [replicator] configuration section.
>> 
>> couchdb-lounge already uses nginx for distributing simple GET and PUT
>> operations to documents and a python-twisted daemon to handle views.
>> The twisted daemon has configurable caching (with the one caveat that
>> the cache is currently unbounded, so the daemon needs to be restarted
>> periodically.... I should really fix this :-P). It should be possible
>> to chain any standard nginx caching modules in front of the lounge
>> proxy module.
>> 
>> If you have other concerns or would like to investigate more, ping me
>> on irc (tilgovi) or join us over on
>> http://groups.google.com/group/couchdb-lounge
>> 
>> -Randall
>> 
>> On Fri, Apr 16, 2010 at 09:54, Fredrik Widlund
>> <[email protected]> wrote:
>>> 
>>> 
>>> Thanks, I will! We will actually use nginx for "dumb" caching, but add an 
>>> api layer in between the cache and the couch. Also we actually need to 
>>> mirror data to provide HA, and the performance issues we're having are more 
>>> about constantly replicating a large number of changes than accelerating 
>>> the reads. I'm not sure if couchdb-lounge would address this.
>>> 
>>> We did stumble upon a bug that's being addressed and we we're also provided 
>>> with a temporary work-around and it could be due to that, but with a quite 
>>> modest load we periodically kept hitting the roof of a e5520 quad-core so 
>>> I'm a bit worried about the performance aspect.
>>> 
>>> Kind regards,
>>> Fredrik Widlund
>>> 
>>> -----Ursprungligt meddelande-----
>>> Från: David Coallier [mailto:[email protected]]
>>> Skickat: den 16 april 2010 18:06
>>> Till: [email protected]
>>> Ämne: Re: CouchDB and Hadoop_
>>> 
>>> On 16 April 2010 16:22, Fredrik Widlund <[email protected]> wrote:
>>>> 
>>>> 
>>>> Well, we're building a solution on Couch and replication on a relatively 
>>>> large scale and saying "it just works" doesn't really describe it for us. 
>>>> I really like the Couch design but it's a bit of a challenge making it 
>>>> work, for us. I can describe the case if you like.
>>>> 
>>>> Also we already have a decentralized distributed file system layer (which 
>>>> often is a natural part of a cloud solution I suppose) so if we could run 
>>>> it on top of that it would lessen the complexity of the overall solution.
>>>> 
>>>> In any case it was a quick comment to the Hadoop question, and maybe it 
>>>> just wouldn't work that way. You could in general discuss atomic 
>>>> operations/locking and performance implications by moving synchronization 
>>>> to a lower abstraction layer I guess.
>>>> 
>>> <snip>
>>> 
>>> You should look into couchdb-lounge . It should resolve most of your
>>> "sharding" replication issues :)
>>> 
>>> --
>>> David Coallier
>>> 
>>> 
>> 
> 
>

Re: CouchDB and Hadoop_

Reply via email to