RE: CouchDB and Hadoop_

Fredrik Widlund Mon, 19 Apr 2010 07:21:28 -0700

Hi,


https://issues.apache.org/jira/browse/COUCHDB-722

Thanks,
Fredrik

-----Original Message-----
From: Adam Kocoloski [mailto:[email protected]]
Sent: den 19 april 2010 16:05
To: [email protected]
Subject: Re: CouchDB and Hadoop_

Hi Fredrik, thanks for the details.  The CPU utilization does not sound normal 
at all.  I have a node replicating 30-75 updates/sec (unique documents, diurnal 
fluctuations) for several months now and it almost never uses more than 50% of 
one core of a virtualized e5410 box with 1.7G of RAM.

I would definitely look into the crashes first and see if that resolves the 
giant fluctuations in CPU.  Is there a JIRA ticket I can follow? (I'm one of 
the developers of the replicator).  Best,

Adam

On Apr 19, 2010, at 4:07 AM, Fredrik Widlund wrote:

>
>
> Hi,
>
> The case I've tested so far is using couch in the following setup (which is a 
> small part of what would be a production level setup for us)
> - two bidirectionally synced nodes
> - <50 writes/s to node A, each updating a unique doc
> - <50 writes/s to node B, each updating a unique doc
> - <50 reads/s from each node
> - regular compacting the database containing the docs
>
> The two nodes run on quad (e5520) cpu with 16G ram. CPU ramp down and up to 
> 400% (i.e. full load on all cores) every few seconds. Couch 0.11.0 crashes 
> regularly, which has been reported and is being worked on from what I 
> understand. Also, the replications tasks breaks and has to be restarted very 
> often, probably due to the problem above.
>
> Now, I've received a temporary patch as a possible work-around for the 
> crashes and I haven't tested this case with the work-around yet, but I would 
> assume this hopefully sorts out the crashes, but not the cpu load.
>
> Kind regards,
> Fredrik Widlund
>
> -----Original Message-----
> From: Randall Leeds [mailto:[email protected]]
> Sent: den 16 april 2010 21:06
> To: [email protected]
> Subject: Re: CouchDB and Hadoop_
>
> Hey Fredrik,
>
> I'm one of the couchdb-lounge developers. I'd like to understand
> better what your performance concerns are. Why are you concerned about
> replicating a large number of changes? A distributed file system would
> be doing the same thing but at a lower level. If such a system were to
> work you'd be saving only HTTP and JSON overhead vs replication. If
> the replicator is too slow, that is something that can possibly be
> improved. If you're concerned about the runtime impact of replication
> this is tunable via the [replicator] configuration section.
>
> couchdb-lounge already uses nginx for distributing simple GET and PUT
> operations to documents and a python-twisted daemon to handle views.
> The twisted daemon has configurable caching (with the one caveat that
> the cache is currently unbounded, so the daemon needs to be restarted
> periodically.... I should really fix this :-P). It should be possible
> to chain any standard nginx caching modules in front of the lounge
> proxy module.
>
> If you have other concerns or would like to investigate more, ping me
> on irc (tilgovi) or join us over on
> http://groups.google.com/group/couchdb-lounge
>
> -Randall
>
> On Fri, Apr 16, 2010 at 09:54, Fredrik Widlund
> <[email protected]> wrote:
>>
>>
>> Thanks, I will! We will actually use nginx for "dumb" caching, but add an 
>> api layer in between the cache and the couch. Also we actually need to 
>> mirror data to provide HA, and the performance issues we're having are more 
>> about constantly replicating a large number of changes than accelerating the 
>> reads. I'm not sure if couchdb-lounge would address this.
>>
>> We did stumble upon a bug that's being addressed and we we're also provided 
>> with a temporary work-around and it could be due to that, but with a quite 
>> modest load we periodically kept hitting the roof of a e5520 quad-core so 
>> I'm a bit worried about the performance aspect.
>>
>> Kind regards,
>> Fredrik Widlund
>>
>> -----Ursprungligt meddelande-----
>> Från: David Coallier [mailto:[email protected]]
>> Skickat: den 16 april 2010 18:06
>> Till: [email protected]
>> Ämne: Re: CouchDB and Hadoop_
>>
>> On 16 April 2010 16:22, Fredrik Widlund <[email protected]> wrote:
>>>
>>>
>>> Well, we're building a solution on Couch and replication on a relatively 
>>> large scale and saying "it just works" doesn't really describe it for us. I 
>>> really like the Couch design but it's a bit of a challenge making it work, 
>>> for us. I can describe the case if you like.
>>>
>>> Also we already have a decentralized distributed file system layer (which 
>>> often is a natural part of a cloud solution I suppose) so if we could run 
>>> it on top of that it would lessen the complexity of the overall solution.
>>>
>>> In any case it was a quick comment to the Hadoop question, and maybe it 
>>> just wouldn't work that way. You could in general discuss atomic 
>>> operations/locking and performance implications by moving synchronization 
>>> to a lower abstraction layer I guess.
>>>
>> <snip>
>>
>> You should look into couchdb-lounge . It should resolve most of your
>> "sharding" replication issues :)
>>
>> --
>> David Coallier
>>
>>
>

RE: CouchDB and Hadoop_

Reply via email to