Re: CouchDB 1.1 issue

Paul Davis Thu, 01 Sep 2011 20:30:49 -0700

I don't have any immediate thoughts no. I'm not "recite details from
memory" familiar with this part of the code base. AFAIK it could be
anything from a networking blip to a pathological log formatting
issue.


On Thu, Sep 1, 2011 at 9:11 PM, kowsik <[email protected]> wrote:
> Wow, I'm shocked by the eerie silence on this. So I take it, there are
> no clues in my prior emails to figure out why the replicator is
> backing up and then dumping a 500,000 line stack trace?
>
> Dunno if it helps, but here's what we see. The number of documents
> between the two clusters will start to differ (meaning things are not
> replicating fast enough) and then we'll see 100% CPU utilization one
> of the them at the same time watching the memory utilization grow.
> Could it be the geo-latency that's causing the problem?
>
> Just to see if it makes a difference, we are moving our CouchDB
> cluster to an m2.2xlarge instance (big honking instance with fast IO)
> as well as using instance storage instead of EBS. Will report back on
> what we see. But we definitely could use some help here.
>
> Thanks,
>
> K.
> ---
> http://blitz.io
> @pcapr
>
> On Thu, Sep 1, 2011 at 7:29 AM, kowsik <[email protected]> wrote:
>> One more observation. It seems the memory goes up dramatically while
>> the replicator task is writing all the failed-to-replicate-docs to the
>> log (ends with this)
>>
>> ** Reason for termination ==
>> ** {http_request_failed,<<"failed to replicate http://host/db";>>}
>>
>> Is there a way to disable logging for the replicator? Interestingly
>> enough, as soon as we restart, the replicator simply catches up and
>> pretends there were no problems.
>>
>> K.
>> ---
>> http://blog.mudynamics.com
>> http://blitz.io
>> @pcapr
>>
>> On Thu, Sep 1, 2011 at 7:18 AM, kowsik <[email protected]> wrote:
>>> Right before I sent this email we restarted CouchDB and now it's at
>>> 14% memory usage and climbing. Is there anything we can look at
>>> stats-wise and see where the pressure in the system is? I realize task
>>> stats are being added to trunk, but on 1.1, anything?
>>>
>>> Thanks,
>>>
>>> K.
>>> ---
>>> http://blog.mudynamics.com
>>> http://blitz.io
>>> @pcapr
>>>
>>> On Thu, Sep 1, 2011 at 6:35 AM, Scott Feinberg <[email protected]> 
>>> wrote:
>>>> I haven't had that issue-though I'm not using using 1.1 in a
>>>> production environment, just using it to replicate like crazy (millions of
>>>> docs in each of my 20+ databases).  I was running a server with 1 GB of
>>>> memory and didn't have an issue, it handled it fine.
>>>>
>>>> However... from http://docs.couchbase.org/couchdb-release-1.1/index.html
>>>>
>>>> When you PUT/POST a document to the _replicator database, CouchDB will
>>>> attempt to start the replication up to 10 times (configurable under
>>>> [replicator], parameter max_replication_retry_count).
>>>>
>>>> Not sure if that helps.
>>>>
>>>> --Scott
>>>>
>>>> On Thu, Sep 1, 2011 at 9:28 AM, kowsik <[email protected]> wrote:
>>>>
>>>>> Ran into this twice so far in production CouchDB in the last two days.
>>>>> We are running CouchDB 1.1 on an EC2 AMI with multi-master replication
>>>>> across two regions. I notice that every now and then CouchDB will
>>>>> simply suck up 100% CPU 50% of the total memory and not respond at
>>>>> all. So far the logs only show sporadic replication errors. One of the
>>>>> stack traces (failed to replicate after 10 times) is about 500,000
>>>>> lines long. We are using the _replicator database.
>>>>>
>>>>> Anyone else running into this? Since 1.1 doesn't have the
>>>>> try-until-infinity-and-beyond mode, we have a worker task that watches
>>>>> the _replication_state and kicks the replicator as soon as it errors
>>>>> out. Are there any settings in terms replicator memory usage, etc that
>>>>> could help us?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> K.
>>>>> ---
>>>>> http://blog.mudynamics.com
>>>>> http://blitz.io
>>>>> @pcapr
>>>>>
>>>>
>>>
>>
>

Re: CouchDB 1.1 issue

Reply via email to