After conferring with our sysadmins, I found out that there indeed was a backup
task running nightly at approximately the time of the crashes. They have turned
it off now. I'll let you know after the weekend how this affects the
replication setup. Keeping my fingers crossed until then. Thanks!
/ Peter
5 mar 2010 kl. 18.24 skrev Adam Kocoloski:
> That would be my guess, too.
>
> On Mar 5, 2010, at 12:22 PM, Randall Leeds wrote:
>
>> Could there be a cron job that's causing a lot of disk contention at the
>> same time every night?
>>
>> On Mar 5, 2010 7:24 AM, "Peter Bengtson" <[email protected]> wrote:
>>
>> Adam, that's interesting. These crashes occur every night with alarming
>> regularity, but the staging system on which this runs is under no load to
>> speak about. And there are only two DBs in the system at this point, both of
>> which were opened at least 12 hours earlier. I'll ask our sysadmins to
>> double-check the load, but I'd like to know one thing:
>>
>> Why do these crashes occur system-wide? On three nodes and six servers? And
>> at the same time? Somehow, we didn't quite expect that CouchDB should go
>> quite so far as to replicate the crashes... ;-)
>>
>> / Peter
>>
>>
>> 5 mar 2010 kl. 15.57 skrev Adam Kocoloski:
>>
>>
>>> From that log we can tell that CouchDB crashed completely on node0-couch2
>> (because of the "Apache...
>