Re: runaway compaction

Damien Katz Mon, 22 Dec 2008 17:29:36 -0800

It's a known issue that compaction maybe cannot complete under heavywrite load. At some point maybe we should implement a mechanism tothrottle writes if the compaction isn't making enough progress duringupdates.


-Damien



On Dec 22, 2008, at 7:32 PM, Adam Kocoloski wrote:

Hi, I ran into an odd failure mode last week and I thought I'd askaround here to see if anyone has seen something similar. I have aCouchDB server (recent trunk) on a large EC2 instance with a DB thatsees a constant update rate of ~50 Hz. I triggered a compactionwhen the DB had reached ~27M update sequences (80 GB in total). Thefirst pass finished after 7h40m, but of course another 1.4M updateshad been written to the original DB. So far, so good.
Unfortunately, the subsequent iterations of copy_compact() ran muchslower than that original pass. After a few passes, the compactorrate was equal to the new write rate, so it effectively entered arunaway mode. The stats looked like
Pass 1:  7h40m    27870955 docs   1010 Hz
Pass 2:  3h44m     1473387 docs    110 Hz
Pass 3:  2h58m      617008 docs     58 Hz
Pass 4:  2h44m      450607 docs     46 Hz
.....
Pass 23: 4h08m      719541 docs     48 Hz
Pass 24: 1h04m      436105 docs    113 Hz
Pass 25: 21 seconds -- done.
We stopped the new write load sometime after the end of Pass 23, andthe compaction finished soon after that.
We turned the write load back on and have been compacting the DBonce/day ever since. We haven't seen this runaway mode again. I'vereviewed the compaction code a couple of times, but I can't figureout what would cause such a dramatic slowdown. Our systemmonitoring wasn't able to turn up any red flags, either -- inparticular, all the latency/throughput/IOPS stats for the diskhosting the database were pretty much constant throughout thelifetime of the compaction.
Best, Adam

Re: runaway compaction

Reply via email to