On Dec 22, 2008, at 7:32 PM, Adam Kocoloski wrote:

Hi, I ran into an odd failure mode last week and I thought I'd ask around here to see if anyone has seen something similar. I have a CouchDB server (recent trunk) on a large EC2 instance with a DB that sees a constant update rate of ~50 Hz. I triggered a compaction when the DB had reached ~27M update sequences (80 GB in total). The first pass finished after 7h40m, but of course another 1.4M updates had been written to the original DB. So far, so good.

Unfortunately, the subsequent iterations of copy_compact() ran much slower than that original pass. After a few passes, the compactor rate was equal to the new write rate, so it effectively entered a runaway mode. The stats looked like

Pass 1:  7h40m    27870955 docs   1010 Hz
Pass 2:  3h44m     1473387 docs    110 Hz
Pass 3:  2h58m      617008 docs     58 Hz
Pass 4:  2h44m      450607 docs     46 Hz
.....
Pass 23: 4h08m      719541 docs     48 Hz
Pass 24: 1h04m      436105 docs    113 Hz
Pass 25: 21 seconds -- done.


There is an expected slowdown during the retry, because it needs to update previous values, not just copy docs, which means 2 extra btree operations. However, I must say I'm surprised at the magnitude of the slowdown. Maybe there is bug or simple optimization that can be performed.

-Damien

Reply via email to