On 3/8/11 1:09 AM, Ian Hobson wrote:
On 02/03/2011 19:33, Wayne Conrad wrote:
We run a compaction script that compacts every database every night.
Compaction of our biggest (0.6 TB) database took about 10 hours today.
Granted, the hardware has poor I/O bandwidth, but even if we improve
the hardware, a change in strategy could be good. Along with splitting
that database into more manageable pieces, I hope to write a
compaction script that only compacts a database sometimes (a la
Postgresql's autovacuum). To do that, I want some way to estimate
whether there's anything to gain from compacting any given database.

I thought I could use the doc_del_count returned by GET
/<database-name> as a gauge of whether to compact or not, but in my
tests doc_del_count remained the same after compaction. Are there any
statistics, however imperfect, that could help my code guess when
compaction ought to be done?

Just a thought.

After compacting, the database will have a given size on disk. Would it
be possible to test, and compact if this grew by (say) 15%?

Its not perfect - but it might be better than time.

Wayne,

You say that your database size is 0.6 TB. What is the change in size during the day? What is the change in size after the compaction? If your database is not increasing appreciably in size during the day and if the compacted database size is not appreciably smaller than the pre-compaction size, I don't think you are gaining much by compacting once per day. In fact, you are taking a significant performance hit if your compaction is running for 10 hours every day.

Perhaps a simple change in compaction schedule from once per day to once per N days will help in the short term. Similarly to Robert's suggestion, I keep track of the initial size of the database as well as the initial sizes of each of the views and compact them whenever they double in size.

Of course you will need to tailor the trigger point so as to not reach a point where you do not have sufficient disk space to complete the compaction.

Until Bob Dionne's patch is released I think that is the best you can achieve.

Unless compaction performance is significantly improved, you also need to consider the case where once your database grows to a sufficiently large size, your database will reach a state where it is always being compacted, i.e., it will take so long to complete compaction that it will need to be compacted again immediately after it finishes.

/bc

Reply via email to