I can't remember whether it was Selena or Josh who covered Postgres' vacuum system in some depth in a talk at CouchCamp. My knowledge is far from deep, but from what I gathered Postgres has a much more complicated vacuum system. It tries to reclaim space within the DB file and has to deal with long-running transactional UPDATE commands and whatnot.
Contrast this with Couch which has no bulk transactional semantics and has a dead simple compaction system that writes out the entire database to a fresh file. While there will be some racing to flush new writes if the database file is concurrently being updated, compaction always seems to finish even though it may take a few (progressively shorter) passes. As far as when to compact. If you know roughly your insert/update ratio you can do some calculations based on the update sequence and doc count to decide when you might run compaction to reclaim space. In my personal experience it has often been easiest to just run compaction every night and I've heard of production environments that run continuous compaction. CouchDB compaction could probably be made faster even without switching from the rewrite-db-and-swap method, but in its current form I've found that it's best to provision servers in production environments under the assumption that compaction is always running since it may take quite a while to finish and relying on compacting during "off-peak" times may not be possible. Randall On Wed, Dec 1, 2010 at 10:56, Brad King <[email protected]> wrote: > Hi, I've been away from couchdb for probably 2 years. The project has > made great strides it appears. My question is around MVCC and > Compaction. We currently run a 4 node PostgreSQL farm. We often > experience problems trying to garbage collect dead tuples (vacuuming), > due to MVCC. Keeping up with this is a big problem in Postgres. I know > this is an entirely different product but, what is the recommended setup > for a similar deployment on CouchDB ? With MVCC, it seems for heavy > update or deletes we could have the same problems. The docs indicate > It's possible to get behind on this and consume all disk. Is there any > guidance on the recommended hardware configuration, max database size, > compaction schedules, etc for a full production deployment that is > update/delete heavy ? Thanks. >
