The deleting doesn't take too much time, it's the compaction process, right? If you have a different DB for each day, then you could compact previous days without affecting writing to the current day. Also, once you've completely deleted all the records from a previous days set of logs, you could then proceed to just delete that day's database instead of compacting it.
On Thu, Jun 14, 2012 at 9:30 AM, Nicolas Peeters <[email protected]> wrote: > A few more hints, after investigation with the team. > 1. We can't really have rotating DBs as sometimes we want to keep older > transaction records in the DB for a longer time. > 2. We never replicate nor update the statements (so the _rev_limit won't > really change much (or will it for the compaction??)) > > On Thu, Jun 14, 2012 at 3:14 PM, Nicolas Peeters <[email protected]>wrote: > >> Actually we never modify those records. Just query them up in certain >> cases. >> >> Regarding Robert's suggestion, I was indeed confused because he was >> suggesting to delete them one by one. >> >> I need to read about the "lower_revs_limit". We never replicate this data. >> >> >> On Thu, Jun 14, 2012 at 3:08 PM, Tim Tisdall <[email protected]> wrote: >> >>> I think he's suggesting avoiding compaction completely. Just delete >>> the old DB when you've finished deleting all the records. >>> >>> On Thu, Jun 14, 2012 at 9:05 AM, Nicolas Peeters <[email protected]> >>> wrote: >>> > Interesting suggestion. However, this would perhaps have the same effect >>> > (deleting/compacting the old DB is what makes the system slower)...? >>> > >>> > On Thu, Jun 14, 2012 at 2:54 PM, Robert Newson <[email protected]> >>> wrote: >>> > >>> >> Do you eventually delete every document you add? >>> >> >>> >> If so, consider using a rolling database scheme instead. At some >>> >> point, perhaps daily, start a new database and write new transaction >>> >> logs there. Continue deleting old logs from the previous database(s) >>> >> until they're empty (doc_count:0) and then delete the database. >>> >> >>> >> B. >>> >> >>> >> On 14 June 2012 13:44, Nicolas Peeters <[email protected]> wrote: >>> >> > I'd like some advice from the community regarding compaction. >>> >> > >>> >> > *Scenario:* >>> >> > >>> >> > We have a large-ish CouchDB database that is being used for >>> transactional >>> >> > logs (very write heavy). Once in a while, we delete some of the >>> records >>> >> in >>> >> > large batches and we have scheduled compaction (not automatic (yet)) >>> >> every >>> >> > 12hours. >>> >> > >>> >> > From what I can see, the DB is being hammered significantly every 12 >>> >> hours >>> >> > and the compaction is taking 4 hours (with a size of 50-100GB of log >>> >> data). >>> >> > >>> >> > *The problem:* >>> >> > >>> >> > The problem is that compaction takes a very long time and reduces the >>> >> > performance of the stack. It seems that it's hard for the compaction >>> >> > process to "keep up" with the insertions, hence why it takes so long. >>> >> Also, >>> >> > what I'm not sure is how "incremental" the compaction is... >>> >> > >>> >> > 1. In this case, would it make sense to run the compaction more >>> often >>> >> > (every 10 minutes); since we're write-heavy. >>> >> > 1. Should we just run more often? (so hopefully it doesn't do >>> >> > unnecessary work too often). Actually, in our case, we should >>> >> probably >>> >> > never have automatic compaction if there has been no >>> "termination". >>> >> > 2. Or actually only once in a while? (bigger batch, but less >>> >> > "useless" overhead) >>> >> > 3. Or should we just wait that a given size (which is the >>> problem >>> >> > really) is hit and use the auto compaction (in CouchDB 1.2.0) >>> for >>> >> this? >>> >> > 2. In CouchDB 1.2.0 there's a new feature: auto >>> >> > compaction< >>> >> http://wiki.apache.org/couchdb/Compaction#Automatic_Compaction> >>> >> > which >>> >> > may be useful for us. There's the "strict_window" feature to give >>> a max >>> >> > amount of time to compact and cancel the compaction after that (in >>> >> order >>> >> > not to have it running for 4h+…). I'm wondering what the impact of >>> >> that is >>> >> > on the long run. What if the compaction cannot be completed in that >>> >> window? >>> >> > >>> >> > Thanks a lot! >>> >> > >>> >> > Nicolas >>> >> >>> >> >>
