Final note: couchdb is a database. Databases often make poor transaction logs (though they often have their own transaction logs, in a highly optimized format designed for that purpose), especially ones like couchdb which preserve a tombstone of every document ever seen forever. My suggestion above is really a coping mechanism for using the wrong tool.
B. On 14 June 2012 15:47, Robert Newson <[email protected]> wrote: > The scheme I suggest avoids compaction entirely, which I thought was > your main struggle. > > You still need to delete the documents in the old database so that you > can detect when it's safe to delete it. When it's empty, -X DELETE it. > A database delete is a simple 'rm' of the file, taking very little > time. > > You can ignore the revs_limit suggestions since you don't update the > documents. And you should ignore it even if you do, there's almost no > legitimate case for altering that setting. > > B. > > On 14 June 2012 15:21, Tim Tisdall <[email protected]> wrote: >> The deleting doesn't take too much time, it's the compaction process, >> right? If you have a different DB for each day, then you could >> compact previous days without affecting writing to the current day. >> Also, once you've completely deleted all the records from a previous >> days set of logs, you could then proceed to just delete that day's >> database instead of compacting it. >> >> >> On Thu, Jun 14, 2012 at 9:30 AM, Nicolas Peeters <[email protected]> wrote: >>> A few more hints, after investigation with the team. >>> 1. We can't really have rotating DBs as sometimes we want to keep older >>> transaction records in the DB for a longer time. >>> 2. We never replicate nor update the statements (so the _rev_limit won't >>> really change much (or will it for the compaction??)) >>> >>> On Thu, Jun 14, 2012 at 3:14 PM, Nicolas Peeters <[email protected]>wrote: >>> >>>> Actually we never modify those records. Just query them up in certain >>>> cases. >>>> >>>> Regarding Robert's suggestion, I was indeed confused because he was >>>> suggesting to delete them one by one. >>>> >>>> I need to read about the "lower_revs_limit". We never replicate this data. >>>> >>>> >>>> On Thu, Jun 14, 2012 at 3:08 PM, Tim Tisdall <[email protected]> wrote: >>>> >>>>> I think he's suggesting avoiding compaction completely. Just delete >>>>> the old DB when you've finished deleting all the records. >>>>> >>>>> On Thu, Jun 14, 2012 at 9:05 AM, Nicolas Peeters <[email protected]> >>>>> wrote: >>>>> > Interesting suggestion. However, this would perhaps have the same effect >>>>> > (deleting/compacting the old DB is what makes the system slower)...? >>>>> > >>>>> > On Thu, Jun 14, 2012 at 2:54 PM, Robert Newson <[email protected]> >>>>> wrote: >>>>> > >>>>> >> Do you eventually delete every document you add? >>>>> >> >>>>> >> If so, consider using a rolling database scheme instead. At some >>>>> >> point, perhaps daily, start a new database and write new transaction >>>>> >> logs there. Continue deleting old logs from the previous database(s) >>>>> >> until they're empty (doc_count:0) and then delete the database. >>>>> >> >>>>> >> B. >>>>> >> >>>>> >> On 14 June 2012 13:44, Nicolas Peeters <[email protected]> wrote: >>>>> >> > I'd like some advice from the community regarding compaction. >>>>> >> > >>>>> >> > *Scenario:* >>>>> >> > >>>>> >> > We have a large-ish CouchDB database that is being used for >>>>> transactional >>>>> >> > logs (very write heavy). Once in a while, we delete some of the >>>>> records >>>>> >> in >>>>> >> > large batches and we have scheduled compaction (not automatic (yet)) >>>>> >> every >>>>> >> > 12hours. >>>>> >> > >>>>> >> > From what I can see, the DB is being hammered significantly every 12 >>>>> >> hours >>>>> >> > and the compaction is taking 4 hours (with a size of 50-100GB of log >>>>> >> data). >>>>> >> > >>>>> >> > *The problem:* >>>>> >> > >>>>> >> > The problem is that compaction takes a very long time and reduces the >>>>> >> > performance of the stack. It seems that it's hard for the compaction >>>>> >> > process to "keep up" with the insertions, hence why it takes so long. >>>>> >> Also, >>>>> >> > what I'm not sure is how "incremental" the compaction is... >>>>> >> > >>>>> >> > 1. In this case, would it make sense to run the compaction more >>>>> often >>>>> >> > (every 10 minutes); since we're write-heavy. >>>>> >> > 1. Should we just run more often? (so hopefully it doesn't do >>>>> >> > unnecessary work too often). Actually, in our case, we should >>>>> >> probably >>>>> >> > never have automatic compaction if there has been no >>>>> "termination". >>>>> >> > 2. Or actually only once in a while? (bigger batch, but less >>>>> >> > "useless" overhead) >>>>> >> > 3. Or should we just wait that a given size (which is the >>>>> problem >>>>> >> > really) is hit and use the auto compaction (in CouchDB 1.2.0) >>>>> for >>>>> >> this? >>>>> >> > 2. In CouchDB 1.2.0 there's a new feature: auto >>>>> >> > compaction< >>>>> >> http://wiki.apache.org/couchdb/Compaction#Automatic_Compaction> >>>>> >> > which >>>>> >> > may be useful for us. There's the "strict_window" feature to give >>>>> a max >>>>> >> > amount of time to compact and cancel the compaction after that (in >>>>> >> order >>>>> >> > not to have it running for 4h+…). I'm wondering what the impact of >>>>> >> that is >>>>> >> > on the long run. What if the compaction cannot be completed in that >>>>> >> window? >>>>> >> > >>>>> >> > Thanks a lot! >>>>> >> > >>>>> >> > Nicolas >>>>> >> >>>>> >>>> >>>>
