Right now, we have one per 12h and the compaction itself takes 4h. There's no quiet period, unfortunately. Your trigger is a great idea. I need to see if it's possible with 1.2.0.
On Thu, Jun 14, 2012 at 5:25 PM, Robert Newson <[email protected]> wrote: > If there's a quiet period in your day/night cycle (there often isn't), > I'd definitely schedule one then. however, it sounds like you can't go > that long between them, so I'd try once an hour and see how it goes. > > You can now compare the disk_size and data_size of your database to > get an accurate measure of how much disk space you'll recover by doing > so, so perhaps trigger on that instead. I think the auto-compactor can > trigger on that basis but I haven't used it (on Cloudant we've had > this automated for a long time, so it's not something I've ever needed > to look for). > > B. > > On 14 June 2012 16:17, Nicolas Peeters <[email protected]> wrote: > > Totally agree that this is not the best use case for CouchDB. We're > looking > > at other options for the very near future. However, now we still have > this > > issue that we need to cope with. > > > > So, if you don't mind, back to my original question, if I wanted to use > > compaction or auto-compaction (as in 1.2.0). What would be the best > > schedule? Trigger it a lot, or trigger it as less as possible (while > still > > making sure I have enough disk). And what if I use the strict_window? > > > > On Thu, Jun 14, 2012 at 4:49 PM, Robert Newson <[email protected]> > wrote: > > > >> Final note: couchdb is a database. Databases often make poor > >> transaction logs (though they often have their own transaction logs, > >> in a highly optimized format designed for that purpose), especially > >> ones like couchdb which preserve a tombstone of every document ever > >> seen forever. My suggestion above is really a coping mechanism for > >> using the wrong tool. > >> > >> B. > >> > >> On 14 June 2012 15:47, Robert Newson <[email protected]> wrote: > >> > The scheme I suggest avoids compaction entirely, which I thought was > >> > your main struggle. > >> > > >> > You still need to delete the documents in the old database so that you > >> > can detect when it's safe to delete it. When it's empty, -X DELETE it. > >> > A database delete is a simple 'rm' of the file, taking very little > >> > time. > >> > > >> > You can ignore the revs_limit suggestions since you don't update the > >> > documents. And you should ignore it even if you do, there's almost no > >> > legitimate case for altering that setting. > >> > > >> > B. > >> > > >> > On 14 June 2012 15:21, Tim Tisdall <[email protected]> wrote: > >> >> The deleting doesn't take too much time, it's the compaction process, > >> >> right? If you have a different DB for each day, then you could > >> >> compact previous days without affecting writing to the current day. > >> >> Also, once you've completely deleted all the records from a previous > >> >> days set of logs, you could then proceed to just delete that day's > >> >> database instead of compacting it. > >> >> > >> >> > >> >> On Thu, Jun 14, 2012 at 9:30 AM, Nicolas Peeters < > [email protected]> > >> wrote: > >> >>> A few more hints, after investigation with the team. > >> >>> 1. We can't really have rotating DBs as sometimes we want to keep > older > >> >>> transaction records in the DB for a longer time. > >> >>> 2. We never replicate nor update the statements (so the _rev_limit > >> won't > >> >>> really change much (or will it for the compaction??)) > >> >>> > >> >>> On Thu, Jun 14, 2012 at 3:14 PM, Nicolas Peeters < > [email protected] > >> >wrote: > >> >>> > >> >>>> Actually we never modify those records. Just query them up in > certain > >> >>>> cases. > >> >>>> > >> >>>> Regarding Robert's suggestion, I was indeed confused because he was > >> >>>> suggesting to delete them one by one. > >> >>>> > >> >>>> I need to read about the "lower_revs_limit". We never replicate > this > >> data. > >> >>>> > >> >>>> > >> >>>> On Thu, Jun 14, 2012 at 3:08 PM, Tim Tisdall <[email protected]> > >> wrote: > >> >>>> > >> >>>>> I think he's suggesting avoiding compaction completely. Just > delete > >> >>>>> the old DB when you've finished deleting all the records. > >> >>>>> > >> >>>>> On Thu, Jun 14, 2012 at 9:05 AM, Nicolas Peeters < > >> [email protected]> > >> >>>>> wrote: > >> >>>>> > Interesting suggestion. However, this would perhaps have the > same > >> effect > >> >>>>> > (deleting/compacting the old DB is what makes the system > >> slower)...? > >> >>>>> > > >> >>>>> > On Thu, Jun 14, 2012 at 2:54 PM, Robert Newson < > [email protected] > >> > > >> >>>>> wrote: > >> >>>>> > > >> >>>>> >> Do you eventually delete every document you add? > >> >>>>> >> > >> >>>>> >> If so, consider using a rolling database scheme instead. At > some > >> >>>>> >> point, perhaps daily, start a new database and write new > >> transaction > >> >>>>> >> logs there. Continue deleting old logs from the previous > >> database(s) > >> >>>>> >> until they're empty (doc_count:0) and then delete the database. > >> >>>>> >> > >> >>>>> >> B. > >> >>>>> >> > >> >>>>> >> On 14 June 2012 13:44, Nicolas Peeters <[email protected]> > >> wrote: > >> >>>>> >> > I'd like some advice from the community regarding compaction. > >> >>>>> >> > > >> >>>>> >> > *Scenario:* > >> >>>>> >> > > >> >>>>> >> > We have a large-ish CouchDB database that is being used for > >> >>>>> transactional > >> >>>>> >> > logs (very write heavy). Once in a while, we delete some of > the > >> >>>>> records > >> >>>>> >> in > >> >>>>> >> > large batches and we have scheduled compaction (not automatic > >> (yet)) > >> >>>>> >> every > >> >>>>> >> > 12hours. > >> >>>>> >> > > >> >>>>> >> > From what I can see, the DB is being hammered significantly > >> every 12 > >> >>>>> >> hours > >> >>>>> >> > and the compaction is taking 4 hours (with a size of 50-100GB > >> of log > >> >>>>> >> data). > >> >>>>> >> > > >> >>>>> >> > *The problem:* > >> >>>>> >> > > >> >>>>> >> > The problem is that compaction takes a very long time and > >> reduces the > >> >>>>> >> > performance of the stack. It seems that it's hard for the > >> compaction > >> >>>>> >> > process to "keep up" with the insertions, hence why it takes > so > >> long. > >> >>>>> >> Also, > >> >>>>> >> > what I'm not sure is how "incremental" the compaction is... > >> >>>>> >> > > >> >>>>> >> > 1. In this case, would it make sense to run the compaction > >> more > >> >>>>> often > >> >>>>> >> > (every 10 minutes); since we're write-heavy. > >> >>>>> >> > 1. Should we just run more often? (so hopefully it > doesn't > >> do > >> >>>>> >> > unnecessary work too often). Actually, in our case, we > >> should > >> >>>>> >> probably > >> >>>>> >> > never have automatic compaction if there has been no > >> >>>>> "termination". > >> >>>>> >> > 2. Or actually only once in a while? (bigger batch, but > >> less > >> >>>>> >> > "useless" overhead) > >> >>>>> >> > 3. Or should we just wait that a given size (which is > the > >> >>>>> problem > >> >>>>> >> > really) is hit and use the auto compaction (in CouchDB > >> 1.2.0) > >> >>>>> for > >> >>>>> >> this? > >> >>>>> >> > 2. In CouchDB 1.2.0 there's a new feature: auto > >> >>>>> >> > compaction< > >> >>>>> >> http://wiki.apache.org/couchdb/Compaction#Automatic_Compaction > > > >> >>>>> >> > which > >> >>>>> >> > may be useful for us. There's the "strict_window" feature > to > >> give > >> >>>>> a max > >> >>>>> >> > amount of time to compact and cancel the compaction after > >> that (in > >> >>>>> >> order > >> >>>>> >> > not to have it running for 4h+…). I'm wondering what the > >> impact of > >> >>>>> >> that is > >> >>>>> >> > on the long run. What if the compaction cannot be completed > >> in that > >> >>>>> >> window? > >> >>>>> >> > > >> >>>>> >> > Thanks a lot! > >> >>>>> >> > > >> >>>>> >> > Nicolas > >> >>>>> >> > >> >>>>> > >> >>>> > >> >>>> > >> >
