Re: Compaction Best Practices

Nicolas Peeters Thu, 14 Jun 2012 08:30:10 -0700

Right now, we have one per 12h and the compaction itself takes 4h. There's
no quiet period, unfortunately.
Your trigger is a great idea. I need to see if it's possible with 1.2.0.


On Thu, Jun 14, 2012 at 5:25 PM, Robert Newson <[email protected]> wrote:

> If there's a quiet period in your day/night cycle (there often isn't),
> I'd definitely schedule one then. however, it sounds like you can't go
> that long between them, so I'd try once an hour and see how it goes.
>
> You can now compare the disk_size and data_size of your database to
> get an accurate measure of how much disk space you'll recover by doing
> so, so perhaps trigger on that instead. I think the auto-compactor can
> trigger on that basis but I haven't used it (on Cloudant we've had
> this automated for a long time, so it's not something I've ever needed
> to look for).
>
> B.
>
> On 14 June 2012 16:17, Nicolas Peeters <[email protected]> wrote:
> > Totally agree that this is not the best use case for CouchDB. We're
> looking
> > at other options for the very near future. However, now we still have
> this
> > issue that we need to cope with.
> >
> > So, if you don't mind, back to my original question, if I wanted to use
> > compaction or auto-compaction (as in 1.2.0). What would be the best
> > schedule? Trigger it a lot, or trigger it as less as possible (while
> still
> > making sure I have enough disk). And what if I use the strict_window?
> >
> > On Thu, Jun 14, 2012 at 4:49 PM, Robert Newson <[email protected]>
> wrote:
> >
> >> Final note: couchdb is a database. Databases often make poor
> >> transaction logs (though they often have their own transaction logs,
> >> in a highly optimized format designed for that purpose), especially
> >> ones like couchdb which preserve a tombstone of every document ever
> >> seen forever. My suggestion above is really a coping mechanism for
> >> using the wrong tool.
> >>
> >> B.
> >>
> >> On 14 June 2012 15:47, Robert Newson <[email protected]> wrote:
> >> > The scheme I suggest avoids compaction entirely, which I thought was
> >> > your main struggle.
> >> >
> >> > You still need to delete the documents in the old database so that you
> >> > can detect when it's safe to delete it. When it's empty, -X DELETE it.
> >> > A database delete is a simple 'rm' of the file, taking very little
> >> > time.
> >> >
> >> > You can ignore the revs_limit suggestions since you don't update the
> >> > documents. And you should ignore it even if you do, there's almost no
> >> > legitimate case for altering that setting.
> >> >
> >> > B.
> >> >
> >> > On 14 June 2012 15:21, Tim Tisdall <[email protected]> wrote:
> >> >> The deleting doesn't take too much time, it's the compaction process,
> >> >> right?  If you have a different DB for each day, then you could
> >> >> compact previous days without affecting writing to the current day.
> >> >> Also, once you've completely deleted all the records from a previous
> >> >> days set of logs, you could then proceed to just delete that day's
> >> >> database instead of compacting it.
> >> >>
> >> >>
> >> >> On Thu, Jun 14, 2012 at 9:30 AM, Nicolas Peeters <
> [email protected]>
> >> wrote:
> >> >>> A few more hints, after investigation with the team.
> >> >>> 1. We can't really have rotating DBs as sometimes we want to keep
> older
> >> >>> transaction records in the DB for a longer time.
> >> >>> 2. We never replicate nor update the statements (so the _rev_limit
> >> won't
> >> >>> really change much (or will it for the compaction??))
> >> >>>
> >> >>> On Thu, Jun 14, 2012 at 3:14 PM, Nicolas Peeters <
> [email protected]
> >> >wrote:
> >> >>>
> >> >>>> Actually we never modify those records. Just query them up in
> certain
> >> >>>> cases.
> >> >>>>
> >> >>>> Regarding Robert's suggestion, I was indeed confused because he was
> >> >>>> suggesting to delete them one by one.
> >> >>>>
> >> >>>> I need to read about the "lower_revs_limit". We never replicate
> this
> >> data.
> >> >>>>
> >> >>>>
> >> >>>> On Thu, Jun 14, 2012 at 3:08 PM, Tim Tisdall <[email protected]>
> >> wrote:
> >> >>>>
> >> >>>>> I think he's suggesting avoiding compaction completely.  Just
> delete
> >> >>>>> the old DB when you've finished deleting all the records.
> >> >>>>>
> >> >>>>> On Thu, Jun 14, 2012 at 9:05 AM, Nicolas Peeters <
> >> [email protected]>
> >> >>>>> wrote:
> >> >>>>> > Interesting suggestion. However, this would perhaps have the
> same
> >> effect
> >> >>>>> > (deleting/compacting the old DB is what makes the system
> >> slower)...?
> >> >>>>> >
> >> >>>>> > On Thu, Jun 14, 2012 at 2:54 PM, Robert Newson <
> [email protected]
> >> >
> >> >>>>> wrote:
> >> >>>>> >
> >> >>>>> >> Do you eventually delete every document you add?
> >> >>>>> >>
> >> >>>>> >> If so, consider using a rolling database scheme instead. At
> some
> >> >>>>> >> point, perhaps daily, start a new database and write new
> >> transaction
> >> >>>>> >> logs there. Continue deleting old logs from the previous
> >> database(s)
> >> >>>>> >> until they're empty (doc_count:0) and then delete the database.
> >> >>>>> >>
> >> >>>>> >> B.
> >> >>>>> >>
> >> >>>>> >> On 14 June 2012 13:44, Nicolas Peeters <[email protected]>
> >> wrote:
> >> >>>>> >> > I'd like some advice from the community regarding compaction.
> >> >>>>> >> >
> >> >>>>> >> > *Scenario:*
> >> >>>>> >> >
> >> >>>>> >> > We have a large-ish CouchDB database that is being used for
> >> >>>>> transactional
> >> >>>>> >> > logs (very write heavy). Once in a while, we delete some of
> the
> >> >>>>> records
> >> >>>>> >> in
> >> >>>>> >> > large batches and we have scheduled compaction (not automatic
> >> (yet))
> >> >>>>> >> every
> >> >>>>> >> > 12hours.
> >> >>>>> >> >
> >> >>>>> >> > From what I can see, the DB is being hammered significantly
> >> every 12
> >> >>>>> >> hours
> >> >>>>> >> > and the compaction is taking 4 hours (with a size of 50-100GB
> >> of log
> >> >>>>> >> data).
> >> >>>>> >> >
> >> >>>>> >> > *The problem:*
> >> >>>>> >> >
> >> >>>>> >> > The problem is that compaction takes a very long time and
> >> reduces the
> >> >>>>> >> > performance of the stack. It seems that it's hard for the
> >> compaction
> >> >>>>> >> > process to "keep up" with the insertions, hence why it takes
> so
> >> long.
> >> >>>>> >> Also,
> >> >>>>> >> > what I'm not sure is how "incremental" the compaction is...
> >> >>>>> >> >
> >> >>>>> >> >   1. In this case, would it make sense to run the compaction
> >> more
> >> >>>>> often
> >> >>>>> >> >   (every 10 minutes); since we're write-heavy.
> >> >>>>> >> >      1. Should we just run more often? (so hopefully it
> doesn't
> >> do
> >> >>>>> >> >      unnecessary work too often). Actually, in our case, we
> >> should
> >> >>>>> >> probably
> >> >>>>> >> >      never have automatic compaction if there has been no
> >> >>>>> "termination".
> >> >>>>> >> >      2. Or actually only once in a while? (bigger batch, but
> >> less
> >> >>>>> >> >      "useless" overhead)
> >> >>>>> >> >      3. Or should we just wait that a given size (which is
> the
> >> >>>>> problem
> >> >>>>> >> >      really) is hit and use the auto compaction (in CouchDB
> >> 1.2.0)
> >> >>>>> for
> >> >>>>> >> this?
> >> >>>>> >> >   2. In CouchDB 1.2.0 there's a new feature: auto
> >> >>>>> >> > compaction<
> >> >>>>> >> http://wiki.apache.org/couchdb/Compaction#Automatic_Compaction
> >
> >> >>>>> >> > which
> >> >>>>> >> >   may be useful for us. There's the "strict_window" feature
> to
> >> give
> >> >>>>> a max
> >> >>>>> >> >   amount of time to compact and cancel the compaction after
> >> that (in
> >> >>>>> >> order
> >> >>>>> >> >   not to have it running for 4h+…). I'm wondering what the
> >> impact of
> >> >>>>> >> that is
> >> >>>>> >> >   on the long run. What if the compaction cannot be completed
> >> in that
> >> >>>>> >> window?
> >> >>>>> >> >
> >> >>>>> >> > Thanks a lot!
> >> >>>>> >> >
> >> >>>>> >> > Nicolas
> >> >>>>> >>
> >> >>>>>
> >> >>>>
> >> >>>>
> >>
>

Re: Compaction Best Practices

Reply via email to