Mark Kirkwood wrote:
> On 06/05/14 16:28, Amit Kapila wrote:

> >On Mon, May 5, 2014 at 11:57 AM, Mark Kirkwood
> ><> wrote:

> >I could think of 2 ways to change this:
> >
> >a. if user has specified cost_limit value for table, then it just uses it
> >     rather than rebalancing based on value of system-wide guc variable
> >     autovacuum_vacuum_cost_limit
> >b. another could be to restrict setting per-table value to be lesser than
> >     system-wide value?
> >
> >The former is used for auto vacuum parameters like scale_factor and
> >later is used for parameters like freeze_max_age.
> >
> >Thoughts?
> >
> >Alvaro, do you think above options makes sense to solve this problem?

I've been giving some thought to this.  Really, there is no way to
handle this sensibly while at the same time keeping the documented
behavior -- or in other words, what we have documented is not useful
behavior.  Your option (b) above is an easy solution to the problem,
however it means that the user will have serious trouble configuring the
system in scenarios such as volatile tables, as Mark says -- essentially
that will foreclose the option of using autovacuum for them.

I'm not sure I like your (a) proposal much better.  One problem there is
that if you set the values for a table to be exactly the same values as
in postgresql.conf, it will behave completely different because it will
not participate in balancing.  To me this seems to violate POLA.

I checked Haribabu's latest patch in this thread, and didn't like it
much.  If you set up a table to have cost_delay=1000, it runs at that
speed when vacuumed alone; but if there are two workers, it goes at half
the speed even if the other one is configured with a very small
cost_delay (in essence, "waste" the allocated I/O bandwidth).  Three
workers, it goes at a third of the speed -- again, even if the other
tables are configured to go much slower than the volatile one.  This
seems too simplistic.  It might be okay when you have only one or two
very large or high-churn tables, and small numbers of workers, but it's
not unreasonable to think that you might have lots more workers if your
DB has many high-churn tables.

So my proposal is a bit more complicated.  First we introduce the notion
of a single number, to enable sorting and computations: the "delay
equivalent", which is the cost_limit divided by cost_delay.  The highest
the value is for any table, the fastest it is vacuumed.  (It makes sense
in physical terms: a higher cost_limit makes it faster, because vacuum
sleeps less often; and a higher cost_delay makes it go slower, because
vacuums sleeps for longer.)  Now, the critical issue is to notice that
not all tables are equal; they can be split in two groups, those that go
faster than the global delay equivalent
(i.e. the effective values of GUC variables
autovacuum_vacuum_cost_limit/autovacuum_vacuum_cost_delay), and those
that go equal or slower.  For the latter group, the rebalancing
algorithm "distributes" the allocated I/O by the global vars, in a
pro-rated manner.  For the former group (tables vacuumed faster than
global delay equiv), to rebalance we don't consider the global delay
equiv but the delay equiv of the fastest table currently being vacuumed.

Suppose we have two tables, delay_equiv=10 each (which is the default
value).  If they are both vacuumed in parallel, then we distribute a
delay_equiv of 5 to each (so set cost_limit=100, cost_delay=20).  As
soon as one of them finishes, the remaining one is allowed to upgrade to
delay_equiv=10 (cost_limit=200, cost_delay=20).

Now add a third table, delay_equiv=500 (cost_limit=10000, cost_delay=20;
this is Mark's volatile table).  If it's being vacuumed on its own, just
assign cost_limit=10000 cost_delay=20, as normal.  If one of the other
two tables are being vacuumed, that one will use delay_equiv=10, as per
above.  To balance the volatile table, we take the delay_equiv of this
one and subtract the already handed-out delay_equiv of 10; so we set the
volatile table to delay_equiv=490 (cost_limit=9800, cost_delay=20).

If we do it this way, the whole system is running at the full speed
enabled by the fastest table we have set the per-table options, but also
we have scaled things so that the slow tables go slow and the fast
tables go fast.

As a more elaborate example, add a fourth table with delay_equiv=50
(cost_limit=1000, cost_delay=20).  This is also faster than the global
vars, so we put it in the first group.  If all four tables are being
vacuumed in parallel, we have the two slow tables going at delay_equiv=5
each (cost_limit=100, cost_delay=20); then there are delay_equiv=490 to
distribute among the remaining ones; pro-rating this we have
delay_equiv=445 (cost_limit=8900, cost_delay=20) for the volatile table
and delay_equiv=45 (cost_limit=900, cost_delay=20) for the other one.

If one of the slowest tables finished vacuuming, the other one will
speed up to delay_equiv=10, and the two fastest ones will go on
unchanged.  If both finish and the fast tables keep going, the faster
one will go at delay_equiv=454 and the other one at delay_equiv=45.
Note that the volatile table will go a bit faster while the other one is
barely affected.

Essentially, if you configure a table with a delay-equiv that's greater
than the system configured values, you're giving permission for vacuum
to use more I/O, but each table has its own limit to how fast it can go.

> The (ahem) sensible way that one would expect (perhaps even need)
> autovacuum throttling to work is:
> - set sensible defaults for all the usual (well behaved) tables
> - set a few really aggressive overrides for a handful of the naughty ones

Does my proposal above satisfy your concerns?

> Runaway free space bloat is one of the things that can really mangle
> a postgres system (I've been called in to rescue a few in my
> time)... there needs to be a way to control those few badly behaved
> tables ... without removing the usefulness of throttling the others.


Álvaro Herrera      
PostgreSQL Development, 24x7 Support, Training & Services

Sent via pgsql-hackers mailing list (
To make changes to your subscription:

Reply via email to