On 2/19/19 8:22 PM, Andres Freund wrote:
> On 2019-02-19 20:02:32 +0100, Tomas Vondra wrote:
>> Let's do a short example. Assume the default vacuum costing parameters
>>
>>     vacuum_cost_limit = 200
>>     vacuum_cost_delay = 20ms
>>     cost_page_dirty = 20
>>
>> and for simplicity we only do writes. So vacuum can do ~8MB/s of writes.
>>
>> Now, let's also throttle based on WAL - once in a while, after producing
>> some amount of WAL we sleep for a while. Again, for simplicity let's
>> assume the sleeps perfectly interleave and are also 20ms. So we have
>> something like:
> 
>>     sleep(20ms); -- vacuum
>>     sleep(20ms); -- WAL
>>     sleep(20ms); -- vacuum
>>     sleep(20ms); -- WAL
>>     sleep(20ms); -- vacuum
>>     sleep(20ms); -- WAL
>>     sleep(20ms); -- vacuum
>>     sleep(20ms); -- WAL
>>
>> Suddenly, we only reach 4MB/s of writes from vacuum. But we also reach
>> only 1/2 the WAL throughput, because it's affected exactly the same way
>> by the sleeps from vacuum throttling.
>>
>> We've not reached either of the limits. How exactly is this "lower limit
>> takes effect"?
> 
> Because I upthread said that that's not how I think a sane
> implementation of WAL throttling would work. I think the whole cost
> budgeting approach is BAD, and it'd be serious mistake to copy it for a
> WAL rate limit (it disregards the time taken to execute IO and CPU costs
> etc, and in this case the cost of other bandwidth limitations).  What
> I'm saying is that we ought to instead specify an WAL rate in bytes/sec
> and *only* sleep once we've exceeded it for a time period (with some
> optimizations, so we don't gettimeofday after every XLogInsert(), but
> instead compute how many bytes later need to re-determine the time to
> see if we're still in the same 'granule').
> 

OK, I agree with that. That's mostly what I described in response to
Robert a while ago, I think. (If you've described that earlier in the
thread, I missed it.)

> Now, a non-toy implementation would probably would want to have a
> sliding window to avoid being overly bursty, and reduce the number of
> gettimeofday as mentioned above, but for explanation's sake basically
> imagine that at the "main loop" of an bulk xlog emitting command would
> invoke a helper with a a computation in pseudocode like:
> 
>     current_time = gettimeofday();
>     if (same_second(current_time, last_time))
>     {
>         wal_written_in_second += new_wal_written;
>         if (wal_written_in_second >= wal_write_limit_per_second)
>         {
>            double too_much = (wal_written_in_second - 
> wal_write_limit_per_second);
>            sleep_fractional_seconds(too_much / wal_written_in_second);
> 
>            last_time = current_time;
>         }
>     }
>     else
>     {
>         last_time = current_time;
>     }
> 
> which'd mean that in contrast to your example we'd not continually sleep
> for WAL, we'd only do so if we actually exceeded (or are projected to
> exceed in a smarter implementation), the specified WAL write rate. As
> the 20ms sleeps from vacuum effectively reduce the WAL write rate, we'd
> correspondingly sleep less.
> 

Yes, that makes sense.

> 
> And my main point is that even if you implement a proper bytes/sec limit
> ONLY for WAL, the behaviour of VACUUM rate limiting doesn't get
> meaningfully more confusing than right now.
> 

So, why not to modify autovacuum to also use this approach? I wonder if
the situation there is more complicated because of multiple workers
sharing the same budget ...

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply via email to