>> 2. the leader being interrupted while waiting is also already happening on 
>> master
>> due to the pgstat_progress_parallel_incr_param() calls in
>> parallel_vacuum_process_one_index() (that have been added in
>> 46ebdfe164). It has been the case "only" 36 times during my test case.

46ebdfe164 will interrupt the leaders sleep every time a parallel workers 
reports
progress, and we currently don't handle interrupts by restarting the sleep with
the remaining time. nanosleep does provide the ability to restart with the 
remaining
time [1], but I don't think it's worth the effort to ensure more accurate
vacuum delays for the leader process. 


> 1. Having a time based only approach to throttle 

I do agree with a time based approach overall.


> 1.1) the more parallel workers is used, the less the impact of the leader on
> the vacuum index phase duration/workload is (because the repartition is done
> on more processes).

Did you mean " because the vacuum is done on more processes"? 

When a leader is operating on a large index(s) during the entirety
of the vacuum operation, wouldn't more parallel workers end up
interrupting the leader more often? This is why I think reporting even more
often than 1 second (more below) will be better.

> 3. A 1 second reporting "throttling" looks a reasonable threshold as:

> 3.1 the idea is to have a significant impact when the leader could have been
> interrupted say hundred/thousand times per second.

> 3.2 it does not make that much sense for any tools to sample 
> pg_stat_progress_vacuum
> multiple times per second (so a one second reporting granularity seems ok).

I feel 1 second may still be too frequent. 
What about 10 seconds ( or 30 seconds )? 
I think this metric in particular will be mainly useful for vacuum runs that 
are 
running for minutes or more, making reporting every 10 or 30 seconds 
still useful.

It just occurred to me also that pgstat_progress_parallel_incr_param 
should have a code comment that it will interrupt a leader process and
cause activity such as a sleep to end early.



Regards,

Sami Imseih
Amazon Web Services (AWS)


[1] https://man7.org/linux/man-pages/man2/nanosleep.2.html


Reply via email to