On Sun, Jan 05, 2020 at 08:54:15AM +0900, Masahiko Sawada wrote:
On Thu, Jan 2, 2020 at 9:09 PM Amit Kapila <amit.kapil...@gmail.com>
wrote:
Hi,
I am starting a new thread for some of the decisions for a parallel
vacuum in the hope to get feedback from more people. There are
mainly two points for which we need some feedback.
1. Tomas Vondra has pointed out on the main thread [1] that by
default the parallel vacuum should be enabled similar to what we do
for Create Index. As proposed, the patch enables it only when the
user specifies it (ex. Vacuum (Parallel 2) <tbl_name>;). One of the
arguments in favor of enabling it by default as mentioned by Tomas is
"It's pretty much the same thing we did with vacuum throttling - it's
disabled for explicit vacuum by default, but you can enable it. If
you're worried about VACUUM causing issues, you should set cost
delay.". Some of the arguments against enabling it are that it will
lead to use of more resources (like CPU, I/O) which users might or
might like.
I'm a bit wary of making parallel vacuum enabled by default. Single
process vacuum does sequential reads/writes on most of indexes but
parallel vacuum does random access random reads/writes. I've tested
parallel vacuum on HDD and confirmed the performance is good but I'm
concerned that it might be cause of more disk I/O than user expected.
I understand the concern, but it's not clear to me why to apply this
defensive approach just to vacuum and not to all commands. Especially
when we do have a way to throttle vacuum (unlike pretty much any other
command) if I/O really is a scarce resource.
As the vacuum workers are separate processes, each generating requests
with a sequential pattern, so I'd expect readahead to kick in and keep
the efficiency of sequential access pattern.
Now, if we want to enable it by default, we need a way to disable it
as well and along with that, we need a way for users to specify a
parallel degree. I have mentioned a few reasons why we need a
parallel degree for this operation in the email [2] on the main
thread.
If parallel vacuum is *not* enabled by default, then I think the
current way to enable is fine which is as follows: Vacuum (Parallel
2) <tbl_name>;
Here, if the user doesn't specify parallel_degree, then we internally
decide based on number of indexes that support a parallel vacuum with
a maximum of max_parallel_maintenance_workers.
If the parallel vacuum is enabled by default, then I could think of
the following ways:
(a) Vacuum (disable_parallel) <tbl_name>; Vacuum (Parallel
<parallel_degree>) <tbl_name>;
(b) Vacuum (Parallel <parallel_degree>) <tbl_name>; If user
specifies parallel_degree as 0, then disable parallelism.
(c) ... Any better ideas?
If parallel vacuum is enabled by default, I would prefer (b) but I
don't think it's a good idea to accept 0 as parallel degree. If we want
to disable parallel vacuum we should max_parallel_maintenance_workers
to 0 instead.
IMO that just makes the interaction between vacuum options and the GUC
even more complicated/confusing.
If we want to have a vacuum option to determine parallel degree, we
should probably have a vacuum option to disable parallelism using just a
vacuum option. I don't think 0 is too bad, and disable_parallel seems a
bit awkward. Maybe we could use NOPARALLEL (in addition to PARALLEL n).
That's what Oracle does, so it's not entirely without a precedent.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services