On Sun, Jan 5, 2020 at 7:26 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Sun, Jan 5, 2020 at 6:40 AM Tomas Vondra > <tomas.von...@2ndquadrant.com> wrote: > > > > On Sun, Jan 05, 2020 at 08:54:15AM +0900, Masahiko Sawada wrote: > > >On Thu, Jan 2, 2020 at 9:09 PM Amit Kapila <amit.kapil...@gmail.com> > > >wrote: > > >> > > >> Hi, > > >> > > >> I am starting a new thread for some of the decisions for a parallel > > >> vacuum in the hope to get feedback from more people. There are > > >> mainly two points for which we need some feedback. > > >> > > >> 1. Tomas Vondra has pointed out on the main thread [1] that by > > >> default the parallel vacuum should be enabled similar to what we do > > >> for Create Index. As proposed, the patch enables it only when the > > >> user specifies it (ex. Vacuum (Parallel 2) <tbl_name>;). One of the > > >> arguments in favor of enabling it by default as mentioned by Tomas is > > >> "It's pretty much the same thing we did with vacuum throttling - it's > > >> disabled for explicit vacuum by default, but you can enable it. If > > >> you're worried about VACUUM causing issues, you should set cost > > >> delay.". Some of the arguments against enabling it are that it will > > >> lead to use of more resources (like CPU, I/O) which users might or > > >> might like. > > >> > > > > > >I'm a bit wary of making parallel vacuum enabled by default. Single > > >process vacuum does sequential reads/writes on most of indexes but > > >parallel vacuum does random access random reads/writes. I've tested > > >parallel vacuum on HDD and confirmed the performance is good but I'm > > >concerned that it might be cause of more disk I/O than user expected. > > > > > > > I understand the concern, but it's not clear to me why to apply this > > defensive approach just to vacuum and not to all commands. Especially > > when we do have a way to throttle vacuum (unlike pretty much any other > > command) if I/O really is a scarce resource. > > > > As the vacuum workers are separate processes, each generating requests > > with a sequential pattern, so I'd expect readahead to kick in and keep > > the efficiency of sequential access pattern. > > > > Right, I also think so.
Okay I understand. > > > >> Now, if we want to enable it by default, we need a way to disable it > > >> as well and along with that, we need a way for users to specify a > > >> parallel degree. I have mentioned a few reasons why we need a > > >> parallel degree for this operation in the email [2] on the main > > >> thread. > > >> > > >> If parallel vacuum is *not* enabled by default, then I think the > > >> current way to enable is fine which is as follows: Vacuum (Parallel > > >> 2) <tbl_name>; > > >> > > >> Here, if the user doesn't specify parallel_degree, then we internally > > >> decide based on number of indexes that support a parallel vacuum with > > >> a maximum of max_parallel_maintenance_workers. > > >> > > >> If the parallel vacuum is enabled by default, then I could think of > > >> the following ways: > > >> > > >> (a) Vacuum (disable_parallel) <tbl_name>; Vacuum (Parallel > > >> <parallel_degree>) <tbl_name>; > > >> > > >> (b) Vacuum (Parallel <parallel_degree>) <tbl_name>; If user > > >> specifies parallel_degree as 0, then disable parallelism. > > >> > > >> (c) ... Any better ideas? > > >> > > > > > >If parallel vacuum is enabled by default, I would prefer (b) but I > > >don't think it's a good idea to accept 0 as parallel degree. If we want > > >to disable parallel vacuum we should max_parallel_maintenance_workers > > >to 0 instead. > > > > > > > IMO that just makes the interaction between vacuum options and the GUC > > even more complicated/confusing. > > > > Yeah, I am also not sure if that will be a good idea. > > > If we want to have a vacuum option to determine parallel degree, we > > should probably have a vacuum option to disable parallelism using just a > > vacuum option. I don't think 0 is too bad, and disable_parallel seems a > > bit awkward. Maybe we could use NOPARALLEL (in addition to PARALLEL n). > > That's what Oracle does, so it's not entirely without a precedent. > > > > We can go either way (using 0 for parallel to indicate disable > parallelism or by introducing a new option like NOPARALLEL). I think > initially we can avoid introducing more options and just go with > 'Parallel 0' and if we find a lot of people find it inconvenient, then > we can always introduce a new option later. Hmm I'm confused. Specifying NOPARALLEL or PARALLEL 0 is the same as setting max_parallel_maintenance_workers to 0, right? We normally set max_parallel_workers_per_gather to 0 to disable parallel queries on a query. So I think that disabling parallel vacuum by setting max_parallel_maintenance_workers to 0 is the same concept. Regarding proposed two options we already have storage parameter parallel_workers and it accepts 0 but PARALLEL 0 looks like contradicted at a glance. And NOPARALLEL is inconsistent with existing DISABLE_XXX options and it's a bit awkward to specify like (NOPARALLEL off). Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services