Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-09 Thread Peter Geoghegan
On Wed, Mar 8, 2017 at 5:55 PM, Robert Haas wrote: > I like to err on the side of the approach that requires fewer changes. > That is, if the question is "does pg_restore need to treat this issue > specially?" and the answer is unclear, I like to assume it probably >

Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-08 Thread Robert Haas
On Wed, Mar 8, 2017 at 8:45 PM, Peter Geoghegan wrote: >> This part I'm not sure about. I think people care quite a lot about >> pg_restore speed, because they are often down when they're running it. >> And they may have oodles mode CPUs that parallel restore can use >> without

Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-08 Thread Peter Geoghegan
On Wed, Mar 8, 2017 at 5:33 PM, Robert Haas wrote: >> pg_restore will avoid parallelism (that will happen by setting >> "max_parallel_workers_maintenance = 0" when it runs), not because it >> cannot trust the cost model, but because it prefers to parallelize >> things its

Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-08 Thread Robert Haas
On Sun, Mar 5, 2017 at 7:14 PM, Peter Geoghegan wrote: > On Sat, Mar 4, 2017 at 2:15 PM, Peter Geoghegan wrote: >> So, I agree with Robert that we should actually use heap size for the >> main, initial determination of # of workers to use, but we still need >> to

Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-05 Thread Peter Geoghegan
On Sat, Mar 4, 2017 at 2:15 PM, Peter Geoghegan wrote: > So, I agree with Robert that we should actually use heap size for the > main, initial determination of # of workers to use, but we still need > to estimate the size of the final index [1], to let the cost model cap > the

Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-04 Thread Peter Geoghegan
On Sat, Mar 4, 2017 at 6:00 AM, Stephen Frost wrote: >> It is, but I was using that with index size, not table size. I can >> change it to be table size, based on what you said. But the workMem >> related cap, which probably won't end up being applied all that often >> in

Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-04 Thread Stephen Frost
Peter, * Peter Geoghegan (p...@bowt.ie) wrote: > On Sat, Mar 4, 2017 at 12:50 AM, Robert Haas wrote: > > If the result of > > compute_parallel_workers() based on min_parallel_table_scan_size is > > smaller, then use that value instead. I must be confused, because I > >

Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-04 Thread Peter Geoghegan
On Sat, Mar 4, 2017 at 12:50 AM, Robert Haas wrote: > If you think parallelism isn't worthwhile unless the sort was going to > be external anyway, I don't -- that's just when it starts to look like a safe bet that parallelism is worthwhile. There are quite a few cases

Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-04 Thread Robert Haas
On Sat, Mar 4, 2017 at 2:17 PM, Peter Geoghegan wrote: > On Sat, Mar 4, 2017 at 12:43 AM, Robert Haas wrote: >> Oh. But then I don't see why you need min_parallel_anything. That's >> just based on an estimate of the amount of data per worker vs. >>

Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-04 Thread Peter Geoghegan
On Sat, Mar 4, 2017 at 12:43 AM, Robert Haas wrote: > Oh. But then I don't see why you need min_parallel_anything. That's > just based on an estimate of the amount of data per worker vs. > maintenance_work_mem, isn't it? Yes -- and it's generally a pretty good estimate.

Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-04 Thread Robert Haas
On Sat, Mar 4, 2017 at 2:01 PM, Peter Geoghegan wrote: > On Sat, Mar 4, 2017 at 12:23 AM, Robert Haas wrote: >>> I guess that the workMem scaling threshold thing could be >>> min_parallel_index_scan_size, rather than min_parallel_relation_size >>> (which we

Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-04 Thread Peter Geoghegan
On Sat, Mar 4, 2017 at 12:23 AM, Robert Haas wrote: >> I guess that the workMem scaling threshold thing could be >> min_parallel_index_scan_size, rather than min_parallel_relation_size >> (which we now call min_parallel_table_scan_size)? > > No, it should be based on

Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-04 Thread Robert Haas
On Thu, Mar 2, 2017 at 10:38 PM, Peter Geoghegan wrote: > I'm glad. This justifies the lack of much of any "veto" on the > logarithmic scaling. The only thing that can do that is > max_parallel_workers_maintenance, the storage parameter > parallel_workers (maybe this isn't a storage

Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-02 Thread Peter Geoghegan
On Thu, Mar 2, 2017 at 5:50 AM, Robert Haas wrote: > On Wed, Mar 1, 2017 at 12:58 AM, Peter Geoghegan wrote: >> * This scales based on output size (projected index size), not input >> size (heap scan input). Apparently, that's what we always do right >> now.

Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-02 Thread Robert Haas
On Wed, Mar 1, 2017 at 12:58 AM, Peter Geoghegan wrote: > * This scales based on output size (projected index size), not input > size (heap scan input). Apparently, that's what we always do right > now. Actually, I'm not aware of any precedent for that. I'd just pass the heap size

[HACKERS] Cost model for parallel CREATE INDEX

2017-02-28 Thread Peter Geoghegan
There are a couple of open items for the parallel CREATE INDEX patch that at this point represent blockers to commit, IMV. The first is around a deficiency in the shared refcount mechanism, which is well understood and doesn't need to be rehashed on this thread. The second is the cost model, which