Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-09 Thread Peter Geoghegan
On Wed, Mar 8, 2017 at 5:55 PM, Robert Haas  wrote:
> I like to err on the side of the approach that requires fewer changes.
> That is, if the question is "does pg_restore need to treat this issue
> specially?" and the answer is unclear, I like to assume it probably
> doesn't until some contrary evidence emerges.
>
> I mean, sometimes it is clear that you are going to need special
> handling someplace, and then you have to do it.  But I don't see that
> this is one of those cases, necessarily.

That's what I'll do, then.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-08 Thread Robert Haas
On Wed, Mar 8, 2017 at 8:45 PM, Peter Geoghegan  wrote:
>> This part I'm not sure about.  I think people care quite a lot about
>> pg_restore speed, because they are often down when they're running it.
>> And they may have oodles mode CPUs that parallel restore can use
>> without help from parallel query.  I would be inclined to leave
>> pg_restore alone and let the chips fall where they may.
>
> I thought that we might want to err on the side of preserving the
> existing behavior, but arguably that's actually what I failed to do.
> That is, since we don't currently have a pg_restore flag that controls
> the maintenance_work_mem used by pg_restore, "let the chips fall where
> they may" is arguably the standard that I didn't uphold.
>
> It might still make sense to take a leaf out of the parallel query
> book on this question. That is, add an open item along the lines of
> "review behavior of pg_restore with parallel CREATE INDEX" that we
> plan to deal with close to the release of Postgres 10.0, when feedback
> from beta testing is in. There are a number of options, none of which
> are difficult to write code for. The hard part is determining what
> makes most sense for users on balance.

I like to err on the side of the approach that requires fewer changes.
That is, if the question is "does pg_restore need to treat this issue
specially?" and the answer is unclear, I like to assume it probably
doesn't until some contrary evidence emerges.

I mean, sometimes it is clear that you are going to need special
handling someplace, and then you have to do it.  But I don't see that
this is one of those cases, necessarily.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-08 Thread Peter Geoghegan
On Wed, Mar 8, 2017 at 5:33 PM, Robert Haas  wrote:
>> pg_restore will avoid parallelism (that will happen by setting
>> "max_parallel_workers_maintenance  = 0" when it runs), not because it
>> cannot trust the cost model, but because it prefers to parallelize
>> things its own way (with multiple restore jobs), and because execution
>> speed may not be the top priority for pg_restore, unlike a live
>> production system.
>
> This part I'm not sure about.  I think people care quite a lot about
> pg_restore speed, because they are often down when they're running it.
> And they may have oodles mode CPUs that parallel restore can use
> without help from parallel query.  I would be inclined to leave
> pg_restore alone and let the chips fall where they may.

I thought that we might want to err on the side of preserving the
existing behavior, but arguably that's actually what I failed to do.
That is, since we don't currently have a pg_restore flag that controls
the maintenance_work_mem used by pg_restore, "let the chips fall where
they may" is arguably the standard that I didn't uphold.

It might still make sense to take a leaf out of the parallel query
book on this question. That is, add an open item along the lines of
"review behavior of pg_restore with parallel CREATE INDEX" that we
plan to deal with close to the release of Postgres 10.0, when feedback
from beta testing is in. There are a number of options, none of which
are difficult to write code for. The hard part is determining what
makes most sense for users on balance.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-08 Thread Robert Haas
On Sun, Mar 5, 2017 at 7:14 PM, Peter Geoghegan  wrote:
> On Sat, Mar 4, 2017 at 2:15 PM, Peter Geoghegan  wrote:
>> So, I agree with Robert that we should actually use heap size for the
>> main, initial determination of # of workers to use, but we still need
>> to estimate the size of the final index [1], to let the cost model cap
>> the initial determination when maintenance_work_mem is just too low.
>> (This cap will rarely be applied in practice, as I said.)
>>
>> [1] 
>> https://wiki.postgresql.org/wiki/Parallel_External_Sort#bt_estimated_nblocks.28.29_function_in_pageinspect
>
> Having looked at it some more, this no longer seems worthwhile. In the
> next revision, I will add a backstop that limits the use of
> parallelism based on a lack of maintenance_work_mem in a simpler
> manner. Namely, the worker will have to be left with a
> maintenance_work_mem/nworkers share of no less than 32MB in order for
> parallel CREATE INDEX to proceed. There doesn't seem to be any great
> reason to bring the volume of data to be sorted into it.

+1.

> I expect the cost model to be significantly simplified in the next
> revision in other ways, too. There will be no new index storage
> parameter, nor a disable_parallelddl GUC. compute_parallel_worker()
> will be called in a fairly straightforward way within
> plan_create_index_workers(), using heap blocks, as agreed to already.

+1.

> pg_restore will avoid parallelism (that will happen by setting
> "max_parallel_workers_maintenance  = 0" when it runs), not because it
> cannot trust the cost model, but because it prefers to parallelize
> things its own way (with multiple restore jobs), and because execution
> speed may not be the top priority for pg_restore, unlike a live
> production system.

This part I'm not sure about.  I think people care quite a lot about
pg_restore speed, because they are often down when they're running it.
And they may have oodles mode CPUs that parallel restore can use
without help from parallel query.  I would be inclined to leave
pg_restore alone and let the chips fall where they may.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-05 Thread Peter Geoghegan
On Sat, Mar 4, 2017 at 2:15 PM, Peter Geoghegan  wrote:
> So, I agree with Robert that we should actually use heap size for the
> main, initial determination of # of workers to use, but we still need
> to estimate the size of the final index [1], to let the cost model cap
> the initial determination when maintenance_work_mem is just too low.
> (This cap will rarely be applied in practice, as I said.)
>
> [1] 
> https://wiki.postgresql.org/wiki/Parallel_External_Sort#bt_estimated_nblocks.28.29_function_in_pageinspect

Having looked at it some more, this no longer seems worthwhile. In the
next revision, I will add a backstop that limits the use of
parallelism based on a lack of maintenance_work_mem in a simpler
manner. Namely, the worker will have to be left with a
maintenance_work_mem/nworkers share of no less than 32MB in order for
parallel CREATE INDEX to proceed. There doesn't seem to be any great
reason to bring the volume of data to be sorted into it.

I expect the cost model to be significantly simplified in the next
revision in other ways, too. There will be no new index storage
parameter, nor a disable_parallelddl GUC. compute_parallel_worker()
will be called in a fairly straightforward way within
plan_create_index_workers(), using heap blocks, as agreed to already.
pg_restore will avoid parallelism (that will happen by setting
"max_parallel_workers_maintenance  = 0" when it runs), not because it
cannot trust the cost model, but because it prefers to parallelize
things its own way (with multiple restore jobs), and because execution
speed may not be the top priority for pg_restore, unlike a live
production system.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-04 Thread Peter Geoghegan
On Sat, Mar 4, 2017 at 6:00 AM, Stephen Frost  wrote:
>> It is, but I was using that with index size, not table size. I can
>> change it to be table size, based on what you said. But the workMem
>> related cap, which probably won't end up being applied all that often
>> in practice, *should* still do something with projected index size,
>> since that really is what we're sorting, which could be very different
>> (e.g. with partial indexes).
>
> Isn't that always going to be very different, unless you're creating a
> single index across every column in the table..?  Or perhaps I've
> misunderstood what you're comparing as being 'very different' in your
> last sentence.

I mean: though a primary key index or similar is smaller than the
table by maybe 5X, they are still generally within an order of
magnitude. Given that the number of workers is determined at
logarithmic intervals, it may not actually matter that much whether
the scaling is based on heap size (input size) or index size (output
size), at a very high level. Despite a 5X difference. I'm referring to
the initial determination of the number of workers to be used, based
on the scan the parallel CREATE INDEX has to do. So, I'm happy to go
along with Robert's suggestion for V9, and have this number determined
based on heap input size rather than index output size. It's good to
be consistent with what we do for parallel seq scan (care about input
size), and it probably won't change things by much anyway. This is
generally the number that the cost model will end up going with, in
practice.

However, we then need to consider that since maintenance_work_mem is
doled out as maintenance_work_mem/nworkers slices for parallel CREATE
INDEX, there is a sensitivity to how much memory is left per worker as
workers are added. This clear needs to be based on projected/estimated
index size (output size), since that is what is being sorted, and
because partial indexes imply that the size of the index could be
*vastly* less than heap input size with still-sensible use of the
feature. This will be applied as a cap on the first number.

So, I agree with Robert that we should actually use heap size for the
main, initial determination of # of workers to use, but we still need
to estimate the size of the final index [1], to let the cost model cap
the initial determination when maintenance_work_mem is just too low.
(This cap will rarely be applied in practice, as I said.)

[1] 
https://wiki.postgresql.org/wiki/Parallel_External_Sort#bt_estimated_nblocks.28.29_function_in_pageinspect
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-04 Thread Stephen Frost
Peter,

* Peter Geoghegan (p...@bowt.ie) wrote:
> On Sat, Mar 4, 2017 at 12:50 AM, Robert Haas  wrote:
> > If the result of
> > compute_parallel_workers() based on min_parallel_table_scan_size is
> > smaller, then use that value instead.  I must be confused, because I
> > actually though that was the exact algorithm you were describing, and
> > it sounded good to me.
> 
> It is, but I was using that with index size, not table size. I can
> change it to be table size, based on what you said. But the workMem
> related cap, which probably won't end up being applied all that often
> in practice, *should* still do something with projected index size,
> since that really is what we're sorting, which could be very different
> (e.g. with partial indexes).

Isn't that always going to be very different, unless you're creating a
single index across every column in the table..?  Or perhaps I've
misunderstood what you're comparing as being 'very different' in your
last sentence.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-04 Thread Peter Geoghegan
On Sat, Mar 4, 2017 at 12:50 AM, Robert Haas  wrote:
> If you think parallelism isn't worthwhile unless the sort was going to
> be external anyway,

I don't -- that's just when it starts to look like a safe bet that
parallelism is worthwhile. There are quite a few cases where an
external sort is faster than an internal sort these days, actually.

> then it seems like the obvious thing to do is
> divide the projected size of the sort by maintenance_work_mem, round
> down, and cap the number of workers to the result.

I'm sorry, I don't follow.

> If the result of
> compute_parallel_workers() based on min_parallel_table_scan_size is
> smaller, then use that value instead.  I must be confused, because I
> actually though that was the exact algorithm you were describing, and
> it sounded good to me.

It is, but I was using that with index size, not table size. I can
change it to be table size, based on what you said. But the workMem
related cap, which probably won't end up being applied all that often
in practice, *should* still do something with projected index size,
since that really is what we're sorting, which could be very different
(e.g. with partial indexes).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-04 Thread Robert Haas
On Sat, Mar 4, 2017 at 2:17 PM, Peter Geoghegan  wrote:
> On Sat, Mar 4, 2017 at 12:43 AM, Robert Haas  wrote:
>> Oh.  But then I don't see why you need min_parallel_anything.  That's
>> just based on an estimate of the amount of data per worker vs.
>> maintenance_work_mem, isn't it?
>
> Yes -- and it's generally a pretty good estimate.
>
> I don't really know what minimum amount of memory to insist workers
> have, which is why I provisionally chose one of those GUCs as the
> threshold.
>
> Any better ideas?

I don't understand how min_parallel_anything is telling you anything
about memory.  It has, in general, nothing to do with that.

If you think parallelism isn't worthwhile unless the sort was going to
be external anyway, then it seems like the obvious thing to do is
divide the projected size of the sort by maintenance_work_mem, round
down, and cap the number of workers to the result.  If the result of
compute_parallel_workers() based on min_parallel_table_scan_size is
smaller, then use that value instead.  I must be confused, because I
actually though that was the exact algorithm you were describing, and
it sounded good to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-04 Thread Peter Geoghegan
On Sat, Mar 4, 2017 at 12:43 AM, Robert Haas  wrote:
> Oh.  But then I don't see why you need min_parallel_anything.  That's
> just based on an estimate of the amount of data per worker vs.
> maintenance_work_mem, isn't it?

Yes -- and it's generally a pretty good estimate.

I don't really know what minimum amount of memory to insist workers
have, which is why I provisionally chose one of those GUCs as the
threshold.

Any better ideas?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-04 Thread Robert Haas
On Sat, Mar 4, 2017 at 2:01 PM, Peter Geoghegan  wrote:
> On Sat, Mar 4, 2017 at 12:23 AM, Robert Haas  wrote:
>>> I guess that the workMem scaling threshold thing could be
>>> min_parallel_index_scan_size, rather than min_parallel_relation_size
>>> (which we now call min_parallel_table_scan_size)?
>>
>> No, it should be based on min_parallel_table_scan_size, because that
>> is the size of the parallel heap scan that will be done as input to
>> the sort.
>
> I'm talking about the extra thing we do to prevent parallelism from
> being used when per-worker workMem is excessively low. That has much
> more to do with projected index size than current heap size.

Oh.  But then I don't see why you need min_parallel_anything.  That's
just based on an estimate of the amount of data per worker vs.
maintenance_work_mem, isn't it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-04 Thread Peter Geoghegan
On Sat, Mar 4, 2017 at 12:23 AM, Robert Haas  wrote:
>> I guess that the workMem scaling threshold thing could be
>> min_parallel_index_scan_size, rather than min_parallel_relation_size
>> (which we now call min_parallel_table_scan_size)?
>
> No, it should be based on min_parallel_table_scan_size, because that
> is the size of the parallel heap scan that will be done as input to
> the sort.

I'm talking about the extra thing we do to prevent parallelism from
being used when per-worker workMem is excessively low. That has much
more to do with projected index size than current heap size.

I agree with everything else you've said, I think.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-04 Thread Robert Haas
On Thu, Mar 2, 2017 at 10:38 PM, Peter Geoghegan  wrote:
> I'm glad. This justifies the lack of much of any "veto" on the
> logarithmic scaling. The only thing that can do that is
> max_parallel_workers_maintenance, the storage parameter
> parallel_workers (maybe this isn't a storage parameter in V9), and
> insufficient maintenance_work_mem per worker (as judged by
> min_parallel_relation_size being greater than workMem per worker).
>
> I guess that the workMem scaling threshold thing could be
> min_parallel_index_scan_size, rather than min_parallel_relation_size
> (which we now call min_parallel_table_scan_size)?

No, it should be based on min_parallel_table_scan_size, because that
is the size of the parallel heap scan that will be done as input to
the sort.

>> I think it's totally counter-intuitive that any hypothetical index
>> storage parameter would affect the degree of parallelism involved in
>> creating the index and also the degree of parallelism involved in
>> scanning it.  Whether or not other systems do such crazy things seems
>> to me to beside the point.  I think if CREATE INDEX allows an explicit
>> specification of the degree of parallelism (a decision I would favor)
>> it should have a syntactically separate place for unsaved build
>> options vs. persistent storage parameters.
>
> I can see both sides of it.
>
> On the one hand, it's weird that you might have query performance
> adversely affected by what you thought was a storage parameter that
> only affected the index build. On the other hand, it's useful that you
> retain that as a parameter, because you may want to periodically
> REINDEX, or have a way of ensuring that pg_restore does go on to use
> parallelism, since it generally won't otherwise. (As mentioned
> already, pg_restore does not trust the cost model due to issues with
> the availability of statistics).

If you make the changes I'm proposing above, this parenthetical issue
goes away, because the only statistic you need is the table size,
which is what it is.  As to the rest, I think a bare REINDEX should
just use the cost model as if it were CREATE INDEX, and if you want to
override that behavior, you can do that by explicit syntax.  I see
very little utility for a setting that fixes the number of workers to
be used for future reindexes: there won't be many of them, and it's
kinda confusing.  But even if we decide to have that, I see no
justification at all for conflating it with the number of workers to
be used for a scan, which is something else altogether.

> To be clear, I don't have any strong feelings on all this. I just
> think it's worth pointing out that there are reasons to not do what
> you suggest, that you might want to consider if you haven't already.

I have considered them.  I also acknowledge that other people may view
the situation differently than I do.  I'm just telling you my opinion
on the topic.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-02 Thread Peter Geoghegan
On Thu, Mar 2, 2017 at 5:50 AM, Robert Haas  wrote:
> On Wed, Mar 1, 2017 at 12:58 AM, Peter Geoghegan  wrote:
>> * This scales based on output size (projected index size), not input
>> size (heap scan input). Apparently, that's what we always do right
>> now.
>
> Actually, I'm not aware of any precedent for that. I'd just pass the
> heap size to compute_parallel_workers(), leaving the index size as 0,
> and call it good.  What you're doing now seems exactly backwards from
> parallel query generally.

Sorry, that's what I meant.

>> So, the main factor that
>> discourages parallel sequential scans doesn't really exist for
>> parallel CREATE INDEX.
>
> Agreed.

I'm glad. This justifies the lack of much of any "veto" on the
logarithmic scaling. The only thing that can do that is
max_parallel_workers_maintenance, the storage parameter
parallel_workers (maybe this isn't a storage parameter in V9), and
insufficient maintenance_work_mem per worker (as judged by
min_parallel_relation_size being greater than workMem per worker).

I guess that the workMem scaling threshold thing could be
min_parallel_index_scan_size, rather than min_parallel_relation_size
(which we now call min_parallel_table_scan_size)?

In general, I would expect this to leave most CREATE INDEX statements
with a parallel plan in the real world, using exactly the number of
workers indicated by the logarithmic scaling. (pg_restore would also
not use parallelism, because it's specially disabled -- you have to
have set the storage param at some point.)

>> We could always defer the cost model to another release, and only
>> support the storage parameter for now, though that has disadvantages,
>> some less obvious [4].
>
> I think it's totally counter-intuitive that any hypothetical index
> storage parameter would affect the degree of parallelism involved in
> creating the index and also the degree of parallelism involved in
> scanning it.  Whether or not other systems do such crazy things seems
> to me to beside the point.  I think if CREATE INDEX allows an explicit
> specification of the degree of parallelism (a decision I would favor)
> it should have a syntactically separate place for unsaved build
> options vs. persistent storage parameters.

I can see both sides of it.

On the one hand, it's weird that you might have query performance
adversely affected by what you thought was a storage parameter that
only affected the index build. On the other hand, it's useful that you
retain that as a parameter, because you may want to periodically
REINDEX, or have a way of ensuring that pg_restore does go on to use
parallelism, since it generally won't otherwise. (As mentioned
already, pg_restore does not trust the cost model due to issues with
the availability of statistics).

There are reports on Google of users of these other systems being
confused by all this, and I don't think that it's any different there
(those other systems don't treat parallel_workers style storage
parameter much different for the purposes of index scans, or anything
like that). I agree that that isn't very user friendly.

In theory, having two index storage parameters solves our problem. I
don't like that either, though, since it creates a whole new problem.

To be clear, I don't have any strong feelings on all this. I just
think it's worth pointing out that there are reasons to not do what
you suggest, that you might want to consider if you haven't already.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-02 Thread Robert Haas
On Wed, Mar 1, 2017 at 12:58 AM, Peter Geoghegan  wrote:
> * This scales based on output size (projected index size), not input
> size (heap scan input). Apparently, that's what we always do right
> now.

Actually, I'm not aware of any precedent for that. I'd just pass the
heap size to compute_parallel_workers(), leaving the index size as 0,
and call it good.  What you're doing now seems exactly backwards from
parallel query generally.

> So, the main factor that
> discourages parallel sequential scans doesn't really exist for
> parallel CREATE INDEX.

Agreed.

> We could always defer the cost model to another release, and only
> support the storage parameter for now, though that has disadvantages,
> some less obvious [4].

I think it's totally counter-intuitive that any hypothetical index
storage parameter would affect the degree of parallelism involved in
creating the index and also the degree of parallelism involved in
scanning it.  Whether or not other systems do such crazy things seems
to me to beside the point.  I think if CREATE INDEX allows an explicit
specification of the degree of parallelism (a decision I would favor)
it should have a syntactically separate place for unsaved build
options vs. persistent storage parameters.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Cost model for parallel CREATE INDEX

2017-02-28 Thread Peter Geoghegan
There are a couple of open items for the parallel CREATE INDEX patch
that at this point represent blockers to commit, IMV. The first is
around a deficiency in the shared refcount mechanism, which is well
understood and doesn't need to be rehashed on this thread. The second
is the cost model, which is what I want to talk about here.

Currently, the cost model scales the number of workers at logarithmic
intervals, in the style of compute_parallel_worker(), but without
considering heap pages (that's actually broken out, for now). I'm
calling a new function that lives in planner.c, right next to
plan_cluster_use_sort(). ISTM that we ought to be considering both
heap pages and indexes pages, which makes the new signature of
compute_parallel_worker() (which now anticipates the needs of parallel
index scans) interesting to me, so that's something that I need to
address.

Right now, as of V8, we:

0. See if the parallel_workers *index* storage parameter is set (I've
added this new storage parameter, but I think that Amit has or will
add the same new index storage parameter for parallel index scan [1]).

1. Estimate/project the size of the finished index, using a new
nbtpage.c function. Note that this does the right thing with partial
indexes and things like that. It uses pg_statistic stats, where
available.

(My testing patch 0002-* has had an SQL-callable function that lets
reviewers easily determine what the projected size of some existing
index is, which might be a good idea to polish up and include as a
general purpose tool, apropos of nothing -- it is typically very
accurate [2]).

2. Input that size into the compute_parallel_worker()-style
logarithmic scaling of number of workers.

3. Calculate how many workers will have at least a full
maintenance_work_mem share doled out, while still having at least
min_parallel_relation_size of workMem in tuplesort.c.

(I guess I should say min_parallel_table_scan_size or even
min_parallel_index_scan_size now, but whatever).

4. Return the minimum worker calculation from either one of steps 3 and 4.

So, a low maintenance_work_mem may cap our original suggested number
of workers. This cap isn't particularly likely to be applied, though.
Note also that the max_parallel_workers_maintenance GUC is given the
opportunity to cap things off. This is the utility statement
equivalent of max_parallel_workers_per_gather.

Issues with this:

* This scales based on output size (projected index size), not input
size (heap scan input). Apparently, that's what we always do right
now.

* This is dissimilar to how we cost parallel seq scans. There, the
only cost associated with going with a parallel access path is fixed
startup overheads, and IPC costs (paid in parallel_tuple_cost units).
So, we're not doing a comparison against a serial and parallel plan,
even though we might want to, especially because parallel CREATE INDEX
always uses temp files, unlike serial CREATE INDEX. cost_sort() is
never involved in any of this, and in any case isn't prepared to cost
parallel sorts right now.

* OTOH, there is less sense in doing the equivalent of charging for
IPC overhead that something like a Gather node incurs costs for during
planning, because to some degree the IPC involved is inherently
necessary. If you were going to get an external sort anyway, well,
that still involves temp files that are written to and read from.
Whether or not it's the same backend that does the writing as the
reading in all cases may matter very little. So, the main factor that
discourages parallel sequential scans doesn't really exist for
parallel CREATE INDEX.

I am tempted to move to something closer to what you see elsewhere,
were a serial path and partial path are both created. This would make
some sense for parallel CREATE INDEX, because you happen to have the
issue of parallelism effectively forcing an external sort. But
otherwise, it wouldn't make that much sense, because parallelism is
always going to help up to the point that all cores are in use, or at
least not hurt. Testing shows this to be the case. It's not as if
there are obvious IPC costs that push things against parallelism. This
is generally good, because the danger of using too many workers is
much less pronounced -- it's demonstrably very small, especially if
you assume a baseline of a serial external sort based CREATE INDEX.

What direction does the cost model need to go in?

I still lean towards the approach V8 takes, though the scaling should
possibly use heap pages and index pages with
compute_parallel_worker(), while not worrying about the input/output
distinction. It's not very appealing to have to teach cost_sort()
about any of this, since the things that considers currently are hard
to map onto parallelism. Besides, it's not as if there is a world of
difference between a serial internal sort CREATE INDEX, and a parallel
external sort CREATE INDEX with lots of memory. It's not a potentially
very large difference, as we see with the sort