Re: [HACKERS] Cost model for parallel CREATE INDEX
On Wed, Mar 8, 2017 at 5:55 PM, Robert Haaswrote: > I like to err on the side of the approach that requires fewer changes. > That is, if the question is "does pg_restore need to treat this issue > specially?" and the answer is unclear, I like to assume it probably > doesn't until some contrary evidence emerges. > > I mean, sometimes it is clear that you are going to need special > handling someplace, and then you have to do it. But I don't see that > this is one of those cases, necessarily. That's what I'll do, then. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost model for parallel CREATE INDEX
On Wed, Mar 8, 2017 at 8:45 PM, Peter Geogheganwrote: >> This part I'm not sure about. I think people care quite a lot about >> pg_restore speed, because they are often down when they're running it. >> And they may have oodles mode CPUs that parallel restore can use >> without help from parallel query. I would be inclined to leave >> pg_restore alone and let the chips fall where they may. > > I thought that we might want to err on the side of preserving the > existing behavior, but arguably that's actually what I failed to do. > That is, since we don't currently have a pg_restore flag that controls > the maintenance_work_mem used by pg_restore, "let the chips fall where > they may" is arguably the standard that I didn't uphold. > > It might still make sense to take a leaf out of the parallel query > book on this question. That is, add an open item along the lines of > "review behavior of pg_restore with parallel CREATE INDEX" that we > plan to deal with close to the release of Postgres 10.0, when feedback > from beta testing is in. There are a number of options, none of which > are difficult to write code for. The hard part is determining what > makes most sense for users on balance. I like to err on the side of the approach that requires fewer changes. That is, if the question is "does pg_restore need to treat this issue specially?" and the answer is unclear, I like to assume it probably doesn't until some contrary evidence emerges. I mean, sometimes it is clear that you are going to need special handling someplace, and then you have to do it. But I don't see that this is one of those cases, necessarily. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost model for parallel CREATE INDEX
On Wed, Mar 8, 2017 at 5:33 PM, Robert Haaswrote: >> pg_restore will avoid parallelism (that will happen by setting >> "max_parallel_workers_maintenance = 0" when it runs), not because it >> cannot trust the cost model, but because it prefers to parallelize >> things its own way (with multiple restore jobs), and because execution >> speed may not be the top priority for pg_restore, unlike a live >> production system. > > This part I'm not sure about. I think people care quite a lot about > pg_restore speed, because they are often down when they're running it. > And they may have oodles mode CPUs that parallel restore can use > without help from parallel query. I would be inclined to leave > pg_restore alone and let the chips fall where they may. I thought that we might want to err on the side of preserving the existing behavior, but arguably that's actually what I failed to do. That is, since we don't currently have a pg_restore flag that controls the maintenance_work_mem used by pg_restore, "let the chips fall where they may" is arguably the standard that I didn't uphold. It might still make sense to take a leaf out of the parallel query book on this question. That is, add an open item along the lines of "review behavior of pg_restore with parallel CREATE INDEX" that we plan to deal with close to the release of Postgres 10.0, when feedback from beta testing is in. There are a number of options, none of which are difficult to write code for. The hard part is determining what makes most sense for users on balance. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost model for parallel CREATE INDEX
On Sun, Mar 5, 2017 at 7:14 PM, Peter Geogheganwrote: > On Sat, Mar 4, 2017 at 2:15 PM, Peter Geoghegan wrote: >> So, I agree with Robert that we should actually use heap size for the >> main, initial determination of # of workers to use, but we still need >> to estimate the size of the final index [1], to let the cost model cap >> the initial determination when maintenance_work_mem is just too low. >> (This cap will rarely be applied in practice, as I said.) >> >> [1] >> https://wiki.postgresql.org/wiki/Parallel_External_Sort#bt_estimated_nblocks.28.29_function_in_pageinspect > > Having looked at it some more, this no longer seems worthwhile. In the > next revision, I will add a backstop that limits the use of > parallelism based on a lack of maintenance_work_mem in a simpler > manner. Namely, the worker will have to be left with a > maintenance_work_mem/nworkers share of no less than 32MB in order for > parallel CREATE INDEX to proceed. There doesn't seem to be any great > reason to bring the volume of data to be sorted into it. +1. > I expect the cost model to be significantly simplified in the next > revision in other ways, too. There will be no new index storage > parameter, nor a disable_parallelddl GUC. compute_parallel_worker() > will be called in a fairly straightforward way within > plan_create_index_workers(), using heap blocks, as agreed to already. +1. > pg_restore will avoid parallelism (that will happen by setting > "max_parallel_workers_maintenance = 0" when it runs), not because it > cannot trust the cost model, but because it prefers to parallelize > things its own way (with multiple restore jobs), and because execution > speed may not be the top priority for pg_restore, unlike a live > production system. This part I'm not sure about. I think people care quite a lot about pg_restore speed, because they are often down when they're running it. And they may have oodles mode CPUs that parallel restore can use without help from parallel query. I would be inclined to leave pg_restore alone and let the chips fall where they may. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost model for parallel CREATE INDEX
On Sat, Mar 4, 2017 at 2:15 PM, Peter Geogheganwrote: > So, I agree with Robert that we should actually use heap size for the > main, initial determination of # of workers to use, but we still need > to estimate the size of the final index [1], to let the cost model cap > the initial determination when maintenance_work_mem is just too low. > (This cap will rarely be applied in practice, as I said.) > > [1] > https://wiki.postgresql.org/wiki/Parallel_External_Sort#bt_estimated_nblocks.28.29_function_in_pageinspect Having looked at it some more, this no longer seems worthwhile. In the next revision, I will add a backstop that limits the use of parallelism based on a lack of maintenance_work_mem in a simpler manner. Namely, the worker will have to be left with a maintenance_work_mem/nworkers share of no less than 32MB in order for parallel CREATE INDEX to proceed. There doesn't seem to be any great reason to bring the volume of data to be sorted into it. I expect the cost model to be significantly simplified in the next revision in other ways, too. There will be no new index storage parameter, nor a disable_parallelddl GUC. compute_parallel_worker() will be called in a fairly straightforward way within plan_create_index_workers(), using heap blocks, as agreed to already. pg_restore will avoid parallelism (that will happen by setting "max_parallel_workers_maintenance = 0" when it runs), not because it cannot trust the cost model, but because it prefers to parallelize things its own way (with multiple restore jobs), and because execution speed may not be the top priority for pg_restore, unlike a live production system. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost model for parallel CREATE INDEX
On Sat, Mar 4, 2017 at 6:00 AM, Stephen Frostwrote: >> It is, but I was using that with index size, not table size. I can >> change it to be table size, based on what you said. But the workMem >> related cap, which probably won't end up being applied all that often >> in practice, *should* still do something with projected index size, >> since that really is what we're sorting, which could be very different >> (e.g. with partial indexes). > > Isn't that always going to be very different, unless you're creating a > single index across every column in the table..? Or perhaps I've > misunderstood what you're comparing as being 'very different' in your > last sentence. I mean: though a primary key index or similar is smaller than the table by maybe 5X, they are still generally within an order of magnitude. Given that the number of workers is determined at logarithmic intervals, it may not actually matter that much whether the scaling is based on heap size (input size) or index size (output size), at a very high level. Despite a 5X difference. I'm referring to the initial determination of the number of workers to be used, based on the scan the parallel CREATE INDEX has to do. So, I'm happy to go along with Robert's suggestion for V9, and have this number determined based on heap input size rather than index output size. It's good to be consistent with what we do for parallel seq scan (care about input size), and it probably won't change things by much anyway. This is generally the number that the cost model will end up going with, in practice. However, we then need to consider that since maintenance_work_mem is doled out as maintenance_work_mem/nworkers slices for parallel CREATE INDEX, there is a sensitivity to how much memory is left per worker as workers are added. This clear needs to be based on projected/estimated index size (output size), since that is what is being sorted, and because partial indexes imply that the size of the index could be *vastly* less than heap input size with still-sensible use of the feature. This will be applied as a cap on the first number. So, I agree with Robert that we should actually use heap size for the main, initial determination of # of workers to use, but we still need to estimate the size of the final index [1], to let the cost model cap the initial determination when maintenance_work_mem is just too low. (This cap will rarely be applied in practice, as I said.) [1] https://wiki.postgresql.org/wiki/Parallel_External_Sort#bt_estimated_nblocks.28.29_function_in_pageinspect -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost model for parallel CREATE INDEX
Peter, * Peter Geoghegan (p...@bowt.ie) wrote: > On Sat, Mar 4, 2017 at 12:50 AM, Robert Haaswrote: > > If the result of > > compute_parallel_workers() based on min_parallel_table_scan_size is > > smaller, then use that value instead. I must be confused, because I > > actually though that was the exact algorithm you were describing, and > > it sounded good to me. > > It is, but I was using that with index size, not table size. I can > change it to be table size, based on what you said. But the workMem > related cap, which probably won't end up being applied all that often > in practice, *should* still do something with projected index size, > since that really is what we're sorting, which could be very different > (e.g. with partial indexes). Isn't that always going to be very different, unless you're creating a single index across every column in the table..? Or perhaps I've misunderstood what you're comparing as being 'very different' in your last sentence. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Cost model for parallel CREATE INDEX
On Sat, Mar 4, 2017 at 12:50 AM, Robert Haaswrote: > If you think parallelism isn't worthwhile unless the sort was going to > be external anyway, I don't -- that's just when it starts to look like a safe bet that parallelism is worthwhile. There are quite a few cases where an external sort is faster than an internal sort these days, actually. > then it seems like the obvious thing to do is > divide the projected size of the sort by maintenance_work_mem, round > down, and cap the number of workers to the result. I'm sorry, I don't follow. > If the result of > compute_parallel_workers() based on min_parallel_table_scan_size is > smaller, then use that value instead. I must be confused, because I > actually though that was the exact algorithm you were describing, and > it sounded good to me. It is, but I was using that with index size, not table size. I can change it to be table size, based on what you said. But the workMem related cap, which probably won't end up being applied all that often in practice, *should* still do something with projected index size, since that really is what we're sorting, which could be very different (e.g. with partial indexes). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost model for parallel CREATE INDEX
On Sat, Mar 4, 2017 at 2:17 PM, Peter Geogheganwrote: > On Sat, Mar 4, 2017 at 12:43 AM, Robert Haas wrote: >> Oh. But then I don't see why you need min_parallel_anything. That's >> just based on an estimate of the amount of data per worker vs. >> maintenance_work_mem, isn't it? > > Yes -- and it's generally a pretty good estimate. > > I don't really know what minimum amount of memory to insist workers > have, which is why I provisionally chose one of those GUCs as the > threshold. > > Any better ideas? I don't understand how min_parallel_anything is telling you anything about memory. It has, in general, nothing to do with that. If you think parallelism isn't worthwhile unless the sort was going to be external anyway, then it seems like the obvious thing to do is divide the projected size of the sort by maintenance_work_mem, round down, and cap the number of workers to the result. If the result of compute_parallel_workers() based on min_parallel_table_scan_size is smaller, then use that value instead. I must be confused, because I actually though that was the exact algorithm you were describing, and it sounded good to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost model for parallel CREATE INDEX
On Sat, Mar 4, 2017 at 12:43 AM, Robert Haaswrote: > Oh. But then I don't see why you need min_parallel_anything. That's > just based on an estimate of the amount of data per worker vs. > maintenance_work_mem, isn't it? Yes -- and it's generally a pretty good estimate. I don't really know what minimum amount of memory to insist workers have, which is why I provisionally chose one of those GUCs as the threshold. Any better ideas? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost model for parallel CREATE INDEX
On Sat, Mar 4, 2017 at 2:01 PM, Peter Geogheganwrote: > On Sat, Mar 4, 2017 at 12:23 AM, Robert Haas wrote: >>> I guess that the workMem scaling threshold thing could be >>> min_parallel_index_scan_size, rather than min_parallel_relation_size >>> (which we now call min_parallel_table_scan_size)? >> >> No, it should be based on min_parallel_table_scan_size, because that >> is the size of the parallel heap scan that will be done as input to >> the sort. > > I'm talking about the extra thing we do to prevent parallelism from > being used when per-worker workMem is excessively low. That has much > more to do with projected index size than current heap size. Oh. But then I don't see why you need min_parallel_anything. That's just based on an estimate of the amount of data per worker vs. maintenance_work_mem, isn't it? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost model for parallel CREATE INDEX
On Sat, Mar 4, 2017 at 12:23 AM, Robert Haaswrote: >> I guess that the workMem scaling threshold thing could be >> min_parallel_index_scan_size, rather than min_parallel_relation_size >> (which we now call min_parallel_table_scan_size)? > > No, it should be based on min_parallel_table_scan_size, because that > is the size of the parallel heap scan that will be done as input to > the sort. I'm talking about the extra thing we do to prevent parallelism from being used when per-worker workMem is excessively low. That has much more to do with projected index size than current heap size. I agree with everything else you've said, I think. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost model for parallel CREATE INDEX
On Thu, Mar 2, 2017 at 10:38 PM, Peter Geogheganwrote: > I'm glad. This justifies the lack of much of any "veto" on the > logarithmic scaling. The only thing that can do that is > max_parallel_workers_maintenance, the storage parameter > parallel_workers (maybe this isn't a storage parameter in V9), and > insufficient maintenance_work_mem per worker (as judged by > min_parallel_relation_size being greater than workMem per worker). > > I guess that the workMem scaling threshold thing could be > min_parallel_index_scan_size, rather than min_parallel_relation_size > (which we now call min_parallel_table_scan_size)? No, it should be based on min_parallel_table_scan_size, because that is the size of the parallel heap scan that will be done as input to the sort. >> I think it's totally counter-intuitive that any hypothetical index >> storage parameter would affect the degree of parallelism involved in >> creating the index and also the degree of parallelism involved in >> scanning it. Whether or not other systems do such crazy things seems >> to me to beside the point. I think if CREATE INDEX allows an explicit >> specification of the degree of parallelism (a decision I would favor) >> it should have a syntactically separate place for unsaved build >> options vs. persistent storage parameters. > > I can see both sides of it. > > On the one hand, it's weird that you might have query performance > adversely affected by what you thought was a storage parameter that > only affected the index build. On the other hand, it's useful that you > retain that as a parameter, because you may want to periodically > REINDEX, or have a way of ensuring that pg_restore does go on to use > parallelism, since it generally won't otherwise. (As mentioned > already, pg_restore does not trust the cost model due to issues with > the availability of statistics). If you make the changes I'm proposing above, this parenthetical issue goes away, because the only statistic you need is the table size, which is what it is. As to the rest, I think a bare REINDEX should just use the cost model as if it were CREATE INDEX, and if you want to override that behavior, you can do that by explicit syntax. I see very little utility for a setting that fixes the number of workers to be used for future reindexes: there won't be many of them, and it's kinda confusing. But even if we decide to have that, I see no justification at all for conflating it with the number of workers to be used for a scan, which is something else altogether. > To be clear, I don't have any strong feelings on all this. I just > think it's worth pointing out that there are reasons to not do what > you suggest, that you might want to consider if you haven't already. I have considered them. I also acknowledge that other people may view the situation differently than I do. I'm just telling you my opinion on the topic. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost model for parallel CREATE INDEX
On Thu, Mar 2, 2017 at 5:50 AM, Robert Haaswrote: > On Wed, Mar 1, 2017 at 12:58 AM, Peter Geoghegan wrote: >> * This scales based on output size (projected index size), not input >> size (heap scan input). Apparently, that's what we always do right >> now. > > Actually, I'm not aware of any precedent for that. I'd just pass the > heap size to compute_parallel_workers(), leaving the index size as 0, > and call it good. What you're doing now seems exactly backwards from > parallel query generally. Sorry, that's what I meant. >> So, the main factor that >> discourages parallel sequential scans doesn't really exist for >> parallel CREATE INDEX. > > Agreed. I'm glad. This justifies the lack of much of any "veto" on the logarithmic scaling. The only thing that can do that is max_parallel_workers_maintenance, the storage parameter parallel_workers (maybe this isn't a storage parameter in V9), and insufficient maintenance_work_mem per worker (as judged by min_parallel_relation_size being greater than workMem per worker). I guess that the workMem scaling threshold thing could be min_parallel_index_scan_size, rather than min_parallel_relation_size (which we now call min_parallel_table_scan_size)? In general, I would expect this to leave most CREATE INDEX statements with a parallel plan in the real world, using exactly the number of workers indicated by the logarithmic scaling. (pg_restore would also not use parallelism, because it's specially disabled -- you have to have set the storage param at some point.) >> We could always defer the cost model to another release, and only >> support the storage parameter for now, though that has disadvantages, >> some less obvious [4]. > > I think it's totally counter-intuitive that any hypothetical index > storage parameter would affect the degree of parallelism involved in > creating the index and also the degree of parallelism involved in > scanning it. Whether or not other systems do such crazy things seems > to me to beside the point. I think if CREATE INDEX allows an explicit > specification of the degree of parallelism (a decision I would favor) > it should have a syntactically separate place for unsaved build > options vs. persistent storage parameters. I can see both sides of it. On the one hand, it's weird that you might have query performance adversely affected by what you thought was a storage parameter that only affected the index build. On the other hand, it's useful that you retain that as a parameter, because you may want to periodically REINDEX, or have a way of ensuring that pg_restore does go on to use parallelism, since it generally won't otherwise. (As mentioned already, pg_restore does not trust the cost model due to issues with the availability of statistics). There are reports on Google of users of these other systems being confused by all this, and I don't think that it's any different there (those other systems don't treat parallel_workers style storage parameter much different for the purposes of index scans, or anything like that). I agree that that isn't very user friendly. In theory, having two index storage parameters solves our problem. I don't like that either, though, since it creates a whole new problem. To be clear, I don't have any strong feelings on all this. I just think it's worth pointing out that there are reasons to not do what you suggest, that you might want to consider if you haven't already. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost model for parallel CREATE INDEX
On Wed, Mar 1, 2017 at 12:58 AM, Peter Geogheganwrote: > * This scales based on output size (projected index size), not input > size (heap scan input). Apparently, that's what we always do right > now. Actually, I'm not aware of any precedent for that. I'd just pass the heap size to compute_parallel_workers(), leaving the index size as 0, and call it good. What you're doing now seems exactly backwards from parallel query generally. > So, the main factor that > discourages parallel sequential scans doesn't really exist for > parallel CREATE INDEX. Agreed. > We could always defer the cost model to another release, and only > support the storage parameter for now, though that has disadvantages, > some less obvious [4]. I think it's totally counter-intuitive that any hypothetical index storage parameter would affect the degree of parallelism involved in creating the index and also the degree of parallelism involved in scanning it. Whether or not other systems do such crazy things seems to me to beside the point. I think if CREATE INDEX allows an explicit specification of the degree of parallelism (a decision I would favor) it should have a syntactically separate place for unsaved build options vs. persistent storage parameters. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Cost model for parallel CREATE INDEX
There are a couple of open items for the parallel CREATE INDEX patch that at this point represent blockers to commit, IMV. The first is around a deficiency in the shared refcount mechanism, which is well understood and doesn't need to be rehashed on this thread. The second is the cost model, which is what I want to talk about here. Currently, the cost model scales the number of workers at logarithmic intervals, in the style of compute_parallel_worker(), but without considering heap pages (that's actually broken out, for now). I'm calling a new function that lives in planner.c, right next to plan_cluster_use_sort(). ISTM that we ought to be considering both heap pages and indexes pages, which makes the new signature of compute_parallel_worker() (which now anticipates the needs of parallel index scans) interesting to me, so that's something that I need to address. Right now, as of V8, we: 0. See if the parallel_workers *index* storage parameter is set (I've added this new storage parameter, but I think that Amit has or will add the same new index storage parameter for parallel index scan [1]). 1. Estimate/project the size of the finished index, using a new nbtpage.c function. Note that this does the right thing with partial indexes and things like that. It uses pg_statistic stats, where available. (My testing patch 0002-* has had an SQL-callable function that lets reviewers easily determine what the projected size of some existing index is, which might be a good idea to polish up and include as a general purpose tool, apropos of nothing -- it is typically very accurate [2]). 2. Input that size into the compute_parallel_worker()-style logarithmic scaling of number of workers. 3. Calculate how many workers will have at least a full maintenance_work_mem share doled out, while still having at least min_parallel_relation_size of workMem in tuplesort.c. (I guess I should say min_parallel_table_scan_size or even min_parallel_index_scan_size now, but whatever). 4. Return the minimum worker calculation from either one of steps 3 and 4. So, a low maintenance_work_mem may cap our original suggested number of workers. This cap isn't particularly likely to be applied, though. Note also that the max_parallel_workers_maintenance GUC is given the opportunity to cap things off. This is the utility statement equivalent of max_parallel_workers_per_gather. Issues with this: * This scales based on output size (projected index size), not input size (heap scan input). Apparently, that's what we always do right now. * This is dissimilar to how we cost parallel seq scans. There, the only cost associated with going with a parallel access path is fixed startup overheads, and IPC costs (paid in parallel_tuple_cost units). So, we're not doing a comparison against a serial and parallel plan, even though we might want to, especially because parallel CREATE INDEX always uses temp files, unlike serial CREATE INDEX. cost_sort() is never involved in any of this, and in any case isn't prepared to cost parallel sorts right now. * OTOH, there is less sense in doing the equivalent of charging for IPC overhead that something like a Gather node incurs costs for during planning, because to some degree the IPC involved is inherently necessary. If you were going to get an external sort anyway, well, that still involves temp files that are written to and read from. Whether or not it's the same backend that does the writing as the reading in all cases may matter very little. So, the main factor that discourages parallel sequential scans doesn't really exist for parallel CREATE INDEX. I am tempted to move to something closer to what you see elsewhere, were a serial path and partial path are both created. This would make some sense for parallel CREATE INDEX, because you happen to have the issue of parallelism effectively forcing an external sort. But otherwise, it wouldn't make that much sense, because parallelism is always going to help up to the point that all cores are in use, or at least not hurt. Testing shows this to be the case. It's not as if there are obvious IPC costs that push things against parallelism. This is generally good, because the danger of using too many workers is much less pronounced -- it's demonstrably very small, especially if you assume a baseline of a serial external sort based CREATE INDEX. What direction does the cost model need to go in? I still lean towards the approach V8 takes, though the scaling should possibly use heap pages and index pages with compute_parallel_worker(), while not worrying about the input/output distinction. It's not very appealing to have to teach cost_sort() about any of this, since the things that considers currently are hard to map onto parallelism. Besides, it's not as if there is a world of difference between a serial internal sort CREATE INDEX, and a parallel external sort CREATE INDEX with lots of memory. It's not a potentially very large difference, as we see with the sort