There are a couple of open items for the parallel CREATE INDEX patch that at this point represent blockers to commit, IMV. The first is around a deficiency in the shared refcount mechanism, which is well understood and doesn't need to be rehashed on this thread. The second is the cost model, which is what I want to talk about here.
Currently, the cost model scales the number of workers at logarithmic intervals, in the style of compute_parallel_worker(), but without considering heap pages (that's actually broken out, for now). I'm calling a new function that lives in planner.c, right next to plan_cluster_use_sort(). ISTM that we ought to be considering both heap pages and indexes pages, which makes the new signature of compute_parallel_worker() (which now anticipates the needs of parallel index scans) interesting to me, so that's something that I need to address. Right now, as of V8, we: 0. See if the parallel_workers *index* storage parameter is set (I've added this new storage parameter, but I think that Amit has or will add the same new index storage parameter for parallel index scan ). 1. Estimate/project the size of the finished index, using a new nbtpage.c function. Note that this does the right thing with partial indexes and things like that. It uses pg_statistic stats, where available. (My testing patch 0002-* has had an SQL-callable function that lets reviewers easily determine what the projected size of some existing index is, which might be a good idea to polish up and include as a general purpose tool, apropos of nothing -- it is typically very accurate ). 2. Input that size into the compute_parallel_worker()-style logarithmic scaling of number of workers. 3. Calculate how many workers will have at least a full maintenance_work_mem share doled out, while still having at least min_parallel_relation_size of workMem in tuplesort.c. (I guess I should say min_parallel_table_scan_size or even min_parallel_index_scan_size now, but whatever). 4. Return the minimum worker calculation from either one of steps 3 and 4. So, a low maintenance_work_mem may cap our original suggested number of workers. This cap isn't particularly likely to be applied, though. Note also that the max_parallel_workers_maintenance GUC is given the opportunity to cap things off. This is the utility statement equivalent of max_parallel_workers_per_gather. Issues with this: * This scales based on output size (projected index size), not input size (heap scan input). Apparently, that's what we always do right now. * This is dissimilar to how we cost parallel seq scans. There, the only cost associated with going with a parallel access path is fixed startup overheads, and IPC costs (paid in parallel_tuple_cost units). So, we're not doing a comparison against a serial and parallel plan, even though we might want to, especially because parallel CREATE INDEX always uses temp files, unlike serial CREATE INDEX. cost_sort() is never involved in any of this, and in any case isn't prepared to cost parallel sorts right now. * OTOH, there is less sense in doing the equivalent of charging for IPC overhead that something like a Gather node incurs costs for during planning, because to some degree the IPC involved is inherently necessary. If you were going to get an external sort anyway, well, that still involves temp files that are written to and read from. Whether or not it's the same backend that does the writing as the reading in all cases may matter very little. So, the main factor that discourages parallel sequential scans doesn't really exist for parallel CREATE INDEX. I am tempted to move to something closer to what you see elsewhere, were a serial path and partial path are both created. This would make some sense for parallel CREATE INDEX, because you happen to have the issue of parallelism effectively forcing an external sort. But otherwise, it wouldn't make that much sense, because parallelism is always going to help up to the point that all cores are in use, or at least not hurt. Testing shows this to be the case. It's not as if there are obvious IPC costs that push things against parallelism. This is generally good, because the danger of using too many workers is much less pronounced -- it's demonstrably very small, especially if you assume a baseline of a serial external sort based CREATE INDEX. What direction does the cost model need to go in? I still lean towards the approach V8 takes, though the scaling should possibly use heap pages and index pages with compute_parallel_worker(), while not worrying about the input/output distinction. It's not very appealing to have to teach cost_sort() about any of this, since the things that considers currently are hard to map onto parallelism. Besides, it's not as if there is a world of difference between a serial internal sort CREATE INDEX, and a parallel external sort CREATE INDEX with lots of memory. It's not a potentially very large difference, as we see with the sort vs. index scan for CLUSTER issue considered by plan_cluster_use_sort(). I think that cost_sort() worries too much about some things, but not enough about other things , so it's hard to imagine that it will ever consistently do the right thing when we're expecting only small differences in cost. We could always defer the cost model to another release, and only support the storage parameter for now, though that has disadvantages, some less obvious .  https://wiki.postgresql.org/wiki/Parallel_External_Sort#parallel_workers_index_storage_parameter  https://wiki.postgresql.org/wiki/Parallel_External_Sort#bt_estimated_nblocks.28.29_function_in_pageinspect  https://www.postgresql.org/message-id/CAMkU=1y_qp+QUPGk=JBJSTtcYQpW2k=v2lmytzko_8ftuuy...@mail.gmail.com  https://www.postgresql.org/message-id/cam3swzr6c+1cwghc40g9z5thfe3u2xbv55w5-tertfeooaz...@mail.gmail.com -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers