On Sat, Jan 28, 2017 at 1:34 AM, Robert Haas <robertmh...@gmail.com> wrote: > On Mon, Jan 23, 2017 at 1:03 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: >> parallel_index_opt_exec_support_v6 - Removed the usage of >> GatherSupportsBackwardScan. Expanded the comments in >> ExecReScanIndexScan. > > I looked through this and in general it looks reasonable to me. > However, I did notice one thing that I think is wrong. In the > parallel bitmap heap scan patch, the second argument to > compute_parallel_worker() is the number of pages that the parallel > scan is expected to fetch from the heap. In this patch, it's the > total number of pages in the index. The former seems to me to be > better, because the point of having a threshold relation size for > parallelism is that we don't want to use a lot of workers to scan a > small number of pages -- the distribution of work won't be even, and > the potential savings are limited. If we've got a big index but are > using a very selective qual to pull out only one or a small number of > rows on a single page or a small handful of pages, we shouldn't > generate a parallel path for that. >
Agreed, that it makes sense to consider only the number of pages to scan for computation of parallel workers. I think for index scan we should consider both index and heap pages that need to be scanned (costing of index scan consider both index and heap pages). I thin where considering heap pages matter more is when the finally selected rows are scattered across heap pages or we need to apply a filter on rows after fetching from the heap. OTOH, we can consider just pages in the index as that is where mainly the parallelism works. In the attached patch (parallel_index_opt_exec_support_v7.patch), I have considered both index and heap pages, let me know if you think some other way is better. I have also prepared a separate independent patch (compute_index_pages_v1) on HEAD to compute index pages which can be used by parallel index scan. There is no change in parallel_index_scan (parallel btree scan) patch, so I am not attaching its new version. > Now, against that theory, the GUC that controls the behavior of > compute_parallel_worker() is called min_parallel_relation_size, which > might make you think that the decision is supposed to be based on the > whole size of some relation. But I think that just means we need to > rename the GUC to something like min_parallel_scan_size. Possibly we > also ought consider reducing the default value somewhat, because it > seems like both sequential and index scans can benefit even when > scanning less than 8MB. > Agreed, but let's consider it separately. The order in which patches needs to be applied: compute_index_pages_v1.patch, parallel_index_scan_v6.patch[1], parallel_index_opt_exec_support_v7.patch [1] - https://www.postgresql.org/message-id/CAA4eK1J%3DLSBpDx7i_izGJxGVUryqPe-2SKT02De-PrQvywiMxw%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
compute_index_pages_v1.patch
Description: Binary data
parallel_index_opt_exec_support_v7.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers