Re: [HACKERS] Parallel Index Scans

Amit Kapila Wed, 08 Feb 2017 22:34:38 -0800

On Sat, Feb 4, 2017 at 7:14 AM, Amit Kapila <[email protected]> wrote:
> On Sat, Feb 4, 2017 at 5:54 AM, Robert Haas <[email protected]> wrote:
>> On Wed, Feb 1, 2017 at 12:58 AM, Amit Kapila <[email protected]> wrote:
>
>> On balance, I'm somewhat inclined to think that we ought to base
>> everything on heap pages, so that we're always measuring in the same
>> units.  That's what Dilip's patch for parallel bitmap heap scan does,
>> and I think it's a reasonable choice.  However, for parallel index
>> scan, we might want to also cap the number of workers to, say,
>> index_pages/10, just so we don't pick an index scan that's going to
>> result in a very lopsided work distribution.
>>
>
> I guess in the above context you mean heap_pages or index_pages that
> are expected to be *fetched* during index scan.
>
> Yet another thought is that for parallel index scan we use
> index_pages_fetched, but use either a different GUC
> (min_parallel_index_rel_size) with a relatively lower default value
> (say equal to min_parallel_relation_size/4 = 2MB) or directly use
> min_parallel_relation_size/4 for parallel index scans.
>


I had some offlist discussion with Robert about the above point and we
feel that keeping only heap pages for parallel computation might not
be future proof as for parallel index only scans there might not be
any heap pages.  So, it is better to use separate GUC for parallel
index (only) scans.  We can have two guc's
min_parallel_table_scan_size (8MB) and min_parallel_index_scan_size
(512kB) for computing parallel scans.  The parallel sequential scan
and parallel bitmap heap scans can use min_parallel_table_scan_size as
a threshold to compute parallel workers as we are doing now.  For
parallel index scans, both min_parallel_table_scan_size and
min_parallel_index_scan_size can be used for threshold;  We can
compute parallel workers both based on heap_pages to be scanned and
index_pages to be scanned and then keep the minimum of those.  This
will help us to engage parallel index scans when the index pages are
lower than threshold but there are many heap pages to be scanned and
will also allow keeping a maximum cap on the number of workers based
on index scan size.

guc_parallel_index_scan_v1.patch - Change name of existing
min_parallel_relation_size to min_parallel_table_scan_size and added a
new guc min_parallel_index_scan_size with default value of 512kB.
This patch also adjusted the computation in compute_parallel_worker
based on two guc's.

compute_index_pages_v2.patch - This function extracts the computation
of index pages to be scanned in a separate function and used it in
existing code.  You will notice that I have pulled up the logic of
conversion of clauses to indexquals from create_index_path to
build_index_paths as that is required to compute the number of index
and heap pages to be scanned by scan in patch
parallel_index_opt_exec_support_v8.patch.  This doesn't impact any
existing functionality.

parallel_index_scan_v7 - patch to parallelize btree scans, nothing is
changed from previous version (just rebased on latest head).

parallel_index_opt_exec_support_v8.patch - This contain changes to
compute parallel workers using both heap and index pages that need to
be scanned.

Patches guc_parallel_index_scan_v1.patch and
compute_index_pages_v2.patch are independent patches.  Both the
patches are required by parallel index scan patches.

The current set of patches handles all the reported comments.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

guc_parallel_index_scan_v1.patch
Description: Binary data

compute_index_pages_v2.patch
Description: Binary data

parallel_index_scan_v7.patch
Description: Binary data

parallel_index_opt_exec_support_v8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Index Scans

Reply via email to