I have done performance analysis for TPCH queries, I saw visible gain
in 5 queries (10-25%).

Performance Data:

Benchmark : TPCH (S.F. 10)
shared_buffer : 20GB
work_mem      : 50MB
Machine       : POWER

Results are median of three run (explain analyze results for both
head/patch are attached in TPCH_out.tar).

          Query Execution Time in (ms)
            Head                Patch        Improvement
Q3     18475                16558           10%
Q4       7526                  5856            22%
Q7     19386                17425            10%
Q10   16994                15019            11%
Q12   13852                10117             26%

Currently we had two major problems about this patch..

Problem1:  As Andres has mentioned, HeapKeyTest uses heap_getattr,
whereas ExecQual use slot_getattr().So we can have worst case
performance problem when very less tuple are getting filter out and we
have table with many columns with qual on most of the columns.

Problem2. In HeapKeyTest we are under per_query_ctx, whereas in
ExecQual we are under per_tuple_ctx , so in former we can not afford
to have any palloc.

In this patch I have address both the concern by exposing executor
information to heap (I exposed per_tuple_ctx and slot to HeapDesc),
which is not a very good design.

I have other ideas in mind for solving these concerns, please provide
your thoughts..

For problem1 :
I think it's better to give task of key push down to optimizer, there
we can actually take the decision   mostly based on two parameters.
   1. Selectivity.
   2. Column number on which qual is given.

For problem2 :
I think for solving this we need to limit the number of datatype we
pushdown to heap (I mean we can  push all datatype which don't need
palloc in qual test).
   1. Don't push down datatype with variable length.
   2. Some other datatype with fixed length like 'name' can also
palloc (i.e nameregexeq). so we need    to block them as well.

*Note: For exactly understanding which key is pushed down in below
attached exact analyze output, refer this example..

without patch:
    ->  Parallel Seq Scan on orders  (cost=0.00..369972.00 rows=225038
width=20) (actual     time=0.025..3216.157 rows=187204 loops=3)
               Filter: ((o_orderdate >= '1995-01-01'::date) AND
(o_orderdate < '1995-04-01 00:00:00'::timestamp without time zone))
               Rows Removed by Filter: 4812796

with patch:
    ->  Parallel Seq Scan on orders  (cost=0.00..369972.00 rows=225038
width=20) (actual time=0.015..1884.993 rows=187204 loops=3)
               Filter: ((o_orderdate >= '1995-01-01'::date) AND
(o_orderdate < '1995-04-01 00:00:00'::timestamp without time zone))

1. So basically on head it shows how many rows are discarded by filter
(Rows Removed by Filter: 4812796), Now if we pushdown all the keys
then it will not show this value.

2. We can also check how much actual time reduced in the SeqScan node.
(i.e in above example on head it was 3216.157 whereas with patch it
was 1884.993).

Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

On Thu, Nov 3, 2016 at 7:29 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Tue, Nov 1, 2016 at 8:31 PM, Kouhei Kaigai <kai...@ak.jp.nec.com> wrote:
>> By the way, I'm a bit skeptical whether this enhancement is really beneficial
>> than works for this enhancement, because we can now easily increase the 
>> number
>> of processor cores to run seq-scan with qualifier, especially, when it has 
>> high
>> selectivity.
>> How about your thought?
> Are you saying we don't need to both making sequential scans faster
> because we could just use parallel sequential scan instead?  That
> doesn't sound right to me.
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment: heap_scankey_pushdown_expr_ctx.patch
Description: Binary data

Attachment: TPCH_out.tar
Description: Unix tar archive

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to