I have done performance analysis for TPCH queries, I saw visible gain in 5 queries (10-25%).
Performance Data: Benchmark : TPCH (S.F. 10) shared_buffer : 20GB work_mem : 50MB Machine : POWER Results are median of three run (explain analyze results for both head/patch are attached in TPCH_out.tar). Query Execution Time in (ms) Head Patch Improvement Q3 18475 16558 10% Q4 7526 5856 22% Q7 19386 17425 10% Q10 16994 15019 11% Q12 13852 10117 26% Currently we had two major problems about this patch.. Problem1: As Andres has mentioned, HeapKeyTest uses heap_getattr, whereas ExecQual use slot_getattr().So we can have worst case performance problem when very less tuple are getting filter out and we have table with many columns with qual on most of the columns. Problem2. In HeapKeyTest we are under per_query_ctx, whereas in ExecQual we are under per_tuple_ctx , so in former we can not afford to have any palloc. In this patch I have address both the concern by exposing executor information to heap (I exposed per_tuple_ctx and slot to HeapDesc), which is not a very good design. I have other ideas in mind for solving these concerns, please provide your thoughts.. For problem1 : I think it's better to give task of key push down to optimizer, there we can actually take the decision mostly based on two parameters. 1. Selectivity. 2. Column number on which qual is given. For problem2 : I think for solving this we need to limit the number of datatype we pushdown to heap (I mean we can push all datatype which don't need palloc in qual test). 1. Don't push down datatype with variable length. 2. Some other datatype with fixed length like 'name' can also palloc (i.e nameregexeq). so we need to block them as well. *Note: For exactly understanding which key is pushed down in below attached exact analyze output, refer this example.. without patch: -> Parallel Seq Scan on orders (cost=0.00..369972.00 rows=225038 width=20) (actual time=0.025..3216.157 rows=187204 loops=3) Filter: ((o_orderdate >= '1995-01-01'::date) AND (o_orderdate < '1995-04-01 00:00:00'::timestamp without time zone)) Rows Removed by Filter: 4812796 with patch: -> Parallel Seq Scan on orders (cost=0.00..369972.00 rows=225038 width=20) (actual time=0.015..1884.993 rows=187204 loops=3) Filter: ((o_orderdate >= '1995-01-01'::date) AND (o_orderdate < '1995-04-01 00:00:00'::timestamp without time zone)) 1. So basically on head it shows how many rows are discarded by filter (Rows Removed by Filter: 4812796), Now if we pushdown all the keys then it will not show this value. 2. We can also check how much actual time reduced in the SeqScan node. (i.e in above example on head it was 3216.157 whereas with patch it was 1884.993). -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com On Thu, Nov 3, 2016 at 7:29 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Tue, Nov 1, 2016 at 8:31 PM, Kouhei Kaigai <kai...@ak.jp.nec.com> wrote: >> By the way, I'm a bit skeptical whether this enhancement is really beneficial >> than works for this enhancement, because we can now easily increase the >> number >> of processor cores to run seq-scan with qualifier, especially, when it has >> high >> selectivity. >> How about your thought? > > Are you saying we don't need to both making sequential scans faster > because we could just use parallel sequential scan instead? That > doesn't sound right to me. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Description: Binary data
Description: Unix tar archive
-- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers