On Fri, Jun 13, 2014 at 1:11 PM, Kyotaro HORIGUCHI < horiguchi.kyot...@lab.ntt.co.jp> wrote: > > Hello, This is the continuation from the last CF. > > This patch intends to make PG to use index for longer pathkeys > than index columns when, > > - The index is a unique index. > - All index columns are NOT NULL. > - The index column list is a subset of query_pathkeys. > > ==== > > This patch consists of mainly three parts. > > 1. Mark index as 'Uniquely orders the table' if the conditions > are satisfied. This is the same from the previous patch. > > 2. Collect the "common primary pathkeys", which is pathkeys that > can perfectly sort all involved > RTE's. (collect_common_primary_pathkeys()) > > 3. Trim the query pathkeys using the common primary pathkeys. > (trim_query_pathkeys())
I have spent some time on this patch and would like to share my findings as below with you. 1. It seems to me that the new flag (IndexOptInfo.full_ordered) introduced in this patch is not getting set properly. Please refer below test: create table t (a int, b int, c int, d text); create unique index i_t_pkey on t(a, b); insert into t (select a % 10, a / 10, a, 't' from generate_series(0, 100000) a); analyze; explain (costs off, analyze on) select distinct * from t; Unptached - postgres=# explain (costs off, analyze on) select distinct * from t; QUERY PLAN --------------------------------------------------------------------------- Unique (actual time=331.624..597.278 rows=100001 loops=1) -> Sort (actual time=330.889..430.449 rows=100001 loops=1) Sort Key: a, b, c, d Sort Method: external merge Disk: 2344kB -> Seq Scan on t (actual time=0.013..74.202 rows=100001 loops=1) Execution time: 665.332 ms (6 rows) Patch (pathkey_and_uniqueindx_typ2_v1) - postgres=# explain (costs off, analyze on) select distinct * from t; QUERY PLAN --------------------------------------------------------------------------- Unique (actual time=336.033..601.144 rows=100001 loops=1) -> Sort (actual time=336.027..436.621 rows=100001 loops=1) Sort Key: a, b, c, d Sort Method: external merge Disk: 2344kB -> Seq Scan on t (actual time=0.014..78.826 rows=100001 loops=1) Execution time: 667.387 ms (6 rows) Shouldn't above query use index scan after patch? 2. typedef struct IndexOptInfo bool predOK; /* true if predicate matches query */ bool unique; /* true if a unique index */ bool immediate; /* is uniqueness enforced immediately? */ + bool full_ordered; /* don't key columns duplicate? */ It seems you have forgotten to update out function _outIndexOptInfo(). 3. + root->distinct_pathkeys = + shorten_pathkeys_following(root, root->distinct_pathkeys,pk_guide, + true); + + root->query_pathkeys = + shorten_pathkeys_following(root,root->query_pathkeys,pk_guide, + true); Add a space after each parameter in function call. 4. +PathKeysComparison +compare_pathkeys_ignoring_strategy(List *keys1, List *keys2) a. This function return 4 different type of values (PATHKEYS_EQUAL, PATHKEYS_DIFFERENT, PATHKEYS_BETTER1, PATHKEYS_BETTER2), however the caller just checks for PATHKEYS_EQUAL, isn't it better to just return either PATHKEYS_EQUAL or PATHKEYS_DIFFERENT. b. Another idea could be that instead of writting separate function, pass an additional parameter to compare_pathkeys() to distinguish for ignore strategy case. c. In any case, I think it is good to add comments on top of newly introduced function (compare_pathkeys_ignoring_strategy) or extend the comments of compare_pathkeys() if you want to add a new parameter to it. > Finally, the new patch has grown up to twice in size.. Although > it seems a bit large, I think that side-effects are clearly > avoided. I am bit worried about the extra cycles added by this patch as compare to your previous patch; example During trimming of paths, it will build index paths (build_index_pathkeys()) which it will anyway do again during creation of index paths: create_index_paths()->get_index_paths()->build_index_paths()->build_index_pathkeys() I am not sure if this has direct impact, however I am suspecting trimming logic can add quite a few instructions even for cases when it is not beneficial. At this moment, I also don't have better idea, so lets proceed with review of this patch unless there is any other better way to achieve it. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com