Re: [HACKERS] Use unique index for longer pathkeys.
Hello, Although, yes, you're right, irrespective of the common something, and even if the dropped index was i_t1_pkey_2, which is on t1(a, b), the result tuples are sorted as expected only by the pathkey (t.a = t1.a, t1.b). It is because both t and t1 are still unique so the joined tuples are also unique, and the unique key of the result tuples is the merged pkey (t.a, t1.a, t1.b), which can be transformed to (t.a, t1.b) using the equiality between t.a and t1.a. And considering the inner relation (t1) is already sorted by (a, b), the sort itself could be elimited from the plan. I think if we could eliminate t1.c,t1.d considering indexes on individual relations (may be by using technique I have mentioned upthread or some other way), then the current code itself will eliminate the ORDER BY clause. I have tried that by using a query without having t1.c,t1.d in ORDER BY clause explain (costs off, analyze off) select * from t,t1 where t.a=t1.a order by t1.a,t1.b; QUERY PLAN -- Merge Join Merge Cond: (t1.a = t.a) - Index Scan using i_t1_pkey_2 on t1 - Index Scan using i_t_pkey on t (4 rows) Ya, the elimination seems to me so fascinate:) Okay, I think you want to handle the elimination of ORDER BY clauses at a much broader level which might handle most of simple cases as well. However I think eliminating clauses as mentioned above is itself a good optimization for many cases and more so if that can be done in a much simpler way. Yes, if can be done in much simpler way.. I guess that it could be looking from opposite side, that is, equivalence classes, anyway, I'll try it. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use unique index for longer pathkeys.
Hello, Now drop the i_t1_pkey_1 and check the query plan again. drop index i_t1_pkey_1; explain (costs off, analyze off) select * from t,t1 where t.a=t1.a order by t1.a,t1.b,t1.c,t1.d; QUERY PLAN Sort Sort Key: t.a, t1.b, t1.c, t1.d - Merge Join Merge Cond: (t.a = t1.a) - Index Scan using i_t_pkey on t - Index Scan using i_t1_pkey_2 on t1 (6 rows) Can't above plan eliminate Sort Key even after dropping index (i_t1_pkey_1)? My patch doesn't so since there no longer a 'common primary pathkeys' in this query. Perhaps the query doesn't allow the sort eliminated. Since a is no more a pkey, t1 can have dulicate rows for the same a, so the joined relation also may have duplicte values in the column a. Sorry, I may misunderstood you. The dropped index is t1(a) not t1(a,b) so the discussion abobe is on the wrong assumption. I think irrespective of that we can trim t1.c t1.d as we have primary key (unique and non column) for t1.a, t1.b. Basically even if we don't use the primary key index, we can still trim the keys in such a case. IIUC, then Tom has mentioned the same in his message related to this issue. Although, yes, you're right, irrespective of the common something, and even if the dropped index was i_t1_pkey_2, which is on t1(a, b), the result tuples are sorted as expected only by the pathkey (t.a = t1.a, t1.b). It is because both t and t1 are still unique so the joined tuples are also unique, and the unique key of the result tuples is the merged pkey (t.a, t1.a, t1.b), which can be transformed to (t.a, t1.b) using the equiality between t.a and t1.a. And considering the inner relation (t1) is already sorted by (a, b), the sort itself could be elimited from the plan. The above could be described as below, - Join between relations with pkey yiels relation with the pkey containing the columns of both pkeys, this is created by simply concatenate them. - The concatenated pkey can be transformed using equality operators, in other words, consulting equivelance classes. - If the projected joined relation(?) contains one of the transformed pkeys above, it is a pkey for the result. And in regards to MergeJoin and NestLoop, the result pathkeys can be composed by concatenating child pathkeys in outer-inner order. If the all ORDER BY pathkeys columns appear in the concatenated pathkeys in the same order , the sort itself can be eliminated. I am referring to below text: If we have ORDER BY a, b, c and (a,b) is the primary key, then including c in the ORDER BY list is semantically redundant, *whether or not we use an indexscan on the pkey index at all*. http://www.postgresql.org/message-id/5212.1397599...@sss.pgh.pa.us Yes, my point at that time was mainly union (all) and partitioned tables so the optimization for joins was the next step for me. Although, my previous implement of collect_common_primary_pathkeys seems to have been excessively heavyweighted and it could be integrated with join optimization. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use unique index for longer pathkeys.
On Mon, Aug 4, 2014 at 1:22 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Hello, I think irrespective of that we can trim t1.c t1.d as we have primary key (unique and non column) for t1.a, t1.b. Basically even if we don't use the primary key index, we can still trim the keys in such a case. IIUC, then Tom has mentioned the same in his message related to this issue. Although, yes, you're right, irrespective of the common something, and even if the dropped index was i_t1_pkey_2, which is on t1(a, b), the result tuples are sorted as expected only by the pathkey (t.a = t1.a, t1.b). It is because both t and t1 are still unique so the joined tuples are also unique, and the unique key of the result tuples is the merged pkey (t.a, t1.a, t1.b), which can be transformed to (t.a, t1.b) using the equiality between t.a and t1.a. And considering the inner relation (t1) is already sorted by (a, b), the sort itself could be elimited from the plan. I think if we could eliminate t1.c,t1.d considering indexes on individual relations (may be by using technique I have mentioned upthread or some other way), then the current code itself will eliminate the ORDER BY clause. I have tried that by using a query without having t1.c,t1.d in ORDER BY clause explain (costs off, analyze off) select * from t,t1 where t.a=t1.a order by t1.a,t1.b; QUERY PLAN -- Merge Join Merge Cond: (t1.a = t.a) - Index Scan using i_t1_pkey_2 on t1 - Index Scan using i_t_pkey on t (4 rows) I am referring to below text: If we have ORDER BY a, b, c and (a,b) is the primary key, then including c in the ORDER BY list is semantically redundant, *whether or not we use an indexscan on the pkey index at all*. http://www.postgresql.org/message-id/5212.1397599...@sss.pgh.pa.us Yes, my point at that time was mainly union (all) and partitioned tables so the optimization for joins was the next step for me. Okay, I think you want to handle the elimination of ORDER BY clauses at a much broader level which might handle most of simple cases as well. However I think eliminating clauses as mentioned above is itself a good optimization for many cases and more so if that can be done in a much simpler way. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Use unique index for longer pathkeys.
On Mon, Jul 28, 2014 at 3:17 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Now drop the i_t1_pkey_1 and check the query plan again. drop index i_t1_pkey_1; explain (costs off, analyze off) select * from t,t1 where t.a=t1.a order by t1.a,t1.b,t1.c,t1.d; QUERY PLAN Sort Sort Key: t.a, t1.b, t1.c, t1.d - Merge Join Merge Cond: (t.a = t1.a) - Index Scan using i_t_pkey on t - Index Scan using i_t1_pkey_2 on t1 (6 rows) Can't above plan eliminate Sort Key even after dropping index (i_t1_pkey_1)? My patch doesn't so since there no longer a 'common primary pathkeys' in this query. Perhaps the query doesn't allow the sort eliminated. Since a is no more a pkey, t1 can have dulicate rows for the same a, so the joined relation also may have duplicte values in the column a. I think irrespective of that we can trim t1.c t1.d as we have primary key (unique and non column) for t1.a, t1.b. Basically even if we don't use the primary key index, we can still trim the keys in such a case. IIUC, then Tom has mentioned the same in his message related to this issue. I am referring to below text: If we have ORDER BY a, b, c and (a,b) is the primary key, then including c in the ORDER BY list is semantically redundant, *whether or not we use an indexscan on the pkey index at all*. http://www.postgresql.org/message-id/5212.1397599...@sss.pgh.pa.us With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Use unique index for longer pathkeys.
Hello, I think there is one more disadvantage in the way current patch is done which is that you need to collect index path keys for all relations irrespective of whether they will be of any use to eliminate useless pathkeys from query_pathkeys. One trivial case that comes to mind is when there are multiple relations involved in query and ORDER BY is base on columns of only part of the tables involved in query. Like this? select x.a, x.b, y.b from x, y where x.a = y.a order by x.a, x.b; Equivalent class consists of (x.a=y.a) and (x.b), so index pathkeys for i_y is (y.a.=x.a). As a result, no common primary pathkeys found. I think it will find common pathkey incase you have an unique index on x.a (please see the example below), but currently I am not clear why there is a need for a common index path key in such a case to eliminate useless keys in ORDER BY, why can't we do it based on individual table's path key. Example: create table t (a int not null, b int not null, c int, d text); create unique index i_t_pkey on t(a, b); insert into t (select a % 10, a / 10, a, 't' from generate_series(0, 10) a); analyze; create table t1 (a int not null, b int not null, c int, d text); create unique index i_t1_pkey_1 on t1(a); create unique index i_t1_pkey_2 on t1(a, b); insert into t1 (select a * 2, a / 10, a, 't' from generate_series(0, 10) a); explain (costs off, analyze off) select * from t,t1 where t.a=t1.a order by t1.a,t1.b,t1.c,t1.d; QUERY PLAN -- Merge Join Merge Cond: (t.a = t1.a) - Index Scan using i_t_pkey on t - Index Scan using i_t1_pkey_1 on t1 (4 rows) Here we can notice that there is no separate sort key in plan. Sure, Now drop the i_t1_pkey_1 and check the query plan again. drop index i_t1_pkey_1; explain (costs off, analyze off) select * from t,t1 where t.a=t1.a order by t1.a,t1.b,t1.c,t1.d; QUERY PLAN Sort Sort Key: t.a, t1.b, t1.c, t1.d - Merge Join Merge Cond: (t.a = t1.a) - Index Scan using i_t_pkey on t - Index Scan using i_t1_pkey_2 on t1 (6 rows) Can't above plan eliminate Sort Key even after dropping index (i_t1_pkey_1)? My patch doesn't so since there no longer a 'common primary pathkeys' in this query. Perhaps the query doesn't allow the sort eliminated. Since a is no more a pkey, t1 can have dulicate rows for the same a, so the joined relation also may have duplicte values in the column a. Therefore the joined relation may be half sorted only by the column a so the sort pathkeys cannot be trimmed. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use unique index for longer pathkeys.
On Fri, Jul 25, 2014 at 12:48 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: I think there is one more disadvantage in the way current patch is done which is that you need to collect index path keys for all relations irrespective of whether they will be of any use to eliminate useless pathkeys from query_pathkeys. One trivial case that comes to mind is when there are multiple relations involved in query and ORDER BY is base on columns of only part of the tables involved in query. Like this? select x.a, x.b, y.b from x, y where x.a = y.a order by x.a, x.b; Equivalent class consists of (x.a=y.a) and (x.b), so index pathkeys for i_y is (y.a.=x.a). As a result, no common primary pathkeys found. I think it will find common pathkey incase you have an unique index on x.a (please see the example below), but currently I am not clear why there is a need for a common index path key in such a case to eliminate useless keys in ORDER BY, why can't we do it based on individual table's path key. Example: create table t (a int not null, b int not null, c int, d text); create unique index i_t_pkey on t(a, b); insert into t (select a % 10, a / 10, a, 't' from generate_series(0, 10) a); analyze; create table t1 (a int not null, b int not null, c int, d text); create unique index i_t1_pkey_1 on t1(a); create unique index i_t1_pkey_2 on t1(a, b); insert into t1 (select a * 2, a / 10, a, 't' from generate_series(0, 10) a); explain (costs off, analyze off) select * from t,t1 where t.a=t1.a order by t1.a,t1.b,t1.c,t1.d; QUERY PLAN -- Merge Join Merge Cond: (t.a = t1.a) - Index Scan using i_t_pkey on t - Index Scan using i_t1_pkey_1 on t1 (4 rows) Here we can notice that there is no separate sort key in plan. Now drop the i_t1_pkey_1 and check the query plan again. drop index i_t1_pkey_1; explain (costs off, analyze off) select * from t,t1 where t.a=t1.a order by t1.a,t1.b,t1.c,t1.d; QUERY PLAN Sort Sort Key: t.a, t1.b, t1.c, t1.d - Merge Join Merge Cond: (t.a = t1.a) - Index Scan using i_t_pkey on t - Index Scan using i_t1_pkey_2 on t1 (6 rows) Can't above plan eliminate Sort Key even after dropping index (i_t1_pkey_1)? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Use unique index for longer pathkeys.
On Sat, Jul 26, 2014 at 11:53 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Fri, Jul 25, 2014 at 12:48 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: I think there is one more disadvantage in the way current patch is done which is that you need to collect index path keys for all relations irrespective of whether they will be of any use to eliminate useless pathkeys from query_pathkeys. One trivial case that comes to mind is when there are multiple relations involved in query and ORDER BY is base on columns of only part of the tables involved in query. Like this? select x.a, x.b, y.b from x, y where x.a = y.a order by x.a, x.b; Equivalent class consists of (x.a=y.a) and (x.b), so index pathkeys for i_y is (y.a.=x.a). As a result, no common primary pathkeys found. I think it will find common pathkey incase you have an unique index on x.a (please see the example below), but currently I am not clear why there is a need for a common index path key in such a case to eliminate useless keys in ORDER BY, why can't we do it based on individual table's path key. Example: create table t (a int not null, b int not null, c int, d text); create unique index i_t_pkey on t(a, b); insert into t (select a % 10, a / 10, a, 't' from generate_series(0, 10) a); analyze; create table t1 (a int not null, b int not null, c int, d text); create unique index i_t1_pkey_1 on t1(a); create unique index i_t1_pkey_2 on t1(a, b); insert into t1 (select a * 2, a / 10, a, 't' from generate_series(0, 10) a); explain (costs off, analyze off) select * from t,t1 where t.a=t1.a order by t1.a,t1.b,t1.c,t1.d; QUERY PLAN -- Merge Join Merge Cond: (t.a = t1.a) - Index Scan using i_t_pkey on t - Index Scan using i_t1_pkey_1 on t1 (4 rows) Here we can notice that there is no separate sort key in plan. Now drop the i_t1_pkey_1 and check the query plan again. drop index i_t1_pkey_1; explain (costs off, analyze off) select * from t,t1 where t.a=t1.a order by t1.a,t1.b,t1.c,t1.d; QUERY PLAN Sort Sort Key: t.a, t1.b, t1.c, t1.d - Merge Join Merge Cond: (t.a = t1.a) - Index Scan using i_t_pkey on t - Index Scan using i_t1_pkey_2 on t1 (6 rows) Can't above plan eliminate Sort Key even after dropping index (i_t1_pkey_1)? Here I have one additional thought which I would like to share with you to see if this patch can be done in a simpler way. In function standard_qp_callback(), can we directly trim the sortclause list based on index information in PlannerInfo. We have access to target list in this function to know exactly the relation/column information of sortclause. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Use unique index for longer pathkeys.
Hello. I've been cooled off and convinced that this patch has become quite useless by itself. It improves almost only UNIONs with ORDER BY on tables that have unioform primary keys, and needs my another patch to work. I'll try to reintegrate this patch into my 'another patch' as mentioned below and try next CF with it. I'll close this patch as 'Return with Feedback' by myself. Thank you for your thoughtful commnets. I learned a lot. = Thank you for the answers, Robert, Amit :) # By the way, this style of calling a person is quite common # among Japanese since the first-name basis implies very close # relationship or it frequently conveys offensive shade. I don't know if I have ever offended you or any other Japanese community member while interaction, but my intention was never so. The reason for using this style for calling you is during my initial 4 years of work, I have worked very closely with Japanese, so I have learned few things during that time and it was quite common to refer the way I used above, however I am not sure I have always used during communication, so if something I have used which is not common in your culture, please feel free to let me know, I will surely not do that again. I also don't mind at all and that's why I was anxious about it. Thank you for your answers. Sorry, my memory had been worn down. After some reconfirmation, this description found to be a bit (quite?) wrong. collect_common_primary_pathkeys needs root-eq_classes to be set up beforehand, not appendrels. Bacause build_index_pathkeys doesn't work before every EC member for all relation involved is already registered. standard_qp_callback registers EC members in the following path but only for the primary(?) tlist of the subquery, so EC members for the child relations are not registered at the time. .-make_pathekeys_sortclauses-make_pathkey_from_sortop -make_pathkey_from_sortinfo-get_eclass_for_sort_expr EC members for the child rels are registered in set_base_rel_sizes-set_rel_size-set_append_rel_size -add_child_rel_equivalences So trim_query_pathkeys (collect_common...) doesn't work before set_base_rel_sizes(). These EC members needed to be registered at least if root-query_pathkeys exists so this seems to be done in standard_qp_callback adding a separate inheritance tree walk. Do we really need that information to eliminate useless keys from query_pathkeys? * We have to make child entries in the EquivalenceClass data * structures as well. This is needed either if the parent * participates in some eclass joins (because we will want to consider * inner-indexscan joins on the individual children) or if the parent * has useful pathkeys (because we should try to build MergeAppend * paths that produce those sort orderings). Referring to above comment, I think we don't need it to eliminate useless keys based on primary key info from query_pathkeys, but I might be missing some case, could you please elaborate more to let me know the case/cases where this would be useful. As I said some time ago, It's not so useful for wide situation, especially for no transformation is done on the given query. If my memory is correct, there was more wider application before the EC fix for nested inheritance, but now this has rather narrow application:( Anyway, The most and almost only target of this patch is the situation that a DISTINCT is implicitly added for the query, that is UNION on tables with primary key, which is generated by my another patch which was proposed on last CF4 (or earlier). This patch is originally a part of it. https://commitfest.postgresql.org/action/patch_view?id=1279 Come to think of it, maybe I changed my mind. It's better to add a trimmed (minimum) DISTINCT ON clause directly for flattened simple UNION. Other examples are seemingly odd or unlikely as given queries. I think there is one more disadvantage in the way current patch is done which is that you need to collect index path keys for all relations irrespective of whether they will be of any use to eliminate useless pathkeys from query_pathkeys. One trivial case that comes to mind is when there are multiple relations involved in query and ORDER BY is base on columns of only part of the tables involved in query. Like this? select x.a, x.b, y.b from x, y where x.a = y.a order by x.a, x.b; Equivalent class consists of (x.a=y.a) and (x.b), so index pathkeys for i_y is (y.a.=x.a). As a result, no common primary pathkeys found. There seem to be no problem if I understand you correctly. # rel-has_elcass_joins seems not working but it is not the topic # here. Could you please explain me why the index information built in above path is not sufficient or is there any other case which I am missing? Is the description above enough to be the explaination for the place? Sorry for the inaccurate description. No issues, above
Re: [HACKERS] Use unique index for longer pathkeys.
On Tue, Jul 22, 2014 at 5:19 AM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: # By the way, this style of calling a person is quite common # among Japanese since the first-name basis implies very close # relationship or it frequently conveys offensive shade. But I'm # not sure what should I call others who're not Japases native. Typical usage on this mailing list is to refer to individuals by first name (e.g. Tom, Alvaro, Heikki) or by full name (e.g. Tom Lane, Alvaro Herrera, Heikki Linnakangas). The last name is typically included only when it might otherwise be confusing - for example, you will rarely see people use just Greg because there are several people by that name who are reasonably active, but Heikki's last name is almost never mentioned). That having been said, if you want to use the Japanese style, I don't think anyone here will mind. I am personally not totally familiar with it so please forgive me in turn if I should address you or anyone else in a manner that would be considered too familiar in another culture. I try not to do that, but I might make some mistakes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use unique index for longer pathkeys.
On Tue, Jul 22, 2014 at 2:49 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Sorry , previous version has bugs. It stamps over the stack and crashesX( The attached is the bug fixed version, with no substantial changes. On Tue, Jul 15, 2014 at 2:17 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Hi, the attached is the revised version. Thanks Horiguchi-San for the updated patch. # By the way, this style of calling a person is quite common # among Japanese since the first-name basis implies very close # relationship or it frequently conveys offensive shade. I don't know if I have ever offended you or any other Japanese community member while interaction, but my intention was never so. The reason for using this style for calling you is during my initial 4 years of work, I have worked very closely with Japanese, so I have learned few things during that time and it was quite common to refer the way I used above, however I am not sure I have always used during communication, so if something I have used which is not common in your culture, please feel free to let me know, I will surely not do that again. Today while looking into updated patch, I was wondering why can't we eliminate useless keys in query_pathkeys when we actually build the same in function standard_qp_callback(), basically somewhat similar to what we do in standard_qp_callback-make_pathkeys_for_sortclauses-pathkey_is_redundant(). I agree that standard_qp_callback is basically more preferable. We already have index information related to base_rels before building query pathkeys. I noticed that you mentioned the below in your original mail which indicates that information related to inheritance tables is built only after set_base_rel_sizes() These steps take place between set_base_rel_sizes() and set_base_rel_pathlists() in make_one_rel(). The reason for the position is that the step 2 above needs all inheritence tables to be extended in PlannerInfo and set_base_rel_sizes (currently) does that. Sorry, my memory had been worn down. After some reconfirmation, this description found to be a bit (quite?) wrong. collect_common_primary_pathkeys needs root-eq_classes to be set up beforehand, not appendrels. Bacause build_index_pathkeys doesn't work before every EC member for all relation involved is already registered. standard_qp_callback registers EC members in the following path but only for the primary(?) tlist of the subquery, so EC members for the child relations are not registered at the time. .-make_pathekeys_sortclauses-make_pathkey_from_sortop -make_pathkey_from_sortinfo-get_eclass_for_sort_expr EC members for the child rels are registered in set_base_rel_sizes-set_rel_size-set_append_rel_size -add_child_rel_equivalences So trim_query_pathkeys (collect_common...) doesn't work before set_base_rel_sizes(). These EC members needed to be registered at least if root-query_pathkeys exists so this seems to be done in standard_qp_callback adding a separate inheritance tree walk. Do we really need that information to eliminate useless keys from query_pathkeys? * We have to make child entries in the EquivalenceClass data * structures as well. This is needed either if the parent * participates in some eclass joins (because we will want to consider * inner-indexscan joins on the individual children) or if the parent * has useful pathkeys (because we should try to build MergeAppend * paths that produce those sort orderings). Referring to above comment, I think we don't need it to eliminate useless keys based on primary key info from query_pathkeys, but I might be missing some case, could you please elaborate more to let me know the case/cases where this would be useful. I think there is one more disadvantage in the way current patch is done which is that you need to collect index path keys for all relations irrespective of whether they will be of any use to eliminate useless pathkeys from query_pathkeys. One trivial case that comes to mind is when there are multiple relations involved in query and ORDER BY is base on columns of only part of the tables involved in query. # rel-has_elcass_joins seems not working but it is not the topic # here. Could you please explain me why the index information built in above path is not sufficient or is there any other case which I am missing? Is the description above enough to be the explaination for the place? Sorry for the inaccurate description. No issues, above description is sufficient to explain why you have written patch in its current form. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Use unique index for longer pathkeys.
Sorry , previous version has bugs. It stamps over the stack and crashesX( The attached is the bug fixed version, with no substantial changes. On Tue, Jul 15, 2014 at 2:17 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Hi, the attached is the revised version. Thanks Horiguchi-San for the updated patch. # By the way, this style of calling a person is quite common # among Japanese since the first-name basis implies very close # relationship or it frequently conveys offensive shade. But I'm # not sure what should I call others who're not Japases native. Anyway, Today while looking into updated patch, I was wondering why can't we eliminate useless keys in query_pathkeys when we actually build the same in function standard_qp_callback(), basically somewhat similar to what we do in standard_qp_callback-make_pathkeys_for_sortclauses-pathkey_is_redundant(). I agree that standard_qp_callback is basically more preferable. We already have index information related to base_rels before building query pathkeys. I noticed that you mentioned the below in your original mail which indicates that information related to inheritance tables is built only after set_base_rel_sizes() These steps take place between set_base_rel_sizes() and set_base_rel_pathlists() in make_one_rel(). The reason for the position is that the step 2 above needs all inheritence tables to be extended in PlannerInfo and set_base_rel_sizes (currently) does that. Sorry, my memory had been worn down. After some reconfirmation, this description found to be a bit (quite?) wrong. collect_common_primary_pathkeys needs root-eq_classes to be set up beforehand, not appendrels. Bacause build_index_pathkeys doesn't work before every EC member for all relation involved is already registered. standard_qp_callback registers EC members in the following path but only for the primary(?) tlist of the subquery, so EC members for the child relations are not registered at the time. .-make_pathekeys_sortclauses-make_pathkey_from_sortop -make_pathkey_from_sortinfo-get_eclass_for_sort_expr EC members for the child rels are registered in set_base_rel_sizes-set_rel_size-set_append_rel_size -add_child_rel_equivalences So trim_query_pathkeys (collect_common...) doesn't work before set_base_rel_sizes(). These EC members needed to be registered at least if root-query_pathkeys exists so this seems to be done in standard_qp_callback adding a separate inheritance tree walk. # rel-has_elcass_joins seems not working but it is not the topic # here. Could you please explain me why the index information built in above path is not sufficient or is there any other case which I am missing? Is the description above enough to be the explaination for the place? Sorry for the inaccurate description. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 9573a9b..546e112 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -1789,9 +1789,11 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node) /* indexprs is redundant since we print indextlist */ WRITE_NODE_FIELD(indpred); WRITE_NODE_FIELD(indextlist); + /* cached pathkeys are omitted as indexkeys */ WRITE_BOOL_FIELD(predOK); WRITE_BOOL_FIELD(unique); WRITE_BOOL_FIELD(immediate); + WRITE_BOOL_FIELD(allnotnull); WRITE_BOOL_FIELD(hypothetical); /* we don't bother with fields copied from the pg_am entry */ } diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index c81efe9..c12d0e7 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -58,6 +58,7 @@ int geqo_threshold; join_search_hook_type join_search_hook = NULL; +static List *collect_common_primary_pathkeys(PlannerInfo *root); static void set_base_rel_sizes(PlannerInfo *root); static void set_base_rel_pathlists(PlannerInfo *root); static void set_rel_size(PlannerInfo *root, RelOptInfo *rel, @@ -66,6 +67,7 @@ static void set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte); static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte); +static void trim_query_pathkeys(PlannerInfo * root); static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte); static void set_foreign_size(PlannerInfo *root, RelOptInfo *rel, @@ -112,6 +114,203 @@ static void recurse_push_qual(Node *setOp, Query *topquery, RangeTblEntry *rte, Index rti, Node *qual); static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel); +/* + * collect_common_primary_pathkeys: Collects common unique and non-null index + * pathkeys from all base relations in current root. + */ +static List * +collect_common_primary_pathkeys(PlannerInfo *root) +{ + List *result = NIL; + Index rti; + int nindex = 0; + List
Re: [HACKERS] Use unique index for longer pathkeys.
On Tue, Jul 15, 2014 at 2:17 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Hi, the attached is the revised version. Thanks Horiguchi-San for the updated patch. Today while looking into updated patch, I was wondering why can't we eliminate useless keys in query_pathkeys when we actually build the same in function standard_qp_callback(), basically somewhat similar to what we do in standard_qp_callback-make_pathkeys_for_sortclauses-pathkey_is_redundant(). We already have index information related to base_rels before building query pathkeys. I noticed that you mentioned the below in your original mail which indicates that information related to inheritance tables is built only after set_base_rel_sizes() These steps take place between set_base_rel_sizes() and set_base_rel_pathlists() in make_one_rel(). The reason for the position is that the step 2 above needs all inheritence tables to be extended in PlannerInfo and set_base_rel_sizes (currently) does that. Now I am not sure why such information is not build during build_simple_rel() in below code path: build_simple_rel() { .. if (rte-inh) { ListCell *l; foreach(l, root-append_rel_list) { AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l); /* append_rel_list contains all append rels; ignore others */ if (appinfo-parent_relid != relid) continue; (void) build_simple_rel(root, appinfo-child_relid, RELOPT_OTHER_MEMBER_REL); } } } Could you please explain me why the index information built in above path is not sufficient or is there any other case which I am missing? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Use unique index for longer pathkeys.
Hi, the attached is the revised version. - defined allnotnull instead of full_ordered in RelOptInfo and changed to use it. - compare_pathkeys_ignoring_strategy() is now integrated into compare_pathkeys(). - index pathkeys caching in RelOptInfo added. (build_index_pathkeys() does) - Refactoring collect_common_primary_pathkeys mainly from a reason of eliminating code duplicates and readability. - trim_query_pathkeys now tries to trim window_pathkeys. The old comment came from my misunderstanding. Window pathkeys are not always RowExpr and RowExpr pathkeys are feedable from not only PARTITION BY clause but also ORDER BY, DISTINCT or GROUP BY clauses. The comment for the function is modified. - Some cosmetic fixes including spacing after commas. - The shortcut judgement is *not* included. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 9573a9b..546e112 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -1789,9 +1789,11 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node) /* indexprs is redundant since we print indextlist */ WRITE_NODE_FIELD(indpred); WRITE_NODE_FIELD(indextlist); + /* cached pathkeys are omitted as indexkeys */ WRITE_BOOL_FIELD(predOK); WRITE_BOOL_FIELD(unique); WRITE_BOOL_FIELD(immediate); + WRITE_BOOL_FIELD(allnotnull); WRITE_BOOL_FIELD(hypothetical); /* we don't bother with fields copied from the pg_am entry */ } diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index c81efe9..751eb6a 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -58,6 +58,7 @@ int geqo_threshold; join_search_hook_type join_search_hook = NULL; +static List *collect_common_primary_pathkeys(PlannerInfo *root); static void set_base_rel_sizes(PlannerInfo *root); static void set_base_rel_pathlists(PlannerInfo *root); static void set_rel_size(PlannerInfo *root, RelOptInfo *rel, @@ -66,6 +67,7 @@ static void set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte); static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte); +static void trim_query_pathkeys(PlannerInfo * root); static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte); static void set_foreign_size(PlannerInfo *root, RelOptInfo *rel, @@ -112,6 +114,203 @@ static void recurse_push_qual(Node *setOp, Query *topquery, RangeTblEntry *rte, Index rti, Node *qual); static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel); +/* + * collect_common_primary_pathkeys: Collects common unique and non-null index + * pathkeys from all base relations in current root. + */ +static List * +collect_common_primary_pathkeys(PlannerInfo *root) +{ + List *result = NIL; + Index rti; + int nindex = 0; + List **pathkeys_array0 = NULL; + List **pathkeys_array = NULL; + List **pathkeys_array2 = NULL; + bool do_conjunction = (root-simple_rel_array_size 2); + + for (rti = 1 ; rti root-simple_rel_array_size ; rti++) + { + RelOptInfo *rel = root-simple_rel_array[rti]; + int prev_nindex = 0; + ListCell *lc; + + if (rel == NULL) + continue; + + Assert (rel-relid == rti); + + /* Scan all base relations except parent relations. */ + if (root-simple_rte_array[rti]-inh || + (rel-reloptkind != RELOPT_BASEREL + rel-reloptkind != RELOPT_OTHER_MEMBER_REL)) + continue; + + if (do_conjunction) + { + if (pathkeys_array2) +memset(pathkeys_array2, 0, sizeof(List*) * nindex); + else + { +/* Rig for AND (conjunction) operation. */ +nindex = 0; + +foreach (lc, rel-indexlist) +{ + IndexOptInfo *index = (IndexOptInfo *) lfirst(lc); + + if (index-allnotnull + index-unique index-immediate + index-indpred == NIL) + nindex++; +} + +if (nindex == 0) + return NULL; /* No common primary pathkeys exists.*/ + +pathkeys_array0 = palloc0(sizeof(List*) * nindex * 2); +pathkeys_array = pathkeys_array0; +pathkeys_array2 = pathkeys_array0 + nindex; + } + } + + prev_nindex = nindex; + nindex = 0; + + foreach (lc, rel-indexlist) + { + IndexOptInfo *index = (IndexOptInfo *) lfirst(lc); + List *idx_pathkeys; + + /* Only indexes that can unique sort are needed. */ + if (!index-allnotnull || +!(index-unique index-immediate) || +index-indpred != NIL) +continue; + + idx_pathkeys = build_index_pathkeys(root, index, +ForwardScanDirection); + if (!idx_pathkeys) +continue; + + if (!do_conjunction) + { +result = lappend(result, idx_pathkeys); +nindex++; + } + else + { +/* + * AND operation: drop pathkeys which don't see identical + * one in this relation. + */ +int i; + +for (i = 0 ; i prev_nindex ; i++) +{ + if
Re: [HACKERS] Use unique index for longer pathkeys.
Thank you for reviewing, the revised patch will come later. At Mon, 14 Jul 2014 11:01:52 +0530, Amit Kapila amit.kapil...@gmail.com wrote in caa4ek1+6b6wjwf51ozmrl+mkfh8xup9j-pehqvor8se7swy...@mail.gmail.com On Fri, Jun 13, 2014 at 1:11 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Hello, This is the continuation from the last CF. This patch intends to make PG to use index for longer pathkeys than index columns when, - The index is a unique index. - All index columns are NOT NULL. - The index column list is a subset of query_pathkeys. This patch consists of mainly three parts. (snip) I have spent some time on this patch and would like to share my findings as below with you. 1. It seems to me that the new flag (IndexOptInfo.full_ordered) introduced in this patch is not getting set properly. Please refer below test: Yeah, probably you omitted the second condition above. create table t (a int, b int, c int, d text); The following definition instead will make this work. NULLs cannot be unique. | create table t (a int not null, b int not null, c int, d text); create unique index i_t_pkey on t(a, b); insert into t (select a % 10, a / 10, a, 't' from generate_series(0, 10) a); analyze; explain (costs off, analyze on) select distinct * from t; ... 2. typedef struct IndexOptInfo bool predOK; /* true if predicate matches query */ bool unique; /* true if a unique index */ bool immediate; /* is uniqueness enforced immediately? */ + bool full_ordered; /* don't key columns duplicate? */ It seems you have forgotten to update out function _outIndexOptInfo(). Mmm, it's surely what I didn't intended to make:( , but the member actually works in collect_common_primary_pathkeys. The correct (desirable) code should use (allnotnull unique) instead of (full_ordered). I'll fix this in the comming version later but they are equivelant in functionality. It's not a mistake of forgetting update out (so broken), but branching from wrong point (so it works contrary to the expectation):) 3. + root-distinct_pathkeys = + shorten_pathkeys_following(root, root-distinct_pathkeys,pk_guide, + true); + + root-query_pathkeys = + shorten_pathkeys_following(root,root-query_pathkeys,pk_guide, + true); Add a space after each parameter in function call. Thank you for pointing out. 4. +PathKeysComparison +compare_pathkeys_ignoring_strategy(List *keys1, List *keys2) a. This function return 4 different type of values (PATHKEYS_EQUAL, PATHKEYS_DIFFERENT, PATHKEYS_BETTER1, PATHKEYS_BETTER2), however the caller just checks for PATHKEYS_EQUAL, isn't it better to just return either PATHKEYS_EQUAL or PATHKEYS_DIFFERENT. Hmm.. I agree that it is practically removable, but I think it is necessary from the view of similarity to the function with the similar name. Perhaps I want another name which don't suggest the similarity to compare_pathekeys(), say, bool pathkeys_are_equal_ignoring_strategy() to change the rerurn values set. b. Another idea could be that instead of writting separate function, pass an additional parameter to compare_pathkeys() to distinguish for ignore strategy case. It sounds reasonable. According to my faint memory, the reason why the function is separated from the original one is, perhaps, simplly a editorial reason. c. In any case, I think it is good to add comments on top of newly introduced function (compare_pathkeys_ignoring_strategy) or extend the comments of compare_pathkeys() if you want to add a new parameter to it. Ouch, thank you for pointing out. I'll add comment if it remains in the next patch. Finally, the new patch has grown up to twice in size.. Although it seems a bit large, I think that side-effects are clearly avoided. I am bit worried about the extra cycles added by this patch as compare to your previous patch; example During trimming of paths, it will build index paths (build_index_pathkeys()) which it will anyway do again during creation of index paths: create_index_paths()-get_index_paths()-build_index_paths()-build_index_pathkeys() I felt the same thing. On stupid measure to reduce cycles would be caching created pathkeys in IndexOptInfo. I'll try in the next patch. I am not sure if this has direct impact, however I am suspecting trimming logic can add quite a few instructions even for cases when it is not beneficial. At this moment, I also don't have better idea, so lets proceed with review of this patch unless there is any other better way to achieve it. I suppose there's some shortcut which can exclude the cases where obviously there's no use of pathkeys trimming, for example, using only pathkey lengths. I'll seek for it. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:
Re: [HACKERS] Use unique index for longer pathkeys.
On Mon, Jul 14, 2014 at 4:02 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: At Mon, 14 Jul 2014 11:01:52 +0530, Amit Kapila amit.kapil...@gmail.com wrote in caa4ek1+6b6wjwf51ozmrl+mkfh8xup9j-pehqvor8se7swy...@mail.gmail.com On Fri, Jun 13, 2014 at 1:11 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: I am bit worried about the extra cycles added by this patch as compare to your previous patch; example During trimming of paths, it will build index paths (build_index_pathkeys()) which it will anyway do again during creation of index paths: create_index_paths()-get_index_paths()-build_index_paths()-build_index_pathkeys() I felt the same thing. On stupid measure to reduce cycles would be caching created pathkeys in IndexOptInfo. I'll try in the next patch. I suggest to wait for overall review of patch before trying this out, we can try to see what is the best way to avoid this if required, once other part of patch is reviewed and proved to be problem free. Other than this I agree with the other points you mentioned in mail and may be you can just handle those in next version of your patch. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Use unique index for longer pathkeys.
On Fri, Jun 13, 2014 at 1:11 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Hello, This is the continuation from the last CF. This patch intends to make PG to use index for longer pathkeys than index columns when, - The index is a unique index. - All index columns are NOT NULL. - The index column list is a subset of query_pathkeys. This patch consists of mainly three parts. 1. Mark index as 'Uniquely orders the table' if the conditions are satisfied. This is the same from the previous patch. 2. Collect the common primary pathkeys, which is pathkeys that can perfectly sort all involved RTE's. (collect_common_primary_pathkeys()) 3. Trim the query pathkeys using the common primary pathkeys. (trim_query_pathkeys()) I have spent some time on this patch and would like to share my findings as below with you. 1. It seems to me that the new flag (IndexOptInfo.full_ordered) introduced in this patch is not getting set properly. Please refer below test: create table t (a int, b int, c int, d text); create unique index i_t_pkey on t(a, b); insert into t (select a % 10, a / 10, a, 't' from generate_series(0, 10) a); analyze; explain (costs off, analyze on) select distinct * from t; Unptached - postgres=# explain (costs off, analyze on) select distinct * from t; QUERY PLAN --- Unique (actual time=331.624..597.278 rows=11 loops=1) - Sort (actual time=330.889..430.449 rows=11 loops=1) Sort Key: a, b, c, d Sort Method: external merge Disk: 2344kB - Seq Scan on t (actual time=0.013..74.202 rows=11 loops=1) Execution time: 665.332 ms (6 rows) Patch (pathkey_and_uniqueindx_typ2_v1) - postgres=# explain (costs off, analyze on) select distinct * from t; QUERY PLAN --- Unique (actual time=336.033..601.144 rows=11 loops=1) - Sort (actual time=336.027..436.621 rows=11 loops=1) Sort Key: a, b, c, d Sort Method: external merge Disk: 2344kB - Seq Scan on t (actual time=0.014..78.826 rows=11 loops=1) Execution time: 667.387 ms (6 rows) Shouldn't above query use index scan after patch? 2. typedef struct IndexOptInfo bool predOK; /* true if predicate matches query */ bool unique; /* true if a unique index */ bool immediate; /* is uniqueness enforced immediately? */ + bool full_ordered; /* don't key columns duplicate? */ It seems you have forgotten to update out function _outIndexOptInfo(). 3. + root-distinct_pathkeys = + shorten_pathkeys_following(root, root-distinct_pathkeys,pk_guide, + true); + + root-query_pathkeys = + shorten_pathkeys_following(root,root-query_pathkeys,pk_guide, + true); Add a space after each parameter in function call. 4. +PathKeysComparison +compare_pathkeys_ignoring_strategy(List *keys1, List *keys2) a. This function return 4 different type of values (PATHKEYS_EQUAL, PATHKEYS_DIFFERENT, PATHKEYS_BETTER1, PATHKEYS_BETTER2), however the caller just checks for PATHKEYS_EQUAL, isn't it better to just return either PATHKEYS_EQUAL or PATHKEYS_DIFFERENT. b. Another idea could be that instead of writting separate function, pass an additional parameter to compare_pathkeys() to distinguish for ignore strategy case. c. In any case, I think it is good to add comments on top of newly introduced function (compare_pathkeys_ignoring_strategy) or extend the comments of compare_pathkeys() if you want to add a new parameter to it. Finally, the new patch has grown up to twice in size.. Although it seems a bit large, I think that side-effects are clearly avoided. I am bit worried about the extra cycles added by this patch as compare to your previous patch; example During trimming of paths, it will build index paths (build_index_pathkeys()) which it will anyway do again during creation of index paths: create_index_paths()-get_index_paths()-build_index_paths()-build_index_pathkeys() I am not sure if this has direct impact, however I am suspecting trimming logic can add quite a few instructions even for cases when it is not beneficial. At this moment, I also don't have better idea, so lets proceed with review of this patch unless there is any other better way to achieve it. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Use unique index for longer pathkeys.
Thank you for taking a look on this patch. I took a quick look at this patch, more or less because nobody else did. Duing last CF, I proposed to match long pathkeys against index columns during creating index paths. This worked fine but also it is difficult to make sure that all side-effects are eliminated. Finally Tom Lane suggested to truncate pathkeys while generation of the pathkeys itself. So this patch comes. I found your older patch quite straightforward to understand, but the new one much more difficult to follow (but that's not saying much, I am not very familiar with the planner code in general). I think it's quite natural to think so. Do you have any references to the discussion about the side-effects that needed to be eliminated with the earlier patch? The biggest side-effects (or simplly defect) found so far is discussed here, http://www.postgresql.org/message-id/01bd01cf0b4e$9b960ad0$d2c22070$@ets...@lab.ntt.co.jp This was caused by omitting the membership of the Var under inspection while cheking if the pathkeys is extensible. http://www.postgresql.org/message-id/20140107.145900.196068363.horiguchi.kyot...@lab.ntt.co.jp After all, Tom said that the right way to do this is not such whacking-a-mole thing but loosen pathkeys previously so that the planner naturally do what the previous patch did without any special treat. http://www.postgresql.org/message-id/5212.1397599...@sss.pgh.pa.us So the new patch comes. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use unique index for longer pathkeys.
Hi. I took a quick look at this patch, more or less because nobody else did. Duing last CF, I proposed to match long pathkeys against index columns during creating index paths. This worked fine but also it is difficult to make sure that all side-effects are eliminated. Finally Tom Lane suggested to truncate pathkeys while generation of the pathkeys itself. So this patch comes. I found your older patch quite straightforward to understand, but the new one much more difficult to follow (but that's not saying much, I am not very familiar with the planner code in general). Do you have any references to the discussion about the side-effects that needed to be eliminated with the earlier patch? -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Use unique index for longer pathkeys.
Hello, This is the continuation from the last CF. This patch intends to make PG to use index for longer pathkeys than index columns when, - The index is a unique index. - All index columns are NOT NULL. - The index column list is a subset of query_pathkeys. The use cases for this patch are, - (Useless?) DISTINCT on the table with PK constraints. This alone is useless but this situation could take place when processing UNION (without ALL), which silently adds DISTINCT * to element queries. Especially effective with ORDER BY and LIMIT clauses. But this needs another patch for UNION optimization. This is the main motive for this patch. - Unfortunately, other use cases are somewhat boresome, mainly effective for essentially unnecessary ORDER BY or DISTINCT. To streatch a point, naively or mechanically composed queires could be saved by this patch.. Duing last CF, I proposed to match long pathkeys against index columns during creating index paths. This worked fine but also it is difficult to make sure that all side-effects are eliminated. Finally Tom Lane suggested to truncate pathkeys while generation of the pathkeys itself. So this patch comes. This patch consists of mainly three parts. 1. Mark index as 'Uniquely orders the table' if the conditions are satisfied. This is the same from the previous patch. 2. Collect the common primary pathkeys, which is pathkeys that can perfectly sort all involved RTE's. (collect_common_primary_pathkeys()) 3. Trim the query pathkeys using the common primary pathkeys. (trim_query_pathkeys()) These steps take place between set_base_rel_sizes() and set_base_rel_pathlists() in make_one_rel(). The reason for the position is that the step 2 above needs all inheritence tables to be extended in PlannerInfo and set_base_rel_sizes (currently) does that. Then the trimmed pathkeys are used in set_base_rel_pathlists so trim_query_pathkeys should be before it. (This needs to be written as a comment?) Finally, the new patch has grown up to twice in size.. Although it seems a bit large, I think that side-effects are clearly avoided. This message is attatched by two patches. 1. pathkey_and_uniqueindx_typ2_v1.patch : This new patch. 2. pathkey_and_uniqueindx_v10_20130411.patch : The last version of the previous approach. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index fdaa964..47b8056 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -50,6 +50,7 @@ int geqo_threshold; join_search_hook_type join_search_hook = NULL; +static List *collect_common_primary_pathkeys(PlannerInfo *root); static void set_base_rel_sizes(PlannerInfo *root); static void set_base_rel_pathlists(PlannerInfo *root); static void set_rel_size(PlannerInfo *root, RelOptInfo *rel, @@ -58,6 +59,7 @@ static void set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte); static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte); +static void trim_query_pathkeys(PlannerInfo * root); static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte); static void set_foreign_size(PlannerInfo *root, RelOptInfo *rel, @@ -102,6 +104,214 @@ static void recurse_push_qual(Node *setOp, Query *topquery, RangeTblEntry *rte, Index rti, Node *qual); static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel); +/* + * collect_common_primary_pathkeys: Collects common unique and non-null index + * pathkeys from all base relations in current root. + */ + +static List * +collect_common_primary_pathkeys(PlannerInfo *root) +{ + List *result = NIL; + Index rti; + int nindex = -1; + List **pathkeys_array0 = NULL; + List **pathkeys_array = NULL; + List **pathkeys_array2 = NULL; + bool multipass = (root-simple_rel_array_size 2); + + for (rti = 1 ; rti root-simple_rel_array_size nindex ; rti++) + { + RelOptInfo *rel = root-simple_rel_array[rti]; + List **pktmp; + + if (rel == NULL) + continue; + + Assert (rel-relid == rti); + + /* + * Scan all base relations except parent relations. + */ + if (root-simple_rte_array[rti]-inh || + (rel-reloptkind != RELOPT_BASEREL + rel-reloptkind != RELOPT_OTHER_MEMBER_REL)) + continue; + + /* pathkeys_array is NULL for the first iteration */ + if (pathkeys_array == NULL) + { + ListCell *lc; + int i = 0; + + /* Skip rigging for AND operation if not needed. */ + if (multipass) + { +nindex = 0; + +foreach (lc, rel-indexlist) +{ + IndexOptInfo *index = (IndexOptInfo *) lfirst(lc); + + if (index-full_ordered index-indpred == NIL) + nindex++; +} + +if (nindex == 0) + return NULL; + +pathkeys_array0 = palloc0(sizeof(List*) * nindex * 2); +pathkeys_array = pathkeys_array0; +