Re: [HACKERS] Use unique index for longer pathkeys.

2014-08-07 Thread Kyotaro HORIGUCHI
Hello, 

  Although, yes, you're right, irrespective of the common
  something, and even if the dropped index was i_t1_pkey_2, which
  is on t1(a, b), the result tuples are sorted as expected only by
  the pathkey (t.a = t1.a, t1.b). It is because both t and t1 are
  still unique so the joined tuples are also unique, and the unique
  key of the result tuples is the merged pkey (t.a, t1.a, t1.b),
  which can be transformed to (t.a, t1.b) using the equiality
  between t.a and t1.a. And considering the inner relation (t1) is
  already sorted by (a, b), the sort itself could be elimited from
  the plan.
 
 I think if we could eliminate t1.c,t1.d considering indexes on
 individual relations (may be by using technique I have mentioned
 upthread or some other way), then the current code itself will
 eliminate the ORDER BY clause.  I have tried that by using a query
 without having t1.c,t1.d in ORDER BY clause
 
 explain (costs off, analyze off) select * from t,t1 where t.a=t1.a order by
 t1.a,t1.b;
 QUERY PLAN
 --
  Merge Join
Merge Cond: (t1.a = t.a)
-  Index Scan using i_t1_pkey_2 on t1
-  Index Scan using i_t_pkey on t
 (4 rows)

Ya, the elimination seems to me so fascinate:)

 Okay, I think you want to handle the elimination of ORDER BY clauses
 at a much broader level which might handle most of simple cases as
 well.  However I think eliminating clauses as mentioned above is itself
 a good optimization for many cases and more so if that can be done in
 a much simpler way.

Yes, if can be done in much simpler way..  I guess that it
could be looking from opposite side, that is, equivalence
classes, anyway, I'll try it.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use unique index for longer pathkeys.

2014-08-04 Thread Kyotaro HORIGUCHI
Hello,

   Now drop the i_t1_pkey_1 and check the query plan again.
  
   drop index i_t1_pkey_1;
   explain (costs off, analyze off) select * from t,t1 where t.a=t1.a
 order by
   t1.a,t1.b,t1.c,t1.d;
  QUERY PLAN
   
Sort
  Sort Key: t.a, t1.b, t1.c, t1.d
  -  Merge Join
Merge Cond: (t.a = t1.a)
-  Index Scan using i_t_pkey on t
-  Index Scan using i_t1_pkey_2 on t1
   (6 rows)
  
   Can't above plan eliminate Sort Key even after dropping index
   (i_t1_pkey_1)?
 
  My patch doesn't so since there no longer a 'common primary
  pathkeys' in this query. Perhaps the query doesn't allow the sort
  eliminated. Since a is no more a pkey, t1 can have dulicate rows
  for the same a, so the joined relation also may have duplicte
  values in the column a.

Sorry, I may misunderstood you. The dropped index is t1(a) not
t1(a,b) so the discussion abobe is on the wrong assumption.

 I think irrespective of that we can trim t1.c  t1.d as we have
 primary key (unique and non column) for t1.a, t1.b.  Basically
 even if we don't use the primary key index, we can still trim
 the keys in such a case.  IIUC, then Tom has mentioned the
 same in his message related to this issue.

Although, yes, you're right, irrespective of the common
something, and even if the dropped index was i_t1_pkey_2, which
is on t1(a, b), the result tuples are sorted as expected only by
the pathkey (t.a = t1.a, t1.b). It is because both t and t1 are
still unique so the joined tuples are also unique, and the unique
key of the result tuples is the merged pkey (t.a, t1.a, t1.b),
which can be transformed to (t.a, t1.b) using the equiality
between t.a and t1.a. And considering the inner relation (t1) is
already sorted by (a, b), the sort itself could be elimited from
the plan.

The above could be described as below,

  - Join between relations with pkey yiels relation with the pkey
containing the columns of both pkeys, this is created by
simply concatenate them.

  - The concatenated pkey can be transformed using equality
operators, in other words, consulting equivelance classes.

  - If the projected joined relation(?) contains one of the
transformed pkeys above, it is a pkey for the result.

And in regards to MergeJoin and NestLoop, the result pathkeys can
be composed by concatenating child pathkeys in outer-inner
order. If the all ORDER BY pathkeys columns appear in the
concatenated pathkeys in the same order , the sort itself can be
eliminated.


 I am referring to below text:
 
 If we have ORDER BY a, b, c and (a,b) is the
 primary key, then including c in the ORDER BY list is semantically
 redundant, *whether or not we use an indexscan on the pkey index at all*.
 
 http://www.postgresql.org/message-id/5212.1397599...@sss.pgh.pa.us

Yes, my point at that time was mainly union (all) and partitioned
tables so the optimization for joins was the next step for
me. Although, my previous implement of
collect_common_primary_pathkeys seems to have been excessively
heavyweighted and it could be integrated with join optimization.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use unique index for longer pathkeys.

2014-08-04 Thread Amit Kapila
On Mon, Aug 4, 2014 at 1:22 PM, Kyotaro HORIGUCHI 
horiguchi.kyot...@lab.ntt.co.jp wrote:

 Hello,

  I think irrespective of that we can trim t1.c  t1.d as we have
  primary key (unique and non column) for t1.a, t1.b.  Basically
  even if we don't use the primary key index, we can still trim
  the keys in such a case.  IIUC, then Tom has mentioned the
  same in his message related to this issue.

 Although, yes, you're right, irrespective of the common
 something, and even if the dropped index was i_t1_pkey_2, which
 is on t1(a, b), the result tuples are sorted as expected only by
 the pathkey (t.a = t1.a, t1.b). It is because both t and t1 are
 still unique so the joined tuples are also unique, and the unique
 key of the result tuples is the merged pkey (t.a, t1.a, t1.b),
 which can be transformed to (t.a, t1.b) using the equiality
 between t.a and t1.a. And considering the inner relation (t1) is
 already sorted by (a, b), the sort itself could be elimited from
 the plan.

I think if we could eliminate t1.c,t1.d considering indexes on
individual relations (may be by using technique I have mentioned
upthread or some other way), then the current code itself will
eliminate the ORDER BY clause.  I have tried that by using a query
without having t1.c,t1.d in ORDER BY clause

explain (costs off, analyze off) select * from t,t1 where t.a=t1.a order by
t1.a,t1.b;
QUERY PLAN
--
 Merge Join
   Merge Cond: (t1.a = t.a)
   -  Index Scan using i_t1_pkey_2 on t1
   -  Index Scan using i_t_pkey on t
(4 rows)



  I am referring to below text:
 
  If we have ORDER BY a, b, c and (a,b) is the
  primary key, then including c in the ORDER BY list is semantically
  redundant, *whether or not we use an indexscan on the pkey index at
all*.
 
  http://www.postgresql.org/message-id/5212.1397599...@sss.pgh.pa.us

 Yes, my point at that time was mainly union (all) and partitioned
 tables so the optimization for joins was the next step for
 me.

Okay, I think you want to handle the elimination of ORDER BY clauses
at a much broader level which might handle most of simple cases as
well.  However I think eliminating clauses as mentioned above is itself
a good optimization for many cases and more so if that can be done in
a much simpler way.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Use unique index for longer pathkeys.

2014-07-29 Thread Amit Kapila
On Mon, Jul 28, 2014 at 3:17 PM, Kyotaro HORIGUCHI 
horiguchi.kyot...@lab.ntt.co.jp wrote:

  Now drop the i_t1_pkey_1 and check the query plan again.
 
  drop index i_t1_pkey_1;
  explain (costs off, analyze off) select * from t,t1 where t.a=t1.a
order by
  t1.a,t1.b,t1.c,t1.d;
 QUERY PLAN
  
   Sort
 Sort Key: t.a, t1.b, t1.c, t1.d
 -  Merge Join
   Merge Cond: (t.a = t1.a)
   -  Index Scan using i_t_pkey on t
   -  Index Scan using i_t1_pkey_2 on t1
  (6 rows)
 
  Can't above plan eliminate Sort Key even after dropping index
  (i_t1_pkey_1)?

 My patch doesn't so since there no longer a 'common primary
 pathkeys' in this query. Perhaps the query doesn't allow the sort
 eliminated. Since a is no more a pkey, t1 can have dulicate rows
 for the same a, so the joined relation also may have duplicte
 values in the column a.

I think irrespective of that we can trim t1.c  t1.d as we have
primary key (unique and non column) for t1.a, t1.b.  Basically
even if we don't use the primary key index, we can still trim
the keys in such a case.  IIUC, then Tom has mentioned the
same in his message related to this issue.

I am referring to below text:

If we have ORDER BY a, b, c and (a,b) is the
primary key, then including c in the ORDER BY list is semantically
redundant, *whether or not we use an indexscan on the pkey index at all*.

http://www.postgresql.org/message-id/5212.1397599...@sss.pgh.pa.us


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Use unique index for longer pathkeys.

2014-07-28 Thread Kyotaro HORIGUCHI
Hello,

   I think there is one more disadvantage in the way current patch is
   done which is that you need to collect index path keys for all relations
   irrespective of whether they will be of any use to eliminate useless
   pathkeys from query_pathkeys.  One trivial case that comes to mind is
   when there are multiple relations involved in query and ORDER BY is
   base on columns of only part of the tables involved in query.
 
  Like this?
 
  select x.a, x.b, y.b from x, y where x.a = y.a order by x.a, x.b;
 
  Equivalent class consists of (x.a=y.a) and (x.b), so index
  pathkeys for i_y is (y.a.=x.a). As a result, no common primary
  pathkeys found.
 
 I think it will find common pathkey incase you have an unique index
 on x.a (please see the example below), but currently I am not clear
 why there is a need for a common index path key in such a case to
 eliminate useless keys in ORDER BY, why can't we do it based
 on individual table's path key.
 
 Example:
 
 create table t (a int not null, b int not null, c int, d text);
 create unique index i_t_pkey on t(a, b);
 insert into t (select a % 10, a / 10, a, 't' from generate_series(0,
 10) a);
 analyze;
 
 create table t1 (a int not null, b int not null, c int, d text);
 create unique index i_t1_pkey_1 on t1(a);
 create unique index i_t1_pkey_2 on t1(a, b);
 insert into t1 (select a * 2, a / 10, a, 't' from generate_series(0,
 10) a);
 explain (costs off, analyze off) select * from t,t1 where t.a=t1.a order by
 t1.a,t1.b,t1.c,t1.d;
 
 QUERY PLAN
 --
  Merge Join
Merge Cond: (t.a = t1.a)
-  Index Scan using i_t_pkey on t
-  Index Scan using i_t1_pkey_1 on t1
 (4 rows)
 
 Here we can notice that there is no separate sort key in plan.

Sure, 

 Now drop the i_t1_pkey_1 and check the query plan again.
 
 drop index i_t1_pkey_1;
 explain (costs off, analyze off) select * from t,t1 where t.a=t1.a order by
 t1.a,t1.b,t1.c,t1.d;
QUERY PLAN
 
  Sort
Sort Key: t.a, t1.b, t1.c, t1.d
-  Merge Join
  Merge Cond: (t.a = t1.a)
  -  Index Scan using i_t_pkey on t
  -  Index Scan using i_t1_pkey_2 on t1
 (6 rows)
 
 Can't above plan eliminate Sort Key even after dropping index
 (i_t1_pkey_1)?

My patch doesn't so since there no longer a 'common primary
pathkeys' in this query. Perhaps the query doesn't allow the sort
eliminated. Since a is no more a pkey, t1 can have dulicate rows
for the same a, so the joined relation also may have duplicte
values in the column a. Therefore the joined relation may be half
sorted only by the column a so the sort pathkeys cannot be
trimmed.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use unique index for longer pathkeys.

2014-07-26 Thread Amit Kapila
On Fri, Jul 25, 2014 at 12:48 PM, Kyotaro HORIGUCHI 
horiguchi.kyot...@lab.ntt.co.jp wrote:
  I think there is one more disadvantage in the way current patch is
  done which is that you need to collect index path keys for all relations
  irrespective of whether they will be of any use to eliminate useless
  pathkeys from query_pathkeys.  One trivial case that comes to mind is
  when there are multiple relations involved in query and ORDER BY is
  base on columns of only part of the tables involved in query.

 Like this?

 select x.a, x.b, y.b from x, y where x.a = y.a order by x.a, x.b;

 Equivalent class consists of (x.a=y.a) and (x.b), so index
 pathkeys for i_y is (y.a.=x.a). As a result, no common primary
 pathkeys found.

I think it will find common pathkey incase you have an unique index
on x.a (please see the example below), but currently I am not clear
why there is a need for a common index path key in such a case to
eliminate useless keys in ORDER BY, why can't we do it based
on individual table's path key.

Example:

create table t (a int not null, b int not null, c int, d text);
create unique index i_t_pkey on t(a, b);
insert into t (select a % 10, a / 10, a, 't' from generate_series(0,
10) a);
analyze;

create table t1 (a int not null, b int not null, c int, d text);
create unique index i_t1_pkey_1 on t1(a);
create unique index i_t1_pkey_2 on t1(a, b);
insert into t1 (select a * 2, a / 10, a, 't' from generate_series(0,
10) a);
explain (costs off, analyze off) select * from t,t1 where t.a=t1.a order by
t1.a,t1.b,t1.c,t1.d;

QUERY PLAN
--
 Merge Join
   Merge Cond: (t.a = t1.a)
   -  Index Scan using i_t_pkey on t
   -  Index Scan using i_t1_pkey_1 on t1
(4 rows)

Here we can notice that there is no separate sort key in plan.

Now drop the i_t1_pkey_1 and check the query plan again.

drop index i_t1_pkey_1;
explain (costs off, analyze off) select * from t,t1 where t.a=t1.a order by
t1.a,t1.b,t1.c,t1.d;
   QUERY PLAN

 Sort
   Sort Key: t.a, t1.b, t1.c, t1.d
   -  Merge Join
 Merge Cond: (t.a = t1.a)
 -  Index Scan using i_t_pkey on t
 -  Index Scan using i_t1_pkey_2 on t1
(6 rows)

Can't above plan eliminate Sort Key even after dropping index
(i_t1_pkey_1)?



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Use unique index for longer pathkeys.

2014-07-26 Thread Amit Kapila
On Sat, Jul 26, 2014 at 11:53 AM, Amit Kapila amit.kapil...@gmail.com
wrote:
 On Fri, Jul 25, 2014 at 12:48 PM, Kyotaro HORIGUCHI 
horiguchi.kyot...@lab.ntt.co.jp wrote:
   I think there is one more disadvantage in the way current patch is
   done which is that you need to collect index path keys for all
relations
   irrespective of whether they will be of any use to eliminate useless
   pathkeys from query_pathkeys.  One trivial case that comes to mind is
   when there are multiple relations involved in query and ORDER BY is
   base on columns of only part of the tables involved in query.
 
  Like this?
 
  select x.a, x.b, y.b from x, y where x.a = y.a order by x.a, x.b;
 
  Equivalent class consists of (x.a=y.a) and (x.b), so index
  pathkeys for i_y is (y.a.=x.a). As a result, no common primary
  pathkeys found.

 I think it will find common pathkey incase you have an unique index
 on x.a (please see the example below), but currently I am not clear
 why there is a need for a common index path key in such a case to
 eliminate useless keys in ORDER BY, why can't we do it based
 on individual table's path key.

 Example:

 create table t (a int not null, b int not null, c int, d text);
 create unique index i_t_pkey on t(a, b);
 insert into t (select a % 10, a / 10, a, 't' from generate_series(0,
10) a);
 analyze;

 create table t1 (a int not null, b int not null, c int, d text);
 create unique index i_t1_pkey_1 on t1(a);
 create unique index i_t1_pkey_2 on t1(a, b);
 insert into t1 (select a * 2, a / 10, a, 't' from generate_series(0,
10) a);
 explain (costs off, analyze off) select * from t,t1 where t.a=t1.a order
by t1.a,t1.b,t1.c,t1.d;

 QUERY PLAN
 --
  Merge Join
Merge Cond: (t.a = t1.a)
-  Index Scan using i_t_pkey on t
-  Index Scan using i_t1_pkey_1 on t1
 (4 rows)

 Here we can notice that there is no separate sort key in plan.

 Now drop the i_t1_pkey_1 and check the query plan again.

 drop index i_t1_pkey_1;
 explain (costs off, analyze off) select * from t,t1 where t.a=t1.a order
by t1.a,t1.b,t1.c,t1.d;
QUERY PLAN
 
  Sort
Sort Key: t.a, t1.b, t1.c, t1.d
-  Merge Join
  Merge Cond: (t.a = t1.a)
  -  Index Scan using i_t_pkey on t
  -  Index Scan using i_t1_pkey_2 on t1
 (6 rows)

 Can't above plan eliminate Sort Key even after dropping index
 (i_t1_pkey_1)?


Here I have one additional thought which I would like to share with
you to see if this patch can be done in a simpler way.  In function
standard_qp_callback(), can we directly trim the sortclause list based
on index information in PlannerInfo.  We have access to target list in
this function to know exactly the relation/column information of
sortclause.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Use unique index for longer pathkeys.

2014-07-25 Thread Kyotaro HORIGUCHI
Hello. 

I've been cooled off and convinced that this patch has become
quite useless by itself. It improves almost only UNIONs with
ORDER BY on tables that have unioform primary keys, and needs my
another patch to work.

I'll try to reintegrate this patch into my 'another patch' as
mentioned below and try next CF with it. I'll close this patch as
'Return with Feedback' by myself. Thank you for your thoughtful
commnets. I learned a lot.

=
Thank you for the answers, Robert, Amit :)

  # By the way, this style of calling a person is quite common
  # among Japanese since the first-name basis implies very close
  # relationship or it frequently conveys offensive shade.
 
 I don't know if I have ever offended you or any other Japanese
 community member while interaction, but my intention was never
 so.  The reason for using this style for calling you is during my initial 4
 years of work, I have worked very closely with Japanese, so I have
 learned few things during that time and it was quite common to refer
 the way I used above, however I am not sure I have always used
 during communication, so if something I have used which is not common
 in your culture, please feel free to let me know, I will surely not do that
 again.

I also don't mind at all and that's why I was anxious about
it. Thank you for your answers.

  Sorry, my memory had been worn down. After some reconfirmation,
  this description found to be a bit (quite?) wrong.
 
  collect_common_primary_pathkeys needs root-eq_classes to be set
  up beforehand, not appendrels. Bacause build_index_pathkeys
  doesn't work before every EC member for all relation involved is
  already registered.
 
 
  standard_qp_callback registers EC members in the following path
  but only for the primary(?) tlist of the subquery, so EC members
  for the child relations are not registered at the time.
 
  .-make_pathekeys_sortclauses-make_pathkey_from_sortop
 -make_pathkey_from_sortinfo-get_eclass_for_sort_expr
 
  EC members for the child rels are registered in
 
   set_base_rel_sizes-set_rel_size-set_append_rel_size
 -add_child_rel_equivalences
 
  So trim_query_pathkeys (collect_common...) doesn't work before
  set_base_rel_sizes(). These EC members needed to be registered at
  least if root-query_pathkeys exists so this seems to be done in
  standard_qp_callback adding a separate inheritance tree walk.
 
 Do we really need that information to eliminate useless keys from
 query_pathkeys?
 
 
 * We have to make child entries in the EquivalenceClass data
 * structures as well.  This is needed either if the parent
 * participates in some eclass joins (because we will want to consider
 * inner-indexscan joins on the individual children) or if the parent
 * has useful pathkeys (because we should try to build MergeAppend
 * paths that produce those sort orderings).
 
 Referring to above comment, I think we don't need it to eliminate
 useless keys based on primary key info from query_pathkeys, but I
 might be missing some case, could you please elaborate more to
 let me know the case/cases where this would be useful.

As I said some time ago, It's not so useful for wide situation,
especially for no transformation is done on the given query. If
my memory is correct, there was more wider application before the
EC fix for nested inheritance, but now this has rather narrow
application:( Anyway,

The most and almost only target of this patch is the situation
that a DISTINCT is implicitly added for the query, that is UNION
on tables with primary key, which is generated by my another
patch which was proposed on last CF4 (or earlier). This patch is
originally a part of it.

https://commitfest.postgresql.org/action/patch_view?id=1279

Come to think of it, maybe I changed my mind. It's better to add
a trimmed (minimum) DISTINCT ON clause directly for flattened
simple UNION. Other examples are seemingly odd or unlikely as
given queries.

 I think there is one more disadvantage in the way current patch is
 done which is that you need to collect index path keys for all relations
 irrespective of whether they will be of any use to eliminate useless
 pathkeys from query_pathkeys.  One trivial case that comes to mind is
 when there are multiple relations involved in query and ORDER BY is
 base on columns of only part of the tables involved in query.

Like this?

select x.a, x.b, y.b from x, y where x.a = y.a order by x.a, x.b;

Equivalent class consists of (x.a=y.a) and (x.b), so index
pathkeys for i_y is (y.a.=x.a). As a result, no common primary
pathkeys found. There seem to be no problem if I understand you
correctly.

  # rel-has_elcass_joins seems not working but it is not the topic
  # here.
 
   Could you please explain me why the index information built in above
   path is not sufficient or is there any other case which I am missing?
 
  Is the description above enough to be the explaination for the
  place? Sorry for the inaccurate description.
 
 No issues, above 

Re: [HACKERS] Use unique index for longer pathkeys.

2014-07-24 Thread Robert Haas
On Tue, Jul 22, 2014 at 5:19 AM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:
 # By the way, this style of calling a person is quite common
 # among Japanese since the first-name basis implies very close
 # relationship or it frequently conveys offensive shade. But I'm
 # not sure what should I call others who're not Japases native.

Typical usage on this mailing list is to refer to individuals by first
name (e.g. Tom, Alvaro, Heikki) or by full name (e.g. Tom Lane, Alvaro
Herrera, Heikki Linnakangas).  The last name is typically included
only when it might otherwise be confusing - for example, you will
rarely see people use just Greg because there are several people by
that name who are reasonably active, but Heikki's last name is almost
never mentioned).

That having been said, if you want to use the Japanese style, I don't
think anyone here will mind.  I am personally not totally familiar
with it so please forgive me in turn if I should address you or anyone
else in a manner that would be considered too familiar in another
culture.  I try not to do that, but I might make some mistakes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use unique index for longer pathkeys.

2014-07-24 Thread Amit Kapila
On Tue, Jul 22, 2014 at 2:49 PM, Kyotaro HORIGUCHI 
horiguchi.kyot...@lab.ntt.co.jp wrote:

 Sorry , previous version has bugs. It stamps over the stack and
 crashesX( The attached is the bug fixed version, with no
 substantial changes.

  On Tue, Jul 15, 2014 at 2:17 PM, Kyotaro HORIGUCHI 
  horiguchi.kyot...@lab.ntt.co.jp wrote:
  
   Hi, the attached is the revised version.
 
  Thanks Horiguchi-San for the updated patch.

 # By the way, this style of calling a person is quite common
 # among Japanese since the first-name basis implies very close
 # relationship or it frequently conveys offensive shade.

I don't know if I have ever offended you or any other Japanese
community member while interaction, but my intention was never
so.  The reason for using this style for calling you is during my initial 4
years of work, I have worked very closely with Japanese, so I have
learned few things during that time and it was quite common to refer
the way I used above, however I am not sure I have always used
during communication, so if something I have used which is not common
in your culture, please feel free to let me know, I will surely not do that
again.

  Today while looking into updated patch, I was wondering why can't
  we eliminate useless keys in query_pathkeys when we actually build
  the same in function standard_qp_callback(), basically somewhat
  similar to what we do in
 
standard_qp_callback-make_pathkeys_for_sortclauses-pathkey_is_redundant().

 I agree that standard_qp_callback is basically more preferable.

  We already have index information related to base_rels before building
  query pathkeys.  I noticed that you mentioned the below in your original
  mail which indicates that information related to inheritance tables is
built
  only after set_base_rel_sizes()
 
  These steps take place between set_base_rel_sizes() and
  set_base_rel_pathlists() in make_one_rel(). The reason for the
  position is that the step 2 above needs all inheritence tables to
  be extended in PlannerInfo and set_base_rel_sizes (currently)
  does that.

 Sorry, my memory had been worn down. After some reconfirmation,
 this description found to be a bit (quite?) wrong.

 collect_common_primary_pathkeys needs root-eq_classes to be set
 up beforehand, not appendrels. Bacause build_index_pathkeys
 doesn't work before every EC member for all relation involved is
 already registered.


 standard_qp_callback registers EC members in the following path
 but only for the primary(?) tlist of the subquery, so EC members
 for the child relations are not registered at the time.

 .-make_pathekeys_sortclauses-make_pathkey_from_sortop
-make_pathkey_from_sortinfo-get_eclass_for_sort_expr

 EC members for the child rels are registered in

  set_base_rel_sizes-set_rel_size-set_append_rel_size
-add_child_rel_equivalences

 So trim_query_pathkeys (collect_common...) doesn't work before
 set_base_rel_sizes(). These EC members needed to be registered at
 least if root-query_pathkeys exists so this seems to be done in
 standard_qp_callback adding a separate inheritance tree walk.

Do we really need that information to eliminate useless keys from
query_pathkeys?


* We have to make child entries in the EquivalenceClass data
* structures as well.  This is needed either if the parent
* participates in some eclass joins (because we will want to consider
* inner-indexscan joins on the individual children) or if the parent
* has useful pathkeys (because we should try to build MergeAppend
* paths that produce those sort orderings).

Referring to above comment, I think we don't need it to eliminate
useless keys based on primary key info from query_pathkeys, but I
might be missing some case, could you please elaborate more to
let me know the case/cases where this would be useful.

I think there is one more disadvantage in the way current patch is
done which is that you need to collect index path keys for all relations
irrespective of whether they will be of any use to eliminate useless
pathkeys from query_pathkeys.  One trivial case that comes to mind is
when there are multiple relations involved in query and ORDER BY is
base on columns of only part of the tables involved in query.

 # rel-has_elcass_joins seems not working but it is not the topic
 # here.

  Could you please explain me why the index information built in above
  path is not sufficient or is there any other case which I am missing?

 Is the description above enough to be the explaination for the
 place? Sorry for the inaccurate description.

No issues, above description is sufficient to explain why you have
written patch in its current form.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Use unique index for longer pathkeys.

2014-07-22 Thread Kyotaro HORIGUCHI
Sorry , previous version has bugs. It stamps over the stack and
crashesX( The attached is the bug fixed version, with no
substantial changes.

 On Tue, Jul 15, 2014 at 2:17 PM, Kyotaro HORIGUCHI 
 horiguchi.kyot...@lab.ntt.co.jp wrote:
 
  Hi, the attached is the revised version.
 
 Thanks Horiguchi-San for the updated patch.

# By the way, this style of calling a person is quite common
# among Japanese since the first-name basis implies very close
# relationship or it frequently conveys offensive shade. But I'm
# not sure what should I call others who're not Japases native.

Anyway,

 Today while looking into updated patch, I was wondering why can't
 we eliminate useless keys in query_pathkeys when we actually build
 the same in function standard_qp_callback(), basically somewhat
 similar to what we do in
 standard_qp_callback-make_pathkeys_for_sortclauses-pathkey_is_redundant().

I agree that standard_qp_callback is basically more preferable.

 We already have index information related to base_rels before building
 query pathkeys.  I noticed that you mentioned the below in your original
 mail which indicates that information related to inheritance tables is built
 only after set_base_rel_sizes()
 
 These steps take place between set_base_rel_sizes() and
 set_base_rel_pathlists() in make_one_rel(). The reason for the
 position is that the step 2 above needs all inheritence tables to
 be extended in PlannerInfo and set_base_rel_sizes (currently)
 does that.

Sorry, my memory had been worn down. After some reconfirmation,
this description found to be a bit (quite?) wrong.

collect_common_primary_pathkeys needs root-eq_classes to be set
up beforehand, not appendrels. Bacause build_index_pathkeys
doesn't work before every EC member for all relation involved is
already registered.

standard_qp_callback registers EC members in the following path
but only for the primary(?) tlist of the subquery, so EC members
for the child relations are not registered at the time.

.-make_pathekeys_sortclauses-make_pathkey_from_sortop
   -make_pathkey_from_sortinfo-get_eclass_for_sort_expr

EC members for the child rels are registered in

 set_base_rel_sizes-set_rel_size-set_append_rel_size
   -add_child_rel_equivalences

So trim_query_pathkeys (collect_common...) doesn't work before
set_base_rel_sizes(). These EC members needed to be registered at
least if root-query_pathkeys exists so this seems to be done in
standard_qp_callback adding a separate inheritance tree walk.

# rel-has_elcass_joins seems not working but it is not the topic
# here.

 Could you please explain me why the index information built in above
 path is not sufficient or is there any other case which I am missing?

Is the description above enough to be the explaination for the
place? Sorry for the inaccurate description.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 9573a9b..546e112 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1789,9 +1789,11 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node)
 	/* indexprs is redundant since we print indextlist */
 	WRITE_NODE_FIELD(indpred);
 	WRITE_NODE_FIELD(indextlist);
+	/* cached pathkeys are omitted as indexkeys */
 	WRITE_BOOL_FIELD(predOK);
 	WRITE_BOOL_FIELD(unique);
 	WRITE_BOOL_FIELD(immediate);
+	WRITE_BOOL_FIELD(allnotnull);
 	WRITE_BOOL_FIELD(hypothetical);
 	/* we don't bother with fields copied from the pg_am entry */
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index c81efe9..c12d0e7 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -58,6 +58,7 @@ int			geqo_threshold;
 join_search_hook_type join_search_hook = NULL;
 
 
+static List *collect_common_primary_pathkeys(PlannerInfo *root);
 static void set_base_rel_sizes(PlannerInfo *root);
 static void set_base_rel_pathlists(PlannerInfo *root);
 static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
@@ -66,6 +67,7 @@ static void set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
  Index rti, RangeTblEntry *rte);
 static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel,
    RangeTblEntry *rte);
+static void trim_query_pathkeys(PlannerInfo * root);
 static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	   RangeTblEntry *rte);
 static void set_foreign_size(PlannerInfo *root, RelOptInfo *rel,
@@ -112,6 +114,203 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
   RangeTblEntry *rte, Index rti, Node *qual);
 static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
 
+/*
+ * collect_common_primary_pathkeys: Collects common unique and non-null index
+ * pathkeys from all base relations in current root.
+ */
+static List *
+collect_common_primary_pathkeys(PlannerInfo *root)
+{
+	List *result = NIL;
+	Index rti;
+	int   nindex = 0;
+	List 

Re: [HACKERS] Use unique index for longer pathkeys.

2014-07-19 Thread Amit Kapila
On Tue, Jul 15, 2014 at 2:17 PM, Kyotaro HORIGUCHI 
horiguchi.kyot...@lab.ntt.co.jp wrote:

 Hi, the attached is the revised version.

Thanks Horiguchi-San for the updated patch.

Today while looking into updated patch, I was wondering why can't
we eliminate useless keys in query_pathkeys when we actually build
the same in function standard_qp_callback(), basically somewhat
similar to what we do in
standard_qp_callback-make_pathkeys_for_sortclauses-pathkey_is_redundant().

We already have index information related to base_rels before building
query pathkeys.  I noticed that you mentioned the below in your original
mail which indicates that information related to inheritance tables is built
only after set_base_rel_sizes()

These steps take place between set_base_rel_sizes() and
set_base_rel_pathlists() in make_one_rel(). The reason for the
position is that the step 2 above needs all inheritence tables to
be extended in PlannerInfo and set_base_rel_sizes (currently)
does that.

Now I am not sure why such information is not build during
build_simple_rel() in below code path:
build_simple_rel()
{
..
if (rte-inh)
{
ListCell   *l;

foreach(l, root-append_rel_list)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);

/* append_rel_list contains all append rels; ignore others */
if (appinfo-parent_relid != relid)
continue;

(void) build_simple_rel(root, appinfo-child_relid,
RELOPT_OTHER_MEMBER_REL);
}
}
}

Could you please explain me why the index information built in above
path is not sufficient or is there any other case which I am missing?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Use unique index for longer pathkeys.

2014-07-15 Thread Kyotaro HORIGUCHI
Hi, the attached is the revised version.

 - defined allnotnull instead of full_ordered in RelOptInfo and
   changed to use it.

 - compare_pathkeys_ignoring_strategy() is now integrated into
   compare_pathkeys().

 - index pathkeys caching in RelOptInfo
   added. (build_index_pathkeys() does)

 - Refactoring collect_common_primary_pathkeys mainly from a
   reason of eliminating code duplicates and readability.

 - trim_query_pathkeys now tries to trim window_pathkeys. The old
   comment came from my misunderstanding. Window pathkeys are not
   always RowExpr and RowExpr pathkeys are feedable from not only
   PARTITION BY clause but also ORDER BY, DISTINCT or GROUP BY
   clauses. The comment for the function is modified.

 - Some cosmetic fixes including spacing after commas.

 - The shortcut judgement is *not* included.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 9573a9b..546e112 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1789,9 +1789,11 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node)
 	/* indexprs is redundant since we print indextlist */
 	WRITE_NODE_FIELD(indpred);
 	WRITE_NODE_FIELD(indextlist);
+	/* cached pathkeys are omitted as indexkeys */
 	WRITE_BOOL_FIELD(predOK);
 	WRITE_BOOL_FIELD(unique);
 	WRITE_BOOL_FIELD(immediate);
+	WRITE_BOOL_FIELD(allnotnull);
 	WRITE_BOOL_FIELD(hypothetical);
 	/* we don't bother with fields copied from the pg_am entry */
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index c81efe9..751eb6a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -58,6 +58,7 @@ int			geqo_threshold;
 join_search_hook_type join_search_hook = NULL;
 
 
+static List *collect_common_primary_pathkeys(PlannerInfo *root);
 static void set_base_rel_sizes(PlannerInfo *root);
 static void set_base_rel_pathlists(PlannerInfo *root);
 static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
@@ -66,6 +67,7 @@ static void set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
  Index rti, RangeTblEntry *rte);
 static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel,
    RangeTblEntry *rte);
+static void trim_query_pathkeys(PlannerInfo * root);
 static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	   RangeTblEntry *rte);
 static void set_foreign_size(PlannerInfo *root, RelOptInfo *rel,
@@ -112,6 +114,203 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
   RangeTblEntry *rte, Index rti, Node *qual);
 static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
 
+/*
+ * collect_common_primary_pathkeys: Collects common unique and non-null index
+ * pathkeys from all base relations in current root.
+ */
+static List *
+collect_common_primary_pathkeys(PlannerInfo *root)
+{
+	List *result = NIL;
+	Index rti;
+	int   nindex = 0;
+	List **pathkeys_array0 = NULL;
+	List **pathkeys_array = NULL;
+	List **pathkeys_array2 = NULL;
+	bool do_conjunction = (root-simple_rel_array_size  2);
+
+	for (rti = 1 ; rti  root-simple_rel_array_size ; rti++)
+	{
+		RelOptInfo *rel = root-simple_rel_array[rti];
+		int prev_nindex = 0;
+		ListCell *lc;
+
+		if (rel == NULL)
+			continue;
+
+		Assert (rel-relid == rti);
+
+		/* Scan all base relations except parent relations. */
+		if (root-simple_rte_array[rti]-inh ||
+			(rel-reloptkind != RELOPT_BASEREL 
+			 rel-reloptkind != RELOPT_OTHER_MEMBER_REL))
+			continue;
+
+		if (do_conjunction)
+		{
+			if (pathkeys_array2)
+memset(pathkeys_array2, 0, sizeof(List*) * nindex);
+			else
+			{
+/* Rig for AND (conjunction) operation. */
+nindex = 0;
+
+foreach (lc, rel-indexlist)
+{
+	IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);
+	
+	if (index-allnotnull 
+		index-unique  index-immediate 
+		index-indpred == NIL)
+		nindex++;
+}
+
+if (nindex == 0)
+	return NULL; /* No common primary pathkeys exists.*/
+
+pathkeys_array0 = palloc0(sizeof(List*) * nindex * 2);
+pathkeys_array  = pathkeys_array0;
+pathkeys_array2 = pathkeys_array0 + nindex;
+			}
+		}
+
+		prev_nindex = nindex;
+		nindex = 0;
+
+		foreach (lc, rel-indexlist)
+		{
+			IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);
+			List *idx_pathkeys;
+
+			/* Only indexes that can unique sort are needed. */
+			if (!index-allnotnull ||
+!(index-unique  index-immediate) ||
+index-indpred != NIL)
+continue;
+
+			idx_pathkeys = build_index_pathkeys(root, index,
+ForwardScanDirection);
+			if (!idx_pathkeys)
+continue;
+
+			if (!do_conjunction)
+			{
+result = lappend(result, idx_pathkeys);
+nindex++;
+			}
+			else
+			{
+/*
+ * AND operation: drop pathkeys which don't see identical
+ * one in this relation.
+ */
+int i;
+
+for (i = 0 ; i  prev_nindex ; i++)
+{
+	if 

Re: [HACKERS] Use unique index for longer pathkeys.

2014-07-14 Thread Kyotaro HORIGUCHI
Thank you for reviewing, the revised patch will come later.

At Mon, 14 Jul 2014 11:01:52 +0530, Amit Kapila amit.kapil...@gmail.com wrote 
in caa4ek1+6b6wjwf51ozmrl+mkfh8xup9j-pehqvor8se7swy...@mail.gmail.com
 On Fri, Jun 13, 2014 at 1:11 PM, Kyotaro HORIGUCHI 
 horiguchi.kyot...@lab.ntt.co.jp wrote:
 
  Hello, This is the continuation from the last CF.
 
  This patch intends to make PG to use index for longer pathkeys
  than index columns when,
 
   - The index is a unique index.
   - All index columns are NOT NULL.
   - The index column list is a subset of query_pathkeys.
 
  
 
  This patch consists of mainly three parts.
(snip)
 
 I have spent some time on this patch and would like to share my
 findings as below with you.
 
 1.
 It seems to me that the new flag (IndexOptInfo.full_ordered) introduced in
 this patch is not getting set properly.  Please refer below test:

Yeah, probably you omitted the second condition above.

 create table t (a int, b int, c int, d text);

The following definition instead will make this work. NULLs
cannot be unique.

| create table t (a int not null, b int not null, c int, d text);

 create unique index i_t_pkey on t(a, b);
 insert into t (select a % 10, a / 10, a, 't' from generate_series(0,
 10) a);
 analyze;
 explain (costs off, analyze on) select distinct * from t;
...
 2.
 typedef struct IndexOptInfo
   bool predOK; /* true if predicate matches query */
   bool unique; /* true if a unique index */
   bool immediate; /* is uniqueness enforced immediately? */
 + bool full_ordered; /* don't key columns duplicate? */
 
 It seems you have forgotten to update out function _outIndexOptInfo().

Mmm, it's surely what I didn't intended to make:( , but the
member actually works in collect_common_primary_pathkeys.

The correct (desirable) code should use (allnotnull  unique)
instead of (full_ordered). I'll fix this in the comming version
later but they are equivelant in functionality. It's not a
mistake of forgetting update out (so broken), but branching from
wrong point (so it works contrary to the expectation):)

 3.
 + root-distinct_pathkeys =
 + shorten_pathkeys_following(root, root-distinct_pathkeys,pk_guide,
 +   true);
 +
 + root-query_pathkeys =
 + shorten_pathkeys_following(root,root-query_pathkeys,pk_guide,
 +   true);
 
 Add a space after each parameter in function call.

Thank you for pointing out.

 4.
 +PathKeysComparison
 +compare_pathkeys_ignoring_strategy(List *keys1, List *keys2)
 
 a. This function return 4 different type of values (PATHKEYS_EQUAL,
PATHKEYS_DIFFERENT, PATHKEYS_BETTER1, PATHKEYS_BETTER2),
however the caller just checks for PATHKEYS_EQUAL, isn't it better to
 just
return either PATHKEYS_EQUAL or PATHKEYS_DIFFERENT.

Hmm.. I agree that it is practically removable, but I think it is
necessary from the view of similarity to the function with the
similar name. Perhaps I want another name which don't suggest the
similarity to compare_pathekeys(), say, bool
pathkeys_are_equal_ignoring_strategy() to change the rerurn
values set.

 b. Another idea could be that instead of writting separate function,
pass an additional parameter to compare_pathkeys() to distinguish
for ignore strategy case.

It sounds reasonable. According to my faint memory, the reason
why the function is separated from the original one is, perhaps,
simplly a editorial reason.

 c. In any case, I think it is good to add comments on top of newly
introduced function (compare_pathkeys_ignoring_strategy) or extend
the comments of compare_pathkeys() if you want to add a new parameter
to it.

Ouch, thank you for pointing out. I'll add comment if it remains
in the next patch.

  Finally, the new patch has grown up to twice in size.. Although
  it seems a bit large, I think that side-effects are clearly
  avoided.
 
 I am bit worried about the extra cycles added by this patch as compare
 to your previous patch; example
 During trimming of paths, it will build index paths (build_index_pathkeys())
 which it will anyway do again during creation of index paths:
 create_index_paths()-get_index_paths()-build_index_paths()-build_index_pathkeys()

I felt the same thing. On stupid measure to reduce cycles would
be caching created pathkeys in IndexOptInfo. I'll try in the next
patch.

 I am not sure if this has direct impact, however I am suspecting trimming
 logic can add quite a few instructions even for cases when it is not
 beneficial.
 At this moment, I also don't have better idea, so lets proceed with review
 of this
 patch unless there is any other better way to achieve it.

I suppose there's some shortcut which can exclude the cases where
obviously there's no use of pathkeys trimming, for example, using
only pathkey lengths. I'll seek for it.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Use unique index for longer pathkeys.

2014-07-14 Thread Amit Kapila
On Mon, Jul 14, 2014 at 4:02 PM, Kyotaro HORIGUCHI 
horiguchi.kyot...@lab.ntt.co.jp wrote:
 At Mon, 14 Jul 2014 11:01:52 +0530, Amit Kapila amit.kapil...@gmail.com
wrote in caa4ek1+6b6wjwf51ozmrl+mkfh8xup9j-pehqvor8se7swy...@mail.gmail.com

  On Fri, Jun 13, 2014 at 1:11 PM, Kyotaro HORIGUCHI 
  horiguchi.kyot...@lab.ntt.co.jp wrote:
  I am bit worried about the extra cycles added by this patch as compare
  to your previous patch; example
  During trimming of paths, it will build index paths
(build_index_pathkeys())
  which it will anyway do again during creation of index paths:
 
create_index_paths()-get_index_paths()-build_index_paths()-build_index_pathkeys()

 I felt the same thing. On stupid measure to reduce cycles would
 be caching created pathkeys in IndexOptInfo. I'll try in the next
 patch.

I suggest to wait for overall review of patch before trying this out,
we can try to see what is the best way to avoid this if required, once
other part of patch is reviewed and proved to be problem free.

Other than this I agree with the other points you mentioned in mail
and may be you can just handle those in next version of your patch.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Use unique index for longer pathkeys.

2014-07-13 Thread Amit Kapila
On Fri, Jun 13, 2014 at 1:11 PM, Kyotaro HORIGUCHI 
horiguchi.kyot...@lab.ntt.co.jp wrote:

 Hello, This is the continuation from the last CF.

 This patch intends to make PG to use index for longer pathkeys
 than index columns when,

  - The index is a unique index.
  - All index columns are NOT NULL.
  - The index column list is a subset of query_pathkeys.

 

 This patch consists of mainly three parts.

  1. Mark index as 'Uniquely orders the table' if the conditions
 are satisfied. This is the same from the previous patch.

  2. Collect the common primary pathkeys, which is pathkeys that
 can perfectly sort all involved
 RTE's. (collect_common_primary_pathkeys())

  3. Trim the query pathkeys using the common primary pathkeys.
 (trim_query_pathkeys())

I have spent some time on this patch and would like to share my
findings as below with you.

1.
It seems to me that the new flag (IndexOptInfo.full_ordered) introduced in
this patch is not getting set properly.  Please refer below test:

create table t (a int, b int, c int, d text);
create unique index i_t_pkey on t(a, b);
insert into t (select a % 10, a / 10, a, 't' from generate_series(0,
10) a);
analyze;
explain (costs off, analyze on) select distinct * from t;

Unptached -
postgres=# explain (costs off, analyze on) select distinct * from t;
QUERY PLAN
---
 Unique (actual time=331.624..597.278 rows=11 loops=1)
   -  Sort (actual time=330.889..430.449 rows=11 loops=1)
 Sort Key: a, b, c, d
 Sort Method: external merge  Disk: 2344kB
 -  Seq Scan on t (actual time=0.013..74.202 rows=11 loops=1)
 Execution time: 665.332 ms
(6 rows)

Patch (pathkey_and_uniqueindx_typ2_v1) -
postgres=# explain (costs off, analyze on) select distinct * from t;
QUERY PLAN
---
 Unique (actual time=336.033..601.144 rows=11 loops=1)
   -  Sort (actual time=336.027..436.621 rows=11 loops=1)
 Sort Key: a, b, c, d
 Sort Method: external merge  Disk: 2344kB
 -  Seq Scan on t (actual time=0.014..78.826 rows=11 loops=1)
 Execution time: 667.387 ms
(6 rows)

Shouldn't above query use index scan after patch?

2.
typedef struct IndexOptInfo
  bool predOK; /* true if predicate matches query */
  bool unique; /* true if a unique index */
  bool immediate; /* is uniqueness enforced immediately? */
+ bool full_ordered; /* don't key columns duplicate? */

It seems you have forgotten to update out function _outIndexOptInfo().

3.
+ root-distinct_pathkeys =
+ shorten_pathkeys_following(root, root-distinct_pathkeys,pk_guide,
+   true);
+
+ root-query_pathkeys =
+ shorten_pathkeys_following(root,root-query_pathkeys,pk_guide,
+   true);

Add a space after each parameter in function call.

4.
+PathKeysComparison
+compare_pathkeys_ignoring_strategy(List *keys1, List *keys2)

a. This function return 4 different type of values (PATHKEYS_EQUAL,
   PATHKEYS_DIFFERENT, PATHKEYS_BETTER1, PATHKEYS_BETTER2),
   however the caller just checks for PATHKEYS_EQUAL, isn't it better to
just
   return either PATHKEYS_EQUAL or PATHKEYS_DIFFERENT.
b. Another idea could be that instead of writting separate function,
   pass an additional parameter to compare_pathkeys() to distinguish
   for ignore strategy case.
c. In any case, I think it is good to add comments on top of newly
   introduced function (compare_pathkeys_ignoring_strategy) or extend
   the comments of compare_pathkeys() if you want to add a new parameter
   to it.

 Finally, the new patch has grown up to twice in size.. Although
 it seems a bit large, I think that side-effects are clearly
 avoided.

I am bit worried about the extra cycles added by this patch as compare
to your previous patch; example
During trimming of paths, it will build index paths (build_index_pathkeys())
which it will anyway do again during creation of index paths:
create_index_paths()-get_index_paths()-build_index_paths()-build_index_pathkeys()

I am not sure if this has direct impact, however I am suspecting trimming
logic can add quite a few instructions even for cases when it is not
beneficial.
At this moment, I also don't have better idea, so lets proceed with review
of this
patch unless there is any other better way to achieve it.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Use unique index for longer pathkeys.

2014-07-10 Thread Kyotaro HORIGUCHI
Thank you for taking a look on this patch.

 I took a quick look at this patch, more or less because nobody else did.
 
  Duing last CF, I proposed to match long pathkeys against index columns
  during creating index paths. This worked fine but also it is difficult
  to make sure that all side-effects are eliminated. Finally Tom Lane
  suggested to truncate pathkeys while generation of the pathkeys
  itself. So this patch comes.
 
 I found your older patch quite straightforward to understand, but the
 new one much more difficult to follow (but that's not saying much, I
 am not very familiar with the planner code in general).

I think it's quite natural to think so.

 Do you have any references to the discussion about the side-effects that
 needed to be eliminated with the earlier patch?

The biggest side-effects (or simplly defect) found so far is
discussed here,

http://www.postgresql.org/message-id/01bd01cf0b4e$9b960ad0$d2c22070$@ets...@lab.ntt.co.jp

This was caused by omitting the membership of the Var under
inspection while cheking if the pathkeys is extensible.

http://www.postgresql.org/message-id/20140107.145900.196068363.horiguchi.kyot...@lab.ntt.co.jp

After all, Tom said that the right way to do this is not such
whacking-a-mole thing but loosen pathkeys previously so that the
planner naturally do what the previous patch did without any
special treat.

http://www.postgresql.org/message-id/5212.1397599...@sss.pgh.pa.us

So the new patch comes.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use unique index for longer pathkeys.

2014-07-07 Thread Abhijit Menon-Sen
Hi.

I took a quick look at this patch, more or less because nobody else did.

 Duing last CF, I proposed to match long pathkeys against index columns
 during creating index paths. This worked fine but also it is difficult
 to make sure that all side-effects are eliminated. Finally Tom Lane
 suggested to truncate pathkeys while generation of the pathkeys
 itself. So this patch comes.

I found your older patch quite straightforward to understand, but the
new one much more difficult to follow (but that's not saying much, I
am not very familiar with the planner code in general).

Do you have any references to the discussion about the side-effects that
needed to be eliminated with the earlier patch?

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Use unique index for longer pathkeys.

2014-06-13 Thread Kyotaro HORIGUCHI
Hello, This is the continuation from the last CF.

This patch intends to make PG to use index for longer pathkeys
than index columns when,

 - The index is a unique index.
 - All index columns are NOT NULL.
 - The index column list is a subset of query_pathkeys.

The use cases for this patch are,

 - (Useless?) DISTINCT on the table with PK constraints.

   This alone is useless but this situation could take place when
   processing UNION (without ALL), which silently adds DISTINCT *
   to element queries.  Especially effective with ORDER BY and
   LIMIT clauses.  But this needs another patch for UNION
   optimization. This is the main motive for this patch.

 - Unfortunately, other use cases are somewhat boresome, mainly
   effective for essentially unnecessary ORDER BY or DISTINCT. To
   streatch a point, naively or mechanically composed queires
   could be saved by this patch..



Duing last CF, I proposed to match long pathkeys against index
columns during creating index paths. This worked fine but also it
is difficult to make sure that all side-effects are
eliminated. Finally Tom Lane suggested to truncate pathkeys while
generation of the pathkeys itself. So this patch comes.

This patch consists of mainly three parts.

 1. Mark index as 'Uniquely orders the table' if the conditions
are satisfied. This is the same from the previous patch.

 2. Collect the common primary pathkeys, which is pathkeys that
can perfectly sort all involved
RTE's. (collect_common_primary_pathkeys())

 3. Trim the query pathkeys using the common primary pathkeys.
(trim_query_pathkeys())

These steps take place between set_base_rel_sizes() and
set_base_rel_pathlists() in make_one_rel(). The reason for the
position is that the step 2 above needs all inheritence tables to
be extended in PlannerInfo and set_base_rel_sizes (currently)
does that. Then the trimmed pathkeys are used in
set_base_rel_pathlists so trim_query_pathkeys should be before
it. (This needs to be written as a comment?)

Finally, the new patch has grown up to twice in size.. Although
it seems a bit large, I think that side-effects are clearly
avoided.


This message is attatched by two patches.

 1. pathkey_and_uniqueindx_typ2_v1.patch : This new patch.

 2. pathkey_and_uniqueindx_v10_20130411.patch : The last version
of the previous approach.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index fdaa964..47b8056 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -50,6 +50,7 @@ int			geqo_threshold;
 join_search_hook_type join_search_hook = NULL;
 
 
+static List *collect_common_primary_pathkeys(PlannerInfo *root);
 static void set_base_rel_sizes(PlannerInfo *root);
 static void set_base_rel_pathlists(PlannerInfo *root);
 static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
@@ -58,6 +59,7 @@ static void set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
  Index rti, RangeTblEntry *rte);
 static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel,
    RangeTblEntry *rte);
+static void trim_query_pathkeys(PlannerInfo * root);
 static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	   RangeTblEntry *rte);
 static void set_foreign_size(PlannerInfo *root, RelOptInfo *rel,
@@ -102,6 +104,214 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
   RangeTblEntry *rte, Index rti, Node *qual);
 static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
 
+/*
+ * collect_common_primary_pathkeys: Collects common unique and non-null index
+ * pathkeys from all base relations in current root.
+ */
+
+static List *
+collect_common_primary_pathkeys(PlannerInfo *root)
+{
+	List *result = NIL;
+	Index rti;
+	int   nindex = -1;
+	List **pathkeys_array0 = NULL;
+	List **pathkeys_array = NULL;
+	List **pathkeys_array2 = NULL;
+	bool multipass = (root-simple_rel_array_size  2);
+
+	for (rti = 1 ; rti  root-simple_rel_array_size  nindex ; rti++)
+	{
+		RelOptInfo *rel = root-simple_rel_array[rti];
+		List **pktmp;
+
+		if (rel == NULL)
+			continue;
+
+		Assert (rel-relid == rti);
+
+		/*
+		 * Scan all base relations except parent relations.
+		 */
+		if (root-simple_rte_array[rti]-inh ||
+			(rel-reloptkind != RELOPT_BASEREL 
+			 rel-reloptkind != RELOPT_OTHER_MEMBER_REL))
+			continue;
+
+		/* pathkeys_array is NULL for the first iteration */
+		if (pathkeys_array == NULL)
+		{
+			ListCell *lc;
+			int i = 0;
+
+			/* Skip rigging for AND operation if not needed. */
+			if (multipass)
+			{
+nindex = 0;
+
+foreach (lc, rel-indexlist)
+{
+	IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);
+
+	if (index-full_ordered  index-indpred == NIL)
+		nindex++;
+}
+
+if (nindex == 0)
+	return NULL;
+
+pathkeys_array0 = palloc0(sizeof(List*) * nindex * 2);
+pathkeys_array  = pathkeys_array0;
+