Re: [HACKERS] Get more from indices.

2014-04-18 Thread Kyotaro HORIGUCHI
Hello,

 I thought some more about this patch, and realized that it's more or less
 morally equivalent to allowing references to ungrouped variables when the
 query has a GROUP BY clause listing all the columns of the primary key.
 In that case the parser is effectively pretending that the GROUP BY list
 contains additional implicit entries that are functionally dependent on
 the entries that are actually there.  In this patch, what we want to do
 is recognize that trailing entries in an ORDER BY list are semantically
 no-ops and can be ignored because they are functionally dependent on
 earlier entries.

Ah, that sounds smarter than extending pathekys. I feel it preferable.

 Now, the reason that the parser restricts the functional dependency
 deduction to a primary key is that it wants to be able to identify a
 constraint OID that the query is dependent on to be semantically valid.
 In this case, we don't need such an OID, so just finding any old unique
 index on not-null columns is good enough.  (If someone drops the index,
 the optimization might become incorrect, but that would force replanning
 anyway.)

Agreed,

 However, this way of thinking about it shows that the patch is missing
 possible optimizations.  If we have ORDER BY a, b, c and (a,b) is the
 primary key, then including c in the ORDER BY list is semantically
 redundant, *whether or not we use an indexscan on the pkey index at all*.
 More: if we have ORDER BY a, b, c and the primary key is (b,a), we
 can still discard c from the sort requirement, even though the pkey
 index as such isn't helpful for producing the required order.

Hmm yes, it really seems expectable.

 So hacking up the pathkeys attributed to the indexscan is the wrong thing.
 Rather, what we should be looking to do is decide that c is a useless
 pathkey and remove it from the query_pathkeys, much as we'd do if we found
 c = constant in WHERE.  That would allow optimization of other query
 plans besides scan-the-pkey-index plans.

Ok, I am convinced that your suggestion - truncating
query_pathkeys by removing eventually no-op entries - seems
preferable and will have wider effect naturally - more promised.

I won't persist with the way this patch currently does but the
new patch of course can't come up within this CF. I will agree if
you decide to make this patch 'Returned with Feedback'. (I feel a
little sad for 'Rejected' but you can justly do that if you think
that the patch comming up next is utterly different from this
one:()

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2014-04-18 Thread Tom Lane
Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes:
 Ok, I am convinced that your suggestion - truncating
 query_pathkeys by removing eventually no-op entries - seems
 preferable and will have wider effect naturally - more promised.

 I won't persist with the way this patch currently does but the
 new patch of course can't come up within this CF. I will agree if
 you decide to make this patch 'Returned with Feedback'. (I feel a
 little sad for 'Rejected' but you can justly do that if you think
 that the patch comming up next is utterly different from this
 one:()

I marked it as returned with feedback.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2014-04-15 Thread Tom Lane
Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes:
 [ pathkey_and_uniqueindx_v10_20130411.patch ]

I thought some more about this patch, and realized that it's more or less
morally equivalent to allowing references to ungrouped variables when the
query has a GROUP BY clause listing all the columns of the primary key.
In that case the parser is effectively pretending that the GROUP BY list
contains additional implicit entries that are functionally dependent on
the entries that are actually there.  In this patch, what we want to do
is recognize that trailing entries in an ORDER BY list are semantically
no-ops and can be ignored because they are functionally dependent on
earlier entries.

Now, the reason that the parser restricts the functional dependency
deduction to a primary key is that it wants to be able to identify a
constraint OID that the query is dependent on to be semantically valid.
In this case, we don't need such an OID, so just finding any old unique
index on not-null columns is good enough.  (If someone drops the index,
the optimization might become incorrect, but that would force replanning
anyway.)

However, this way of thinking about it shows that the patch is missing
possible optimizations.  If we have ORDER BY a, b, c and (a,b) is the
primary key, then including c in the ORDER BY list is semantically
redundant, *whether or not we use an indexscan on the pkey index at all*.
More: if we have ORDER BY a, b, c and the primary key is (b,a), we
can still discard c from the sort requirement, even though the pkey
index as such isn't helpful for producing the required order.

So hacking up the pathkeys attributed to the indexscan is the wrong thing.
Rather, what we should be looking to do is decide that c is a useless
pathkey and remove it from the query_pathkeys, much as we'd do if we found
c = constant in WHERE.  That would allow optimization of other query
plans besides scan-the-pkey-index plans.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2014-04-10 Thread Etsuro Fujita
(2014/04/10 0:08), Tom Lane wrote:
 Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes:
 Oops! I found a bug in this patch. The previous v8 patch missed
 the case that build_index_pathkeys() could build a partial
 pathkeys from the index tlist.
 
 TBH I think that's barely the tip of the iceberg of cases where this
 patch will get the wrong answer.

 Also, I don't see it doing anything to check the ordering
 of multiple index columns

I think that the following code in index_pathkeys_are_extensible() would
check the ordering:

+   if (!pathkeys_contained_in(pathkeys, root-query_pathkeys))
+   return false;

 Also, what's with the success return
 before the loop:
 
 + if (list_length(pathkeys) == list_length(root-query_pathkeys))
 + return true;
 
 At this point you haven't proven *anything at all* about whether the
 index columns have something to do with the query_pathkeys.

I think that the two pathkeys would be proved to be equal, if the both
conditions are satisfied.

Thanks,

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2014-04-10 Thread Tom Lane
Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
 (2014/04/10 0:08), Tom Lane wrote:
 TBH I think that's barely the tip of the iceberg of cases where this
 patch will get the wrong answer.

 Also, I don't see it doing anything to check the ordering
 of multiple index columns

 I think that the following code in index_pathkeys_are_extensible() would
 check the ordering:
 + if (!pathkeys_contained_in(pathkeys, root-query_pathkeys))
 + return false;

Hm ... if you're relying on that, then what's the point of the new loop
at all?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2014-04-10 Thread Etsuro Fujita
(2014/04/10 22:25), Tom Lane wrote:
 Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
 (2014/04/10 0:08), Tom Lane wrote:
 TBH I think that's barely the tip of the iceberg of cases where this
 patch will get the wrong answer.
 
 Also, I don't see it doing anything to check the ordering
 of multiple index columns
 
 I think that the following code in index_pathkeys_are_extensible() would
 check the ordering:
 +if (!pathkeys_contained_in(pathkeys, root-query_pathkeys))
 +return false;
 
 Hm ... if you're relying on that, then what's the point of the new loop
 at all?

The point is that from the discussion [1], we allow the index pathkeys
to be extended to query_pathkeys if each *remaining* pathkey in
query_pathkey is a Var belonging to the indexed relation.  The code is
confusing, though.  Sorry, that is my faults.

Thanks,

[1] http://www.postgresql.org/message-id/29637.1389064...@sss.pgh.pa.us

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2014-04-10 Thread Kyotaro HORIGUCHI
Hi, sorry for the absense. I've been back.

Attached is the patch following the discussion below.

  (2014/04/10 0:08), Tom Lane wrote:
  TBH I think that's barely the tip of the iceberg of cases where this
  patch will get the wrong answer.
  
  Also, I don't see it doing anything to check the ordering
  of multiple index columns
  
  I think that the following code in index_pathkeys_are_extensible() would
  check the ordering:
  +  if (!pathkeys_contained_in(pathkeys, root-query_pathkeys))
  +  return false;
  
  Hm ... if you're relying on that, then what's the point of the new loop
  at all?
 
 The point is that from the discussion [1], we allow the index pathkeys
 to be extended to query_pathkeys if each *remaining* pathkey in
 query_pathkey is a Var belonging to the indexed relation.  The code is
 confusing, though.  Sorry, that is my faults.

Hmm, I found that the iterations for the part that already
checked by pathkeys_contained_in are not only needless but a bit
heavy. And the loop seems a little verbose. I did for the patch,
in index_pathkeys_are_extensible,

 - Avoiding duplicate check with pathkeys_contained_in.

   I put similar code to list_nth_cell since it is not exposed
   outside of list.c.

 - Add comment to clarify the purpose of the loop.

 - Simplify the check for the remaining keycolumns




 Thanks,
 
 [1] http://www.postgresql.org/message-id/29637.1389064...@sss.pgh.pa.us
 
 Best regards,
 Etsuro Fujita
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2014-04-10 Thread Kyotaro HORIGUCHI
# Sorry for accidentialy sending the previous mail unfinished.

## ...and I seem to have bombed uncertain files off out of my
## home directory by accident, too :(

=
Hi, sorry for the absense. I've been back.

Thank you for continuing this discussion.

Attached is the patch following the discussion below.

  (2014/04/10 0:08), Tom Lane wrote:
  TBH I think that's barely the tip of the iceberg of cases where this
  patch will get the wrong answer.
  
  Also, I don't see it doing anything to check the ordering
  of multiple index columns
  
  I think that the following code in index_pathkeys_are_extensible() would
  check the ordering:
  +  if (!pathkeys_contained_in(pathkeys, root-query_pathkeys))
  +  return false;
  
  Hm ... if you're relying on that, then what's the point of the new loop
  at all?
 
 The point is that from the discussion [1], we allow the index pathkeys
 to be extended to query_pathkeys if each *remaining* pathkey in
 query_pathkey is a Var belonging to the indexed relation.  The code is
 confusing, though.  Sorry, that is my faults.

Hmm, I found that the iterations for the part that already
checked by pathkeys_contained_in are not only useless but a bit
heavy. And the loop seems a little verbose. I did for the patch,
in index_pathkeys_are_extensible,

 - Avoiding duplicate check with pathkeys_contained_in.

   I put similar code to list_nth_cell since it is not exposed
   outside of list.c.

 - Add comment to clarify the purpose of the loop.

 - Simplify the check for the remaining keycolumns

I think this makes some things clearer.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index bfb4b9f..ff5c88c 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1790,6 +1790,7 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node)
 	WRITE_BOOL_FIELD(predOK);
 	WRITE_BOOL_FIELD(unique);
 	WRITE_BOOL_FIELD(immediate);
+	WRITE_BOOL_FIELD(allnotnull);
 	WRITE_BOOL_FIELD(hypothetical);
 	/* we don't bother with fields copied from the pg_am entry */
 }
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index a912174..4376e95 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -951,8 +951,11 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	{
 		index_pathkeys = build_index_pathkeys(root, index,
 			  ForwardScanDirection);
-		useful_pathkeys = truncate_useless_pathkeys(root, rel,
-	index_pathkeys);
+		if (index_pathkeys_are_extensible(root, index, index_pathkeys))
+			useful_pathkeys = root-query_pathkeys;
+		else
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+		index_pathkeys);
 		orderbyclauses = NIL;
 		orderbyclausecols = NIL;
 	}
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 9179c61..5edca4f 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -502,6 +502,76 @@ build_index_pathkeys(PlannerInfo *root,
 }
 
 /*
+ * index_pathkeys_are_extensible
+ *	  Check whether the pathkeys are extensible to query_pathkeys.
+ */
+bool
+index_pathkeys_are_extensible(PlannerInfo *root,
+			  IndexOptInfo *index,
+			  List *pathkeys)
+{
+	bool		result;
+	ListCell   *lc1, *remain;
+
+	if (root-query_pathkeys == NIL || pathkeys == NIL)
+		return false;
+
+	/* This index is not suitable for pathkey extension */
+	if (!index-unique || !index-immediate || !index-allnotnull)
+		return false;
+
+	/* pathkeys is a prefixing proper subset of index tlist */
+	if (list_length(pathkeys)  list_length(index-indextlist))
+		return false;
+
+	if (!pathkeys_contained_in(pathkeys, root-query_pathkeys))
+		return false;
+
+	if (list_length(pathkeys) == list_length(root-query_pathkeys))
+		return true;
+
+	Assert(list_length(root-query_pathkeys)  list_length(pathkeys));
+
+	/*
+	 * Check if all unchecked elements of query_patheys are simple Vars for
+	 * the same relation.
+	 */
+
+	/* list_nth_cell is not exposed publicly.. */
+	if (list_length(pathkeys) == list_length(root-query_pathkeys) - 1)
+		remain = list_tail(root-query_pathkeys);
+	else
+	{
+		int n = list_length(pathkeys);
+
+		remain = root-query_pathkeys-head;
+		while(n--  0) remain = remain-next;
+	}
+
+	result = true;
+	for_each_cell(lc1, remain)
+	{
+		PathKey*pathkey = (PathKey *) lfirst(lc1);
+		ListCell   *lc2;
+		
+		foreach(lc2, pathkey-pk_eclass-ec_members)
+		{
+			EquivalenceMember *member = (EquivalenceMember *) lfirst(lc2);
+			
+			if (!bms_equal(member-em_relids, index-rel-relids) ||
+!IsA(member-em_expr, Var))
+			{
+result = false;
+break;
+			}
+		}
+
+		if (!result) break;
+	}
+	return result;
+}
+
+/*
  * build_expression_pathkey
  *	  Build a pathkeys list that describes an ordering by a single expression
  *	  using the given sort operator.
diff --git 

Re: [HACKERS] Get more from indices.

2014-04-09 Thread Tom Lane
Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes:
 Oops! I found a bug in this patch. The previous v8 patch missed
 the case that build_index_pathkeys() could build a partial
 pathkeys from the index tlist.

TBH I think that's barely the tip of the iceberg of cases where this
patch will get the wrong answer.  It looks like it thinks that *any*
Var belonging to the indexed relation must be the same one indexed
by the index.  Also, I don't see it doing anything to check the ordering
of multiple index columns --- for instance, an index on (a,b) and one
on (b,a) would both pass or fail identically, though surely only one
should match ORDER BY a,b,  Also, what's with the success return
before the loop:

+   if (list_length(pathkeys) == list_length(root-query_pathkeys))
+   return true;

At this point you haven't proven *anything at all* about whether the
index columns have something to do with the query_pathkeys.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2014-04-07 Thread Etsuro Fujita

Hi Horiguchi-san,

Sorry for not reviewing this patch in the last CF.

(2014/03/10 16:21), Kyotaro HORIGUCHI wrote:

Oops! I found a bug in this patch. The previous v8 patch missed
the case that build_index_pathkeys() could build a partial
pathkeys from the index tlist.

This causes the situation follows,

===
=# \d cu11
  Table public.cu11
  Column |  Type   | Modifiers
+-+---
  a  | integer | not null
  b  | integer | not null
  c  | integer |
  d  | text|
Indexes:
 cu11_a_b_idx UNIQUE, btree (a, b)

s=# explain (costs off) select * from cu11 order by a, c ,d;
   QUERY PLAN
---
  Index Scan using cu11_a_b_idx on cu11
(1 row)
===

Where the simple ORDER BY a, c, d on the table with index on
columns (a, b) results simple index scan which cannot perform the
order.

The attached v9 patche is fixed by adding a check for the case
into index_pathkeys_are_extensible(), and rebase to current HEAD.


Good catch!

ISTM that the biggest concern about this patch would be whether it's 
worth complicating the code, because the range of application of the 
patch is not so wide as is.  So, all we need to do is show important use 
cases that prove the effectiveness of the patch.  Sorry, I can't come up 
with a good idea, though.


Thanks,

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2014-03-10 Thread Kyotaro HORIGUCHI
Oops! I found a bug in this patch. The previous v8 patch missed
the case that build_index_pathkeys() could build a partial
pathkeys from the index tlist.

This causes the situation follows,

===
=# \d cu11
 Table public.cu11
 Column |  Type   | Modifiers 
+-+---
 a  | integer | not null
 b  | integer | not null
 c  | integer | 
 d  | text| 
Indexes:
cu11_a_b_idx UNIQUE, btree (a, b)

s=# explain (costs off) select * from cu11 order by a, c ,d;
  QUERY PLAN   
---
 Index Scan using cu11_a_b_idx on cu11
(1 row)
===

Where the simple ORDER BY a, c, d on the table with index on
columns (a, b) results simple index scan which cannot perform the
order.

The attached v9 patche is fixed by adding a check for the case
into index_pathkeys_are_extensible(), and rebase to current HEAD.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index bfb4b9f..ff5c88c 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1790,6 +1790,7 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node)
 	WRITE_BOOL_FIELD(predOK);
 	WRITE_BOOL_FIELD(unique);
 	WRITE_BOOL_FIELD(immediate);
+	WRITE_BOOL_FIELD(allnotnull);
 	WRITE_BOOL_FIELD(hypothetical);
 	/* we don't bother with fields copied from the pg_am entry */
 }
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index a912174..4376e95 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -951,8 +951,11 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	{
 		index_pathkeys = build_index_pathkeys(root, index,
 			  ForwardScanDirection);
-		useful_pathkeys = truncate_useless_pathkeys(root, rel,
-	index_pathkeys);
+		if (index_pathkeys_are_extensible(root, index, index_pathkeys))
+			useful_pathkeys = root-query_pathkeys;
+		else
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+		index_pathkeys);
 		orderbyclauses = NIL;
 		orderbyclausecols = NIL;
 	}
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 9179c61..01479f4 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -502,6 +502,65 @@ build_index_pathkeys(PlannerInfo *root,
 }
 
 /*
+ * index_pathkeys_are_extensible
+ *	  Check whether the pathkeys are extensible to query_pathkeys.
+ */
+bool
+index_pathkeys_are_extensible(PlannerInfo *root,
+			  IndexOptInfo *index,
+			  List *pathkeys)
+{
+	bool		result;
+	ListCell   *lc1;
+
+	if (root-query_pathkeys == NIL || pathkeys == NIL)
+		return false;
+
+	/* This index is not suitable for pathkey extension */
+	if (!index-unique || !index-immediate || !index-allnotnull)
+		return false;
+
+	/* pathkeys is a prefixing proper subset of index tlist */
+	if (list_length(pathkeys)  list_length(index-indextlist))
+		return false;
+
+	if (!pathkeys_contained_in(pathkeys, root-query_pathkeys))
+		return false;
+
+	if (list_length(pathkeys) == list_length(root-query_pathkeys))
+		return true;
+
+	result = true;
+	foreach(lc1, root-query_pathkeys)
+	{
+		PathKey*pathkey = (PathKey *) lfirst(lc1);
+		bool		found = false;
+		ListCell   *lc2;
+
+		foreach(lc2, pathkey-pk_eclass-ec_members)
+		{
+			EquivalenceMember *member = (EquivalenceMember *) lfirst(lc2);
+
+			if (!bms_equal(member-em_relids, index-rel-relids) ||
+!IsA(member-em_expr, Var))
+continue;
+			else
+			{
+found = true;
+break;
+			}
+		}
+
+		if (!found)
+		{
+			result = false;
+			break;
+		}
+	}
+	return result;
+}
+
+/*
  * build_expression_pathkey
  *	  Build a pathkeys list that describes an ordering by a single expression
  *	  using the given sort operator.
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 73ba2f6..c61cddb 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -333,6 +333,26 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			info-immediate = index-indimmediate;
 			info-hypothetical = false;
 
+			info-allnotnull = true;
+			for (i = 0; i  ncolumns; i++)
+			{
+int			attrno = info-indexkeys[i];
+
+if (attrno == 0)
+{
+	info-allnotnull = false;
+	break;
+}
+else if (attrno  0)
+{
+	if (!relation-rd_att-attrs[attrno - 1]-attnotnull)
+	{
+		info-allnotnull = false;
+		break;
+	}
+}
+			}
+
 			/*
 			 * Estimate the index size.  If it's not a partial index, we lock
 			 * the number-of-tuples estimate to equal the parent table; if it
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index c607b36..119bb31 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -525,6 +525,7 @@ typedef struct IndexOptInfo
 	

Re: [HACKERS] Get more from indices.

2014-03-03 Thread Kyotaro HORIGUCHI
I marked this patch as 'Ready for Committer' by myself according
to the following discussion.

Thanks.

 At Tue, 25 Feb 2014 16:35:55 -0500, Robert Haas wrote
  On Tue, Feb 25, 2014 at 3:36 AM, Kyotaro HORIGUCHI
  horiguchi.kyot...@lab.ntt.co.jp wrote:
   May I mark this patch as Ready for Committer by myself since
   this was already marked so in last CF3 and have had no objection
   or new suggestion in this CF4 for more than a month?
  
  Sounds fair.
 
 Thank you, I will send this patch to commtters after waiting
 another few days for more suggestions.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2014-02-25 Thread Kyotaro HORIGUCHI
Hello.

May I mark this patch as Ready for Committer by myself since
this was already marked so in last CF3 and have had no objection
or new suggestion in this CF4 for more than a month?

At Tue, 14 Jan 2014 18:08:10 +0900, Kyotaro HORIGUCHI wrote
 Hello, since CF4 is already closed but this patch ramains marked
   CF3
 as 'Ready for Committer', please let me re-post the latest
 version for CF4 to get rid of vanishing :-p
 

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2014-02-25 Thread Robert Haas
On Tue, Feb 25, 2014 at 3:36 AM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:
 May I mark this patch as Ready for Committer by myself since
 this was already marked so in last CF3 and have had no objection
 or new suggestion in this CF4 for more than a month?

Sounds fair.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2014-02-25 Thread Kyotaro HORIGUCHI
Hello,

At Tue, 25 Feb 2014 16:35:55 -0500, Robert Haas wrote
 On Tue, Feb 25, 2014 at 3:36 AM, Kyotaro HORIGUCHI
 horiguchi.kyot...@lab.ntt.co.jp wrote:
  May I mark this patch as Ready for Committer by myself since
  this was already marked so in last CF3 and have had no objection
  or new suggestion in this CF4 for more than a month?
 
 Sounds fair.

Thank you, I will send this patch to commtters after waiting
another few days for more suggestions.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2014-01-14 Thread Kyotaro HORIGUCHI
Hello, since CF4 is already closed but this patch ramains marked
as 'Ready for Committer', please let me re-post the latest
version for CF4 to get rid of vanishing :-p

 tgl But aside from hasty typos,
 
 Oops! I've picked up wrong node. It always denies pathkeys extension.
 
 | !IsA(member, Var))
 
 is a mistake of the following.
 
 | !IsA(member-em_expr, Var))

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 4f63906..b695e40 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1790,6 +1790,7 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node)
 	WRITE_BOOL_FIELD(predOK);
 	WRITE_BOOL_FIELD(unique);
 	WRITE_BOOL_FIELD(immediate);
+	WRITE_BOOL_FIELD(allnotnull);
 	WRITE_BOOL_FIELD(hypothetical);
 	/* we don't bother with fields copied from the pg_am entry */
 }
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 606734a..0b2f529 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -952,8 +952,11 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 	{
 		index_pathkeys = build_index_pathkeys(root, index,
 			  ForwardScanDirection);
-		useful_pathkeys = truncate_useless_pathkeys(root, rel,
-	index_pathkeys);
+		if (index_pathkeys_are_extensible(root, index, index_pathkeys))
+			useful_pathkeys = root-query_pathkeys;
+		else
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+		index_pathkeys);
 		orderbyclauses = NIL;
 		orderbyclausecols = NIL;
 	}
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 9c8ede6..ad3a8b7 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -502,6 +502,60 @@ build_index_pathkeys(PlannerInfo *root,
 }
 
 /*
+ * index_pathkeys_are_extensible
+ *	  Check whether the pathkeys are extensible to query_pathkeys.
+ */
+bool
+index_pathkeys_are_extensible(PlannerInfo *root,
+			  IndexOptInfo *index,
+			  List *pathkeys)
+{
+	bool		result;
+	ListCell   *lc1;
+
+	if (root-query_pathkeys == NIL || pathkeys == NIL)
+		return false;
+
+	if (!index-unique || !index-immediate || !index-allnotnull)
+		return false;
+
+	if (!pathkeys_contained_in(pathkeys, root-query_pathkeys))
+		return false;
+
+	if (list_length(pathkeys) == list_length(root-query_pathkeys))
+		return true;
+
+	result = true;
+	foreach(lc1, root-query_pathkeys)
+	{
+		PathKey*pathkey = (PathKey *) lfirst(lc1);
+		bool		found = false;
+		ListCell   *lc2;
+
+		foreach(lc2, pathkey-pk_eclass-ec_members)
+		{
+			EquivalenceMember *member = (EquivalenceMember *) lfirst(lc2);
+
+			if (!bms_equal(member-em_relids, index-rel-relids) ||
+!IsA(member-em_expr, Var))
+continue;
+			else
+			{
+found = true;
+break;
+			}
+		}
+
+		if (!found)
+		{
+			result = false;
+			break;
+		}
+	}
+	return result;
+}
+
+/*
  * build_expression_pathkey
  *	  Build a pathkeys list that describes an ordering by a single expression
  *	  using the given sort operator.
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index de981cb..4e24220 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -333,6 +333,26 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			info-immediate = index-indimmediate;
 			info-hypothetical = false;
 
+			info-allnotnull = true;
+			for (i = 0; i  ncolumns; i++)
+			{
+int			attrno = info-indexkeys[i];
+
+if (attrno == 0)
+{
+	info-allnotnull = false;
+	break;
+}
+else if (attrno  0)
+{
+	if (!relation-rd_att-attrs[attrno - 1]-attnotnull)
+	{
+		info-allnotnull = false;
+		break;
+	}
+}
+			}
+
 			/*
 			 * Estimate the index size.  If it's not a partial index, we lock
 			 * the number-of-tuples estimate to equal the parent table; if it
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index a9219e0..d9a3b9b 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -525,6 +525,7 @@ typedef struct IndexOptInfo
 	bool		predOK;			/* true if predicate matches query */
 	bool		unique;			/* true if a unique index */
 	bool		immediate;		/* is uniqueness enforced immediately? */
+	bool		allnotnull;		/* true if index's keys are all not null */
 	bool		hypothetical;	/* true if index doesn't really exist */
 	bool		canreturn;		/* can index return IndexTuples? */
 	bool		amcanorderbyop; /* does AM support order by operator result? */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 999adaa..5ee2e56 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -196,5 +196,8 @@ extern List *truncate_useless_pathkeys(PlannerInfo *root,
 		  RelOptInfo *rel,
 		  List *pathkeys);
 extern bool 

Re: [HACKERS] Get more from indices.

2014-01-07 Thread Tom Lane
Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes:
 The following modification to v7 does this.

 =
 diff --git a/src/backend/optimizer/path/pathkeys.c 
 b/src/backend/optimizer/path/pathkeys.c
 index 380f3ba..233e21c 100644
 --- a/src/backend/optimizer/path/pathkeys.c
 +++ b/src/backend/optimizer/path/pathkeys.c
 @@ -536,7 +536,8 @@ index_pathkeys_are_extensible(PlannerInfo *root,
   {
   EquivalenceMember *member = (EquivalenceMember *) 
 lfirst(lc2);
 
 - if (!bms_equal(member-em_relids, index-rel-relids))
 + if (!bms_equal(member-em_relids, index-rel-relids) ||
 + !IsA(member, Var))
   continue;
   else
   {
 ==

I'm pretty sure that IsA test prevents the optimization from ever firing.

But aside from hasty typos, is there enough left of this optimization to
be worth the complication?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2014-01-07 Thread Kyotaro HORIGUCHI
Hello,

tgl I'm pretty sure that IsA test prevents the optimization from ever firing.

Thank you.

tgl But aside from hasty typos,

Oops! I've picked up wrong node. It always denies pathkeys extension.

| !IsA(member, Var))

is a mistake of the following.

| !IsA(member-em_expr, Var))

tgl is there enough left of this optimization to
tgl be worth the complication?

Certainly.

This patch is intended to be with my another UNION-ALL
optimization pathce at first. It does very much with
UNION-ORDERBY-LIMIT and also with UNION_ALL(Partitioned
tables)-DISTINCT-ORDERBY-LIMIT.


= test tables
create table pu1 (a int not null, b int not null, c int, d text);
create unique index i_pu1_ab on pu1 (a, b);
create table cu11 (like pu1 including all) inherits (pu1);
create table cu12 (like pu1 including all) inherits (pu1);
create table pu2 (like pu1 including all);
create table cu21 (like pu2 including all) inherits (pu2);
create table cu22 (like pu2 including all) inherits (pu2);
insert into cu11 (select a / 5, 4 - (a % 5), a, 'cu11' from 
generate_series(00, 09) a);
insert into cu12 (select a / 5, 4 - (a % 5), a, 'cu12' from 
generate_series(10, 19) a);
insert into cu21 (select a / 5, 4 - (a % 5), a, 'cu21' from 
generate_series(20, 29) a);
insert into cu22 (select a / 5, 4 - (a % 5), a, 'cu22' from 
generate_series(30, 39) a);


Following example is an implicit union derived from partitioned
tables created as above. Limit is added to highlight the effect.

9.4dev with no patch, 

| =# explain (analyze on, costs off) select distinct * from pu1 order by a, b 
limit 100;
| QUERY PLAN  
| 
|  Limit (actual time=246.653..246.730 rows=100 loops=1)
|  -  Unique (actual time=246.646..246.713 rows=100 loops=1)
|   -  Sort (actual time=246.645..246.666 rows=100 loops=1)
|Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
|Sort Method: external sort  Disk: 5280kB
|-  Append (actual time=0.017..52.608 rows=20 loops=1)
| -  Seq Scan on pu1 (actual time=0.001..0.001 rows=0 loops=1)
| -  Seq Scan on cu11 (actual time=0.015..17.047 rows=10 loops=1)
| -  Seq Scan on cu12 (actual time=0.007..15.474 rows=10 loops=1)
|  Total runtime: 247.854 ms

Fairly common query. It will get following plan with this patch.


| =# explain (analyze on, costs off) select distinct * from pu1 order by a, b 
limit 100;
|   QUERY PLAN
| -
|  Limit (actual time=0.108..0.350 rows=100 loops=1)
|  -  Unique (actual time=0.104..0.329 rows=100 loops=1)
|   -  Merge Append (actual time=0.103..0.214 rows=100 loops=1)
| Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
| -  Index Scan .. on pu1 (actual time=0.003..0.003 rows=0 loops=1)
| -  Index Scan .. on cu11 (actual time=0.049..0.110 rows=100 loops=1)
| -  Index Scan .. on cu12 (actual time=0.044..0.044 rows=1 loops=1)
|  Total runtime: 0.666 ms


The same could be said on union with my another union-pullup
patch.  We will get the following result with only the
union-pullup patch.

|=# explain (analyze on, costs off) select * from cu11 union select * from cu12 
union select * from cu21 union select * from cu22 order by a limit 1;
|   QUERY PLAN
|---
| Limit (actual time=474.825..482.326 rows=1 loops=1)
| -  Unique (actual time=474.824..481.357 rows=1 loops=1)
|  -  Sort (actual time=474.823..477.101 rows=1 loops=1)
|   Sort Key: cu11.a, cu11.b, cu11.c, cu11.d
|   Sort Method: external sort  Disk: 10568kB
|   -  Append (actual time=0.018..99.310 rows=40 loops=1)
|-  Seq Scan on cu11 (actual time=0.017..16.263 rows=10 loops=1)
|-  Seq Scan on cu12 (actual time=0.006..14.831 rows=10 loops=1)
|-  Seq Scan on cu21 (actual time=0.006..14.766 rows=10 loops=1)
|-  Seq Scan on cu22 (actual time=0.006..14.721 rows=10 loops=1)
| Total runtime: 484.555 ms

This is also a quite common operation but implicit DISTINCT
prevents planner from selecting index scans. (ORDER BY is not
essential but needed because planner does not consult
distinct_pathkeys on planning scans and I've left it as it is.)

The planner gives the following plan with this patch.

| =# explain (analyze on, costs off) select * from cu11 union select * from 
cu12 union select * from cu21 union select * from cu22 order by a limit 1;
|QUERY PLAN   
| -
|  Limit (actual time=0.197..14.739 rows=1 loops=1)
|  -  Unique (actual time=0.191..13.527 rows=1 loops=1)
|   -  Merge Append (actual time=0.191..7.894 rows=1 loops=1)
| 

Re: [HACKERS] Get more from indices.

2014-01-06 Thread Etsuro Fujita
Tom Lane wrote:
 Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
  [ pathkey_and_uniqueindx_v7_20131203.patch ]

 I started to look at this patch.  I don't understand the reason for the
 foreach loop in index_pathkeys_are_extensible (and the complete lack of
 comments in the patch isn't helping).  Isn't it sufficient to check that
 the index is unique/immediate/allnotnull and its ordering is a prefix of
 query_pathkeys?  If not, what's the rationale for the specific tests being
 made on the pathkeys --- this code doesn't make much sense to me.

Thank you for taking time to look at this patch.  I think it's not
sufficient to check those things.  Let me explain the reason why this patch
has that code.  The patch has that code in order to prevent
build_join_pathkeys() from building incorrect join pathkeys', where the
pathkeys for a join relation constructed by mergejoin or nestloop join are
built normally just by using the outer path's pathkeys.  Without that code,
the patch would produce an incorrect result for such a case.  An example
will be shown below.  A simple approach to avoid this problem would be to
apply this idea only when each pathkey in query_pathkeys references the
indexed relation in addition to that the index is
unique/immediate/allnotnull and its ordering is a prefix of query_pathkeys.
That's the reason.

[Data]
CREATE TABLE t (a int not null, b int not null, c int, d text);
CREATE UNIQUE INDEX i_t_ab ON t (a, b);
INSERT INTO t (SELECT a / 5, 4 - (a % 5), a, 't' FROM
generate_series(00, 09) a);
ANALYZE t;
CREATE TABLE t2 (e text, f int);
INSERT INTO t2 VALUES ('t', 2);
INSERT INTO t2 VALUES ('t', 1);
ANALYZE t2;

[Query]
EXPLAIN SELECT * FROM t, t2 WHERE t.a  10 AND t.d = t2.e ORDER BY t.a, t.b,
t.c, t.d, t2.f LIMIT 4;
   QUERY PLAN


 Limit  (cost=0.29..3.96 rows=4 width=20)
   -  Nested Loop  (cost=0.29..110.17 rows=120 width=20)
 Join Filter: (t.d = t2.e)
 -  Index Scan using i_t_ab on t  (cost=0.29..107.34 rows=60
width=14)
   Index Cond: (a  10)
 -  Materialize  (cost=0.00..1.03 rows=2 width=6)
   -  Seq Scan on t2  (cost=0.00..1.02 rows=2 width=6)
(7 rows)

SELECT * FROM t, t2 WHERE t.a  10 AND t.d = t2.e ORDER BY t.a, t.b, t.c,
t.d, t2.f LIMIT 4;
 a | b | c | d | e | f
---+---+---+---+---+---
 0 | 0 | 4 | t | t | 2
 0 | 0 | 4 | t | t | 1
 0 | 1 | 3 | t | t | 2
 0 | 1 | 3 | t | t | 1
(4 rows)

(Note the column f is sorted in the descending order.)

Sorry for the delay.

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2014-01-06 Thread Tom Lane
Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
 Thank you for taking time to look at this patch.  I think it's not
 sufficient to check those things.  Let me explain the reason why this patch
 has that code.  The patch has that code in order to prevent
 build_join_pathkeys() from building incorrect join pathkeys', where the
 pathkeys for a join relation constructed by mergejoin or nestloop join are
 built normally just by using the outer path's pathkeys.  Without that code,
 the patch would produce an incorrect result for such a case.

Ah, thanks for the example.  ISTM that really the issue is that if an
originally-unique row is expanded into multiple rows, those rows are
sort peers so far as the unique-index column(s) are concerned, and so
now the lower-order ORDER BY columns do matter after all.

The problem is that joining isn't the only way that such expansion can
happen.  Set-returning functions in the targetlist are another way,
and I'm not sure that there aren't others.  Here's an example that
I'm pretty sure breaks your patch (though I didn't actually reinstall
the patch to try it):

create or replace function rev(n int) returns setof int language plpgsql
as 'begin for i in reverse n..1 loop return next i; end loop; end';

create table tt (f1 int primary key, f2 int);

insert into tt values (1,2), (2,3);

select f1, rev(f2) from tt order by 1,2;

Also, even if the row-expansion mechanism is a join, it's certainly
insufficient to check that the lower-order sort column is an expression
in variables of the index's table.  Something like f2 + random() is
going to need an explicit sort step anyway.

These particular objections could be worked around by checking for
set-returning functions and volatile functions in the lower-order
ORDER BY expressions.  But I have to say that I think I'm losing
faith in the entire idea.  I have little confidence that there
aren't other cases that will break it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2014-01-06 Thread Kyotaro HORIGUCHI
Hello,

 Tom Lane wrote:
  I started to look at this patch.  I don't understand the reason for the
  foreach loop in index_pathkeys_are_extensible (and the complete lack of
  comments in the patch isn't helping).  Isn't it sufficient to check that
  the index is unique/immediate/allnotnull and its ordering is a prefix of
  query_pathkeys?  If not, what's the rationale for the specific tests being
  made on the pathkeys --- this code doesn't make much sense to me.
 
 Thank you for taking time to look at this patch.  I think it's not
 sufficient to check those things.  Let me explain the reason why this patch
 has that code.  The patch has that code in order to prevent
 build_join_pathkeys() from building incorrect join pathkeys', where the
 pathkeys for a join relation constructed by mergejoin or nestloop join are
 built normally just by using the outer path's pathkeys.  Without that code,
 the patch would produce an incorrect result for such a case.  An example
 will be shown below.

 A simple approach to avoid this problem would be to
 apply this idea only when each pathkey in query_pathkeys references the
 indexed relation in addition to that the index is
 unique/immediate/allnotnull and its ordering is a prefix of query_pathkeys.
 That's the reason.

Utterly disregarding the chances of joins - the patch (v7)
already does so in some extent, ignoring the possibility of
partial extension for multi-table'd pathkeys - it is also
avoidable by simply passing a boolean
'extend_pathkeys_if_possible', or splitting into two functions
regarding the boolean. The check was not a yes-or-no decision but
a how-long-it-can-be-extended measuring in the previous version
(pathkey_and_uniqueindx_v5).  It has been simplified and splitted
out as individual function after.

 [Data]
 CREATE TABLE t (a int not null, b int not null, c int, d text);
 CREATE UNIQUE INDEX i_t_ab ON t (a, b);
 INSERT INTO t (SELECT a / 5, 4 - (a % 5), a, 't' FROM
 generate_series(00, 09) a);
 ANALYZE t;
 CREATE TABLE t2 (e text, f int);
 INSERT INTO t2 VALUES ('t', 2);
 INSERT INTO t2 VALUES ('t', 1);
 ANALYZE t2;
 
 [Query]
 EXPLAIN SELECT * FROM t, t2 WHERE t.a  10 AND t.d = t2.e ORDER BY t.a, t.b,
 t.c, t.d, t2.f LIMIT 4;
QUERY PLAN
 
 
  Limit  (cost=0.29..3.96 rows=4 width=20)
-  Nested Loop  (cost=0.29..110.17 rows=120 width=20)
  Join Filter: (t.d = t2.e)
  -  Index Scan using i_t_ab on t  (cost=0.29..107.34 rows=60
 width=14)
Index Cond: (a  10)
  -  Materialize  (cost=0.00..1.03 rows=2 width=6)
-  Seq Scan on t2  (cost=0.00..1.02 rows=2 width=6)
 (7 rows)
 
 SELECT * FROM t, t2 WHERE t.a  10 AND t.d = t2.e ORDER BY t.a, t.b, t.c,
 t.d, t2.f LIMIT 4;
  a | b | c | d | e | f
 ---+---+---+---+---+---
  0 | 0 | 4 | t | t | 2
  0 | 0 | 4 | t | t | 1
  0 | 1 | 3 | t | t | 2
  0 | 1 | 3 | t | t | 1
 (4 rows)
 
 (Note the column f is sorted in the descending order.)
 
 Sorry for the delay.
 
 Best regards,
 Etsuro Fujita

With best wishes for a happy New Yaar.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2014-01-06 Thread Kyotaro HORIGUCHI
Hello,

tgl The problem is that joining isn't the only way that such expansion can
tgl happen.  Set-returning functions in the targetlist are another way,
tgl and I'm not sure that there aren't others.  Here's an example that
tgl I'm pretty sure breaks your patch (though I didn't actually reinstall
tgl the patch to try it):
tgl 
tgl create or replace function rev(n int) returns setof int language plpgsql
tgl as 'begin for i in reverse n..1 loop return next i; end loop; end';
tgl 
tgl create table tt (f1 int primary key, f2 int);
tgl 
tgl insert into tt values (1,2), (2,3);
tgl 
tgl select f1, rev(f2) from tt order by 1,2;
tgl 
tgl Also, even if the row-expansion mechanism is a join, it's certainly
tgl insufficient to check that the lower-order sort column is an expression
tgl in variables of the index's table.  Something like f2 + random() is
tgl going to need an explicit sort step anyway.
tgl 
tgl These particular objections could be worked around by checking for
tgl set-returning functions and volatile functions in the lower-order
tgl ORDER BY expressions.  But I have to say that I think I'm losing
tgl faith in the entire idea.  I have little confidence that there
tgl aren't other cases that will break it.

I think that the required condition for all these ordering
problems is generating multiple rows for single input (for a
value of any column of the same table).

If a prefixing set of values correspond to a unique index appears
only once in a result, the result must can be fully-ordered by
the extended pathkeys. This is what this patch stands on. It
never be broken while the precondition is satisfied... I think.

On the other hand, the precondition should be satisfied when all
extended columns are simple Vars of the target table. I believe
Vars cannot produce multiple result. More close checking for
every possibility could be done but it should be a overdone.

The following modification to v7 does this.

=
diff --git a/src/backend/optimizer/path/pathkeys.c 
b/src/backend/optimizer/path/pathkeys.c
index 380f3ba..233e21c 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -536,7 +536,8 @@ index_pathkeys_are_extensible(PlannerInfo *root,
{
EquivalenceMember *member = (EquivalenceMember *) 
lfirst(lc2);
 
-   if (!bms_equal(member-em_relids, index-rel-relids))
+   if (!bms_equal(member-em_relids, index-rel-relids) ||
+   !IsA(member, Var))
continue;
else
{
==

The result is

postgres=# select f1, rev(f2) from tt order by 1, 2;
 f1 | rev 
+-
  1 |   1
  1 |   2
  2 |   1
  2 |   2
  2 |   3
==

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2014-01-06 Thread Kyotaro HORIGUCHI
 I think that the required condition for all these ordering

Careless wording. the 'required' condition above means 'necessary
(but not sufficient)' condition so I think the lack of the
precondition (=multiple rows) results in prevention of the
postcondition = ordering problems.

 problems is generating multiple rows for single input (for a
 value of any column of the same table).
 
 If a prefixing set of values correspond to a unique index appears
 only once in a result, the result must can be fully-ordered by
 the extended pathkeys. This is what this patch stands on. It
 never be broken while the precondition is satisfied... I think.
 
 On the other hand, the precondition should be satisfied when all
 extended columns are simple Vars of the target table. I believe
 Vars cannot produce multiple result. More close checking for
 every possibility could be done but it should be a overdone.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2014-01-06 Thread Etsuro Fujita
Kyotaro HORIGUCHI wrote:
 tgl The problem is that joining isn't the only way that such expansion
 tgl can happen.  Set-returning functions in the targetlist are another
 tgl way, and I'm not sure that there aren't others.  Here's an example
 tgl that I'm pretty sure breaks your patch (though I didn't actually
 tgl reinstall the patch to try it):

 tgl Also, even if the row-expansion mechanism is a join, it's certainly
 tgl insufficient to check that the lower-order sort column is an
 tgl expression in variables of the index's table.  Something like f2 +
 tgl random() is going to need an explicit sort step anyway.

 tgl These particular objections could be worked around by checking for
 tgl set-returning functions and volatile functions in the lower-order
 tgl ORDER BY expressions.  But I have to say that I think I'm losing
 tgl faith in the entire idea.  I have little confidence that there
 tgl aren't other cases that will break it.

 On the other hand, the precondition should be satisfied when all extended
 columns are simple Vars of the target table. I believe Vars cannot produce
 multiple result. More close checking for every possibility could be done
 but it should be a overdone.

 The following modification to v7 does this.

It seems to me that this would be reasonable at least as the first step and
that it would be better to leave more close checking for future work.

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2013-12-31 Thread Tom Lane
Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
 [ pathkey_and_uniqueindx_v7_20131203.patch ]

I started to look at this patch.  I don't understand the reason for the
foreach loop in index_pathkeys_are_extensible (and the complete lack of
comments in the patch isn't helping).  Isn't it sufficient to check that
the index is unique/immediate/allnotnull and its ordering is a prefix
of query_pathkeys?  If not, what's the rationale for the specific tests
being made on the pathkeys --- this code doesn't make much sense to me.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2013-12-10 Thread Etsuro Fujita
I wrote:
 Kyotaro HORIGUCHI wrote:
  I'm convinced of the validity of your patch. I'm satisfied with it.

 Then, if there are no objections of others, I'll mark this as Ready for
 Committer.

Done.

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2013-12-09 Thread Kyotaro HORIGUCHI
Thank you,

  One is, you put the added code for getrelation_info() out of the block for
  the condition (info-relam == BTREE_AM_OID) (though amcanorder would be
..
 By checking the following equation in build_index_paths(), the updated
 version of the patch guarantees that the result of an index scan is ordered:
 
 index_is_ordered = (index-sortopfamily != NULL);

Oops.. I forgot about it although many times I've seen...
You're right.

   Another is, you changed pathkeys expantion to be all-or-nothing decision.
   While this change should simplify the code slightly, it also dismisses
   the oppotunity for partially-extended pathkeys. Could you let me know
   the reason why you did so.
...
  We might be able to partially-extend the original
  pathkey list optimally in something significant, but that seems useless
  complexity to me.  So, I modified the patch to do the all-or-nothing
  decision.
 
 Here I mean the optimality for use in merge joins.

Ok, I'll follow your advice. I agree on the point about merit vs
complexity.

I'm convinced of the validity of your patch. I'm satisfied with
it. Thank you.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2013-12-09 Thread Etsuro Fujita
Kyotaro HORIGUCHI wrote:
 I'm convinced of the validity of your patch. I'm satisfied with it. Thank
 you.

Thank you for the reply.

Then, if there are no objections of others, I'll mark this as Ready for
Committer.

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2013-12-05 Thread Kyotaro HORIGUCHI
Hello, 

  I've modified the patch to work in such a way.  Also, as ISTM the patch
  is more complicated than what the patch really does, I've simplified the
  patch.
 
 I've revised the patch a bit.  Please find attached the patch.

Thank you, but it seems to me too simplified. You made two major
functional changes.

One is, you put the added code for getrelation_info() out of the
block for the condition (info-relam == BTREE_AM_OID) (though
amcanorder would be preferable..) Anyway the reason for the place
is to guarantee 'full_ordered' index always to be orderable.  I
believe the relation between them are not obvious. So your patch
has an oppotunity to make wrong assumption for possible indexes
which is not orderable but unique. Going on your way some
additional works would be needed to judge an index to be
orderable or not on checking the extensibility of pathkeys.

Another is, you changed pathkeys expantion to be all-or-nothing
decision. While this change should simplify the code slightly, it
also dismisses the oppotunity for partially-extended
pathkeys. Could you let me know the reason why you did so.

Any thoughts?

regards,
   
-- 
Kyotaro Horiguchi
NTT Open Source Software Center
   


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2013-12-05 Thread Etsuro Fujita
Kyotaro HORIGUCHI wrote:
 Thank you, but it seems to me too simplified. You made two major
functional
 changes.

Thank you for the comments!

 One is, you put the added code for getrelation_info() out of the block for
 the condition (info-relam == BTREE_AM_OID) (though amcanorder would be
 preferable..) Anyway the reason for the place is to guarantee
'full_ordered'
 index always to be orderable.  I believe the relation between them are not
 obvious. So your patch has an oppotunity to make wrong assumption for
 possible indexes which is not orderable but unique. Going on your way some
 additional works would be needed to judge an index to be orderable or not
 on checking the extensibility of pathkeys.

By checking the following equation in build_index_paths(), the updated
version of the patch guarantees that the result of an index scan is ordered:

index_is_ordered = (index-sortopfamily != NULL);

 Another is, you changed pathkeys expantion to be all-or-nothing decision.
 While this change should simplify the code slightly, it also dismisses the
 oppotunity for partially-extended pathkeys. Could you let me know the
reason
 why you did so.

At first I thought the partially-extended pathkey list that is made from
query_pathkeys, as you proposed in the original versions of the patch.  But
I've started to doubt whether it's worth doing that because I think the
partially-extended pathkey list is merely one example while the original
pathkey list can be partially-extended in different ways, ie, ISTM the
partially-extended pathkey list doesn't necessarily have the optimality in
anything significant.  We might be able to partially-extend the original
pathkey list optimally in something significant, but that seems useless
complexity to me.  So, I modified the patch to do the all-or-nothing
decision.

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2013-12-05 Thread Etsuro Fujita
I wrote:
 Kyotaro HORIGUCHI wrote:
  Another is, you changed pathkeys expantion to be all-or-nothing
decision.
  While this change should simplify the code slightly, it also dismisses
  the oppotunity for partially-extended pathkeys. Could you let me know
  the
 reason
  why you did so.

 At first I thought the partially-extended pathkey list that is made from
 query_pathkeys, as you proposed in the original versions of the patch.
But
 I've started to doubt whether it's worth doing that because I think the
 partially-extended pathkey list is merely one example while the original
 pathkey list can be partially-extended in different ways, ie, ISTM the
 partially-extended pathkey list doesn't necessarily have the optimality
 in anything significant.  We might be able to partially-extend the
original
 pathkey list optimally in something significant, but that seems useless
 complexity to me.  So, I modified the patch to do the all-or-nothing
 decision.

Here I mean the optimality for use in merge joins.

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2013-12-03 Thread Etsuro Fujita
I wrote:
 I've modified the patch to work in such a way.  Also, as ISTM the patch
 is more complicated than what the patch really does, I've simplified the
 patch.

I've revised the patch a bit.  Please find attached the patch.

Thanks,

Best regards,
Etsuro Fujita


pathkey_and_uniqueindx_v7_20131203.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2013-11-28 Thread Etsuro Fujita
I wrote:
 I wrote:
  Kyotaro HORIGUCHI wrote:
* In get_relation_info(), the patch determines the branch
condition keyattno != ObjectIdAttributeNumber.  I fail to
understand the meaning of this branch condition.  Could you explain
 about it?

   Literally answering, it means oid cannot be NULL (if it exists).

  Understood.  Thank you for the detailed information.  But I'm not sure
  it's worth complicating the code.  What use cases are you thinking?

 Having said that, I've reconsidered about this, and start to think it
would
 be better that all system attributes are processed in the same way as are
 user attributes because that makes the code more simple.  Yes, as you
 mention, it's not necessarily guaranteed that system attributes have the
 uniqueness property in general, but that's another problem.

I've modified the patch to work in such a way.  Also, as ISTM the patch is
more complicated than what the patch really does, I've simplified the patch.
Attached is an updated version of the patch.  Could you review the patch?

Thanks,

Best regards,
Etsuro Fujita


pathkey_and_uniqueindx_v6_20131128.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2013-11-27 Thread Etsuro Fujita
Kyotaro HORIGUCHI wrote:
  * In get_relation_info(), the patch determines the branch condition
  keyattno != ObjectIdAttributeNumber.  I fail to understand the
  meaning of this branch condition.  Could you explain about it?

 Literally answering, it means oid cannot be NULL (if it exists).

Understood.  Thank you for the detailed information.  But I'm not sure it's
worth complicating the code.  What use cases are you thinking?

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2013-11-27 Thread Etsuro Fujita
I wrote:
 Kyotaro HORIGUCHI wrote:
   * In get_relation_info(), the patch determines the branch condition
   keyattno != ObjectIdAttributeNumber.  I fail to understand the
   meaning of this branch condition.  Could you explain about it?

  Literally answering, it means oid cannot be NULL (if it exists).

 Understood.  Thank you for the detailed information.  But I'm not sure
it's
 worth complicating the code.  What use cases are you thinking?

Having said that, I've reconsidered about this, and start to think it would
be better that all system attributes are processed in the same way as are
user attributes because that makes the code more simple.  Yes, as you
mention, it's not necessarily guaranteed that system attributes have the
uniqueness property in general, but that's another problem.

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2013-11-26 Thread Kyotaro HORIGUCHI
Thank you for pointing out.

  the attched pathkey_and_uniqueindx_v4_20131122.patch is changed as
  follows.
 
 The patch is compiled successfully and passes all regression tests.  Also,
 the patch works well for the tests shown in an earlier email from
 Horiguchi-san.  But in this version I found an incorrect behavior.
 
 postgres=# CREATE TABLE t (a int not null, b int not null, c int, d text);
 CREATE TABLE
 postgres=# CREATE UNIQUE INDEX i_t_ab ON t (a, b);
 CREATE INDEX
 postgres=# INSERT INTO t (SELECT a / 5, 4 - (a % 5), a, 't' FROM
 generate_series(00, 09) a);
 INSERT 0 10
 postgres=# ANALYZE t;
 ANALYZE
 postgres=# CREATE TABLE t2 (e text, f int);
 CREATE TABLE
 postgres=# INSERT INTO t2 VALUES ('t', 2);
 INSERT 0 1
 postgres=# INSERT INTO t2 VALUES ('t', 1);
 INSERT 0 1
 postgres=# ANALYZE t2;
 ANALYZE
 postgres=# EXPLAIN SELECT * FROM t, t2 WHERE t.a  10 AND t.d = t2.e ORDER
 BY t.a, t.b, t.c, t.d, t2.f LIMIT 10;
QUERY PLAN
 
 
  Limit  (cost=0.29..9.30 rows=10 width=20)
-  -  Sort  (cost=92.33..92.57 rows=94 width=20)
-Sort Key: t.a, t.b, t.c, t.d, t2.f
-  Nested Loop  (cost=0.29..129.99 rows=144 width=20)
  Join Filter: (t.d = t2.e)
  -  Index Scan using i_t_ab on t  (cost=0.29..126.80 rows=72
 width=14)
Index Cond: (a  10)
  -  Materialize  (cost=0.00..1.03 rows=2 width=6)
-  Seq Scan on t2  (cost=0.00..1.02 rows=2 width=6)
 (7 rows)

Auch. Outermost sort is omitted but necessary (Prefixed by '-' above).

 ISTM this was brought by the following change.
 
  In truncate_useless_pathkeys, if query_pathkeys is longer than pathkeys
  made from index columns old patch picked up the latter as IndexPath's
  pathkeys. But the former is more suitable according to the context here.
 
   - truncate_useless_pathkeys returns root-query_pathkeys when
 the index is fully-ordered and query_pathkeys contains the
 pathkeys made from index columns.

I implicitly put a wrong assumption that query_pathkeys is
dedicated for the relation under work, not whole subquery.

 I think it would be required to modify the patch so that the transformation
 of the pathkeys is performed only when each pathkey of query_pathkeys
 references the indexed relation.

What is wrong here is that IndexPath declares that it is ordered
in the pathkeys which includes the columns not belongs to the
table.

  (The above change might have been made according to my comment
 in an earlier email I sent.  But I have to admit my explanation
 there was not adequate.  I'm sorry for that.)

Nothing to be sorry for. It's my fault.

Anyway, I've put restriction on pathkeys_useful_for_ordering that
pick up longest pathkeys consists only ec members which matches
the required rel bitmap. With the new patch the planner makes
plan with the omitted sort and the query returns the following
result.

uniontest=# SELECT * FROM t, t2 WHERE t.a  10 AND t.d = t2.e
|  ORDER BY t.a, t.b, t.c, t.d, t2.f LIMIT 10;
|  a | b | c | d | e | f 
| ---+---+---+---+---+---
|  0 | 0 | 4 | t | t | 1  -+-- Correct order by t2.f.
|  0 | 0 | 4 | t | t | 2  -+
|  0 | 1 | 3 | t | t | 1
(snipped.)

 Here are random comments.
 
 * In grouping_planner(), the patch resets the pathkeys of the cheapest
 presorted index-path to query_pathkeys through
 get_cheapest_fractional_path_for_pathkeys().  Is this necessary?  ISTM the
 transformation of the pathkeys after the scan/join optimization would be no
 longer necessary once it has been done at the index-path creation time, ie,
 build_index_paths().  Why does the patch do that thing?

You're appliciated for pointing out. It is utterly useless code
of older patch. (Have you totally scrapped the older verions? me :-(

Removed it in this version.

 * In get_relation_info(), the patch determines the branch condition
 keyattno != ObjectIdAttributeNumber.  I fail to understand the meaning of
 this branch condition.  Could you explain about it?

Literally answering, it means oid cannot be NULL (if it exists).

Precisely, it reflects that I believed (or wished) it cannot be
NULL if oid column exists. (Other sysattrs are dismissed because
they are hardly to be a unique index column:-)

Oid in a tuple is retrived using HeapTupleGetOid() in
heap_getsysattr().

| result = ObjectIdGetDatum(HeapTupleGetOid(tup));

Then HeapTupleHeaderGetOid,

| #define HeapTupleGetOid(tuple) \
|HeapTupleHeaderGetOid((tuple)-t_data)

Finally asking HeapTupleHeaderGetOid for the value.

htup_details.h:354
| #define HeapTupleHeaderGetOid(tup) \
| ( \
|   ((tup)-t_infomask  HEAP_HASOID) ? \
|   *((Oid *) ((char *)(tup) + (tup)-t_hoff - sizeof(Oid))) \
|   : \
|   InvalidOid \
| )

So oid cannot be NULL after all even though can be InvalidOid.

On the otherhand, it can be NULL according to the definition in
heap.c.

heap.c:146
| 

Re: [HACKERS] Get more from indices.

2013-11-26 Thread Peter Eisentraut
On Fri, 2013-11-22 at 16:59 +0900, Kyotaro HORIGUCHI wrote:
 Hello. I found a bug(?) in thsi patch as I considered on another
 patch.

src/backend/optimizer/util/plancat.c:256: trailing whitespace.
src/backend/optimizer/util/plancat.c:261: space before tab in indent.





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2013-11-25 Thread Etsuro Fujita
Kyotaro HORIGUCHI wrote:
 the attched pathkey_and_uniqueindx_v4_20131122.patch is changed as
 follows.

The patch is compiled successfully and passes all regression tests.  Also,
the patch works well for the tests shown in an earlier email from
Horiguchi-san.  But in this version I found an incorrect behavior.

postgres=# CREATE TABLE t (a int not null, b int not null, c int, d text);
CREATE TABLE
postgres=# CREATE UNIQUE INDEX i_t_ab ON t (a, b);
CREATE INDEX
postgres=# INSERT INTO t (SELECT a / 5, 4 - (a % 5), a, 't' FROM
generate_series(00, 09) a);
INSERT 0 10
postgres=# ANALYZE t;
ANALYZE
postgres=# CREATE TABLE t2 (e text, f int);
CREATE TABLE
postgres=# INSERT INTO t2 VALUES ('t', 2);
INSERT 0 1
postgres=# INSERT INTO t2 VALUES ('t', 1);
INSERT 0 1
postgres=# ANALYZE t2;
ANALYZE
postgres=# EXPLAIN SELECT * FROM t, t2 WHERE t.a  10 AND t.d = t2.e ORDER
BY t.a, t.b, t.c, t.d, t2.f LIMIT 10;
   QUERY PLAN


 Limit  (cost=0.29..9.30 rows=10 width=20)
   -  Nested Loop  (cost=0.29..129.99 rows=144 width=20)
 Join Filter: (t.d = t2.e)
 -  Index Scan using i_t_ab on t  (cost=0.29..126.80 rows=72
width=14)
   Index Cond: (a  10)
 -  Materialize  (cost=0.00..1.03 rows=2 width=6)
   -  Seq Scan on t2  (cost=0.00..1.02 rows=2 width=6)
(7 rows)

postgres=# SELECT * FROM t, t2 WHERE t.a  10 AND t.d = t2.e ORDER BY t.a,
t.b, t.c, t.d, t2.f LIMIT 10;
 a | b | c | d | e | f
---+---+---+---+---+---
 0 | 0 | 4 | t | t | 2
 0 | 0 | 4 | t | t | 1
 0 | 1 | 3 | t | t | 2
 0 | 1 | 3 | t | t | 1
 0 | 2 | 2 | t | t | 2
 0 | 2 | 2 | t | t | 1
 0 | 3 | 1 | t | t | 2
 0 | 3 | 1 | t | t | 1
 0 | 4 | 0 | t | t | 2
 0 | 4 | 0 | t | t | 1
(10 rows)

(Note the column f is sorted in the descending order.)

ISTM this was brought by the following change.

 In truncate_useless_pathkeys, if query_pathkeys is longer than pathkeys
 made from index columns old patch picked up the latter as IndexPath's
 pathkeys. But the former is more suitable according to the context here.

  - truncate_useless_pathkeys returns root-query_pathkeys when
the index is fully-ordered and query_pathkeys contains the
pathkeys made from index columns.

I think it would be required to modify the patch so that the transformation
of the pathkeys is performed only when each pathkey of query_pathkeys
references the indexed relation.  (The above change might have been made
according to my comment in an earlier email I sent.  But I have to admit my
explanation there was not adequate.  I'm sorry for that.)

Here are random comments.

* In grouping_planner(), the patch resets the pathkeys of the cheapest
presorted index-path to query_pathkeys through
get_cheapest_fractional_path_for_pathkeys().  Is this necessary?  ISTM the
transformation of the pathkeys after the scan/join optimization would be no
longer necessary once it has been done at the index-path creation time, ie,
build_index_paths().  Why does the patch do that thing?

* In get_relation_info(), the patch determines the branch condition
keyattno != ObjectIdAttributeNumber.  I fail to understand the meaning of
this branch condition.  Could you explain about it?

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2013-11-22 Thread Kyotaro HORIGUCHI
Hello. I found a bug(?) in thsi patch as I considered on another
patch.

In truncate_useless_pathkeys, if query_pathkeys is longer than
pathkeys made from index columns old patch picked up the latter
as IndexPath's pathkeys. But the former is more suitable
according to the context here.

the attched pathkey_and_uniqueindx_v4_20131122.patch is changed
as follows.

 - Rebased to current master.

 - truncate_useless_pathkeys returns root-query_pathkeys when
   the index is fully-ordered and query_pathkeys contains the
   pathkeys made from index columns.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 606734a..43be0a5 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -953,7 +953,8 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 		index_pathkeys = build_index_pathkeys(root, index,
 			  ForwardScanDirection);
 		useful_pathkeys = truncate_useless_pathkeys(root, rel,
-	index_pathkeys);
+	index_pathkeys,
+	index-full_ordered);
 		orderbyclauses = NIL;
 		orderbyclausecols = NIL;
 	}
@@ -1015,7 +1016,8 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 		index_pathkeys = build_index_pathkeys(root, index,
 			  BackwardScanDirection);
 		useful_pathkeys = truncate_useless_pathkeys(root, rel,
-	index_pathkeys);
+	index_pathkeys,
+	index-full_ordered);
 		if (useful_pathkeys != NIL)
 		{
 			ipath = create_index_path(root, index,
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 9c8ede6..fd104b2 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -369,6 +369,29 @@ get_cheapest_path_for_pathkeys(List *paths, List *pathkeys,
 }
 
 /*
+ * path_is_ordered
+ * Return true if the path is apparently ordered on the pathkeys.
+ */
+bool
+path_is_ordered(Path *path, List *pathkeys)
+{
+	if (pathkeys_contained_in(pathkeys, path-pathkeys))
+		return true;
+
+	/*
+	 * If IndexPath is fully ordered, it is sufficiently ordered if index
+	 * pathkeys is a subset of target pathkeys
+	 */
+	if (pathkeys  path-pathkeys 
+		IsA(path, IndexPath) 
+		((IndexPath*)path)-indexinfo-full_ordered 
+		(pathkeys_contained_in(path-pathkeys, pathkeys)))
+		return true;
+
+	return false;
+}
+
+/*
  * get_cheapest_fractional_path_for_pathkeys
  *	  Find the cheapest path (for retrieving a specified fraction of all
  *	  the tuples) that satisfies the given pathkeys and parameterization.
@@ -381,6 +404,8 @@ get_cheapest_path_for_pathkeys(List *paths, List *pathkeys,
  * 'pathkeys' represents a required ordering (in canonical form!)
  * 'required_outer' denotes allowable outer relations for parameterized paths
  * 'fraction' is the fraction of the total tuples expected to be retrieved
+ * paths-pathkeys might be replaced with pathkeys so as to declare that the
+ * path is ordered on it.
  */
 Path *
 get_cheapest_fractional_path_for_pathkeys(List *paths,
@@ -403,9 +428,17 @@ get_cheapest_fractional_path_for_pathkeys(List *paths,
 			compare_fractional_path_costs(matched_path, path, fraction) = 0)
 			continue;
 
-		if (pathkeys_contained_in(pathkeys, path-pathkeys) 
+		if (path_is_ordered(path, pathkeys) 
 			bms_is_subset(PATH_REQ_OUTER(path), required_outer))
+		{
+			/*
+			 * Set required pathkeys as the path's pathkeys so as to declare
+			 * that this path is ordred on the pathkeys.
+			 */
+			if (length(path-pathkeys)  length(pathkeys))
+path-pathkeys = pathkeys;
 			matched_path = path;
+		}
 	}
 	return matched_path;
 }
@@ -798,7 +831,7 @@ build_join_pathkeys(PlannerInfo *root,
 	 * contain pathkeys that were useful for forming this joinrel but are
 	 * uninteresting to higher levels.
 	 */
-	return truncate_useless_pathkeys(root, joinrel, outer_pathkeys);
+	return truncate_useless_pathkeys(root, joinrel, outer_pathkeys, false);
 }
 
 /
@@ -1455,7 +1488,8 @@ right_merge_direction(PlannerInfo *root, PathKey *pathkey)
  * So the result is always either 0 or list_length(root-query_pathkeys).
  */
 static int
-pathkeys_useful_for_ordering(PlannerInfo *root, List *pathkeys)
+pathkeys_useful_for_ordering(PlannerInfo *root, List *pathkeys,
+			 bool pk_full_ordered)
 {
 	if (root-query_pathkeys == NIL)
 		return 0;/* no special ordering requested */
@@ -1469,23 +1503,36 @@ pathkeys_useful_for_ordering(PlannerInfo *root, List *pathkeys)
 		return list_length(root-query_pathkeys);
 	}
 
+	/*
+	 * Given that the pathkeys compose an unique key, they are useful for
+	 * ordering when they are a sublist of query_pathkeys. Expand pathkeys to
+	 * root-query_pathkeys in this case.
+	 */
+	if (pk_full_ordered 
+		pathkeys_contained_in(pathkeys, root-query_pathkeys))
+		return list_length(root-query_pathkeys);
+
 	

Re: [HACKERS] Get more from indices.

2013-11-22 Thread Etsuro Fujita
Kyotaro HORIGUCHI wrote:
 Hello. I found a bug(?) in thsi patch as I considered on another patch.

 In truncate_useless_pathkeys, if query_pathkeys is longer than pathkeys
made
 from index columns old patch picked up the latter as IndexPath's pathkeys.

 But the former is more suitable according to the context here.

 the attched pathkey_and_uniqueindx_v4_20131122.patch is changed as
follows.

  - Rebased to current master.

  - truncate_useless_pathkeys returns root-query_pathkeys when
the index is fully-ordered and query_pathkeys contains the
pathkeys made from index columns.

OK, I'd like to look at this version of the patch more closely, then.

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2013-11-20 Thread Etsuro Fujita
Kyotaro HORIGUCHI wrote:
 Hello, I've totally refactored the series of patches and cut out the
appropriate portion as 'unique (and non-nullable) index stuff'.

 As the discussion before, it got rid of path distinctness. This patch
works only on index 'full-orederedness', i.e., unique index on non-nullable
columns.

This is interesting!

I took a quick look at the patch.  Here is my initial comment.  I don't
think it's so good to set the pathkeys of a unique-index path to
query_pathkeys after the scan/join optimization because it can't exploit the
full-orderedness property in implementing mergejoins, i.e., it can't skip
doing an explicit sort that is considered unnecessary from the property.
So, I think the path's pathkeys should be set to query_pathkeys before the
scan/join optimization, i.e., at the index-path creation time.

If you wouldn't mind, I'd like to rework on the patch.

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2013-11-19 Thread Kyotaro HORIGUCHI
Hello, I've totally refactored the series of patches and cut out
the appropriate portion as 'unique (and non-nullable) index
stuff'.

As the discussion before, it got rid of path distinctness. This
patch works only on index 'full-orederedness', i.e., unique index
on non-nullable columns.

This patch itself does not so much. Will have power applied with
'Using indices for UNION' patch.


=== Making test table

create table t (a int not null, b int not null, c int, d text);
create unique index i_t_ab on t (a, b);
insert into t (select a / 5, 4 - (a % 5), a, 't' from generate_series(00, 
09) a);


=== Example 1.

not-patched=# explain select * from t order by a, b ,c limit 1;
   QUERY PLAN  
 --
  Limit  (cost=2041.00..2041.00 rows=1 width=14)
-  Sort  (cost=2041.00..2291.00 rows=10 width=14)
  Sort Key: a, b, c
  -  Seq Scan on t  (cost=0.00..1541.00 rows=10 width=14)

patched=# explain select * from t order by a, b ,c limit 1;
   QUERY PLAN 
 -
  Limit  (cost=0.29..0.33 rows=1 width=14)
-  Index Scan using i_t_ab on t  (cost=0.29..3857.04 rows=10 width=14)


=== Example 2.

not-patched=# explain select distinct * from t order by a limit 1;
 QUERY PLAN 
 ---
  Limit  (cost=1820.46..1820.47 rows=1 width=44)
-  Sort  (cost=1820.46..1835.34 rows=5951 width=44)
  Sort Key: a
  -  HashAggregate  (cost=1731.20..1790.71 rows=5951 width=44)
-  Seq Scan on t  (cost=0.00..1136.10 rows=59510 width=44)

patched=# explain select distinct * from t order by a limit 1;
  QUERY PLAN   
   
 
  Limit  (cost=0.29..1.09 rows=1 width=44)
-  Unique  (cost=0.29..4756.04 rows=5951 width=44)
  -  Index Scan using i_t_ab on t  (cost=0.29..4160.94 rows=59510 
 width=44)

The unique node could be removed technically but it requires to
care the distinctness of path/plan. So it has been put out to
Using indeces for UNION patch.


Any comments?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 606734a..43be0a5 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -953,7 +953,8 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 		index_pathkeys = build_index_pathkeys(root, index,
 			  ForwardScanDirection);
 		useful_pathkeys = truncate_useless_pathkeys(root, rel,
-	index_pathkeys);
+	index_pathkeys,
+	index-full_ordered);
 		orderbyclauses = NIL;
 		orderbyclausecols = NIL;
 	}
@@ -1015,7 +1016,8 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 		index_pathkeys = build_index_pathkeys(root, index,
 			  BackwardScanDirection);
 		useful_pathkeys = truncate_useless_pathkeys(root, rel,
-	index_pathkeys);
+	index_pathkeys,
+	index-full_ordered);
 		if (useful_pathkeys != NIL)
 		{
 			ipath = create_index_path(root, index,
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 032b2cd..5d8ee04 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -369,6 +369,29 @@ get_cheapest_path_for_pathkeys(List *paths, List *pathkeys,
 }
 
 /*
+ * path_is_ordered
+ * Return true if the path is apparently ordered on the pathkeys.
+ */
+bool
+path_is_ordered(Path *path, List *pathkeys)
+{
+	if (pathkeys_contained_in(pathkeys, path-pathkeys))
+		return true;
+
+	/*
+	 * If IndexPath is fully ordered, it is sufficiently ordered if index
+	 * pathkeys is a subset of target pathkeys
+	 */
+	if (pathkeys  path-pathkeys 
+		IsA(path, IndexPath) 
+		((IndexPath*)path)-indexinfo-full_ordered 
+		(pathkeys_contained_in(path-pathkeys, pathkeys)))
+		return true;
+
+	return false;
+}
+
+/*
  * get_cheapest_fractional_path_for_pathkeys
  *	  Find the cheapest path (for retrieving a specified fraction of all
  *	  the tuples) that satisfies the given pathkeys and parameterization.
@@ -381,6 +404,8 @@ get_cheapest_path_for_pathkeys(List *paths, List *pathkeys,
  * 'pathkeys' represents a required ordering (in canonical form!)
  * 'required_outer' denotes allowable outer relations for parameterized paths
  * 'fraction' is the fraction of the total tuples expected to be retrieved
+ * paths-pathkeys might be replaced with pathkeys so as to declare that the
+ * path is ordered on it.
  */
 Path *
 

Re: [HACKERS] Get more from indices.

2013-11-14 Thread Kyotaro HORIGUCHI
Hello, I might made insufficient explanation. Surely it is
overdone for the example.

 uniquetest=# create table t (a int, b int, c int, d text);
 uniquetest=# create unique index i_t_pkey on t(a, b);
 uniquetest=# insert into t
 (select a % 10, a / 10, a, 't' from generate_series(0, 10) a);
 uniquetest=# analyze;
 
 uniquetest=# explain (costs off, analyze on) select distinct * from t;

This is the simplest example for this patch.

The main intention of this patch is to widen application of my
another 'Index for UNION' patch.

https://commitfest.postgresql.org/action/patch_view?id=1279

 ISTM the right way to deal with this is not what you've done
 here, but just to deem the DISTINCT a no-op if there's a unique
 index on some subset of the distinct-ed columns.

It is not sufficient only to deem DISTINCT no-op in order to get
more utility of MergeAppends on IndexScans for UNION. The
following example (omitting data insertion..)

| create table cu11 (a int not null, b int not null, c int, d text);
| create unique index i_cu11_pk on cu11 (a, b);
| create table cu12 (like cu11 including all);
| explain select a, b from cu11 union select a, b from cu12
|order by a, b limit 100;

yields following plan without this patch.

|  Limit
|   -  Sort (Sort Key: cu11.a, cu11.b)
|-  HashAggregate
| -  Append
|  -  Limit
|   -  Unique
|-  Index Only Scan using cu11_a_b_idx on cu11
|  -  Limit
|   -  Unique
|-  Index Only Scan using cu12_a_b_idx on cu12

But, this could be far more effecient when the plan were as
follows if the rows are fully-ordered on the sort key.

|  Limit
|   -  Unique 
|-  Merge Append (Sort Key: cu11.a, cu11.b)
| -  Index Only Scan using cu11_a_b_idx on cu11
| -  Index Only Scan using cu12_a_b_idx on cu12

Propagation of uniqueness on MergeAppend is required to do this
but not for the first expample on this thread.

  This query is actually legally satisfied by a simple seqscan,
 which would be faster than either of the plans you mention.  In
 any case, it seems like a bad idea to me to conflate
 distinct-ness with ordering, so I don't like what you did to
 PathKeys.

Umm. Reconsidering on that and the discussion of the purose above
and the last patch of this thread, I've been wrong in where to
split the patches. I'll repost the reorganized patches on both
threads.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2013-11-13 Thread Peter Eisentraut
On Tue, 2013-11-12 at 17:48 +0900, Kyotaro HORIGUCHI wrote:
 Hello, this is the revised patch.

Since you're using git, please check your patch for trailing whitespace
with git diff --check.





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2013-11-12 Thread Kyotaro HORIGUCHI
Hello, this is the revised patch.

 Hmm, that sounds quite resonable in general. But the conflation
 is already found in grouping_planner to some extent.

Although this new patch is not split into unique-index sutff and
others, it removes current_pathkeys from grouping_planner, and
adds pathkeys and is_unique into struct Plan instead. This
eliminates independent and longstanding current_pathkeys variable
and calculus involving it from grouping_planner() so it would be
made clearer and easier to read, I expect.

  - is_unique and pathkeys is added to the type Path. (umm...)
 
  - create function like pathkeys_satisfies(check_pathkeys,
pathkeys, isuniq) or path_ordered_by(pathkeys, path) as
needed.

This was different from my thought. Finally I added
plan_is_ordered(plan, path) and some of make_nodename()
functions take pathkeys and/or uniquely_ordered as parameter and
some of others take them from given leftree plan. Plan nodes
propagate the attributes in autonomous way so explicit
current_pathkeys handling could be thrown away.

  - Plan will be set ordered and pathkeys derived from source path
and its process and grouping_planner consults it to deceide
whether to do sort (to hide the currently naked code).
 
  - Add check for NULLuble columns :-)

 Now IndexOptInfo has new attribute full_ordered and it is set
true if the index does not cover any nullAble columns in
get_relation_info().

And expect file of isolation test is modified to fit the change
in result plan.

Finally, this version became to conflict with the another UNION
patch so I detached from it and rebased to current master.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 606734a..43be0a5 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -953,7 +953,8 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 		index_pathkeys = build_index_pathkeys(root, index,
 			  ForwardScanDirection);
 		useful_pathkeys = truncate_useless_pathkeys(root, rel,
-	index_pathkeys);
+	index_pathkeys,
+	index-full_ordered);
 		orderbyclauses = NIL;
 		orderbyclausecols = NIL;
 	}
@@ -1015,7 +1016,8 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 		index_pathkeys = build_index_pathkeys(root, index,
 			  BackwardScanDirection);
 		useful_pathkeys = truncate_useless_pathkeys(root, rel,
-	index_pathkeys);
+	index_pathkeys,
+	index-full_ordered);
 		if (useful_pathkeys != NIL)
 		{
 			ipath = create_index_path(root, index,
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 6724996..954d8a8 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -363,6 +363,68 @@ get_cheapest_path_for_pathkeys(List *paths, List *pathkeys,
 }
 
 /*
+ * path_is_uniquely_ordered
+ * Return true if the path is apparently uniquely ordered on the pathkeys.
+ */
+bool
+path_is_uniquely_ordered(Path *path, List *pathkeys)
+{
+	if (pathkeys  path-pathkeys 
+		IsA(path, IndexPath) 
+		((IndexPath*)path)-indexinfo-full_ordered 
+		(pathkeys_contained_in(path-pathkeys, pathkeys)))
+	{
+		return true;
+	}
+	
+	return false;
+}
+
+
+/*
+ * path_is_ordered
+ * Return true if the path is apparently ordered on the pathkeys.  The given
+ * path might be modified so that it will be recognized later as sufficiently
+ * ordered.
+ */
+bool
+path_is_ordered(Path *path, List *pathkeys)
+{
+	if (pathkeys_contained_in(pathkeys, path-pathkeys))
+		return true;
+
+	/* path is obviously ordered when uniquely ordered */
+	if (path_is_uniquely_ordered(path, pathkeys))
+		return true;
+
+	/*
+	 * MergeAppendPath's pathkeys can be replaced by arbitrary pathkeys when
+	 * all subpaths are uniquely ordered on the target pathkeys.
+	 */
+	if (IsA(path, MergeAppendPath))
+	{
+		ListCell *lc;
+		MergeAppendPath *mpath = (MergeAppendPath *)path;
+		
+		foreach (lc, mpath-subpaths)
+		{
+			Path *subpath = (Path *) lfirst(lc);
+			if (!path_is_uniquely_ordered(subpath, pathkeys)) break;
+		}
+		if (!lc)
+		{
+			/*
+			 * This MergeAppendPath is sufficiently ordered on the target
+			 * pathkeys. Reflect that on this path.
+			 */
+			mpath-path.pathkeys = pathkeys;
+			return true;
+		}
+	}
+	return false;
+}
+
+/*
  * get_cheapest_fractional_path_for_pathkeys
  *	  Find the cheapest path (for retrieving a specified fraction of all
  *	  the tuples) that satisfies the given pathkeys and parameterization.
@@ -397,7 +459,7 @@ get_cheapest_fractional_path_for_pathkeys(List *paths,
 			compare_fractional_path_costs(matched_path, path, fraction) = 0)
 			continue;
 
-		if (pathkeys_contained_in(pathkeys, path-pathkeys) 
+		if (path_is_ordered(path, pathkeys) 
 			bms_is_subset(PATH_REQ_OUTER(path), required_outer))
 			matched_path = path;
 	}
@@ -733,7 +795,7 @@ 

Re: [HACKERS] Get more from indices.

2013-11-11 Thread Kyotaro HORIGUCHI
Thank you,

 In any case, it seems like a bad idea to me to conflate
 distinct-ness with ordering, so I don't like what you did to
 PathKeys.

Hmm, that sounds quite resonable in general. But the conflation
is already found in grouping_planner to some extent. The name
distinct_pathkey itself asserts that it is the ordering used for
distinct. And actually is used for sorting if hashed-distinct is
not selected.

Plus, these modifications in grouping_planner is required by the
patch for pathkeys.c to be effective.

I suppose the main cause of nastiness of the patch for
grouping_planner comes from the adheration of the usage of the
variable for path uniqueness with the existent code.

The additional members to Plan, say, pathkeys and ordered could
help the code to look less ugly by taking in the related code
currently appears nakedly in grouping_planner into make(or
generate)_plan() functions. Although the new attributes
somewhat look out of place..


  However, if the index is unique, wouldn't
  scanning the index produce data that actually satisfies the longer sort
  key?  It doesn't matter what the values of c,d are if there are no
  duplicates in the a,b columns.  So maybe as a separate patch, we could
  look at claiming that a unique index satisfies the entire query_pathkeys
  if it matches the first N columns of that.
 
 That would be really spiffy.

# Putting aside the trueness of the assumption for unique-index
# and pathkeys.

The expanded sufficiency check can be archieved by involving
'unique-indexed' attribute for pathkeys_contained_in(),say
pathkeys_satisfies(pathkeys, pathkeys, is_uniq), but finally
could have no effect without some extent of assist in the process
in grouping_planner like my preveous patch to be in effect, I
believe.


I'll try to rewrite the path to be as following considering less
conflating lookings in grouping_planner.

 - is_unique and pathkeys is added to the type Path. (umm...)

 - create function like pathkeys_satisfies(check_pathkeys,
   pathkeys, isuniq) or path_ordered_by(pathkeys, path) as
   needed.

 - Plan will be set ordered and pathkeys derived from source path
   and its process and grouping_planner consults it to deceide
   whether to do sort (to hide the currently naked code).

 - Add check for NULLuble columns :-)

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

  


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2013-11-11 Thread Kyotaro HORIGUCHI
Hello,

 Your patch fails the isolation test because of changed query plans:
 
 http://pgci.eisentraut.org/jenkins/job/postgresql_commitfest_world/175/artifact/src/test/isolation/regression.diffs/*view*/

Thank you for pointing out. I wasn't aware of that..

# Because it is not launched from the top-level make check...

I'll count that in next pach.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2013-11-07 Thread Peter Eisentraut
On 10/31/13, 6:43 AM, Kyotaro HORIGUCHI wrote:
 Hello, This is the last episode of the 'dance with indices'?
 series.

Your patch fails the isolation test because of changed query plans:

http://pgci.eisentraut.org/jenkins/job/postgresql_commitfest_world/175/artifact/src/test/isolation/regression.diffs/*view*/




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Get more from indices.

2013-10-31 Thread Kyotaro HORIGUCHI
Hello, This is the last episode of the 'dance with indices'?
series.


Unique indexes can sort the tuples in corresponding tables
prefectly. So this query might can use index.

 uniquetest=# create table t (a int, b int, c int, d text);
 uniquetest=# create unique index i_t_pkey on t(a, b);
 uniquetest=# insert into t
   (select a % 10, a / 10, a, 't' from generate_series(0, 10) a);
 uniquetest=# analyze;
 
 uniquetest=# explain (costs off, analyze on) select distinct * from t;
 QUERY PLAN 
 ---
 Unique (actual time=149.625..245.978 rows=11 loops=1)
  -  Sort (actual time=149.624..199.806 rows=11 loops=1)
  Sort Key: a, b, c, d
  Sort Method: external merge  Disk: 2328kB
   -  Seq Scan on t (actual time=0.016..15.597 rows=11 loops=1)
 Total runtime: 251.272 ms

With this patch, the plan for this query becomes as follows,

 uniquetest=# explain (costs off, analyze on) select distinct * from t;
   QUERY PLAN
 -
  Index Scan using i_t_pkey on t
   (actual time=0.019..32.457 rows=11 loops=1)
  Total runtime: 37.488 ms

This only seems not so valuable to archive, but with my previous
MergeAppend for UNION patch, this will have more value.

 uniquetest=# create table pu1 (a int, b int, c int, d text);
 uniquetest=# create unique index pu1_pkey_idx on pu1 (a, b);
 uniquetest=# create table cu11 (like pu1 including all) inherits (pu1);
 uniquetest=# create table cu12 (like pu1 including all) inherits (pu1);
 uniquetest=# create table pu2 (like pu1 including all);
 uniquetest=# create table cu21 (like pu1 including all) inherits (pu2);
 uniquetest=# create table cu22 (like pu1 including all) inherits (pu2);
 uniquetest=# insert into cu11 (select a / 5, 4 - (a % 5), a, 'cu11'
   from generate_series(00, 09) a);
 uniquetest=# insert into cu12 (select a / 5, 4 - (a % 5), a, 'cu12'
from generate_series(10, 19) a);
 uniquetest=# insert into cu21 (select a / 5, 4 - (a % 5), a, 'cu21'
   from generate_series(15, 24) a);
 uniquetest=# insert into cu22 (select a / 5, 4 - (a % 5), a, 'cu22'
from generate_series(25, 34) a);
 uniquetest=# explain (analyze on, costs off)
select * from pu1 union select * from pu2 order by a, b limit 100;
  QUERY PLAN
 --
  Limit (actual time=0.226..0.591 rows=100 loops=1)
   -  Unique (actual time=0.223..0.561 rows=100 loops=1)
-  Merge Append (actual time=0.223..0.470 rows=100 loops=1)
Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
 -  Limit (actual time=0.125..0.326 rows=100 loops=1)
  -  Unique (actual time=0.125..0.303 rows=100 loops=1)
   -  Merge Append (actual time=0.123..0.220 rows=100 loops=1)
   Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
-  Index Scan using..on pu1 (actual time=0.005..0.005 rows=0 loops=1)
-  Index Scan using..on cu11(actual time=0.071..0.128 rows=100 
 loops=1)
-  Index Scan using..on cu12 (actual time=0.045..0.045 rows=1 loops=1)
 -  Limit (actual time=0.096..0.096 rows=1 loops=1)
  -  Unique (actual time=0.096..0.096 rows=1 loops=1)
   -  Merge Append (actual time=0.094..0.094 rows=1 loops=1)
   Sort Key: pu2.a, pu2.b, pu2.c, pu2.d
-  Index Scan using..on pu2 (actual time=0.002..0.002 rows=0 loops=1)
-  Index Scan using..on cu21 (actual time=0.047..0.047 rows=1 loops=1)
-  Index Scan using..on cu22 (actual time=0.043..0.043 rows=1 loops=1)
  Total runtime: 1.987 ms

On the other hand, what we will get from the unpatched PostgreSQL is,

 uniquetest=# explain (analyze on, costs off)
  select * from pu1 union select * from pu2 order by a, b limit 100;
QUERY PLAN
 -
 Limit (actual time=535.620..535.706 rows=100 loops=1)
  -  Unique (actual time=535.618..535.695 rows=100 loops=1)
   -  Sort (actual time=535.617..535.637 rows=100 loops=1)
   Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
   Sort Method: external sort  Disk: 10568kB
-  Append (actual time=0.012..144.144 rows=40 loops=1)
 -  Append (actual time=0.012..49.570 rows=20 loops=1)
  -  Seq Scan on pu1 (actual time=0.001..0.001 rows=0 loops=1)
  -  Seq Scan on cu11 (actual time=0.011..16.202 rows=10 loops=1)
  -  Seq Scan on cu12 (actual time=0.008..16.334 rows=10 loops=1)
 -  Append (actual time=0.010..54.052 rows=20 loops=1)
  -  Seq Scan on pu2 (actual time=0.000..0.000 rows=0 loops=1)
  -  Seq Scan on cu21 (actual time=0.009..16.444 

Re: [HACKERS] Get more from indices.

2013-10-31 Thread Tom Lane
Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes:
 Unique indexes can sort the tuples in corresponding tables
 prefectly. So this query might can use index.

 uniquetest=# create table t (a int, b int, c int, d text);
 uniquetest=# create unique index i_t_pkey on t(a, b);
 uniquetest=# insert into t
 (select a % 10, a / 10, a, 't' from generate_series(0, 10) a);
 uniquetest=# analyze;
 
 uniquetest=# explain (costs off, analyze on) select distinct * from t;

ISTM the right way to deal with this is not what you've done here, but
just to deem the DISTINCT a no-op if there's a unique index on some subset
of the distinct-ed columns.  This query is actually legally satisfied by
a simple seqscan, which would be faster than either of the plans you
mention.  In any case, it seems like a bad idea to me to conflate
distinct-ness with ordering, so I don't like what you did to PathKeys.

Having said that, there is the kernel of a useful idea here, I think.
The reason you don't get an indexscan already is that the DISTINCT
assumes it needs to sort by (a,b,c,d), which an index on just (a,b)
doesn't appear to satisfy.  However, if the index is unique, wouldn't
scanning the index produce data that actually satisfies the longer sort
key?  It doesn't matter what the values of c,d are if there are no
duplicates in the a,b columns.  So maybe as a separate patch, we could
look at claiming that a unique index satisfies the entire query_pathkeys
if it matches the first N columns of that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2013-10-31 Thread Robert Haas
On Thu, Oct 31, 2013 at 10:59 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  However, if the index is unique, wouldn't
 scanning the index produce data that actually satisfies the longer sort
 key?  It doesn't matter what the values of c,d are if there are no
 duplicates in the a,b columns.  So maybe as a separate patch, we could
 look at claiming that a unique index satisfies the entire query_pathkeys
 if it matches the first N columns of that.

That would be really spiffy.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2013-10-31 Thread Tom Lane
I wrote:
 Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes:
 Unique indexes can sort the tuples in corresponding tables
 prefectly. So this query might can use index.

BTW, how much of any of this is correct if the unique index contains
multiple NULL values?  We might need to restrict the optimization(s)
to cases where the unique-ified columns are all marked NOT NULL.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers