Re: [HACKERS] rangetypes spgist questions/refactoring

2014-05-28 Thread Bruce Momjian
On Tue, May 20, 2014 at 11:18:29AM -0700, Jeff Davis wrote:
 I think this can be done without breaking upgrade compatibility, because
 I think the structure already satisfies the invariants I mentioned in
 the other email (aside from the special case of a root tuple with two
 nodes and no prefix). That being said, it's a little scary to refactor
 indexing code while trying to keep it upgrade-compatible.

We can make pg_upgrade mark such indexes as invalid and create a user
script to reindex all the indexes after the upgrade.  We have done
that in the past.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] rangetypes spgist questions/refactoring

2014-05-20 Thread Jeff Davis
I am trying to understand the rangetypes spgist code and its interaction
with empty ranges. (Slightly embarrassing, because I reviewed the code.)
I see an old email here:

http://www.postgresql.org/message-id/50145a9c.7080...@enterprisedb.com

But still don't have a clear picture.

What I don't understand is:
1. Under what conditions might a tuple have no prefix (centroid), yet
also *not* be allTheSame?
2. Why would any tuple have 2 nodes? If there are some non-empty ranges,
why not make a centroid and have 4 or 5 nodes?

I added a bunch of assertions that seem reasonable to me based on my
understanding of the structure, and I attached them as a patch. They all
pass regression, which has a non-trivial set of input ranges, so there
is a reasonable chance the assertions are generally true. But if they
are true, some refactoring is in order.

It seems like the structure should be something like:

* Empty and non-empty ranges are only mixed in the root (level == 0).
* If a tuple has any non-empty ranges, there's a prefix (centroid).
AllTheSame may or may not be set (ordinarily not).
* If a tuple has only empty ranges, there's no prefix, and allTheSame is
set.
* The root tuple may have 5 nodes if there are any non-empty ranges,
otherwise 1 node (and allTheSame is set).
* Inner tuples may have either:
  - one node if it contains only empty ranges, in which case it is
allTheSame, and must be in the 5th node (quadrant) of the root node, and
be at level 1
  - four nodes if there are any non-empty ranges, in which case it has
*no* empty ranges

Note: I am using allTheSame in the logical sense; perhaps the flag is
an internal optimization that we can't rely on.

Regards,
Jeff Davis

*** a/src/backend/utils/adt/rangetypes_spgist.c
--- b/src/backend/utils/adt/rangetypes_spgist.c
***
*** 104,109  getQuadrant(TypeCacheEntry *typcache, RangeType *centroid, RangeType *tst)
--- 104,112 
  
  	range_deserialize(typcache, centroid, centroidLower, centroidUpper,
  	  centroidEmpty);
+ 
+ 	Assert(!centroidEmpty);
+ 
  	range_deserialize(typcache, tst, lower, upper, empty);
  
  	if (empty)
***
*** 138,143  spg_range_quad_choose(PG_FUNCTION_ARGS)
--- 141,158 
  	int16		quadrant;
  	TypeCacheEntry *typcache;
  
+ 	/* if there is no prefix, they must be all the same */
+ 	Assert(in-hasPrefix || in-allTheSame);
+ 	/*
+ 	 * If there are two nodes, it must either be the root or at level 1, and
+ 	 * there must be no prefix.
+ 	 */
+ 	Assert(in-nNodes != 2 ||
+ 		   (!in-hasPrefix  (in-level == 0 || in-level == 1)));
+ 	/* if there are 4 or 5 nodes, there must be a prefix */
+ 	Assert(in-nNodes != 4 || in-hasPrefix);
+ 	Assert(in-nNodes != 5 || (in-hasPrefix  in-level == 0));
+ 
  	if (in-allTheSame)
  	{
  		out-resultType = spgMatchNode;
***
*** 156,161  spg_range_quad_choose(PG_FUNCTION_ARGS)
--- 171,178 
  	 */
  	if (!in-hasPrefix)
  	{
+ 		Assert(false);
+ 
  		out-resultType = spgMatchNode;
  		if (RangeIsEmpty(inRange))
  			out-result.matchNode.nodeN = 0;
***
*** 232,237  spg_range_quad_picksplit(PG_FUNCTION_ARGS)
--- 249,261 
  	nonEmptyCount = j;
  
  	/*
+ 	 * Only the root mixes empty and non-empty tuples.
+ 	 */
+ 	Assert(nonEmptyCount == 0 ||
+ 		   nonEmptyCount == in-nTuples ||
+ 		   in-level == 0);
+ 
+ 	/*
  	 * All the ranges are empty. The best we can do is to construct an inner
  	 * node with no centroid, and put all ranges into node 0. If non-empty
  	 * ranges are added later, they will be routed to node 1.
***
*** 315,320  spg_range_quad_inner_consistent(PG_FUNCTION_ARGS)
--- 339,356 
  	 */
  	bool		needPrevious = false;
  
+ 	/* if there is no prefix, they must be all the same */
+ 	Assert(in-hasPrefix || in-allTheSame);
+ 	/*
+ 	 * If there are two nodes, it must either be the root or at level 1, and
+ 	 * there must be no prefix.
+ 	 */
+ 	Assert(in-nNodes != 2 ||
+ 		   (!in-hasPrefix  (in-level == 0 || in-level == 1)));
+ 	/* if there are 4 or 5 nodes, there must be a prefix */
+ 	Assert(in-nNodes != 4 || in-hasPrefix);
+ 	Assert(in-nNodes != 5 || (in-hasPrefix  in-level == 0));
+ 
  	if (in-allTheSame)
  	{
  		/* Report that all nodes should be visited */
***
*** 327,332  spg_range_quad_inner_consistent(PG_FUNCTION_ARGS)
--- 363,370 
  
  	if (!in-hasPrefix)
  	{
+ 		Assert(false);
+ 
  		/*
  		 * No centroid on this inner node. Such a node has two child nodes,
  		 * the first for empty ranges, and the second for non-empty ones.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rangetypes spgist questions/refactoring

2014-05-20 Thread Jeff Davis
On Tue, 2014-05-20 at 09:52 -0700, Jeff Davis wrote:
 2. Why would any tuple have 2 nodes? If there are some non-empty ranges,
 why not make a centroid and have 4 or 5 nodes?

This is slightly more complicated than I thought, because we need to do
something about the root node if a bunch of empty ranges are inserted
first. 

SpgSplitTuple seems to offer a way to handle the first non-empty range
inserted. Unfortunately, this limitation seems to kill that idea:

This new prefix value must be sufficiently less restrictive than the
original to accept the new value to be indexed, and it should be no
longer than the original prefix.

because there's no good way to know how large the root's prefix might
eventually be.

So we might be better off (not a proposal; this would break upgrade)
saying that the root always has two nodes:
  * node0 points to all empty ranges, which all live at level 1 and have
allTheSame set.
  * node1 points to all non-empty ranges, and every tuple in that
subtree has a prefix (centroid) and 4 nodes (one for each quadrant)

I am starting to see the current implementation as an optimization this
idea where the root can also have a centroid if you can find one, which
can save an extra level in the tree search.

If my analysis is correct so far, and the assertions are correct, my
proposal is something like:

 * remove dead code
 * refactor to make the invariants a little more clear
 * make the special case of the root tuple more clear
 * improve comments describing tree structure and add assertions

I think this can be done without breaking upgrade compatibility, because
I think the structure already satisfies the invariants I mentioned in
the other email (aside from the special case of a root tuple with two
nodes and no prefix). That being said, it's a little scary to refactor
indexing code while trying to keep it upgrade-compatible.

Thoughts?

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers