Re: [HACKERS] Cube extension point support // GSoC'13

2013-10-22 Thread Alexander Korotkov
On Tue, Oct 22, 2013 at 2:00 AM, Dimitri Fontaine dimi...@2ndquadrant.frwrote:

 Tom Lane t...@sss.pgh.pa.us writes:
  I believe the reason GIST has compress/decompress functions is not for
  TOAST (they predate that, if memory serves), but to allow the on-disk
  representation of an index entry to be different from the data type's
  normal representation in other ways --- think lossy storage in
 particular.

 My understanding of the use case for those functions is to do with
 storing a different data type in the index upper nodes and in the index
 leafs. It should be possible to do that in a non-lossy way, so that you
 would implement compress/decompress and not declare the RECHECK bits.

 Then again I'm talking from 8.3 era memories of when I tried to
 understand GiST enough to code the prefix extension.


Actually, I mean purpose of this particular decompress function
implementation, not compress/decompress in general. I understand that in
general compress/decompress can do useful job.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Cube extension point support // GSoC'13

2013-10-21 Thread Heikki Linnakangas

On 11.10.2013 17:39, Alexander Korotkov wrote:

On Fri, Oct 11, 2013 at 5:56 PM, Heikki Linnakangashlinnakan...@vmware.com

wrote:



2. I didn't understand this change:

  @@ -422,24 +439,14 @@ g_cube_union(PG_FUNCTION_ARGS)

  Datum
  g_cube_compress(PG_FUNCTION_**ARGS)
  {
-   PG_RETURN_DATUM(PG_GETARG_**DATUM(0));
+   GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
+   PG_RETURN_POINTER(entry);
  }

  Datum
  g_cube_decompress(PG_FUNCTION_**ARGS)
  {
 GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
-   NDBOX  *key = DatumGetNDBOX(PG_DETOAST_**DATUM(entry-key));
-
-   if (key != DatumGetNDBOX(entry-key))
-   {
-   GISTENTRY  *retval = (GISTENTRY *)
palloc(sizeof(GISTENTRY));
-
-   gistentryinit(*retval, PointerGetDatum(key),
- entry-rel, entry-page,
- entry-offset, FALSE);
-   PG_RETURN_POINTER(retval);
-   }
 PG_RETURN_POINTER(entry);
  }



What did the removed code do, and why isn't it needed anymore?


I don't understand this place even more generally. What decompress function
do is to detoast key and return new gist entry with it if needed. But can
GiST key ever be toasted? My experiments show that answer is no, it can't.
When too long key is passed then GiST falls during inserting key into page.
And I didn't find any code doing GiST key toast in git.


I guess it can't happen. Or is it possible that a toasted value that 
came from disk will be passed to these functions, without detoasting 
them somewhere along the way? Not sure.


In any case, I've committed this patch now. I left out the changes to 
compress and decompress functions, as that's unrelated to the patch at 
hand. I ran the regression test on a Windows VM, and updated the 
expected output file that the Windows build used. That leaves two 
alternative expected output files unmodified - the buildfarm will tell 
us whether they're still needed.


Thanks and congrats on your first committed patch, Stas! :-)

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cube extension point support // GSoC'13

2013-10-21 Thread Alexander Korotkov
On Mon, Oct 21, 2013 at 11:06 PM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:

 On 11.10.2013 17:39, Alexander Korotkov wrote:

 On Fri, Oct 11, 2013 at 5:56 PM, Heikki Linnakangashlinnakangas@**
 vmware.com hlinnakan...@vmware.com

 wrote:


  2. I didn't understand this change:

   @@ -422,24 +439,14 @@ g_cube_union(PG_FUNCTION_ARGS)

   Datum
   g_cube_compress(PG_FUNCTION_ARGS)
   {
 -   PG_RETURN_DATUM(PG_GETARG_DATUM(0));

 +   GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
 +   PG_RETURN_POINTER(entry);
   }

   Datum
   g_cube_decompress(PG_FUNCTION_ARGS)

   {
  GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
 -   NDBOX  *key = DatumGetNDBOX(PG_DETOAST_
 DATUM(entry-key));

 -
 -   if (key != DatumGetNDBOX(entry-key))
 -   {
 -   GISTENTRY  *retval = (GISTENTRY *)
 palloc(sizeof(GISTENTRY));
 -
 -   gistentryinit(*retval, PointerGetDatum(key),
 - entry-rel, entry-page,
 - entry-offset, FALSE);
 -   PG_RETURN_POINTER(retval);
 -   }
  PG_RETURN_POINTER(entry);
   }


  What did the removed code do, and why isn't it needed anymore?


 I don't understand this place even more generally. What decompress
 function
 do is to detoast key and return new gist entry with it if needed. But can
 GiST key ever be toasted? My experiments show that answer is no, it can't.
 When too long key is passed then GiST falls during inserting key into
 page.
 And I didn't find any code doing GiST key toast in git.


 I guess it can't happen. Or is it possible that a toasted value that came
 from disk will be passed to these functions, without detoasting them
 somewhere along the way? Not sure.


I will ask Teodor about it. When situation will be completely clear we
should cleanup confusing comments.
Also, too long GiST key shouldn't cause GiST to fall in its internals. We
should add check somewhere before.

In any case, I've committed this patch now. I left out the changes to
 compress and decompress functions, as that's unrelated to the patch at
 hand. I ran the regression test on a Windows VM, and updated the expected
 output file that the Windows build used. That leaves two alternative
 expected output files unmodified - the buildfarm will tell us whether
 they're still needed.

 Thanks and congrats on your first committed patch, Stas! :-)


Stas, congratulations from me too! Now, it's time to rebase another patches
against head. :)

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Cube extension point support // GSoC'13

2013-10-21 Thread Tom Lane
Alexander Korotkov aekorot...@gmail.com writes:
 On Mon, Oct 21, 2013 at 11:06 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:
 I guess it can't happen. Or is it possible that a toasted value that came
 from disk will be passed to these functions, without detoasting them
 somewhere along the way? Not sure.

 I will ask Teodor about it. When situation will be completely clear we
 should cleanup confusing comments.

I believe the reason GIST has compress/decompress functions is not for
TOAST (they predate that, if memory serves), but to allow the on-disk
representation of an index entry to be different from the data type's
normal representation in other ways --- think lossy storage in particular.

It's possible that the code in cube's decompress/decompress functions is
unnecessary because cube doesn't actually do any such compression; the
code might've been copy-pasted from somewhere else without adequate
thought about whether it was useful for cubes.  I'd want to see some
discussion about why it's okay to change it before we do so, though.
In any case it seems separate from the claimed purpose of this patch
and thus better done in a different patch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cube extension point support // GSoC'13

2013-10-21 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 I believe the reason GIST has compress/decompress functions is not for
 TOAST (they predate that, if memory serves), but to allow the on-disk
 representation of an index entry to be different from the data type's
 normal representation in other ways --- think lossy storage in particular.

My understanding of the use case for those functions is to do with
storing a different data type in the index upper nodes and in the index
leafs. It should be possible to do that in a non-lossy way, so that you
would implement compress/decompress and not declare the RECHECK bits.

Then again I'm talking from 8.3 era memories of when I tried to
understand GiST enough to code the prefix extension.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cube extension point support // GSoC'13

2013-10-11 Thread Heikki Linnakangas

On 09.10.2013 21:07, Robert Haas wrote:

On Tue, Sep 24, 2013 at 9:07 AM, Stas Kelvichstas.kelv...@gmail.com  wrote:

Hello

There is new version of patch. I have separated ordering operators to different 
patch (https://commitfest.postgresql.org/action/patch_view?id=1243), fixed 
formatting issues and implemented backward compatibility with old-style points 
in cube_is_point() and cube_out().

Also comparing output files I've discovered that this four files is combination 
of two types of different behavior:

1) SELECT '-1e-700'::cube AS cube;
can be (0) or (-0)

2) Amount of zeros in exponent of floating point, i.e. SELECT '1e27'::cube AS 
cube;
  can be (1e+027) or (1e+27)

On my system (OSX) it is second option in both situations. I've also tested it 
on FreeBSD 9.0 and Ubuntu 12.04 with the same results. So is there some ideas 
how can I reproduce such results?


Heikki, are you going to review this further for this CommitFest?


Sorry, I didn't realize the ball was in my court.

I went through the patch now, kibitzing over some minor style issues. 
Attached is a new version.


This seems good for commit except for two things:

1. The alternative expected output files still need to be updated. Stas 
couldn't find a system where some of those file were used. One option is 
to simply commit the patch as is, and see if the buildfarm goes red. If 
it doesn't, we can simply remove the alternative files - they are not 
used on any supported platform. If some animals go red, we'll get the 
required diff from the buildfarm output and apply. So this isn't a 
show-stopper.


2. I didn't understand this change:


@@ -422,24 +439,14 @@ g_cube_union(PG_FUNCTION_ARGS)
 Datum
 g_cube_compress(PG_FUNCTION_ARGS)
 {
-   PG_RETURN_DATUM(PG_GETARG_DATUM(0));
+   GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
+   PG_RETURN_POINTER(entry);
 }

 Datum
 g_cube_decompress(PG_FUNCTION_ARGS)
 {
GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
-   NDBOX  *key = DatumGetNDBOX(PG_DETOAST_DATUM(entry-key));
-
-   if (key != DatumGetNDBOX(entry-key))
-   {
-   GISTENTRY  *retval = (GISTENTRY *) palloc(sizeof(GISTENTRY));
-
-   gistentryinit(*retval, PointerGetDatum(key),
- entry-rel, entry-page,
- entry-offset, FALSE);
-   PG_RETURN_POINTER(retval);
-   }
PG_RETURN_POINTER(entry);
 }



What did the removed code do, and why isn't it needed anymore?


Is there a prerequisite patch that hasn't been committed yet?


No.

- Heikki
diff --git a/contrib/cube/cube.c b/contrib/cube/cube.c
index dab0e6e..853acbe 100644
--- a/contrib/cube/cube.c
+++ b/contrib/cube/cube.c
@@ -144,6 +144,7 @@ bool		g_cube_internal_consistent(NDBOX *key, NDBOX *query, StrategyNumber strate
 ** Auxiliary funxtions
 */
 static double distance_1D(double a1, double a2, double b1, double b2);
+static bool	cube_is_point_internal(NDBOX *cube);
 
 
 /*
@@ -181,6 +182,7 @@ cube_a_f8_f8(PG_FUNCTION_ARGS)
 	int			i;
 	int			dim;
 	int			size;
+	bool		point;
 	double	   *dur,
 			   *dll;
 
@@ -198,16 +200,32 @@ cube_a_f8_f8(PG_FUNCTION_ARGS)
 	dur = ARRPTR(ur);
 	dll = ARRPTR(ll);
 
-	size = offsetof(NDBOX, x[0]) +sizeof(double) * 2 * dim;
+	/* Check if it's a point */
+	point = true;
+	for (i = 0; i  dim; i++)
+	{
+		if (dur[i] != dll[i])
+		{
+			point = false;
+			break;
+		}
+	}
+
+	size = point ? POINT_SIZE(dim) : CUBE_SIZE(dim);
 	result = (NDBOX *) palloc0(size);
 	SET_VARSIZE(result, size);
-	result-dim = dim;
+	SET_DIM(result, dim);
 
 	for (i = 0; i  dim; i++)
-	{
 		result-x[i] = dur[i];
-		result-x[i + dim] = dll[i];
+
+	if (!point)
+	{
+		for (i = 0; i  dim; i++)
+			result-x[i + dim] = dll[i];
 	}
+	else
+		SET_POINT_BIT(result);
 
 	PG_RETURN_NDBOX(result);
 }
@@ -234,16 +252,14 @@ cube_a_f8(PG_FUNCTION_ARGS)
 
 	dur = ARRPTR(ur);
 
-	size = offsetof(NDBOX, x[0]) +sizeof(double) * 2 * dim;
+	size = POINT_SIZE(dim);
 	result = (NDBOX *) palloc0(size);
 	SET_VARSIZE(result, size);
-	result-dim = dim;
+	SET_DIM(result, dim);
+	SET_POINT_BIT(result);
 
 	for (i = 0; i  dim; i++)
-	{
 		result-x[i] = dur[i];
-		result-x[i + dim] = dur[i];
-	}
 
 	PG_RETURN_NDBOX(result);
 }
@@ -267,14 +283,17 @@ cube_subset(PG_FUNCTION_ARGS)
 	dx = (int32 *) ARR_DATA_PTR(idx);
 
 	dim = ARRNELEMS(idx);
-	size = offsetof(NDBOX, x[0]) +sizeof(double) * 2 * dim;
+	size = IS_POINT(c) ? POINT_SIZE(dim) : CUBE_SIZE(dim);
 	result = (NDBOX *) palloc0(size);
 	SET_VARSIZE(result, size);
-	result-dim = dim;
+	SET_DIM(result, dim);
+
+	if (IS_POINT(c))
+		SET_POINT_BIT(result);
 
 	for (i = 0; i  dim; i++)
 	{
-		if ((dx[i] = 0) || (dx[i]  c-dim))
+		if ((dx[i] = 0) || (dx[i]  DIM(c)))
 		{
 			pfree(result);
 			ereport(ERROR,
@@ -282,7 +301,8 @@ cube_subset(PG_FUNCTION_ARGS)
 	 errmsg(Index out of bounds)));
 		}
 		result-x[i] = 

Re: [HACKERS] Cube extension point support // GSoC'13

2013-10-11 Thread Alexander Korotkov
On Fri, Oct 11, 2013 at 5:56 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 2. I didn't understand this change:

  @@ -422,24 +439,14 @@ g_cube_union(PG_FUNCTION_ARGS)
  Datum
  g_cube_compress(PG_FUNCTION_**ARGS)
  {
 -   PG_RETURN_DATUM(PG_GETARG_**DATUM(0));
 +   GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
 +   PG_RETURN_POINTER(entry);
  }

  Datum
  g_cube_decompress(PG_FUNCTION_**ARGS)
  {
 GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
 -   NDBOX  *key = DatumGetNDBOX(PG_DETOAST_**DATUM(entry-key));
 -
 -   if (key != DatumGetNDBOX(entry-key))
 -   {
 -   GISTENTRY  *retval = (GISTENTRY *)
 palloc(sizeof(GISTENTRY));
 -
 -   gistentryinit(*retval, PointerGetDatum(key),
 - entry-rel, entry-page,
 - entry-offset, FALSE);
 -   PG_RETURN_POINTER(retval);
 -   }
 PG_RETURN_POINTER(entry);
  }


 What did the removed code do, and why isn't it needed anymore?


I don't understand this place even more generally. What decompress function
do is to detoast key and return new gist entry with it if needed. But can
GiST key ever be toasted? My experiments show that answer is no, it can't.
When too long key is passed then GiST falls during inserting key into page.
And I didn't find any code doing GiST key toast in git.
However taking care about key toast is sequentially repeated in GiST
related code. For example, in contrib/intarray/_int.h

#define SIGLENINT  63 /* 122 = key will *toast*, so very slow!!! */

Any ideas?

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Cube extension point support // GSoC'13

2013-10-09 Thread Robert Haas
On Tue, Sep 24, 2013 at 9:07 AM, Stas Kelvich stas.kelv...@gmail.com wrote:
 Hello

 There is new version of patch. I have separated ordering operators to 
 different patch 
 (https://commitfest.postgresql.org/action/patch_view?id=1243), fixed 
 formatting issues and implemented backward compatibility with old-style 
 points in cube_is_point() and cube_out().

 Also comparing output files I've discovered that this four files is 
 combination of two types of different behavior:

 1) SELECT '-1e-700'::cube AS cube;
 can be (0) or (-0)

 2) Amount of zeros in exponent of floating point, i.e. SELECT '1e27'::cube AS 
 cube;
  can be (1e+027) or (1e+27)

 On my system (OSX) it is second option in both situations. I've also tested 
 it on FreeBSD 9.0 and Ubuntu 12.04 with the same results. So is there some 
 ideas how can I reproduce such results?

Heikki, are you going to review this further for this CommitFest?

Is there a prerequisite patch that hasn't been committed yet?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cube extension point support // GSoC'13

2013-09-24 Thread Stas Kelvich
Hello

There is new version of patch. I have separated ordering operators to different 
patch (https://commitfest.postgresql.org/action/patch_view?id=1243), fixed 
formatting issues and implemented backward compatibility with old-style points 
in cube_is_point() and cube_out().

Also comparing output files I've discovered that this four files is combination 
of two types of different behavior:

1) SELECT '-1e-700'::cube AS cube;
can be (0) or (-0)

2) Amount of zeros in exponent of floating point, i.e. SELECT '1e27'::cube AS 
cube;
 can be (1e+027) or (1e+27)

On my system (OSX) it is second option in both situations. I've also tested it 
on FreeBSD 9.0 and Ubuntu 12.04 with the same results. So is there some ideas 
how can I reproduce such results?

Stas.



points.diff
Description: Binary data


On Sep 16, 2013, at 10:48 AM, Heikki Linnakangas wrote:

 On 12.07.2013 14:57, Stas Kelvich wrote:
 Hello.
 
 here is a patch adding to cube extension support for compressed 
 representation of point cubes. If cube is a point, i.e. has coincident lower 
 left and upper right corners, than only one corner is stored. First bit of 
 the cube header indicates whether the cube is point or not. Few moments:
 
 * Patch preserves binary compatibility with old indices
 * All functions that create cubes from user input, check whether it is a 
 point or not
 * All internal functions that can return cubes takes care of all cases where 
 a cube might become a point
 
 Great!
 
 cube_is_point() needs to still handle old-style points. An NDBOX without the 
 point-flag set, where the ll and ur coordinates for each dimension are the 
 same, still needs to be considered a point. Even if you are careful to never 
 construct such structs in the code, they can be present on-disk if you have 
 upgraded from an earlier version with pg_upgrade. Same in cube_out().
 
 * Added tests for checking correct point behavior
 
 You'll need to adjust all the expected output files, not only cube_1.out.
 
 Also this patch includes adapted Alexander Korotkov's patch with kNN-based 
 ordering operator, which he wrote for postgresql-9.0beta1 with knngist 
 patch. More info there 
 http://www.postgresql.org/message-id/aanlktimhfaq6hcibrnk0tlcqmiyhywhwaq2zd87wb...@mail.gmail.com
 
 To make review easier, it would be better to keep that as a separate patch, 
 actually. Could you split it up again, please?
 
 - Heikki


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cube extension point support // GSoC'13

2013-09-16 Thread Heikki Linnakangas

On 12.07.2013 14:57, Stas Kelvich wrote:

Hello.

here is a patch adding to cube extension support for compressed representation 
of point cubes. If cube is a point, i.e. has coincident lower left and upper 
right corners, than only one corner is stored. First bit of the cube header 
indicates whether the cube is point or not. Few moments:

* Patch preserves binary compatibility with old indices
* All functions that create cubes from user input, check whether it is a point 
or not
* All internal functions that can return cubes takes care of all cases where a 
cube might become a point


Great!

cube_is_point() needs to still handle old-style points. An NDBOX without 
the point-flag set, where the ll and ur coordinates for each dimension 
are the same, still needs to be considered a point. Even if you are 
careful to never construct such structs in the code, they can be present 
on-disk if you have upgraded from an earlier version with pg_upgrade. 
Same in cube_out().



* Added tests for checking correct point behavior


You'll need to adjust all the expected output files, not only cube_1.out.


Also this patch includes adapted Alexander Korotkov's patch with kNN-based 
ordering operator, which he wrote for postgresql-9.0beta1 with knngist patch. 
More info there 
http://www.postgresql.org/message-id/aanlktimhfaq6hcibrnk0tlcqmiyhywhwaq2zd87wb...@mail.gmail.com


To make review easier, it would be better to keep that as a separate 
patch, actually. Could you split it up again, please?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cube extension point support // GSoC'13

2013-07-17 Thread Alexander Korotkov
On Fri, Jul 12, 2013 at 3:57 PM, Stas Kelvich stas.kelv...@gmail.comwrote:

 Hello.

 here is a patch adding to cube extension support for compressed
 representation of point cubes. If cube is a point, i.e. has coincident
 lower left and upper right corners, than only one corner is stored. First
 bit of the cube header indicates whether the cube is point or not. Few
 moments:

 * Patch preserves binary compatibility with old indices


New representation of points will work in both index and heap. So, we
should speak about just compatibility with old cubes.


 * All functions that create cubes from user input, check whether it is a
 point or not
 * All internal functions that can return cubes takes care of all cases
 where a cube might become a point
 * Added tests for checking correct point behavior

 Also this patch includes adapted Alexander Korotkov's patch with kNN-based
 ordering operator, which he wrote for postgresql-9.0beta1 with knngist
 patch. More info there
 http://www.postgresql.org/message-id/aanlktimhfaq6hcibrnk0tlcqmiyhywhwaq2zd87wb...@mail.gmail.com


I think ordering operator should be extracted into separated patch together
with another ordering operators of your project.


Patch contains some formatting issues. For example, this comment

/* Point can arise in two cases:
   1) When argument is point and r == 0
   2) When all coordinates was set to their averages */

should contain star sign on the beginning of each line. Also it will be
reflowed by pgindent. Correct formatting for this comment should look like
this:

/*--
 * Point can arise in two cases:
 * 1) When argument is point and r == 0
 * 2) When all coordinates was set to their averages
 */

See coding convention for details:
http://www.postgresql.org/docs/current/static/source-format.html

--
With best regards,
Alexander Korotkov.