Re: [HACKERS] Review: GiST support for UUIDs

2016-02-05 Thread Paul Jungwirth

On 12/25/2015 03:23 AM, Ildus Kurbangaliev wrote:

On Fri, 25 Dec 2015 13:34:25 +0300
Teodor Sigaev  wrote:

First, variables (high and low) should not be declared in the middle
of code. Second, assert will fail if ua.v64[0] == ub.v64[0] and
ua.v64[1] > ub.v64[1] although it's a possible and allowed case.
Third, actually you multiply high value by 2^64 anf low by 2^-64.
Seems, it's needed to do only one multiplication.


Thank you for review. Fixed.


Just wanted to follow up on this and see about getting it added to the 
next commitfest. It looks like Iluds's latest patch 
(btree_gist_uuid_5.patch) doesn't appear on the commitfest page here:


https://commitfest.postgresql.org/7/332/

Also the last few emails of the thread don't show up, although you can 
read them here:


http://www.postgresql.org/message-id/20151225142340.46e577dd@lp

I'm not sure the next step here. Do I click "Move to next CF" in the 
"Status" dropdown? Apologies for not being familiar with the protocol. 
I'd be sad to see this work just get ignored.


Thanks,
Paul



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: GiST support for UUIDs

2015-12-25 Thread Teodor Sigaev

Thank you, but I have some notices about
static float
uuid_parts_distance(pg_uuid_t *a, pg_uuid_t *b)
{
pg_uuid_t ua, ub;
const double mp = pow(2, -64);

uuid_cnv(a, );
uuid_cnv(b, );

Assert(ua.v64[0] > ub.v64[0]);
uint64 high = ua.v64[0] - ub.v64[0];
uint64 low = ua.v64[1] - ub.v64[1];
if (low > ua.v64[1])
high--;

return (float) (ldexp(high, 64) + (double) low * mp);
}

First, variables (high and low) should not be declared in the middle of code. 
Second, assert will fail if ua.v64[0] == ub.v64[0] and
ua.v64[1] > ub.v64[1] although it's a possible and allowed case. Third, actually 
you multiply high value by 2^64 anf low by 2^-64. Seems, it's needed to do only 
one multiplication.



Ildus Kurbangaliev wrote:

On Wed, 23 Dec 2015 16:36:23 -0800
Paul Jungwirth  wrote:


On 12/23/2015 08:10 AM, Ildus Kurbangaliev wrote:

There is a more improved version of the patch. Main idea is to
present uuid as two uint64 values, and make comparisons and penalty
calculation based on these values. This approach is much faster
than using memcmp for uuid comparisons.


Thank you for picking this up! I'm sorry I was not able to work on it
the last few months. I'm very glad to see someone wrapping it up. I'm
not a reviewer, but personally it looks like a good change to me.

Happy holidays,

Paul






Thanks! The patch was almost done and ready. I attached new version of
the patch with compability changes.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: GiST support for UUIDs

2015-12-25 Thread Ildus Kurbangaliev
On Fri, 25 Dec 2015 13:34:25 +0300
Teodor Sigaev  wrote:

> Thank you, but I have some notices about
> static float
> uuid_parts_distance(pg_uuid_t *a, pg_uuid_t *b)
> {
>  pg_uuid_t ua, ub;
>  const double mp = pow(2, -64);
> 
>  uuid_cnv(a, );
>  uuid_cnv(b, );
> 
>  Assert(ua.v64[0] > ub.v64[0]);
>  uint64 high = ua.v64[0] - ub.v64[0];
>  uint64 low = ua.v64[1] - ub.v64[1];
>  if (low > ua.v64[1])
>  high--;
> 
>  return (float) (ldexp(high, 64) + (double) low * mp);
> }
> 
> First, variables (high and low) should not be declared in the middle
> of code. Second, assert will fail if ua.v64[0] == ub.v64[0] and
> ua.v64[1] > ub.v64[1] although it's a possible and allowed case.
> Third, actually you multiply high value by 2^64 anf low by 2^-64.
> Seems, it's needed to do only one multiplication.

Thank you for review. Fixed.

-- 
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/contrib/btree_gist/Makefile b/contrib/btree_gist/Makefile
index a4b2cc7..43a6c5d 100644
--- a/contrib/btree_gist/Makefile
+++ b/contrib/btree_gist/Makefile
@@ -6,16 +6,16 @@ OBJS =  btree_gist.o btree_utils_num.o btree_utils_var.o btree_int2.o \
 btree_int4.o btree_int8.o btree_float4.o btree_float8.o btree_cash.o \
 btree_oid.o btree_ts.o btree_time.o btree_date.o btree_interval.o \
 btree_macaddr.o btree_inet.o btree_text.o btree_bytea.o btree_bit.o \
-btree_numeric.o $(WIN32RES)
+btree_numeric.o btree_uuid.o $(WIN32RES)
 
 EXTENSION = btree_gist
-DATA = btree_gist--1.1.sql btree_gist--unpackaged--1.0.sql \
-	btree_gist--1.0--1.1.sql
+DATA = btree_gist--1.2.sql btree_gist--unpackaged--1.0.sql \
+	btree_gist--1.0--1.1.sql btree_gist--1.1--1.2.sql
 PGFILEDESC = "btree_gist - B-tree equivalent GiST operator classes"
 
 REGRESS = init int2 int4 int8 float4 float8 cash oid timestamp timestamptz \
 time timetz date interval macaddr inet cidr text varchar char bytea \
-bit varbit numeric not_equal
+bit varbit numeric uuid not_equal
 
 SHLIB_LINK += $(filter -lm, $(LIBS))
 
diff --git a/contrib/btree_gist/btree_gist--1.1--1.2.sql b/contrib/btree_gist/btree_gist--1.1--1.2.sql
new file mode 100644
index 000..376e884
--- /dev/null
+++ b/contrib/btree_gist/btree_gist--1.1--1.2.sql
@@ -0,0 +1,64 @@
+/* contrib/btree_gist/btree_gist--1.1--1.2.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION btree_gist UPDATE TO '1.2'" to load this file. \quit
+
+-- Add support for indexing UUID columns
+
+-- define the GiST support methods
+CREATE FUNCTION gbt_uuid_consistent(internal,uuid,int2,oid,internal)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION gbt_uuid_fetch(internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION gbt_uuid_compress(internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION gbt_uuid_penalty(internal,internal,internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION gbt_uuid_picksplit(internal, internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION gbt_uuid_union(bytea, internal)
+RETURNS gbtreekey16
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION gbt_uuid_same(internal, internal, internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+-- Create the operator class
+CREATE OPERATOR CLASS gist_uuid_ops
+DEFAULT FOR TYPE uuid USING gist
+AS
+	OPERATOR	1	<   ,
+	OPERATOR	2	<=  ,
+	OPERATOR	3	=   ,
+	OPERATOR	4	>=  ,
+	OPERATOR	5	>   ,
+	FUNCTION	1	gbt_uuid_consistent (internal, uuid, int2, oid, internal),
+	FUNCTION	2	gbt_uuid_union (bytea, internal),
+	FUNCTION	3	gbt_uuid_compress (internal),
+	FUNCTION	4	gbt_decompress (internal),
+	FUNCTION	5	gbt_uuid_penalty (internal, internal, internal),
+	FUNCTION	6	gbt_uuid_picksplit (internal, internal),
+	FUNCTION	7	gbt_uuid_same (internal, internal, internal),
+	STORAGE		gbtreekey32;
+
+ALTER OPERATOR FAMILY gist_uuid_ops USING gist ADD
+	OPERATOR	6	<>  (uuid, uuid) ,
+	FUNCTION	9 (uuid, uuid) gbt_uuid_fetch (internal) ;
diff --git a/contrib/btree_gist/btree_gist--1.1.sql b/contrib/btree_gist/btree_gist--1.1.sql
deleted file mode 100644
index cdec964..000
--- a/contrib/btree_gist/btree_gist--1.1.sql
+++ /dev/null
@@ -1,1570 +0,0 @@
-/* contrib/btree_gist/btree_gist--1.0.sql */
-
--- complain if script is sourced in psql, rather than via CREATE EXTENSION
-\echo Use "CREATE EXTENSION btree_gist" to load this file. \quit
-
-CREATE FUNCTION gbtreekey4_in(cstring)
-RETURNS gbtreekey4
-AS 'MODULE_PATHNAME', 'gbtreekey_in'
-LANGUAGE C IMMUTABLE STRICT;
-
-CREATE FUNCTION gbtreekey4_out(gbtreekey4)
-RETURNS cstring
-AS 'MODULE_PATHNAME', 'gbtreekey_out'
-LANGUAGE C IMMUTABLE STRICT;
-
-CREATE TYPE 

Re: [HACKERS] Review: GiST support for UUIDs

2015-12-24 Thread Ildus Kurbangaliev
On Wed, 23 Dec 2015 16:36:23 -0800
Paul Jungwirth  wrote:

> On 12/23/2015 08:10 AM, Ildus Kurbangaliev wrote:
> > There is a more improved version of the patch. Main idea is to
> > present uuid as two uint64 values, and make comparisons and penalty
> > calculation based on these values. This approach is much faster
> > than using memcmp for uuid comparisons.  
> 
> Thank you for picking this up! I'm sorry I was not able to work on it 
> the last few months. I'm very glad to see someone wrapping it up. I'm 
> not a reviewer, but personally it looks like a good change to me.
> 
> Happy holidays,
> 
> Paul
> 
> 
> 
> 

Thanks! The patch was almost done and ready. I attached new version of
the patch with compability changes.

-- 
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/contrib/btree_gist/Makefile b/contrib/btree_gist/Makefile
index a4b2cc7..43a6c5d 100644
--- a/contrib/btree_gist/Makefile
+++ b/contrib/btree_gist/Makefile
@@ -6,16 +6,16 @@ OBJS =  btree_gist.o btree_utils_num.o btree_utils_var.o btree_int2.o \
 btree_int4.o btree_int8.o btree_float4.o btree_float8.o btree_cash.o \
 btree_oid.o btree_ts.o btree_time.o btree_date.o btree_interval.o \
 btree_macaddr.o btree_inet.o btree_text.o btree_bytea.o btree_bit.o \
-btree_numeric.o $(WIN32RES)
+btree_numeric.o btree_uuid.o $(WIN32RES)
 
 EXTENSION = btree_gist
-DATA = btree_gist--1.1.sql btree_gist--unpackaged--1.0.sql \
-	btree_gist--1.0--1.1.sql
+DATA = btree_gist--1.2.sql btree_gist--unpackaged--1.0.sql \
+	btree_gist--1.0--1.1.sql btree_gist--1.1--1.2.sql
 PGFILEDESC = "btree_gist - B-tree equivalent GiST operator classes"
 
 REGRESS = init int2 int4 int8 float4 float8 cash oid timestamp timestamptz \
 time timetz date interval macaddr inet cidr text varchar char bytea \
-bit varbit numeric not_equal
+bit varbit numeric uuid not_equal
 
 SHLIB_LINK += $(filter -lm, $(LIBS))
 
diff --git a/contrib/btree_gist/btree_gist--1.1--1.2.sql b/contrib/btree_gist/btree_gist--1.1--1.2.sql
new file mode 100644
index 000..376e884
--- /dev/null
+++ b/contrib/btree_gist/btree_gist--1.1--1.2.sql
@@ -0,0 +1,64 @@
+/* contrib/btree_gist/btree_gist--1.1--1.2.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION btree_gist UPDATE TO '1.2'" to load this file. \quit
+
+-- Add support for indexing UUID columns
+
+-- define the GiST support methods
+CREATE FUNCTION gbt_uuid_consistent(internal,uuid,int2,oid,internal)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION gbt_uuid_fetch(internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION gbt_uuid_compress(internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION gbt_uuid_penalty(internal,internal,internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION gbt_uuid_picksplit(internal, internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION gbt_uuid_union(bytea, internal)
+RETURNS gbtreekey16
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION gbt_uuid_same(internal, internal, internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+-- Create the operator class
+CREATE OPERATOR CLASS gist_uuid_ops
+DEFAULT FOR TYPE uuid USING gist
+AS
+	OPERATOR	1	<   ,
+	OPERATOR	2	<=  ,
+	OPERATOR	3	=   ,
+	OPERATOR	4	>=  ,
+	OPERATOR	5	>   ,
+	FUNCTION	1	gbt_uuid_consistent (internal, uuid, int2, oid, internal),
+	FUNCTION	2	gbt_uuid_union (bytea, internal),
+	FUNCTION	3	gbt_uuid_compress (internal),
+	FUNCTION	4	gbt_decompress (internal),
+	FUNCTION	5	gbt_uuid_penalty (internal, internal, internal),
+	FUNCTION	6	gbt_uuid_picksplit (internal, internal),
+	FUNCTION	7	gbt_uuid_same (internal, internal, internal),
+	STORAGE		gbtreekey32;
+
+ALTER OPERATOR FAMILY gist_uuid_ops USING gist ADD
+	OPERATOR	6	<>  (uuid, uuid) ,
+	FUNCTION	9 (uuid, uuid) gbt_uuid_fetch (internal) ;
diff --git a/contrib/btree_gist/btree_gist--1.1.sql b/contrib/btree_gist/btree_gist--1.1.sql
deleted file mode 100644
index cdec964..000
--- a/contrib/btree_gist/btree_gist--1.1.sql
+++ /dev/null
@@ -1,1570 +0,0 @@
-/* contrib/btree_gist/btree_gist--1.0.sql */
-
--- complain if script is sourced in psql, rather than via CREATE EXTENSION
-\echo Use "CREATE EXTENSION btree_gist" to load this file. \quit
-
-CREATE FUNCTION gbtreekey4_in(cstring)
-RETURNS gbtreekey4
-AS 'MODULE_PATHNAME', 'gbtreekey_in'
-LANGUAGE C IMMUTABLE STRICT;
-
-CREATE FUNCTION gbtreekey4_out(gbtreekey4)
-RETURNS cstring
-AS 'MODULE_PATHNAME', 'gbtreekey_out'
-LANGUAGE C IMMUTABLE STRICT;
-
-CREATE TYPE gbtreekey4 (
-	INTERNALLENGTH = 4,
-	INPUT  = gbtreekey4_in,
-	OUTPUT = gbtreekey4_out
-);
-
-CREATE FUNCTION gbtreekey8_in(cstring)

Re: [HACKERS] Review: GiST support for UUIDs

2015-12-23 Thread Ildus Kurbangaliev
On Tue, 15 Sep 2015 09:03:00 +0300
Teodor Sigaev  wrote:

> Paul Jungwirth wrote:
> >> Or something like this in pseudocode:
> >>
> >> numeric = int8_numeric(*(uint64 *)(>data[0])) *
> >> int8_numeric(MAX_INT64) + int8_numeric(*(uint64 *)(>data[8]))  
> >
> > This is more like what I was hoping for, rather than converting to a
> > string and back. Would you mind confirming for me: int8_numeric
> > turns an int64 into an arbitrary-precision numeric Datum? So there
> > is no overflow risk here?  
> Sure, no risk. Numeric precision is limited 1000 digits with
> magnitude 1000
> 
> 
> >
> > But it looks like int8_numeric takes a *signed* integer. Isn't that
> > a problem? I suppose I could get it working though by jumping
> > through some hoops.  
> signed vs unsigned problem does not exist actually, because of
> precision of numeric is much better than we need and presence of
> numeric_abs.
> 
> 
> > Yes, but that seems like an unrealistic concern. Even "only" 2^64 is
> > 18446744073709551616 records. Or another way of thinking about it,
> > if 64  
> :) "only"
> 
> > bits (or 32) is a good enough penalty input for all the other data
> > types, why not for UUIDs? Keep in mind type 3-5 UUIDs should be
> > evenly distributed. Perhaps we could use the bottom half (instead
> > of the top) to ensure even distribution for type 1 and 2 too.  
> it must be. But UUID could be taken for unknown source and we can't 
> predict distribution. I believe pg generates them correctly, but
> other generators could be not so good.
> 
> > It seems to me that using only the top half should be okay, but if
> > you believe it's not I'll go with the int8_numeric approach (in
> > three chunks instead of two to work around signed-vs-unsigned).  
> Yes, I believe. It is not good case when we can ruin index
> performance with special set of value.
> 
> Some difficulty  which I see is how to transform numeric penalty to 
> double as it requires by GiST. May be, penalty/(INT_MAX64*INT_MAX64 + 
> INT_MAX64)? Keep in mind, that penalty is how range will be enlarged 
> after range union, so minimal penalty is 0 and maximum is 
> 0x
> 

There is a more improved version of the patch. Main idea is to present
uuid as two uint64 values, and make comparisons and penalty calculation
based on these values. This approach is much faster than using memcmp
for uuid comparisons.

-- 
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/contrib/btree_gist/Makefile b/contrib/btree_gist/Makefile
index a4b2cc7..43a6c5d 100644
--- a/contrib/btree_gist/Makefile
+++ b/contrib/btree_gist/Makefile
@@ -6,16 +6,16 @@ OBJS =  btree_gist.o btree_utils_num.o btree_utils_var.o btree_int2.o \
 btree_int4.o btree_int8.o btree_float4.o btree_float8.o btree_cash.o \
 btree_oid.o btree_ts.o btree_time.o btree_date.o btree_interval.o \
 btree_macaddr.o btree_inet.o btree_text.o btree_bytea.o btree_bit.o \
-btree_numeric.o $(WIN32RES)
+btree_numeric.o btree_uuid.o $(WIN32RES)
 
 EXTENSION = btree_gist
-DATA = btree_gist--1.1.sql btree_gist--unpackaged--1.0.sql \
-	btree_gist--1.0--1.1.sql
+DATA = btree_gist--1.2.sql btree_gist--unpackaged--1.0.sql \
+	btree_gist--1.0--1.1.sql btree_gist--1.1--1.2.sql
 PGFILEDESC = "btree_gist - B-tree equivalent GiST operator classes"
 
 REGRESS = init int2 int4 int8 float4 float8 cash oid timestamp timestamptz \
 time timetz date interval macaddr inet cidr text varchar char bytea \
-bit varbit numeric not_equal
+bit varbit numeric uuid not_equal
 
 SHLIB_LINK += $(filter -lm, $(LIBS))
 
diff --git a/contrib/btree_gist/btree_gist--1.1--1.2.sql b/contrib/btree_gist/btree_gist--1.1--1.2.sql
new file mode 100644
index 000..376e884
--- /dev/null
+++ b/contrib/btree_gist/btree_gist--1.1--1.2.sql
@@ -0,0 +1,64 @@
+/* contrib/btree_gist/btree_gist--1.1--1.2.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION btree_gist UPDATE TO '1.2'" to load this file. \quit
+
+-- Add support for indexing UUID columns
+
+-- define the GiST support methods
+CREATE FUNCTION gbt_uuid_consistent(internal,uuid,int2,oid,internal)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION gbt_uuid_fetch(internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION gbt_uuid_compress(internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION gbt_uuid_penalty(internal,internal,internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION gbt_uuid_picksplit(internal, internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION gbt_uuid_union(bytea, internal)
+RETURNS gbtreekey16
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION gbt_uuid_same(internal, 

Re: [HACKERS] Review: GiST support for UUIDs

2015-12-23 Thread Paul Jungwirth

On 12/23/2015 08:10 AM, Ildus Kurbangaliev wrote:

There is a more improved version of the patch. Main idea is to present
uuid as two uint64 values, and make comparisons and penalty calculation
based on these values. This approach is much faster than using memcmp
for uuid comparisons.


Thank you for picking this up! I'm sorry I was not able to work on it 
the last few months. I'm very glad to see someone wrapping it up. I'm 
not a reviewer, but personally it looks like a good change to me.


Happy holidays,

Paul






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: GiST support for UUIDs

2015-12-22 Thread Michael Paquier
On Tue, Sep 15, 2015 at 3:03 PM, Teodor Sigaev  wrote:
>
>
> Paul Jungwirth wrote:
>>>
>>> Or something like this in pseudocode:
>>>
>>> numeric = int8_numeric(*(uint64 *)(>data[0])) *
>>> int8_numeric(MAX_INT64) + int8_numeric(*(uint64 *)(>data[8]))
>>
>>
>> This is more like what I was hoping for, rather than converting to a
>> string and back. Would you mind confirming for me: int8_numeric turns an
>> int64 into an arbitrary-precision numeric Datum? So there is no overflow
>> risk here?
>
> Sure, no risk. Numeric precision is limited 1000 digits with magnitude 1000
>
>
>>
>> But it looks like int8_numeric takes a *signed* integer. Isn't that a
>> problem? I suppose I could get it working though by jumping through some
>> hoops.
>
> signed vs unsigned problem does not exist actually, because of precision of
> numeric is much better than we need and presence of numeric_abs.
>
>
>> Yes, but that seems like an unrealistic concern. Even "only" 2^64 is
>> 18446744073709551616 records. Or another way of thinking about it, if 64
>
> :) "only"
>
>> bits (or 32) is a good enough penalty input for all the other data
>> types, why not for UUIDs? Keep in mind type 3-5 UUIDs should be evenly
>> distributed. Perhaps we could use the bottom half (instead of the top)
>> to ensure even distribution for type 1 and 2 too.
>
> it must be. But UUID could be taken for unknown source and we can't predict
> distribution. I believe pg generates them correctly, but other generators
> could be not so good.
>
>> It seems to me that using only the top half should be okay, but if you
>> believe it's not I'll go with the int8_numeric approach (in three chunks
>> instead of two to work around signed-vs-unsigned).
>
> Yes, I believe. It is not good case when we can ruin index performance with
> special set of value.
>
> Some difficulty  which I see is how to transform numeric penalty to double
> as it requires by GiST. May be, penalty/(INT_MAX64*INT_MAX64 + INT_MAX64)?
> Keep in mind, that penalty is how range will be enlarged after range union,
> so minimal penalty is 0 and maximum is 0x

This patch has been marked as "returned with feedback" for CF 2015-11
because of the presence of a review and a certain lack of activity..
Please free to move it to next CF if you think that's more adapted.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: GiST support for UUIDs

2015-09-15 Thread Teodor Sigaev



Paul Jungwirth wrote:

Or something like this in pseudocode:

numeric = int8_numeric(*(uint64 *)(>data[0])) *
int8_numeric(MAX_INT64) + int8_numeric(*(uint64 *)(>data[8]))


This is more like what I was hoping for, rather than converting to a
string and back. Would you mind confirming for me: int8_numeric turns an
int64 into an arbitrary-precision numeric Datum? So there is no overflow
risk here?

Sure, no risk. Numeric precision is limited 1000 digits with magnitude 1000




But it looks like int8_numeric takes a *signed* integer. Isn't that a
problem? I suppose I could get it working though by jumping through some
hoops.
signed vs unsigned problem does not exist actually, because of precision 
of numeric is much better than we need and presence of numeric_abs.




Yes, but that seems like an unrealistic concern. Even "only" 2^64 is
18446744073709551616 records. Or another way of thinking about it, if 64

:) "only"


bits (or 32) is a good enough penalty input for all the other data
types, why not for UUIDs? Keep in mind type 3-5 UUIDs should be evenly
distributed. Perhaps we could use the bottom half (instead of the top)
to ensure even distribution for type 1 and 2 too.
it must be. But UUID could be taken for unknown source and we can't 
predict distribution. I believe pg generates them correctly, but other 
generators could be not so good.



It seems to me that using only the top half should be okay, but if you
believe it's not I'll go with the int8_numeric approach (in three chunks
instead of two to work around signed-vs-unsigned).
Yes, I believe. It is not good case when we can ruin index performance 
with special set of value.


Some difficulty  which I see is how to transform numeric penalty to 
double as it requires by GiST. May be, penalty/(INT_MAX64*INT_MAX64 + 
INT_MAX64)? Keep in mind, that penalty is how range will be enlarged 
after range union, so minimal penalty is 0 and maximum is 
0x


--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: GiST support for UUIDs

2015-09-14 Thread Andres Freund
Please post reviews to the original thread unless that's already humongously 
large. Otherwise somebody needs to manually link up the threads in the CF 
application.

It also makes it much harder to follow the development because there'll likely 
be a new version of the patch after a review. Which then cab either pushed in a 
thread titled review, or in an old one, without context.

Andres
--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Review: GiST support for UUIDs

2015-09-14 Thread Teodor Sigaev

http://www.postgresql.org/message-id/flat/ca+renyvephxto1c7dfbvjp1gymuc0-3qdnwpn30-noo5mpy...@mail.gmail.com#ca+renyvephxto1c7dfbvjp1gymuc0-3qdnwpn30-noo5mpy...@mail.gmail.com

Patch looks perfect but it's still needed some work.

0) rebase to current HEAD (done, in attach)
1) UUIDSIZE ->  UUID_LEN (it's defined in utils/uuid.h, done)
2)
static double
uuid2num(const pg_uuid_t *i)
{
return *((uint64 *)i);
}
   It isn't looked as correct transformation for me. May be, it's better
   to transform to numeric type (UUID looks like a 16-digit hexademical number)
   and follow  gbt_numeric_penalty() logic (or even call directly).




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/
diff --git a/contrib/btree_gist/Makefile b/contrib/btree_gist/Makefile
index a4b2cc7..43a6c5d 100644
--- a/contrib/btree_gist/Makefile
+++ b/contrib/btree_gist/Makefile
@@ -6,16 +6,16 @@ OBJS =  btree_gist.o btree_utils_num.o btree_utils_var.o 
btree_int2.o \
 btree_int4.o btree_int8.o btree_float4.o btree_float8.o btree_cash.o \
 btree_oid.o btree_ts.o btree_time.o btree_date.o btree_interval.o \
 btree_macaddr.o btree_inet.o btree_text.o btree_bytea.o btree_bit.o \
-btree_numeric.o $(WIN32RES)
+btree_numeric.o btree_uuid.o $(WIN32RES)
 
 EXTENSION = btree_gist
-DATA = btree_gist--1.1.sql btree_gist--unpackaged--1.0.sql \
-   btree_gist--1.0--1.1.sql
+DATA = btree_gist--1.2.sql btree_gist--unpackaged--1.0.sql \
+   btree_gist--1.0--1.1.sql btree_gist--1.1--1.2.sql
 PGFILEDESC = "btree_gist - B-tree equivalent GiST operator classes"
 
 REGRESS = init int2 int4 int8 float4 float8 cash oid timestamp timestamptz \
 time timetz date interval macaddr inet cidr text varchar char bytea \
-bit varbit numeric not_equal
+bit varbit numeric uuid not_equal
 
 SHLIB_LINK += $(filter -lm, $(LIBS))
 
diff --git a/contrib/btree_gist/btree_gist--1.1--1.2.sql 
b/contrib/btree_gist/btree_gist--1.1--1.2.sql
new file mode 100644
index 000..376e884
--- /dev/null
+++ b/contrib/btree_gist/btree_gist--1.1--1.2.sql
@@ -0,0 +1,64 @@
+/* contrib/btree_gist/btree_gist--1.1--1.2.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION btree_gist UPDATE TO '1.2'" to load this file. \quit
+
+-- Add support for indexing UUID columns
+
+-- define the GiST support methods
+CREATE FUNCTION gbt_uuid_consistent(internal,uuid,int2,oid,internal)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION gbt_uuid_fetch(internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION gbt_uuid_compress(internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION gbt_uuid_penalty(internal,internal,internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION gbt_uuid_picksplit(internal, internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION gbt_uuid_union(bytea, internal)
+RETURNS gbtreekey16
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION gbt_uuid_same(internal, internal, internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT;
+
+-- Create the operator class
+CREATE OPERATOR CLASS gist_uuid_ops
+DEFAULT FOR TYPE uuid USING gist
+AS
+   OPERATOR1   <   ,
+   OPERATOR2   <=  ,
+   OPERATOR3   =   ,
+   OPERATOR4   >=  ,
+   OPERATOR5   >   ,
+   FUNCTION1   gbt_uuid_consistent (internal, uuid, int2, oid, 
internal),
+   FUNCTION2   gbt_uuid_union (bytea, internal),
+   FUNCTION3   gbt_uuid_compress (internal),
+   FUNCTION4   gbt_decompress (internal),
+   FUNCTION5   gbt_uuid_penalty (internal, internal, internal),
+   FUNCTION6   gbt_uuid_picksplit (internal, internal),
+   FUNCTION7   gbt_uuid_same (internal, internal, internal),
+   STORAGE gbtreekey32;
+
+ALTER OPERATOR FAMILY gist_uuid_ops USING gist ADD
+   OPERATOR6   <>  (uuid, uuid) ,
+   FUNCTION9 (uuid, uuid) gbt_uuid_fetch (internal) ;
diff --git a/contrib/btree_gist/btree_gist--1.1.sql 
b/contrib/btree_gist/btree_gist--1.1.sql
deleted file mode 100644
index cdec964..000
--- a/contrib/btree_gist/btree_gist--1.1.sql
+++ /dev/null
@@ -1,1570 +0,0 @@
-/* contrib/btree_gist/btree_gist--1.0.sql */
-
--- complain if script is sourced in psql, rather than via CREATE EXTENSION
-\echo Use "CREATE EXTENSION btree_gist" to load this file. \quit
-
-CREATE FUNCTION gbtreekey4_in(cstring)
-RETURNS gbtreekey4
-AS 'MODULE_PATHNAME', 'gbtreekey_in'
-LANGUAGE C IMMUTABLE STRICT;
-
-CREATE FUNCTION 

Re: [HACKERS] Review: GiST support for UUIDs

2015-09-14 Thread Teodor Sigaev



Andres Freund wrote:

Please post reviews to the original thread unless that's already humongously 
large.

I've lost original mail. Sorry.


Otherwise somebody needs to manually link up the threads in the CF application.


Of course, I did it



It also makes it much harder to follow the development because there'll likely 
be a new version of the patch
after a review. Which then cab either pushed in a thread titled review, or in 
an old one, without context.


Sorry, I tried to fix that as possible.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: GiST support for UUIDs

2015-09-14 Thread Teodor Sigaev



Paul Jungwirth wrote:

2)
 static double
 uuid2num(const pg_uuid_t *i)
 {
 return *((uint64 *)i);
 }
It isn't looked as correct transformation for me. May be, it's better
to transform to numeric type (UUID looks like a 16-digit hexademical
number)
and follow  gbt_numeric_penalty() logic (or even call directly).


Thanks for the review! A UUID is actually not stored as a string of
hexadecimal digits. (It is normally displayed that way, but with 32
digits, not 16.) Rather it is stored as an unstructured 128-bit value
(which in C is 16 unsigned chars). Here is the easy-to-misread
declaration from src/backend/utils/adt/uuid.c:
Missed number of digit, but nevertheless it doesn't matter for idea. 
Original coding uses only 8 bytes from 16 to compute penalty which could 
cause a problem with index performance. Simple way is just printing each 
4bits  with %02d modifier into string and then make a numeric value with 
a help of numeric_in.


Or something like this in pseudocode:

numeric = int8_numeric(*(uint64 *)(>data[0])) * 
int8_numeric(MAX_INT64) + int8_numeric(*(uint64 *)(>data[8]))



The only other 128-bit type I found in btree_gist was Interval. For that
type we convert to a double using INTERVAL_TO_SEC, then call
penalty_num. By my read that accepts a similar loss of precision.
Right, but precision of double  is enough to represent 1 century 
interval with 0.1 seconds accuracy which is enough for  practical 
usage. In UUID case you will take into account only half of value. Of 
course, GiST will work even with penalty function returning constant but 
each scan could become full-index-scan.




If I'm mistaken about 128-bit integer support, let me know, and maybe we
can do the penalty computation on the whole UUID. Or maybe I should just
convert the uint64 to a double before calling penalty_num? I don't
completely understand what the penalty calculation is all about, so I
welcome suggestions here.


Penalty method calculates how union key will be enlarged if insert will 
be produced in current subtree. It directly affects selectivity of subtree.


--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: GiST support for UUIDs

2015-09-14 Thread Paul Jungwirth

Or something like this in pseudocode:

numeric = int8_numeric(*(uint64 *)(>data[0])) *
int8_numeric(MAX_INT64) + int8_numeric(*(uint64 *)(>data[8]))


This is more like what I was hoping for, rather than converting to a 
string and back. Would you mind confirming for me: int8_numeric turns an 
int64 into an arbitrary-precision numeric Datum? So there is no overflow 
risk here?


But it looks like int8_numeric takes a *signed* integer. Isn't that a 
problem? I suppose I could get it working though by jumping through some 
hoops.



In UUID case you will take into account only half of value. Of
course, GiST will work even with penalty function returning constant but
each scan could become full-index-scan.


Yes, but that seems like an unrealistic concern. Even "only" 2^64 is 
18446744073709551616 records. Or another way of thinking about it, if 64 
bits (or 32) is a good enough penalty input for all the other data 
types, why not for UUIDs? Keep in mind type 3-5 UUIDs should be evenly 
distributed. Perhaps we could use the bottom half (instead of the top) 
to ensure even distribution for type 1 and 2 too.


It seems to me that using only the top half should be okay, but if you 
believe it's not I'll go with the int8_numeric approach (in three chunks 
instead of two to work around signed-vs-unsigned).


Thanks,
Paul



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: GiST support for UUIDs

2015-09-14 Thread Paul Jungwirth

2)
 static double
 uuid2num(const pg_uuid_t *i)
 {
 return *((uint64 *)i);
 }
It isn't looked as correct transformation for me. May be, it's better
to transform to numeric type (UUID looks like a 16-digit hexademical
number)
and follow  gbt_numeric_penalty() logic (or even call directly).


Thanks for the review! A UUID is actually not stored as a string of 
hexadecimal digits. (It is normally displayed that way, but with 32 
digits, not 16.) Rather it is stored as an unstructured 128-bit value 
(which in C is 16 unsigned chars). Here is the easy-to-misread 
declaration from src/backend/utils/adt/uuid.c:


#define UUID_LEN 16
struct pg_uuid_t
{
unsigned char data[UUID_LEN];
};

I would love to just cast this to a 128-bit unsigned int. But it looks 
like Postgres doesn't always have 128-bit ints available, so my code 
takes the top half and uses that for penalty calculations. It seemed to 
me that was "good enough" for this purpose.


The only other 128-bit type I found in btree_gist was Interval. For that 
type we convert to a double using INTERVAL_TO_SEC, then call 
penalty_num. By my read that accepts a similar loss of precision.


If I'm mistaken about 128-bit integer support, let me know, and maybe we 
can do the penalty computation on the whole UUID. Or maybe I should just 
convert the uint64 to a double before calling penalty_num? I don't 
completely understand what the penalty calculation is all about, so I 
welcome suggestions here.


Thanks again,
Paul











--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers