Re: [HACKERS] Selectivity estimation for intarray with @@

2015-07-21 Thread Heikki Linnakangas

On 07/21/2015 03:44 PM, Alexander Korotkov wrote:

While Uriy is on vacation, I've revised this patch a bit.


I whacked this around quite a bit, and I think it's in a committable 
state now. But if you could run whatever tests you were using before on 
this, to make sure it still produces the same estimates, that would be 
great. I didn't change the estimates it should produce, only the code 
structure.


One thing that bothers me slightly with this patch is the way it peeks 
into the Most-Common-Elements arrays, which is produced by the built-in 
type analyze function. If we ever change what statistics are collected 
for arrays, or the way they are stored, this might break. In matchsel, 
why don't we just call the built-in estimator function for each element 
that we need to probe, and not look into the statistics ourselves at 
all? I actually experimented with that, and it did slash much of the 
code, and it would be more future-proof. However, it was also a lot 
slower for queries that contain multiple values. That's understandable: 
the built-in estimator will fetch the statistics tuple, parse the 
arrays, etc. separately for each value in the query_int, while this 
patch will do it only once for the whole query, and perform a simple 
binary search for each value. So overall, I think this is OK as it is. 
But if we find that we need to use the MCE list in this fashion in more 
places in the future, it might be worthwhile to add some support code 
for this in the backend to allow extracting the stats once, and doing 
multiple lightweight estimations using the extracted stats.


Some things I fixed/changed:

* I didn't like that transformOperator() function, which looked up the 
function's name. I replaced it with separate wrapper functions for each 
operator, so that the built-in operator's OID can be hardcoded into each.


* I refactored the matchsel function heavily. I think it's more readable 
now.


* I got rid of the Int4Freq array. It didn't seem significantly easier 
to work with than the separate values/numbers arrays, so I just used 
those directly.


* Also use the matchsel estimator for ~~ (the commutator of @@)

* Also use the estimators for the obsolete @ and ~ operators. Not that I 
care much about those as they are obsolete, but seems strange not to, as 
it's a trivial matter of setting the right estimator function.


* I added an ANALYZE in the regression test. It still won't 
systematically test all the cost estimation code, and there's nothing to 
check that the estimates make sense, but at least more of the code will 
now run.


- Heikki

diff --git a/contrib/intarray/Makefile b/contrib/intarray/Makefile
index 920c5b1..5ea7f2a 100644
--- a/contrib/intarray/Makefile
+++ b/contrib/intarray/Makefile
@@ -2,10 +2,10 @@
 
 MODULE_big = _int
 OBJS = _int_bool.o _int_gist.o _int_op.o _int_tool.o \
-	_intbig_gist.o _int_gin.o $(WIN32RES)
+	_intbig_gist.o _int_gin.o _int_selfuncs.o $(WIN32RES)
 
 EXTENSION = intarray
-DATA = intarray--1.0.sql intarray--unpackaged--1.0.sql
+DATA = intarray--1.1.sql intarray--1.0--1.1.sql intarray--unpackaged--1.0.sql
 PGFILEDESC = intarray - functions and operators for arrays of integers
 
 REGRESS = _int
diff --git a/contrib/intarray/_int_selfuncs.c b/contrib/intarray/_int_selfuncs.c
new file mode 100644
index 000..4896461
--- /dev/null
+++ b/contrib/intarray/_int_selfuncs.c
@@ -0,0 +1,334 @@
+/*-
+ *
+ * _int_selfuncs.c
+ *	  Functions for selectivity estimation of intarray operators
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  contrib/intarray/_int_selfuncs.c
+ *
+ *-
+ */
+#include postgres.h
+#include _int.h
+
+#include access/htup_details.h
+#include catalog/pg_operator.h
+#include catalog/pg_statistic.h
+#include catalog/pg_type.h
+#include utils/selfuncs.h
+#include utils/syscache.h
+#include utils/lsyscache.h
+#include miscadmin.h
+
+PG_FUNCTION_INFO_V1(_int_overlap_sel);
+PG_FUNCTION_INFO_V1(_int_contains_sel);
+PG_FUNCTION_INFO_V1(_int_contained_sel);
+PG_FUNCTION_INFO_V1(_int_overlap_joinsel);
+PG_FUNCTION_INFO_V1(_int_contains_joinsel);
+PG_FUNCTION_INFO_V1(_int_contained_joinsel);
+PG_FUNCTION_INFO_V1(_int_matchsel);
+
+Datum		_int_overlap_sel(PG_FUNCTION_ARGS);
+Datum		_int_contains_sel(PG_FUNCTION_ARGS);
+Datum		_int_contained_sel(PG_FUNCTION_ARGS);
+Datum		_int_overlap_joinsel(PG_FUNCTION_ARGS);
+Datum		_int_contains_joinsel(PG_FUNCTION_ARGS);
+Datum		_int_contained_joinsel(PG_FUNCTION_ARGS);
+Datum		_int_matchsel(PG_FUNCTION_ARGS);
+
+
+static Selectivity int_query_opr_selec(ITEM *item, Datum *values, float4 *freqs,
+	int nmncelems, float4 minfreq);
+static int	compare_val_int4(const void *a, const void *b);
+
+/*
+ * Wrappers around the default array 

Re: [HACKERS] Selectivity estimation for intarray with @@

2015-07-21 Thread Alexander Korotkov
On Tue, Jul 21, 2015 at 6:49 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

 On 07/21/2015 03:44 PM, Alexander Korotkov wrote:

 While Uriy is on vacation, I've revised this patch a bit.


 I whacked this around quite a bit, and I think it's in a committable state
 now. But if you could run whatever tests you were using before on this, to
 make sure it still produces the same estimates, that would be great. I
 didn't change the estimates it should produce, only the code structure.

 One thing that bothers me slightly with this patch is the way it peeks
 into the Most-Common-Elements arrays, which is produced by the built-in
 type analyze function. If we ever change what statistics are collected for
 arrays, or the way they are stored, this might break. In matchsel, why
 don't we just call the built-in estimator function for each element that we
 need to probe, and not look into the statistics ourselves at all? I
 actually experimented with that, and it did slash much of the code, and it
 would be more future-proof. However, it was also a lot slower for queries
 that contain multiple values. That's understandable: the built-in estimator
 will fetch the statistics tuple, parse the arrays, etc. separately for each
 value in the query_int, while this patch will do it only once for the whole
 query, and perform a simple binary search for each value. So overall, I
 think this is OK as it is. But if we find that we need to use the MCE list
 in this fashion in more places in the future, it might be worthwhile to add
 some support code for this in the backend to allow extracting the stats
 once, and doing multiple lightweight estimations using the extracted
 stats.


Yeah, I see. We could end up with something like this. But probably we
would need something more general for extensions which wants to play with
statistics.
For instance, pg_trgm could estimate selectivity for text % text
operator. But in order to provide that it needs trigram statistics. Now it
could be implemented by defining separate datatype, but it's a kluge.
Probably, we would end up with custom additional statistics for datatypes.


 Some things I fixed/changed:

 * I didn't like that transformOperator() function, which looked up the
 function's name. I replaced it with separate wrapper functions for each
 operator, so that the built-in operator's OID can be hardcoded into each.

 * I refactored the matchsel function heavily. I think it's more readable
 now.

 * I got rid of the Int4Freq array. It didn't seem significantly easier to
 work with than the separate values/numbers arrays, so I just used those
 directly.

 * Also use the matchsel estimator for ~~ (the commutator of @@)


In this version of patch it's not checked if variable is actually and int[]
not query_int. See following test case.

# create table test2 as (select '1'::query_int val from
generate_series(1,100));
# analyze test2;

# explain select * from test2 where '{1}'::int[] @@ val;
ERROR:  unrecognized int query item type: 0

I've added this check.

* Also use the estimators for the obsolete @ and ~ operators. Not that I
 care much about those as they are obsolete, but seems strange not to, as
 it's a trivial matter of setting the right estimator function.

 * I added an ANALYZE in the regression test. It still won't systematically
 test all the cost estimation code, and there's nothing to check that the
 estimates make sense, but at least more of the code will now run.


You also forgot to include intarray--1.0--1.1.sql into patch. I've also
added it.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


intarray-sel-4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Selectivity estimation for intarray with @@

2015-07-21 Thread Alexander Korotkov
On Tue, Jul 21, 2015 at 5:14 PM, Teodor Sigaev teo...@sigaev.ru wrote:

 Some notices about code:

 1 Near function transformOperator() there is wrong operator name @


Fixed.


 2 int_query (and intquery) should be replaced to query_int to be
 consistent with actual type name. At least where it's used as separate
 lexeme.


Fixed.


 3 For historical reasons @@ doesn't commutate with itself, it has a
 commutator ~~. Patch assumes that @@ is self-commutator, but ~~ will use
 'contsel' as a restrict estimation. Suppose, we need to declare ~~ as
 deprecated and introduce commutating @@ operator.


I think deprecating ~~ is a subject of separate patch. I fixed patch
behavior in the different way. @@ and ~~ are now handled by the same
function. The function handles var @@ const and const ~~ var, but
doesn't handle const @@ var and var  ~~ const. It determines the case
by type of variable: it should be int[].

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


intarray_sel-3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Selectivity estimation for intarray with @@

2015-07-21 Thread Heikki Linnakangas

On 07/21/2015 07:28 PM, Alexander Korotkov wrote:

On Tue, Jul 21, 2015 at 6:49 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

In this version of patch it's not checked if variable is actually and int[]
not query_int. See following test case.

# create table test2 as (select '1'::query_int val from
generate_series(1,100));
# analyze test2;

# explain select * from test2 where '{1}'::int[] @@ val;
ERROR:  unrecognized int query item type: 0

I've added this check.

* Also use the estimators for the obsolete @ and ~ operators. Not that I

care much about those as they are obsolete, but seems strange not to, as
it's a trivial matter of setting the right estimator function.

* I added an ANALYZE in the regression test. It still won't systematically
test all the cost estimation code, and there's nothing to check that the
estimates make sense, but at least more of the code will now run.


You also forgot to include intarray--1.0--1.1.sql into patch. I've also
added it.


Thanks, committed!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Selectivity estimation for intarray with @@

2015-07-21 Thread Alexander Korotkov
Hi!

While Uriy is on vacation, I've revised this patch a bit.

On Tue, Jul 14, 2015 at 8:55 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 Hi Uriy,

 This patch looks pretty good.

 The first line of intarray--1.1.sql mis-identifies itself as /*
 contrib/intarray/intarray--1.0.sql */


Fixed.


 The real file intarray--1.0.sql file probably should not be included in
 the final patch, but I like having it for testing.


I've removed intarray--1.0.sql since it shouldn't be in final commit.


 It applies and builds cleanly over the alter operator patch (and now the
 commit as well), passes make check of the contrib module both with and
 without cassert.

 I could succesfully upgrade from version 1.0 to 1.1 without having to drop
 the gin__int_ops indexes in the process.

 I could do pg_upgrade from 9.2 and 9.4 to 9.6devel with large indexes in
 place, and then upgrade the extension to 1.1, and it worked without having
 to rebuild the index.

 It does what it says, and I think we want this.


Good.


 There were some cases where the estimates were not very good, but that
 seems to be limitation of what pg_stats makes available, not of this
 patch.  Specifically if the elements listed in the query text are not part
 of most_common_elems (or worse yet, most_common_elems is null) then it is
 left to guess with no real data, and those guesses can be pretty bad.  It
 is not this patches job to fix that, however.



 It assumes all the stats are independent and so doesn't account for
 correlation between members.  This is also how the core selectivity
 estimates work between columns, so I can't really complain about that.  It
 is easy to trick it with things like @@ '(!300  300)'::query_int, but I
 don't think that is necessary to fix that.


Analysis of all the dependencies inside query is NP-complete task. We could
try to workout simple cases, but postgres optimizer currently doesn't care
about it.

# explain select * from test where val = 'val' and val != 'val';
 QUERY PLAN
-
 Seq Scan on test  (cost=0.00..39831.00 rows=45 width=8)
   Filter: ((val  'val'::text) AND (val = 'val'::text))
(2 rows)

I think we could do the same inside intquery until we figure out some
better solution for postgres optimizer in general.


 I have not been creative enough to come up with queries for which this
 improvement in selectivity estimate causes the execution plan to change in
 important ways.  I'm sure the serious users of this module would have such
 cases, however.

 I haven't tested gist__int_ops as I can't get those indexes to build in a
 feasible amount of time.  But the selectivity estimates are independent of
 any actual index, so testing those doesn't seem to be critical.

 There is no documentation change, which makes sense as this internal stuff
 which isn't documented to start with.


Yes. For instance, tsquery make very similar selectivity estimation as
intquery, but it's assumed to be internal and isn't mentioned in
documentation.

There are no regression test changes.  Not sure about that, it does seem
 like regression tests would be desirable.


It would be nice to cover selectivity estimation with regression tests, but
AFAICS we didn't do it for any selectivity estimation functions yet.
Problem is that selectivity estimation is quite complex process and its
result depending on random sampling during analyze, floating points
operations and so on. We could make a test like with very high level of
confidence, estimate number of rows here should be in [10;100]. But it's
hard to fit such assumptions into our current regression tests
infrastructure.

I haven't gone through the C code.


I also did some coding style and comments modifications.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


intarray_sel-2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Selectivity estimation for intarray with @@

2015-07-21 Thread Teodor Sigaev

Some notices about code:

1 Near function transformOperator() there is wrong operator name @
2 int_query (and intquery) should be replaced to query_int to be consistent with 
actual type name. At least where it's used as separate lexeme.
3 For historical reasons @@ doesn't commutate with itself, it has a commutator 
~~. Patch assumes that @@ is self-commutator, but ~~ will use  'contsel' as a 
restrict estimation. Suppose, we need to declare ~~ as deprecated and introduce 
commutating @@ operator.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Selectivity estimation for intarray with @@

2015-07-14 Thread Jeff Janes
On Tue, May 26, 2015 at 4:58 AM, Uriy Zhuravlev u.zhurav...@postgrespro.ru
wrote:

 Hello.

 Attached patch based on:

 http://www.postgresql.org/message-id/capphfdssy+qepdcovxx-b4lp3ybr+qs04m6-arggknfk3fr...@mail.gmail.com

 and adds selectivity estimation functions to @@ (port from tsquery). Now we
 support , @, @ and @@.
 In addition it was written migration to version 1.1 intarray. Because of
 what
 this patch requires my other patch:
 http://www.postgresql.org/message-id/14346041.DNcb5Y1inS@dinodell

 Alexander Korotkov know about this patch.



Hi Uriy,

This patch looks pretty good.

The first line of intarray--1.1.sql mis-identifies itself as /*
contrib/intarray/intarray--1.0.sql */

The real file intarray--1.0.sql file probably should not be included in the
final patch, but I like having it for testing.

It applies and builds cleanly over the alter operator patch (and now the
commit as well), passes make check of the contrib module both with and
without cassert.

I could succesfully upgrade from version 1.0 to 1.1 without having to drop
the gin__int_ops indexes in the process.

I could do pg_upgrade from 9.2 and 9.4 to 9.6devel with large indexes in
place, and then upgrade the extension to 1.1, and it worked without having
to rebuild the index.

It does what it says, and I think we want this.

There were some cases where the estimates were not very good, but that
seems to be limitation of what pg_stats makes available, not of this
patch.  Specifically if the elements listed in the query text are not part
of most_common_elems (or worse yet, most_common_elems is null) then it is
left to guess with no real data, and those guesses can be pretty bad.  It
is not this patches job to fix that, however.

It assumes all the stats are independent and so doesn't account for
correlation between members.  This is also how the core selectivity
estimates work between columns, so I can't really complain about that.  It
is easy to trick it with things like @@ '(!300  300)'::query_int, but I
don't think that is necessary to fix that.

I have not been creative enough to come up with queries for which this
improvement in selectivity estimate causes the execution plan to change in
important ways.  I'm sure the serious users of this module would have such
cases, however.

I haven't tested gist__int_ops as I can't get those indexes to build in a
feasible amount of time.  But the selectivity estimates are independent of
any actual index, so testing those doesn't seem to be critical.

There is no documentation change, which makes sense as this internal stuff
which isn't documented to start with.

There are no regression test changes.  Not sure about that, it does seem
like regression tests would be desirable.

I haven't gone through the C code.

Cheers,

Jeff


[HACKERS] Selectivity estimation for intarray with @@

2015-05-26 Thread Uriy Zhuravlev
Hello.

Attached patch based on:
http://www.postgresql.org/message-id/capphfdssy+qepdcovxx-b4lp3ybr+qs04m6-arggknfk3fr...@mail.gmail.com

and adds selectivity estimation functions to @@ (port from tsquery). Now we 
support , @, @ and @@.
In addition it was written migration to version 1.1 intarray. Because of what 
this patch requires my other patch:
http://www.postgresql.org/message-id/14346041.DNcb5Y1inS@dinodell

Alexander Korotkov know about this patch.

Thanks. 

-- 
Uriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/contrib/intarray/Makefile b/contrib/intarray/Makefile
index 920c5b1..16c829c 100644
--- a/contrib/intarray/Makefile
+++ b/contrib/intarray/Makefile
@@ -2,10 +2,10 @@
 
 MODULE_big = _int
 OBJS = _int_bool.o _int_gist.o _int_op.o _int_tool.o \
-	_intbig_gist.o _int_gin.o $(WIN32RES)
+	_intbig_gist.o _int_gin.o _int_selfuncs.o $(WIN32RES)
 
 EXTENSION = intarray
-DATA = intarray--1.0.sql intarray--unpackaged--1.0.sql
+DATA = intarray--1.0.sql intarray--1.1.sql intarray--1.0--1.1.sql intarray--unpackaged--1.0.sql
 PGFILEDESC = intarray - functions and operators for arrays of integers
 
 REGRESS = _int
diff --git a/contrib/intarray/_int.h b/contrib/intarray/_int.h
index d524f0f..739c3c0 100644
--- a/contrib/intarray/_int.h
+++ b/contrib/intarray/_int.h
@@ -140,6 +140,7 @@ typedef struct QUERYTYPE
 #define COMPUTESIZE(size)	( HDRSIZEQT + (size) * sizeof(ITEM) )
 #define QUERYTYPEMAXITEMS	((MaxAllocSize - HDRSIZEQT) / sizeof(ITEM))
 #define GETQUERY(x)  ( (x)-items )
+#define GETRQUERY(x)  ( (x)-items + ((x)-size - 1) )
 
 /* type codes for ITEM */
 #define END		0
diff --git a/contrib/intarray/_int_selfuncs.c b/contrib/intarray/_int_selfuncs.c
new file mode 100644
index 000..fd80668
--- /dev/null
+++ b/contrib/intarray/_int_selfuncs.c
@@ -0,0 +1,346 @@
+/*-
+ *
+ * _int_selfuncs.c
+ *	  Functions for selectivity estimation of intarray operators
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  contrib/intarray/_int_selfuncs.c
+ *
+ *-
+ */
+#include postgres.h
+#include _int.h
+
+#include access/htup_details.h
+#include catalog/pg_operator.h
+#include catalog/pg_statistic.h
+#include catalog/pg_type.h
+#include utils/selfuncs.h
+#include utils/syscache.h
+#include utils/lsyscache.h
+#include miscadmin.h
+
+PG_FUNCTION_INFO_V1(_int_contsel);
+PG_FUNCTION_INFO_V1(_int_contjoinsel);
+PG_FUNCTION_INFO_V1(_int_matchsel);
+
+Datum _int_contsel(PG_FUNCTION_ARGS);
+Datum _int_contjoinsel(PG_FUNCTION_ARGS);
+Datum _int_matchsel(PG_FUNCTION_ARGS);
+
+/* lookup table type for binary searching through MCELEMs */
+typedef struct
+{
+	int32	   element;
+	float4		frequency;
+} Int4Freq;
+
+
+static Oid transformOperator(Oid oprOid);
+static Selectivity int_querysel(VariableStatData *vardata, Datum constval);
+static Selectivity int_query_opr_selec(ITEM *item, Int4Freq *lookup,
+	   int length, float4 minfreq);
+static Selectivity mcelem_int_query_selec(QUERYTYPE *query,
+	 Datum *mcelem, int nmcelem,
+	 float4 *numbers, int nnumbers);
+
+static int
+compare_val_int4freq(const void *a, const void *b);
+
+#define int_query_opr_selec_no_stats(query) \
+	int_query_opr_selec(GETRQUERY(query), NULL, 0, 0)
+
+
+
+static Oid
+transformOperator(Oid oprOid)
+{
+	HeapTuple			tup;
+	Form_pg_operator	op;
+	Oid	result = InvalidOid;
+
+	tup = SearchSysCache1(OPEROID, ObjectIdGetDatum(oprOid));
+	if (!HeapTupleIsValid(tup))
+		elog(ERROR, Invalid operator: %u, oprOid);
+
+	op = (Form_pg_operator) GETSTRUCT(tup);
+
+	if (!strcmp(op-oprname.data, ))
+		result = OID_ARRAY_OVERLAP_OP;
+	else if (!strcmp(op-oprname.data, @))
+		result = OID_ARRAY_CONTAINS_OP;
+	else if (!strcmp(op-oprname.data, @))
+		result = OID_ARRAY_CONTAINED_OP;
+
+	ReleaseSysCache(tup);
+
+	if (!OidIsValid(result))
+		elog(ERROR, Invalid operator: %u, oprOid);
+
+	return result;
+}
+
+/*
+ * _int_contsel -- restriction selectivity for array @, , @ operators
+ */
+Datum
+_int_contsel(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_DATUM(DirectFunctionCall4(arraycontsel,
+			PG_GETARG_DATUM(0),
+			ObjectIdGetDatum(transformOperator(PG_GETARG_OID(1))),
+			PG_GETARG_DATUM(2),
+			PG_GETARG_DATUM(3)));
+}
+
+/*
+ * _int_contjoinsel -- join selectivity for array @, , @ operators
+ */
+Datum
+_int_contjoinsel(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_DATUM(DirectFunctionCall5(arraycontjoinsel,
+			PG_GETARG_DATUM(0),
+			ObjectIdGetDatum(transformOperator(PG_GETARG_OID(1))),
+			PG_GETARG_DATUM(2),
+			PG_GETARG_DATUM(3),
+			PG_GETARG_DATUM(4)));
+}
+
+
+/*
+ * _int_matchsel -- restriction selectivity function for intarray @@ int_query
+ */
+Datum
+_int_matchsel(PG_FUNCTION_ARGS)
+{
+	PlannerInfo *root = (PlannerInfo *) 

[HACKERS] Selectivity estimation for intarray

2015-04-29 Thread Alexander Korotkov
Hackers,

currently built-in , @, @ array operators have selectivity estimations
while same operator in intarray contrib haven't them. Problem is that
operators in intarray contrib still use contsel and contjoinsel functions
for selectivity estimation even when built-in operators receive their
specific selectivity estimations.

Attached patch adds selectivity estimation functions to  , @, @
operators in intarray contrib. In order to have less code duplication they
are just wrappers over built-in selectivity estimation functions.

However, I faced a problem of migration scripts. Currently, ALTER OPERATOR
can only change owner and schema of operator not operator parameters
themselves. Change pg_operator directly is also not an option. At first, it
would be kludge. At second, in order to correctly find corresponding
operator in pg_operator migration script need to know schema where
extension is installed. But it can't refer @extschema@ because extension is
relocatable.

My proposal is to let ALTER OPERATOR change restrict and join selectivity
functions of the operator. Also it would be useful to be able to change
commutator and negator of operator: extension could add commutators and
negators in further versions. Any thoughts?

--
With best regards,
Alexander Korotkov.


intarray-sel-1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Selectivity estimation for intarray

2015-04-29 Thread Oleg Bartunov
Any chance to have this patch in 9.5 ? Many intarray users will be happy.

On Wed, Apr 29, 2015 at 1:48 PM, Alexander Korotkov aekorot...@gmail.com
wrote:

 Hackers,

 currently built-in , @, @ array operators have selectivity estimations
 while same operator in intarray contrib haven't them. Problem is that
 operators in intarray contrib still use contsel and contjoinsel functions
 for selectivity estimation even when built-in operators receive their
 specific selectivity estimations.

 Attached patch adds selectivity estimation functions to  , @, @
 operators in intarray contrib. In order to have less code duplication they
 are just wrappers over built-in selectivity estimation functions.

 However, I faced a problem of migration scripts. Currently, ALTER OPERATOR
 can only change owner and schema of operator not operator parameters
 themselves. Change pg_operator directly is also not an option. At first, it
 would be kludge. At second, in order to correctly find corresponding
 operator in pg_operator migration script need to know schema where
 extension is installed. But it can't refer @extschema@ because extension
 is relocatable.

 My proposal is to let ALTER OPERATOR change restrict and join selectivity
 functions of the operator. Also it would be useful to be able to change
 commutator and negator of operator: extension could add commutators and
 negators in further versions. Any thoughts?

 --
 With best regards,
 Alexander Korotkov.


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




Re: [HACKERS] Selectivity estimation for intarray

2015-04-29 Thread Tom Lane
Oleg Bartunov obartu...@gmail.com writes:
 Any chance to have this patch in 9.5 ? Many intarray users will be happy.

Considering how desperately behind we are on reviewing/committing patches
that were submitted by the deadline, I don't think it would be appropriate
or fair to add on major new patches that came in months late.  Please add
this to the first 9.6 commitfest, instead.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Selectivity estimation for intarray

2015-04-29 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Alexander Korotkov aekorot...@gmail.com writes:
  My proposal is to let ALTER OPERATOR change restrict and join selectivity
  functions of the operator. Also it would be useful to be able to change
  commutator and negator of operator: extension could add commutators and
  negators in further versions. Any thoughts?
 
 I'm pretty dubious about this, because we lack any mechanism for undoing
 parser/planner decisions based on operator properties.  And there's quite
 a lot of stuff that is based on the assumption that operator properties
 will never change.
 
 An example of the pitfalls here is that we can never allow ALTER OPERATOR
 RENAME, because for example if you rename '' to '~~' that will change
 its precedence, and we have no way to fix the parse trees embedded in
 stored views to reflect that.
 
 For the specific cases you mention, perhaps it would be all right if we
 taught plancache.c to blow away *all* cached plans upon seeing any change
 in pg_operator; but that seems like a brute-force solution.

Agreed that it is- but is that really a problem...?  I've not run into
many (any?) systems where pg_operator is getting changed often...  The
worst part would be adding new operators/extensions, but perhaps we
could exclude that specific case from triggering the cache invalidation?

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Selectivity estimation for intarray

2015-04-29 Thread Tom Lane
Alexander Korotkov aekorot...@gmail.com writes:
 My proposal is to let ALTER OPERATOR change restrict and join selectivity
 functions of the operator. Also it would be useful to be able to change
 commutator and negator of operator: extension could add commutators and
 negators in further versions. Any thoughts?

I'm pretty dubious about this, because we lack any mechanism for undoing
parser/planner decisions based on operator properties.  And there's quite
a lot of stuff that is based on the assumption that operator properties
will never change.

An example of the pitfalls here is that we can never allow ALTER OPERATOR
RENAME, because for example if you rename '' to '~~' that will change
its precedence, and we have no way to fix the parse trees embedded in
stored views to reflect that.

For the specific cases you mention, perhaps it would be all right if we
taught plancache.c to blow away *all* cached plans upon seeing any change
in pg_operator; but that seems like a brute-force solution.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Selectivity estimation for intarray

2015-04-29 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 For the specific cases you mention, perhaps it would be all right if we
 taught plancache.c to blow away *all* cached plans upon seeing any change
 in pg_operator; but that seems like a brute-force solution.

 Agreed that it is- but is that really a problem...?

Perhaps it isn't; we certainly have assumptions that pg_amop, for
instance, changes seldom enough that it's not worth tracking individual
changes.  The same might be true of pg_operator.  I'm not sure though.

The core point I'm trying to make is that making pg_operator entries
mutable is something that's going to require very careful review.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers