Re: [HACKERS] Index usage for elem-contained-by-const-range clauses
On 3/24/17 10:50 AM, David Steele wrote: Hi Pritam, On 3/17/17 5:41 PM, Pritam Baral wrote: So sorry. I'm attaching the correct version of the original with this, in case you want to test the limited implementation, because I still have to go through Tom's list of suggestions. BTW, the patch is for applying on top of REL9_6_2, and while I suspect it may work on master too, I haven't tested it since the original submission (Feb 23). Also, I noticed that patch haven't regression tests. Some mention of this optimization in docs is also nice to have. > > -- > Alexander Korotkov > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company This thread has been idle for a week. Please respond and/or post a new patch by 2017-03-28 00:00 AoE (UTC-12) or this submission will be marked "Returned with Feedback". This submission has been marked "Returned with Feedback". Please feel free to resubmit to a future commitfest. Regards, -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index usage for elem-contained-by-const-range clauses
On Sat, Mar 18, 2017 at 12:41 AM, Pritam Baralwrote: > On Friday 10 March 2017 07:59 PM, Alexander Korotkov wrote: > >> Hi, Pritam! > > I've assigned to review this patch. > > On Thu, Feb 23, >> 2017 at 2:17 AM, Pritam Baral wrote: > > >> The topic has been previously discussed[0] on the -performance mailing >> list, > about four years ago. > > In that thread, Tom suggested[0] >> the planner could be made to "expand > "intcol <@ > >> 'x,y'::int4range" into "intcol between x and y", using something similar >> > to the > index LIKE optimization (ie, the "special operator" >> stuff in indxpath.c)". > > > That's cool idea. But I would say more. >> Sometimes it's useful to transform "intcol between x and y" into "intcol <@ >> 'x,y'::int4range". btree_gin supports "intcol between x and y" as overlap >> of "intcol >= x" and "intcol <= y". That is very inefficient. But it this >> clause would be transformed into "intcol <@ 'x,y'::int4range", btree_gin >> could handle this very efficient. > > > > This patch tries to do >> exactly that. It's not tied to any specific datatype, > and has >> > been tested with both builtin types and custom range types. Most > of > the > checking for proper datatypes, operators, and btree index happens > before > this > code, so I haven't run into any issues yet in my > testing. But I'm not > familiar > enough with the internals to be > able to confidently say it can handle > all cases > just yet. > > > > I've tried this patch. It applies cleanly, but doesn't compile. > > > indxpath.c:4252:1: error: conflicting types for > 'range_elem_contained_quals' > range_elem_contained_quals(Node *leftop, > Datum rightop) > ^ > indxpath.c:192:14: note: previous declaration is here > > static List *range_elem_contained_quals(Node *leftop, Oid expr_op, Oid > opfamily, > ^ > Could you please recheck that you published > right version of patch? > > So sorry. I'm attaching the correct version of the original with this, > in case you want to test the limited implementation, because I still > have to go through Tom's list of suggestions. > > BTW, the patch is for applying on top of REL9_6_2, and while I > suspect it may work on master too, I haven't tested it since the > original submission (Feb 23). What is idea behind basing patch on the REL9_6_2? This patch implements new functionality and it's definitely not going to be considered to be committed to stable release branches. If you are interesting in committing this patch to master, please rebase it on master branch. If not, please clarify the purpose of this submission. Also, please include some numbering to the patch name, so that we could distinguish one version of patch from another. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Index usage for elem-contained-by-const-range clauses
Hi Pritam, On 3/17/17 5:41 PM, Pritam Baral wrote: So sorry. I'm attaching the correct version of the original with this, in case you want to test the limited implementation, because I still have to go through Tom's list of suggestions. BTW, the patch is for applying on top of REL9_6_2, and while I suspect it may work on master too, I haven't tested it since the original submission (Feb 23). Also, I noticed that patch haven't regression tests. Some mention of this optimization in docs is also nice to have. > > -- > Alexander Korotkov > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company This thread has been idle for a week. Please respond and/or post a new patch by 2017-03-28 00:00 AoE (UTC-12) or this submission will be marked "Returned with Feedback". Thanks, -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index usage for elem-contained-by-const-range clauses
On Friday 10 March 2017 07:59 PM, Alexander Korotkov wrote: Hi, Pritam! > > I've assigned to review this patch. > > On Thu, Feb 23, 2017 at 2:17 AM, Pritam Baralwrote: > > The topic has been previously discussed[0] on the -performance mailing list, > about four years ago. > > In that thread, Tom suggested[0] the planner could be made to "expand > "intcol <@ > 'x,y'::int4range" into "intcol between x and y", using something similar > to the > index LIKE optimization (ie, the "special operator" stuff in indxpath.c)". > > > That's cool idea. But I would say more. Sometimes it's useful to transform "intcol between x and y" into "intcol <@ 'x,y'::int4range". btree_gin supports "intcol between x and y" as overlap of "intcol >= x" and "intcol <= y". That is very inefficient. But it this clause would be transformed into "intcol <@ 'x,y'::int4range", btree_gin could handle this very efficient. > > > > This patch tries to do exactly that. It's not tied to any specific datatype, > and has been tested with both builtin types and custom range types. Most > of the > checking for proper datatypes, operators, and btree index happens before > this > code, so I haven't run into any issues yet in my testing. But I'm not > familiar > enough with the internals to be able to confidently say it can handle > all cases > just yet. > > > I've tried this patch. It applies cleanly, but doesn't compile. > > indxpath.c:4252:1: error: conflicting types for 'range_elem_contained_quals' > range_elem_contained_quals(Node *leftop, Datum rightop) > ^ > indxpath.c:192:14: note: previous declaration is here > static List *range_elem_contained_quals(Node *leftop, Oid expr_op, Oid opfamily, > ^ > Could you please recheck that you published right version of patch? So sorry. I'm attaching the correct version of the original with this, in case you want to test the limited implementation, because I still have to go through Tom's list of suggestions. BTW, the patch is for applying on top of REL9_6_2, and while I suspect it may work on master too, I haven't tested it since the original submission (Feb 23). Also, I noticed that patch haven't regression tests. Some mention of this optimization in docs is also nice to have. > > -- > Alexander Korotkov > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company -- #!/usr/bin/env regards Chhatoi Pritam Baral diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 2952bfb7c2..40a3c2c9f4 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -30,21 +30,23 @@ #include "optimizer/pathnode.h" #include "optimizer/paths.h" #include "optimizer/predtest.h" #include "optimizer/prep.h" #include "optimizer/restrictinfo.h" #include "optimizer/var.h" #include "utils/builtins.h" #include "utils/bytea.h" #include "utils/lsyscache.h" #include "utils/pg_locale.h" +#include "utils/rangetypes.h" #include "utils/selfuncs.h" +#include "utils/typcache.h" #define IsBooleanOpfamily(opfamily) \ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID) #define IndexCollMatchesExprColl(idxcollation, exprcollation) \ ((idxcollation) == InvalidOid || (idxcollation) == (exprcollation)) /* Whether we are looking for plain indexscan, bitmap scan, or either */ typedef enum @@ -180,20 +182,21 @@ static Expr *expand_boolean_index_clause(Node *clause, int indexcol, IndexOptInfo *index); static List *expand_indexqual_opclause(RestrictInfo *rinfo, Oid opfamily, Oid idxcollation); static RestrictInfo *expand_indexqual_rowcompare(RestrictInfo *rinfo, IndexOptInfo *index, int indexcol); static List *prefix_quals(Node *leftop, Oid opfamily, Oid collation, Const *prefix, Pattern_Prefix_Status pstatus); static List *network_prefix_quals(Node *leftop, Oid expr_op, Oid opfamily, Datum rightop); +static List *range_elem_contained_quals(Node *leftop, Datum rightop); static Datum string_to_datum(const char *str, Oid datatype); static Const *string_to_const(const char *str, Oid datatype); /* * create_index_paths() * Generate all interesting index paths for the given relation. * Candidate paths are added to the rel's pathlist (using add_path). * * To be considered for an index scan, an index must match one or more @@ -3286,20 +3289,23 @@ match_special_index_operator(Expr *clause, Oid opfamily, Oid idxcollation, /* the right-hand const is type text for all of these */ pstatus = pattern_fixed_prefix(patt, Pattern_Type_Regex_IC, expr_coll, , NULL); isIndexable = (pstatus != Pattern_Prefix_None); break; case OID_INET_SUB_OP: case OID_INET_SUBEQ_OP: isIndexable = true; break; + case OID_RANGE_ELEM_CONTAINED_OP: + isIndexable = true; + break; } if (prefix)
Re: [HACKERS] Index usage for elem-contained-by-const-range clauses
On Sunday 12 March 2017 01:58 AM, Jim Nasby wrote: On 3/10/17 8:29 AM, Alexander Korotkov wrote: >> That's cool idea. But I would say more. Sometimes it's useful to >> transform "intcol between x and y" into "intcol <@ 'x,y'::int4range". >> btree_gin supports "intcol between x and y" as overlap of "intcol >= x" >> and "intcol <= y". That is very inefficient. But it this clause would >> be transformed into "intcol <@ 'x,y'::int4range", btree_gin could handle >> this very efficient. > > That's certainly be nice as well, but IMHO it's outside the scope of this patch to accomplish that. Also, I think btree indexes are more common than btree_gin. The motivation for this originally came from trying to use the primary key of a large table in a range search, and the primary key index was the default btree. Also, this is my first deep dive into Postgres's source code, so I took a few easy ways out, just to get started. If it's not too complex to get btree_gin to handle between queries as contained-in-range, I can give it a try. > BTW, while we're wishing for things... Something else that would be nice is if there was a way to do these kind of transforms without hacking the backend... Indeed. And this was one of the things Tom said back when a similar discussion had happened (on the -performance mailing list). But seeing as how it's been almost four years since then, I decided to go ahead with the backend hacking anyway. >> Also, I noticed that patch haven't regression tests. > > BTW, those tests need to pay special attention to inclusive vs exclusive bounds. I will add regression tests, though I do have to get through all of Tom's suggestions elsewhere in this thread first. -- #!/usr/bin/env regards Chhatoi Pritam Baral -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index usage for elem-contained-by-const-range clauses
I wrote: > * You're not bothering to insert any inputcollid into the generated > comparison operator nodes. I'm not sure why that fails to fall over > for text comparisons (if indeed it does fail ...) but it's wrong. > Use the range type's collation there. Oh ... looking at this again, I realize that there's an additional validity check missing: if the range type's collation doesn't match the index column's collation, we can't do this optimization at all. That check probably belongs in match_special_index_operator. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index usage for elem-contained-by-const-range clauses
Pritam Baralwrites: > The topic has been previously discussed[0] on the -performance mailing list, > about four years ago. > In that thread, Tom suggested[0] the planner could be made to "expand > "intcol <@ > 'x,y'::int4range" into "intcol between x and y", using something similar > to the > index LIKE optimization (ie, the "special operator" stuff in indxpath.c)". > This patch tries to do exactly that. I took a quick look through this, and have some thoughts --- * In match_special_index_operator, there are two switch statements and you've added a clause to only one of them. In the second one, you need to add a check that you're working with a btree index. I'd expect the patch as-is to fall over if an "indexkey <@ range" clause were matched to a hash index. * You're failing to account for the case of "range @> indexkey", which really this ought to work for. That would take a bit of fooling around with the structure of match_special_index_operator to allow indexkey on the right, but we've foreseen since the beginning that that would someday be necessary. Looks like today is that day. * The first bit in range_elem_contained_quals will fall over for an indexkey that is an expression rather than a simple Var. Probably you should just be using exprType() instead. (Not sure about whether that's sufficient in domain cases, though.) Or actually, why look at that at all? Seems like what you want is to look at the RHS input, ie the range value, and get the relevant element datatype from it. That would correspond to what will happen if the <@ operator executes normally: elem_contained_by_range does not consult the type of its LHS. * The "return NIL" for an empty range looks pretty dubious. Even if it fails to fail altogether, it's not doing what we really want, which is to signal that the condition cannot succeed so we needn't search the index. Maybe what we should do in that case is generate an "indexkey = NULL" qual. * Likewise, if the range is infinite, you're just returning NIL and that's leaving something on the table. Probably worth generating "indexkey IS NOT NULL" in that case; it's not going to help much in typical usage, but it would prevent scanning nulls if there are a lot of them in the index. * elog(ERROR) doesn't return, so stuff like this is not idiomatic: +if (opr2oid == InvalidOid) +{ +elog(ERROR, "no <= operator for opfamily %u", opfamily); +return NIL; +} It'd be sufficient to do +if (opr2oid == InvalidOid) +elog(ERROR, "no <= operator for opfamily %u", opfamily); * You're not bothering to insert any inputcollid into the generated comparison operator nodes. I'm not sure why that fails to fall over for text comparisons (if indeed it does fail ...) but it's wrong. Use the range type's collation there. * It's sort of annoying that the whole thing only works for a Const range value. A different approach you might think about is to make this work more like ScalarArrayOp, ie we pass the qual through to btree as-is and teach relevant functions in access/nbtree/ how to extract index bound conditions from the range datum at runtime. That would likely end up being a significantly larger patch, though, so you might reasonably conclude it's not worth the effort. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index usage for elem-contained-by-const-range clauses
On 3/10/17 8:29 AM, Alexander Korotkov wrote: That's cool idea. But I would say more. Sometimes it's useful to transform "intcol between x and y" into "intcol <@ 'x,y'::int4range". btree_gin supports "intcol between x and y" as overlap of "intcol >= x" and "intcol <= y". That is very inefficient. But it this clause would be transformed into "intcol <@ 'x,y'::int4range", btree_gin could handle this very efficient. That's certainly be nice as well, but IMHO it's outside the scope of this patch to accomplish that. BTW, while we're wishing for things... Something else that would be nice is if there was a way to do these kind of transforms without hacking the backend... Also, I noticed that patch haven't regression tests. BTW, those tests need to pay special attention to inclusive vs exclusive bounds. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index usage for elem-contained-by-const-range clauses
Hi, Pritam! I've assigned to review this patch. On Thu, Feb 23, 2017 at 2:17 AM, Pritam Baralwrote: > The topic has been previously discussed[0] on the -performance mailing > list, > about four years ago. > > In that thread, Tom suggested[0] the planner could be made to "expand > "intcol <@ > 'x,y'::int4range" into "intcol between x and y", using something similar > to the > index LIKE optimization (ie, the "special operator" stuff in indxpath.c)". > That's cool idea. But I would say more. Sometimes it's useful to transform "intcol between x and y" into "intcol <@ 'x,y'::int4range". btree_gin supports "intcol between x and y" as overlap of "intcol >= x" and "intcol <= y". That is very inefficient. But it this clause would be transformed into "intcol <@ 'x,y'::int4range", btree_gin could handle this very efficient. > > This patch tries to do exactly that. It's not tied to any specific > datatype, > and has been tested with both builtin types and custom range types. Most > of the > checking for proper datatypes, operators, and btree index happens before > this > code, so I haven't run into any issues yet in my testing. But I'm not > familiar > enough with the internals to be able to confidently say it can handle > all cases > just yet. > I've tried this patch. It applies cleanly, but doesn't compile. indxpath.c:4252:1: error: conflicting types for 'range_elem_contained_quals' range_elem_contained_quals(Node *leftop, Datum rightop) ^ indxpath.c:192:14: note: previous declaration is here static List *range_elem_contained_quals(Node *leftop, Oid expr_op, Oid opfamily, ^ Could you please recheck that you published right version of patch? Also, I noticed that patch haven't regression tests. Some mention of this optimization in docs is also nice to have. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Index usage for elem-contained-by-const-range clauses
On Thu, Feb 23, 2017 at 4:47 AM, Pritam Baralwrote: > The topic has been previously discussed[0] on the -performance mailing list, > about four years ago. > > In that thread, Tom suggested[0] the planner could be made to "expand > "intcol <@ > 'x,y'::int4range" into "intcol between x and y", using something similar > to the > index LIKE optimization (ie, the "special operator" stuff in indxpath.c)". > > This patch tries to do exactly that. Please add your patch to https://commitfest.postgresql.org/ so it doesn't get overlooked. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Index usage for elem-contained-by-const-range clauses
The topic has been previously discussed[0] on the -performance mailing list, about four years ago. In that thread, Tom suggested[0] the planner could be made to "expand "intcol <@ 'x,y'::int4range" into "intcol between x and y", using something similar to the index LIKE optimization (ie, the "special operator" stuff in indxpath.c)". This patch tries to do exactly that. It's not tied to any specific datatype, and has been tested with both builtin types and custom range types. Most of the checking for proper datatypes, operators, and btree index happens before this code, so I haven't run into any issues yet in my testing. But I'm not familiar enough with the internals to be able to confidently say it can handle all cases just yet. [0]: https://www.postgresql.org/message-id/flat/9860.1364013108%40sss.pgh.pa.us#9860.1364013...@sss.pgh.pa.us -- #!/usr/bin/env regards Chhatoi Pritam Baral diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 2952bfb7c2..84dfd8362a 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -30,21 +30,23 @@ #include "optimizer/pathnode.h" #include "optimizer/paths.h" #include "optimizer/predtest.h" #include "optimizer/prep.h" #include "optimizer/restrictinfo.h" #include "optimizer/var.h" #include "utils/builtins.h" #include "utils/bytea.h" #include "utils/lsyscache.h" #include "utils/pg_locale.h" +#include "utils/rangetypes.h" #include "utils/selfuncs.h" +#include "utils/typcache.h" #define IsBooleanOpfamily(opfamily) \ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID) #define IndexCollMatchesExprColl(idxcollation, exprcollation) \ ((idxcollation) == InvalidOid || (idxcollation) == (exprcollation)) /* Whether we are looking for plain indexscan, bitmap scan, or either */ typedef enum @@ -180,20 +182,22 @@ static Expr *expand_boolean_index_clause(Node *clause, int indexcol, IndexOptInfo *index); static List *expand_indexqual_opclause(RestrictInfo *rinfo, Oid opfamily, Oid idxcollation); static RestrictInfo *expand_indexqual_rowcompare(RestrictInfo *rinfo, IndexOptInfo *index, int indexcol); static List *prefix_quals(Node *leftop, Oid opfamily, Oid collation, Const *prefix, Pattern_Prefix_Status pstatus); static List *network_prefix_quals(Node *leftop, Oid expr_op, Oid opfamily, Datum rightop); +static List *range_elem_contained_quals(Node *leftop, Oid expr_op, Oid opfamily, + Datum rightop); static Datum string_to_datum(const char *str, Oid datatype); static Const *string_to_const(const char *str, Oid datatype); /* * create_index_paths() * Generate all interesting index paths for the given relation. * Candidate paths are added to the rel's pathlist (using add_path). * * To be considered for an index scan, an index must match one or more @@ -3286,20 +3290,23 @@ match_special_index_operator(Expr *clause, Oid opfamily, Oid idxcollation, /* the right-hand const is type text for all of these */ pstatus = pattern_fixed_prefix(patt, Pattern_Type_Regex_IC, expr_coll, , NULL); isIndexable = (pstatus != Pattern_Prefix_None); break; case OID_INET_SUB_OP: case OID_INET_SUBEQ_OP: isIndexable = true; break; + case OID_RANGE_ELEM_CONTAINED_OP: + isIndexable = true; + break; } if (prefix) { pfree(DatumGetPointer(prefix->constvalue)); pfree(prefix); } /* done if the expression doesn't look indexable */ if (!isIndexable) @@ -3614,20 +3621,27 @@ expand_indexqual_opclause(RestrictInfo *rinfo, Oid opfamily, Oid idxcollation) break; case OID_INET_SUB_OP: case OID_INET_SUBEQ_OP: if (!op_in_opfamily(expr_op, opfamily)) { return network_prefix_quals(leftop, expr_op, opfamily, patt->constvalue); } break; + case OID_RANGE_ELEM_CONTAINED_OP: + if (!op_in_opfamily(expr_op, opfamily)) + { +return range_elem_contained_quals(leftop, expr_op, opfamily, + patt->constvalue); + } + break; } /* Default case: just make a list of the unmodified indexqual */ return list_make1(rinfo); } /* * expand_indexqual_rowcompare --- expand a single indexqual condition * that is a RowCompareExpr * @@ -4096,20 +4110,124 @@ network_prefix_quals(Node *leftop, Oid expr_op, Oid opfamily, Datum rightop) InvalidOid, /* not collatable */ -1, opr2right, false, false), InvalidOid, InvalidOid); result = lappend(result, make_simple_restrictinfo(expr)); return result; } /* + * Given an element leftop and a range rightop, and an elem contained-by range + * operator, generate suitable indexqual condition(s). + */ +static List * +range_elem_contained_quals(Node *leftop, Datum rightop) +{ + Oiddatatype; + Oidopfamily; + Oidopr1oid; + Oidopr2oid; + List *result = NIL; + Expr *expr; + RangeType *range; +