Re: [HACKERS] Extending opfamilies for GIN indexes

2011-01-20 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 Actually the other way around.  An opclass is the subset of an opfamily
 that is tightly bound to an index.  The build methods have to be
 associatable with an index, so they're part of the index's opclass.
 The query methods could be loose in the opfamily.

I had understood your proposal to change that for GIN.  Thinking again
now with keeping opfamily and opclass as they are now: an opclass is the
code we run to build and scan the index, an opfamily is a way to use the
same index data and code in more contexts than strictly covered by an
opclass.

 The planner's not the problem here --- what's missing is the rule for
 the index AM to look up the right support functions to call at runtime.

 The trick is to associate the proper query support methods with any
 given query operator (which'd also be loose in the family, probably).
 The existing schema for pg_amop and pg_amproc is built on the assumption
 that the amoplefttype/amoprighttype are sufficient for making this
 association; but that seems to fall down if we would like to allow
 contrib modules to add new query operators that coincidentally take the
 same input types as an existing opfamily member.

Well the opfamily machinery allows to give query support to any index
whose opclass is in the family.  That is, the same set of operators are
covered by more than one opclass.

What we want to add is more than one set of operators can find data
support in more than one index kind.  But you still want to run
specific search code here.  So it seems to me we shouldn't attack the
problem at the operator left and right type level, but rather model that
we need another level of flexibility, separating somewhat the index data
building and maintaining from the code that's used to access it.

The example that we're working from seem to be covered if we are able to
instruct PostgreSQL than a set of opclass'es are binary coercible, I
think that's the term here.

Then the idea would be to have PostgreSQL able to figure out that a
given index can be used with any binary coercible opclass, rather than
only the one used to maintain it.  What do you think?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending opfamilies for GIN indexes

2011-01-19 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 Oh, wait a minute: there's a bad restriction there, namely that a
 contrib module could only add loose operators that had different
 declared input types from the ones known to the core opclass.  Otherwise
 there'd be a conflict with the contrib module and core needing to insert
 similarly-keyed support functions.  This would actually be enough for
 contrib/intarray (because the core operator entries are for anyarray
 not for integer[]) but it is easy to foresee cases where that wouldn't
 be good enough.  Seems like we'd need an additional key column in
 pg_amproc to really make this cover all cases.

I would have though that such contrib would then need to offer their own
opfamily and opclasses, and users would have to use the specific opclass
manually like they do e.g. for text_pattern_ops.  Can't it work that way?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending opfamilies for GIN indexes

2011-01-19 Thread Tom Lane
Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 Tom Lane t...@sss.pgh.pa.us writes:
 Oh, wait a minute: there's a bad restriction there, namely that a
 contrib module could only add loose operators that had different
 declared input types from the ones known to the core opclass.

 I would have though that such contrib would then need to offer their own
 opfamily and opclasses, and users would have to use the specific opclass
 manually like they do e.g. for text_pattern_ops.  Can't it work that way?

I think you missed the point: right now, to use both the core and
intarray operators on an integer[] column, you have to create *two*
GIN indexes, which will have exactly identical contents.  I'm looking
for a way to let intarray extend the core opfamily definition so that
one index can serve.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending opfamilies for GIN indexes

2011-01-19 Thread Robert Haas
On Wed, Jan 19, 2011 at 12:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 Tom Lane t...@sss.pgh.pa.us writes:
 Oh, wait a minute: there's a bad restriction there, namely that a
 contrib module could only add loose operators that had different
 declared input types from the ones known to the core opclass.

 I would have though that such contrib would then need to offer their own
 opfamily and opclasses, and users would have to use the specific opclass
 manually like they do e.g. for text_pattern_ops.  Can't it work that way?

 I think you missed the point: right now, to use both the core and
 intarray operators on an integer[] column, you have to create *two*
 GIN indexes, which will have exactly identical contents.  I'm looking
 for a way to let intarray extend the core opfamily definition so that
 one index can serve.

Maybe this is a dumb question, but why not just put whatever stuff
intarray[] adds directly into the core opfamily?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending opfamilies for GIN indexes

2011-01-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Jan 19, 2011 at 12:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think you missed the point: right now, to use both the core and
 intarray operators on an integer[] column, you have to create *two*
 GIN indexes, which will have exactly identical contents. I'm looking
 for a way to let intarray extend the core opfamily definition so that
 one index can serve.

 Maybe this is a dumb question, but why not just put whatever stuff
 intarray[] adds directly into the core opfamily?

AFAICS that means integrating contrib/intarray into core.  Independently
of whether that's a good idea or not, PG is supposed to be an extensible
system, so it would be nice to have a solution that supported add-on
extensions.

The subtext here is that GIN, unlike the other index AMs, uses a
representation that seems pretty amenable to supporting a wide variety
of query types with a single index.  contrib/intarray's query_int
operators are not at all like the subset-inclusion-testing operators
that the core opclass supports, and it's not very hard to think of
additional cases that could be of interest to somebody (example: find
all arrays that contain some/all entries within a given integer range).
I think we're going to come up against similar situations over and over
until we find a solution.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending opfamilies for GIN indexes

2011-01-19 Thread Robert Haas
On Wed, Jan 19, 2011 at 1:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Jan 19, 2011 at 12:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think you missed the point: right now, to use both the core and
 intarray operators on an integer[] column, you have to create *two*
 GIN indexes, which will have exactly identical contents. I'm looking
 for a way to let intarray extend the core opfamily definition so that
 one index can serve.

 Maybe this is a dumb question, but why not just put whatever stuff
 intarray[] adds directly into the core opfamily?

 AFAICS that means integrating contrib/intarray into core.  Independently
 of whether that's a good idea or not, PG is supposed to be an extensible
 system, so it would be nice to have a solution that supported add-on
 extensions.

Yeah, I'm just wondering if it's worth the effort, especially in view
of a rather large patch queue we seem to have outstanding at the
moment.

 The subtext here is that GIN, unlike the other index AMs, uses a
 representation that seems pretty amenable to supporting a wide variety
 of query types with a single index.  contrib/intarray's query_int
 operators are not at all like the subset-inclusion-testing operators
 that the core opclass supports, and it's not very hard to think of
 additional cases that could be of interest to somebody (example: find
 all arrays that contain some/all entries within a given integer range).
 I think we're going to come up against similar situations over and over
 until we find a solution.

Interesting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending opfamilies for GIN indexes

2011-01-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Jan 19, 2011 at 1:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 AFAICS that means integrating contrib/intarray into core.  Independently
 of whether that's a good idea or not, PG is supposed to be an extensible
 system, so it would be nice to have a solution that supported add-on
 extensions.

 Yeah, I'm just wondering if it's worth the effort, especially in view
 of a rather large patch queue we seem to have outstanding at the
 moment.

Oh, maybe we're not on the same page here: I wasn't really proposing
to do this right now, it's more of a TODO item.

Offhand the only reason to do it now would be if we settled on something
that required a layout change in pg_amop/pg_amproc.  Since we already
have one such change in 9.1, getting the additional change done in the
same release would be valuable to reduce the number of distinct cases
for pg_dump and other clients to support.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending opfamilies for GIN indexes

2011-01-19 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 I think you missed the point: right now, to use both the core and
 intarray operators on an integer[] column, you have to create *two*
 GIN indexes, which will have exactly identical contents.  I'm looking
 for a way to let intarray extend the core opfamily definition so that
 one index can serve.

That I think I understood, but then I mixed opfamily and opclasses
badly.  Let's try again.

For the GIN indexes, we have 2 methods for building the index and 3
others to search it to solve the query.  You're proposing that the 2
former methods would be in the opfamily and the 3 later in the opclass.

We'd like to be able to use the same index (which building depends on
the opfamily) for solving different kind of queries, for which we can
use different traversal and search algorithms, that's the opclass.

So we would want the planner to know that in the GIN case an index built
with any opclass of a given opfamily can help answer a query that would
need any opclass of the opfamily.  Right?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending opfamilies for GIN indexes

2011-01-19 Thread Tom Lane
Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 For the GIN indexes, we have 2 methods for building the index and 3
 others to search it to solve the query.  You're proposing that the 2
 former methods would be in the opfamily and the 3 later in the opclass.

Actually the other way around.  An opclass is the subset of an opfamily
that is tightly bound to an index.  The build methods have to be
associatable with an index, so they're part of the index's opclass.
The query methods could be loose in the opfamily.

 So we would want the planner to know that in the GIN case an index built
 with any opclass of a given opfamily can help answer a query that would
 need any opclass of the opfamily.  Right?

The planner's not the problem here --- what's missing is the rule for
the index AM to look up the right support functions to call at runtime.

The trick is to associate the proper query support methods with any
given query operator (which'd also be loose in the family, probably).
The existing schema for pg_amop and pg_amproc is built on the assumption
that the amoplefttype/amoprighttype are sufficient for making this
association; but that seems to fall down if we would like to allow
contrib modules to add new query operators that coincidentally take the
same input types as an existing opfamily member.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Extending opfamilies for GIN indexes

2011-01-18 Thread Tom Lane
I just got annoyed by the fact that contrib/intarray has support for
queries on GIN indexes on integer[] columns, but they only work if you
use the intarray-provided opclass, not the core-provided GIN opclass for
integer[] columns.  In general, of course, two different GIN opclasses
aren't compatible, but here there is precious little reason why not:
the contents of the index are the same both ways, ie, all the individual
integer keys in the arrays.  It would be a real usability improvement,
and would eliminate a foot-gun, if contrib/intarray could somehow be an
extension to the core opclass instead of an independent thing.

It seems to me that this should be possible within the opfamily/opclass
data structure.  Right now, there isn't any real application for
opfamilies for GIN (or GiST) indexes, because both of those AMs pay
attention only to the default support procs that are bound into the
opclass for an index.  But that could change.

In particular, only two of the five support procs used by GIN are
actually associated with the index, in the sense of having some impact
on what's stored in the index: the compare() and extractValue() procs.
The other three are more associated with queries, though they do depend
on having knowledge about the behavior of the compare and extractValue
procs.

So here's what I'm thinking: we could redefine a GIN opclass, per se, as
needing only compare() and extractValue() procs to be bound into it.
The other three procs, as well as the query operators, could be loose
in the containing opfamily.  The index AM would choose which set of the
other support procedures to use for a specific query by matching their
amproclefttype/amprocrighttype to the declared input types of the query
operator, much as btree does.

Having done that, contrib/intarray could work by adding loose
operators and support procs to the core opfamily for integer[].

It's possible that this scheme would also make it really useful to have
multiple opclasses within one GIN opfamily; though offhand I'm not sure
of an application for that.  (Right now, the only reason to do that is
if you want to give opclasses for different types the same name, as we
do with the core array_ops.)

Perhaps the same could be done with GiST, although I'm less sure about
the possible usefulness there.

Comments?

BTW, this idea means that amproc entries would no longer be tightly
associated with specific GIN opclasses, so the contentious patch for
getObjectDescription should indeed get applied.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending opfamilies for GIN indexes

2011-01-18 Thread Tom Lane
I wrote:
 So here's what I'm thinking: we could redefine a GIN opclass, per se, as
 needing only compare() and extractValue() procs to be bound into it.
 The other three procs, as well as the query operators, could be loose
 in the containing opfamily.  The index AM would choose which set of the
 other support procedures to use for a specific query by matching their
 amproclefttype/amprocrighttype to the declared input types of the query
 operator, much as btree does.

 Having done that, contrib/intarray could work by adding loose
 operators and support procs to the core opfamily for integer[].

Oh, wait a minute: there's a bad restriction there, namely that a
contrib module could only add loose operators that had different
declared input types from the ones known to the core opclass.  Otherwise
there'd be a conflict with the contrib module and core needing to insert
similarly-keyed support functions.  This would actually be enough for
contrib/intarray (because the core operator entries are for anyarray
not for integer[]) but it is easy to foresee cases where that wouldn't
be good enough.  Seems like we'd need an additional key column in
pg_amproc to really make this cover all cases.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers