Re: [HACKERS] Implied Functional index use (redux)

2007-01-28 Thread Gregory Stark

Simon Riggs [EMAIL PROTECTED] writes:

 In a thread in July last year, I raised the possibility of transforming
 a query to allow functional indexes to be utilised automatically.
 http://archives.postgresql.org/pgsql-hackers/2006-07/msg00323.php

 This idea can work and has many benefits, but there are some
 complexities. I want to summarise those issues first, then make a more
 practical and hopefully more acceptable proposal.

 Taken together the complexities would have lead us to have additional
 TRANSFORMABLE clauses on TYPEs, FUNCTIONs and potentially encoding
 schemes. All of which, I agree, just too much complexity to allow this
 to be specified.

I've thought further about this and I believe the problem is simpler than we
were thinking previously. All we need is one boolean flag on the equality
operator for the data type (or perhaps it would be more convenient to have it
on the operator class) that indicates that two objects will never compare
equal unless they're binary equal.

As long as we know that binary unequal implies operator class unequal then we
can know that any immutable function will always return the same value for any
two equal data. And therefore we can be guaranteed that any elements we want
to look up with col = $1 will be included in the elements returned by
looking up foo(col) = foo($1). We don't need to know anything about foo()
beyond its immutability.

If we want to be able to do inequality lookups as well then we'll have to have
a flag on the function indicating it's order preserving. That is for any
values a, b the property ab == f(a)=f(b) holds. The problem is that you
can't guarantee that for all operator classes since someone can always come
along and define a new operator class with a new ordering that breaks your
guarantee. I suppose we could just have the flag indicate that the property
holds for the default operator class for the argument data types.

But even if we only handled equality I think it would still be a very useful
feature. It would allow doing things like creating an index on
soundex(lastname) or crc32(url) and have them be automatically useful without
altering queries. And it would mean you wouldn't have to create redundant
indexes on lastname and lower(lastname). We could always look to generalize it
to inequalities later.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Implied Functional index use (redux)

2007-01-28 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 I've thought further about this and I believe the problem is simpler than we
 were thinking previously. All we need is one boolean flag on the equality
 operator for the data type (or perhaps it would be more convenient to have it
 on the operator class) that indicates that two objects will never compare
 equal unless they're binary equal.

Well, we could simplify it that far, but that lets out float, numeric,
most or all of the geometry types, and I'm not too sure I care to
promise it for text either (think Unicode combining characters...).
So really that's too simple IMHO.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Implied Functional index use (redux)

2007-01-26 Thread Simon Riggs
On Thu, 2007-01-25 at 16:20 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  A simpler, alternate proposal is to allow the user to specify whether a
  functional index is transformable or not using CREATE or ALTER INDEX,
  with a default of not transformable. That then leaves the responsibility
  for specifying this with the user, who as we have seen is the really
  only person really capable of judging the whole case on its merits.
 
  e.g. CREATE INDEX fooidx ON foo (foofunc(foocol1)) 
  [TABLESPACE ...] [ENABLE|DISABLE TRANSFORM] [WHERE ...];
 
 This is a foot-gun and nothing else.  I hardly think the average DBA
 will realize such subtleties as numeric equality doesn't guarantee that
 such-and-such works.  If it's not specified by the datatype author
 it's not going to be safe.

OK, no problem.

The most beneficial use case is for string handling: name lookups, case
insensitive indexing and index size reduction generally. If, for some
reason, bpchar were to be excluded then it would take away a great chunk
of benefit.

Two questions:

- Will bpchar be transformable?

- Do you see a clear way forward for specifying the information required
to allow the transform? We need to specify the operator, which might be
taken to include the datatype. (Peter suggested placing this on the
function itself, though I think current precedent is to place on the
operator.) If you can say where you want the info to live, I can work
out the details and repost.

If there's clear benefit and a clear way forward, then we might just be
OK for 8.3. If not, I'll put this back on the shelf again in favour of
other ideas.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Implied Functional index use (redux)

2007-01-26 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 If there's clear benefit and a clear way forward, then we might just be
 OK for 8.3. If not, I'll put this back on the shelf again in favour of
 other ideas.

I think this is still a long way off, and there are probably more useful
things to work on for 8.3.

Part of my antagonism stems from the fact that by means of the operator
family rewrite I've been getting rid of some longstanding but really
quite unacceptable assumptions about this operator does that.  I don't
want to see us start putting unsupported semantic assumptions back into
the optimizer; rather its assumptions about operator behavior need to be
clearly specified.  As an example, without some careful preliminary
thinking I'd have probably folded all the numeric types into one big
opfamily and thereby broken transitivity :-(, leading to bugs that would
be devilish to figure out.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Implied Functional index use (redux)

2007-01-26 Thread Simon Riggs
On Fri, 2007-01-26 at 10:58 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  If there's clear benefit and a clear way forward, then we might just be
  OK for 8.3. If not, I'll put this back on the shelf again in favour of
  other ideas.
 
 I think this is still a long way off, and there are probably more useful
 things to work on for 8.3.
 
 Part of my antagonism stems from the fact that by means of the operator
 family rewrite I've been getting rid of some longstanding but really
 quite unacceptable assumptions about this operator does that.  I don't
 want to see us start putting unsupported semantic assumptions back into
 the optimizer; rather its assumptions about operator behavior need to be
 clearly specified.  As an example, without some careful preliminary
 thinking I'd have probably folded all the numeric types into one big
 opfamily and thereby broken transitivity :-(, leading to bugs that would
 be devilish to figure out.

OK, no problems. All of the above says time, which is becoming rare as
we approach 8.3 anyways. 

I'll pick it up again in 8.4. 

Some notes-to-self for the future:

- ideally want to be able to decide transformability at CREATE INDEX
time; this will reduce planning time for functional index usage when
there is no possible transforms.

- may want to do this by having a special catalog table that holds the
cases that *will* work, to make it both safer and faster to look up.
Sort of like pg_autovacuum - absence means No.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] Implied Functional index use (redux)

2007-01-25 Thread Simon Riggs
In a thread in July last year, I raised the possibility of transforming
a query to allow functional indexes to be utilised automatically.
http://archives.postgresql.org/pgsql-hackers/2006-07/msg00323.php

This idea can work and has many benefits, but there are some
complexities. I want to summarise those issues first, then make a more
practical and hopefully more acceptable proposal.

Taken together the complexities would have lead us to have additional
TRANSFORMABLE clauses on TYPEs, FUNCTIONs and potentially encoding
schemes. All of which, I agree, just too much complexity to allow this
to be specified.

One example of this was FLOAT, where -0 and +0 are equal but not the
same in a binary form. That would normally mean we couldn't use FLOAT
for TRANSFORMABLE indexes, but of course what happens if we specify a
partial functional index, where we only index values  0. In that case,
we *can* use the transform technique again. Worse still we may have a
full (non-partial) index where there is a constraint on the column(s)
such as CHECK (value  0). So we'd need another heavy dose of
catalog-complexity to catch all the special cases. 
Yuck and double Yuck.

Even if we did that, it isn't easy for a data type author to tell
whether their type is transformable, or not **in all cases**. That would
probably lead to people saying DISABLE TRANSFORM for their data type,
just in case. Which means no benefit in practice with this feature.

- - - 

A simpler, alternate proposal is to allow the user to specify whether a
functional index is transformable or not using CREATE or ALTER INDEX,
with a default of not transformable. That then leaves the responsibility
for specifying this with the user, who as we have seen is the really
only person really capable of judging the whole case on its merits.

e.g. CREATE INDEX fooidx ON foo (foofunc(foocol1)) 
[TABLESPACE ...] [ENABLE|DISABLE TRANSFORM] [WHERE ...];

ENABLE TRANSFORM is only possible for functional indexes.

Suggestions for better syntax/naming welcome. 

Placing the TRANSFORM clause on the index as a simple boolean makes
utilising the feature more streamlined at planning time too. This would
be an extra initial check in create_index_paths() to see if the query
might benefit from transform. Most indexable WHERE clauses would be able
to be transformed, if the index allows.

The feature would be enabled by default with a GUC, but as stated above,
the default for each index would be to *not* transform unless
specifically requested by the user.

enable_index_transform = on (default)| off

EXPLAIN would not need alteration, since the modified query would show
up clearly in the output. (I can add explicit visibility if people want
that).

Overall, a fairly isolated patch, with little user interface changes.

All of the complexities would be very clearly documented as part of this
feature. That is essential to avoid user error, of which I am mindful.
But the technique has much promise, so I would like to make this option
available to designers and DBAs.

If we can agree this smoothly, then it seems possible for 8.3. 

Comments?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Implied Functional index use (redux)

2007-01-25 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 A simpler, alternate proposal is to allow the user to specify whether a
 functional index is transformable or not using CREATE or ALTER INDEX,
 with a default of not transformable. That then leaves the responsibility
 for specifying this with the user, who as we have seen is the really
 only person really capable of judging the whole case on its merits.

 e.g. CREATE INDEX fooidx ON foo (foofunc(foocol1)) 
   [TABLESPACE ...] [ENABLE|DISABLE TRANSFORM] [WHERE ...];

This is a foot-gun and nothing else.  I hardly think the average DBA
will realize such subtleties as numeric equality doesn't guarantee that
such-and-such works.  If it's not specified by the datatype author
it's not going to be safe.

In fact, this doesn't work anyway, since it still doesn't address the
question of which equality operators we think permit us to apply
the transform.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Implied Functional Index use

2006-07-13 Thread Zeugswetter Andreas DCP SD

   - add a new boolean to pg_operator to allow us to define which 
   operators offer true equality ...
  
  This would be useful for other purposes too, as we keep coming up 
  against what's the equality operator for this datatype problems.
  However, the restriction to true equality, such that we can assume
x 
  = y implies f(x) = f(y) for every immutable function f on the
datatype

Maybe we could have a tri (or more) state flag for the equality
operators.

' ' .. not an equality op
'e' .. equality
's' .. strict equality (op only true iff the binary representation is
equal)

Andreas

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Implied Functional Index use

2006-07-13 Thread Simon Riggs
On Wed, 2006-07-12 at 22:34 -0400, Tom Lane wrote:

 More generally, I don't believe in hacks that only work for a small
 number of built-in types: to me, that's prima facie evidence that you
 haven't thought the problem through.

I agree an attempt at a simple definition of the required functional
properties hasn't yielded a great solution, so we must go deeper.


On Wed, 2006-07-12 at 15:09 +0200, Peter Eisentraut wrote:

 From a mathematical point of view, it seems cleaner to attach this property 
 to 
 functions, not operators, namely, this function preserves the following 
 relations.  This would allow extending Simon's idea to other operators such 
 as  and  and possibly more mind-boggling cases with geometric operators and 
 such.

Well, in PG, operators are functions that define various special
properties of their inputs/outputs, so it seems the correct place to put
these definitions. I agree it seems more normalised to place this
information at the function itself, but that is not current precedent.

On Thu, 2006-07-13 at 09:14 +0200, Zeugswetter Andreas DCP SD wrote:

 Maybe we could have a tri (or more) state flag for the equality
 operators.
 
 ' ' .. not an equality op
 'e' .. equality
 's' .. strict equality (op only true iff the binary representation is
 equal)

Tom has pointed out that the required functional properties are actually
a fourth state between equality and full binary equality.

I was trying to avoid introducing new single-use properties, but I think
that is the only way here. My concern was not the complexity of
specifying this for function authors, but the problem of making an
incorrect selection leading to incorrect query results. 

It seems we need this in the catalog:

 ' ' .. not an equality op
 'e' .. equality (sufficient for FKs)
 'f' .. functional equality (sufficient for this proposed optimisation)
 's' .. strict equality (op only true iff the binary representation is
 equal)

We're breaking new ground here, but that's a good thing. I'd much rather
have an optimisable and extensible type system than a hard-wired one.

There is a problem of implication here, AFAICS:
When a user SQL asks 
WHERE col1 = 7
which equality level is meant when several exist?

We currently already assume that they mean e-equality, since it is the
most useful and intuitive meaning. That is not as strict as f-equality,
so we would not be able to make implications from that.

I'll think on this some more, but not for 8.2.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Implied Functional Index use

2006-07-13 Thread Zeugswetter Andreas DCP SD

 There is a problem of implication here, AFAICS:
 When a user SQL asks 
   WHERE col1 = 7
 which equality level is meant when several exist?

Well, the operator must be unique, so there is no problem.
Unique in the sense that an operator with the same name ('=' in this
case)
and argument types cannot exist for more than one level of equality. 
(and the level should not have an effect on the resolution)

So, when we see col1 = 7 we lookup the equality level of the operator
and decide whether it is strict enough for the particular optimization.

Andreas

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Implied Functional Index use

2006-07-12 Thread Peter Eisentraut
Am Dienstag, 11. Juli 2006 23:31 schrieb Tom Lane:
 We could invent some more-complex concept involving well, this is
 equality, but there are some functions for which f(x) might differ
 from f(y) anyway and then mark the presumably-few functions that
 could produce divergent results --- examples are sgn() for float8
 and anything dependent on dscale for numeric.  This seems ugly and
 error prone however.

From a mathematical point of view, it seems cleaner to attach this property to 
functions, not operators, namely, this function preserves the following 
relations.  This would allow extending Simon's idea to other operators such 
as  and  and possibly more mind-boggling cases with geometric operators and 
such.

Admittedly, this would put a lot of additional typing load on function 
authors, but perhaps it's safer that way.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Implied Functional Index use

2006-07-12 Thread Simon Riggs
On Tue, 2006-07-11 at 17:31 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  ...
  - add a new boolean to pg_operator to allow us to define which operators
  offer true equality
  ...
 
 This would be useful for other purposes too, as we keep coming up
 against what's the equality operator for this datatype problems.
 However, the restriction to true equality, such that we can assume
 x = y implies f(x) = f(y) for every immutable function f on the datatype
 (note this need not necessarily mean bitwise equality --- it depends on
 what operations the datatype provides), seems like a problem.  For
 instance, the ordinary = operators on float and numeric are NOT true
 equality, nor do we provide any true equality in this sense for these
 common datatypes.  We could hardly get away with using this concept
 to drive foreign-key comparisons, if it doesn't work for float or
 numeric.

Normally, I would not suggest that we do things only for certain data
types only. In this case however, it seems that the reason it would work
only for INTEGER and TEXT data types is that they are simple atomic
datatypes that have the required properties. So doing this for those
datatypes only seems permissable on a theoretical basis, rather than
just because we can't be bothered to do it for more complex types.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Implied Functional Index use

2006-07-12 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 Normally, I would not suggest that we do things only for certain data
 types only. In this case however, it seems that the reason it would work
 only for INTEGER and TEXT data types is that they are simple atomic
 datatypes that have the required properties. So doing this for those
 datatypes only seems permissable on a theoretical basis, rather than
 just because we can't be bothered to do it for more complex types.

There's nothing simple nor atomic about TEXT, and in fact until very
recently text_eq was NOT true equality by this definition.  See
discussions about hu_HU locale back in December.  A number of people
thought that fix was an ugly kluge, and so we may someday go back to
a behavior in which text_eq is again not true equality --- in particular
I'm dubious that such a restriction can survive once we support multiple
encodings/collations in the same database.

More generally, I don't believe in hacks that only work for a small
number of built-in types: to me, that's prima facie evidence that you
haven't thought the problem through.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Implied Functional Index use

2006-07-11 Thread Simon Riggs
Currently, functional indexes can be used by a query that explicitly
mentions the exact phrasing of the functional index within the WHERE
clause.

IMHO it is feasible to extend the range of WHERE clauses for which
functional indexes can be used by using implication, much in the same
way that we use transitive closure in join queries now. This has some
powerful and important uses, just one of which is index prefix
compression (so read on).

If we have a subclause within a WHERE clause of the form

col1 = k1 AND col2 = k2 AND ... colN = kN

this implies that the following clause is also true:

col1 = k1 AND col2 = k2 AND ... colN = kN
AND F(col1, col2 ... colN) = F(k1, k2 ... kN)

iff:
- the function F is IMMUTABLE, since that definition implies that the
output is always the same for any set of constant inputs.
- the operator that connects each col1,k1 pair must be defined as true
equality (note that in the above example what looks like the same
equality operator is in fact context dependent on the datatypes)
- the datatypes of each col1, k1 pair must match

If we have a Functional Index F(col1, col2...colN) then we can use the
second implied form of the subclause to recognise that the index can 
be useful for the query. This would then allow a query plan that uses an
IndexScan on the functional index with a re-check filter of the original
query clause.

An example might be a query using the clause
surname = 'RIGGS' 
would allow us to use an index defined as
metaphone(surname)
even though the original application was written before we added the
index. Note that the index would be significantly smaller than an index
on the full surname, as well as avoiding some of the hotspots caused by
certain most-frequent-values in the data.

An alternative example might be to define an index on
substr(text_col, 1, 100)
which still allows searching for longer strings without the need to
store the complete string in the index. This is effectively index prefix
compression, which is not directly possible with pgsql because of the
requirements of datatype encapsulation. (Note that this avoids the need
to have very long index keys also).

One difficulty to this is defining which operators represent
true-equality. There isn't a definition of this currently for operators
and we cannot assume that an operator called = has the required
properties, even if that is true for most built-in types. An example of
true-equality and its difficulties is for the FLOAT type minus-zero (-0)
and plus-zero (+0) have different byte representations yet when compared
with the standard FLOAT comparison operator +0 and -0 would be
considered equal. If we were to put each value through a hash-like
function such as md5() then we would clearly get different answers.

To take advantage of this, I propose to
- add code to be called from allpaths.c: when we check functional
indexes, if they exist and yet are not usable because of a lack of
explicit clauses we will check to see if there any clauses that can be
used to imply the correct use of the functional index. This is in many
ways similar to existing constraint exclusion code.
- add a new boolean to pg_operator to allow us to define which operators
offer true equality
- add a new keyword EQUALITY to CREATE/ALTER OPERATOR (which I think
implies HASHES and MERGES if they are not mentioned explicitly).

No promises for 8.2, but does anyone have further input?

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Implied Functional Index use

2006-07-11 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 ...
 - add a new boolean to pg_operator to allow us to define which operators
 offer true equality
 ...

This would be useful for other purposes too, as we keep coming up
against what's the equality operator for this datatype problems.
However, the restriction to true equality, such that we can assume
x = y implies f(x) = f(y) for every immutable function f on the datatype
(note this need not necessarily mean bitwise equality --- it depends on
what operations the datatype provides), seems like a problem.  For
instance, the ordinary = operators on float and numeric are NOT true
equality, nor do we provide any true equality in this sense for these
common datatypes.  We could hardly get away with using this concept
to drive foreign-key comparisons, if it doesn't work for float or
numeric.

We could invent some more-complex concept involving well, this is
equality, but there are some functions for which f(x) might differ
from f(y) anyway and then mark the presumably-few functions that
could produce divergent results --- examples are sgn() for float8
and anything dependent on dscale for numeric.  This seems ugly and
error prone however.

Anyone have a better idea?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq