Re: [HACKERS] Implied Functional index use (redux)
Simon Riggs [EMAIL PROTECTED] writes: In a thread in July last year, I raised the possibility of transforming a query to allow functional indexes to be utilised automatically. http://archives.postgresql.org/pgsql-hackers/2006-07/msg00323.php This idea can work and has many benefits, but there are some complexities. I want to summarise those issues first, then make a more practical and hopefully more acceptable proposal. Taken together the complexities would have lead us to have additional TRANSFORMABLE clauses on TYPEs, FUNCTIONs and potentially encoding schemes. All of which, I agree, just too much complexity to allow this to be specified. I've thought further about this and I believe the problem is simpler than we were thinking previously. All we need is one boolean flag on the equality operator for the data type (or perhaps it would be more convenient to have it on the operator class) that indicates that two objects will never compare equal unless they're binary equal. As long as we know that binary unequal implies operator class unequal then we can know that any immutable function will always return the same value for any two equal data. And therefore we can be guaranteed that any elements we want to look up with col = $1 will be included in the elements returned by looking up foo(col) = foo($1). We don't need to know anything about foo() beyond its immutability. If we want to be able to do inequality lookups as well then we'll have to have a flag on the function indicating it's order preserving. That is for any values a, b the property ab == f(a)=f(b) holds. The problem is that you can't guarantee that for all operator classes since someone can always come along and define a new operator class with a new ordering that breaks your guarantee. I suppose we could just have the flag indicate that the property holds for the default operator class for the argument data types. But even if we only handled equality I think it would still be a very useful feature. It would allow doing things like creating an index on soundex(lastname) or crc32(url) and have them be automatically useful without altering queries. And it would mean you wouldn't have to create redundant indexes on lastname and lower(lastname). We could always look to generalize it to inequalities later. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Implied Functional index use (redux)
Gregory Stark [EMAIL PROTECTED] writes: I've thought further about this and I believe the problem is simpler than we were thinking previously. All we need is one boolean flag on the equality operator for the data type (or perhaps it would be more convenient to have it on the operator class) that indicates that two objects will never compare equal unless they're binary equal. Well, we could simplify it that far, but that lets out float, numeric, most or all of the geometry types, and I'm not too sure I care to promise it for text either (think Unicode combining characters...). So really that's too simple IMHO. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Implied Functional index use (redux)
On Thu, 2007-01-25 at 16:20 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: A simpler, alternate proposal is to allow the user to specify whether a functional index is transformable or not using CREATE or ALTER INDEX, with a default of not transformable. That then leaves the responsibility for specifying this with the user, who as we have seen is the really only person really capable of judging the whole case on its merits. e.g. CREATE INDEX fooidx ON foo (foofunc(foocol1)) [TABLESPACE ...] [ENABLE|DISABLE TRANSFORM] [WHERE ...]; This is a foot-gun and nothing else. I hardly think the average DBA will realize such subtleties as numeric equality doesn't guarantee that such-and-such works. If it's not specified by the datatype author it's not going to be safe. OK, no problem. The most beneficial use case is for string handling: name lookups, case insensitive indexing and index size reduction generally. If, for some reason, bpchar were to be excluded then it would take away a great chunk of benefit. Two questions: - Will bpchar be transformable? - Do you see a clear way forward for specifying the information required to allow the transform? We need to specify the operator, which might be taken to include the datatype. (Peter suggested placing this on the function itself, though I think current precedent is to place on the operator.) If you can say where you want the info to live, I can work out the details and repost. If there's clear benefit and a clear way forward, then we might just be OK for 8.3. If not, I'll put this back on the shelf again in favour of other ideas. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Implied Functional index use (redux)
Simon Riggs [EMAIL PROTECTED] writes: If there's clear benefit and a clear way forward, then we might just be OK for 8.3. If not, I'll put this back on the shelf again in favour of other ideas. I think this is still a long way off, and there are probably more useful things to work on for 8.3. Part of my antagonism stems from the fact that by means of the operator family rewrite I've been getting rid of some longstanding but really quite unacceptable assumptions about this operator does that. I don't want to see us start putting unsupported semantic assumptions back into the optimizer; rather its assumptions about operator behavior need to be clearly specified. As an example, without some careful preliminary thinking I'd have probably folded all the numeric types into one big opfamily and thereby broken transitivity :-(, leading to bugs that would be devilish to figure out. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Implied Functional index use (redux)
On Fri, 2007-01-26 at 10:58 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: If there's clear benefit and a clear way forward, then we might just be OK for 8.3. If not, I'll put this back on the shelf again in favour of other ideas. I think this is still a long way off, and there are probably more useful things to work on for 8.3. Part of my antagonism stems from the fact that by means of the operator family rewrite I've been getting rid of some longstanding but really quite unacceptable assumptions about this operator does that. I don't want to see us start putting unsupported semantic assumptions back into the optimizer; rather its assumptions about operator behavior need to be clearly specified. As an example, without some careful preliminary thinking I'd have probably folded all the numeric types into one big opfamily and thereby broken transitivity :-(, leading to bugs that would be devilish to figure out. OK, no problems. All of the above says time, which is becoming rare as we approach 8.3 anyways. I'll pick it up again in 8.4. Some notes-to-self for the future: - ideally want to be able to decide transformability at CREATE INDEX time; this will reduce planning time for functional index usage when there is no possible transforms. - may want to do this by having a special catalog table that holds the cases that *will* work, to make it both safer and faster to look up. Sort of like pg_autovacuum - absence means No. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] Implied Functional index use (redux)
In a thread in July last year, I raised the possibility of transforming a query to allow functional indexes to be utilised automatically. http://archives.postgresql.org/pgsql-hackers/2006-07/msg00323.php This idea can work and has many benefits, but there are some complexities. I want to summarise those issues first, then make a more practical and hopefully more acceptable proposal. Taken together the complexities would have lead us to have additional TRANSFORMABLE clauses on TYPEs, FUNCTIONs and potentially encoding schemes. All of which, I agree, just too much complexity to allow this to be specified. One example of this was FLOAT, where -0 and +0 are equal but not the same in a binary form. That would normally mean we couldn't use FLOAT for TRANSFORMABLE indexes, but of course what happens if we specify a partial functional index, where we only index values 0. In that case, we *can* use the transform technique again. Worse still we may have a full (non-partial) index where there is a constraint on the column(s) such as CHECK (value 0). So we'd need another heavy dose of catalog-complexity to catch all the special cases. Yuck and double Yuck. Even if we did that, it isn't easy for a data type author to tell whether their type is transformable, or not **in all cases**. That would probably lead to people saying DISABLE TRANSFORM for their data type, just in case. Which means no benefit in practice with this feature. - - - A simpler, alternate proposal is to allow the user to specify whether a functional index is transformable or not using CREATE or ALTER INDEX, with a default of not transformable. That then leaves the responsibility for specifying this with the user, who as we have seen is the really only person really capable of judging the whole case on its merits. e.g. CREATE INDEX fooidx ON foo (foofunc(foocol1)) [TABLESPACE ...] [ENABLE|DISABLE TRANSFORM] [WHERE ...]; ENABLE TRANSFORM is only possible for functional indexes. Suggestions for better syntax/naming welcome. Placing the TRANSFORM clause on the index as a simple boolean makes utilising the feature more streamlined at planning time too. This would be an extra initial check in create_index_paths() to see if the query might benefit from transform. Most indexable WHERE clauses would be able to be transformed, if the index allows. The feature would be enabled by default with a GUC, but as stated above, the default for each index would be to *not* transform unless specifically requested by the user. enable_index_transform = on (default)| off EXPLAIN would not need alteration, since the modified query would show up clearly in the output. (I can add explicit visibility if people want that). Overall, a fairly isolated patch, with little user interface changes. All of the complexities would be very clearly documented as part of this feature. That is essential to avoid user error, of which I am mindful. But the technique has much promise, so I would like to make this option available to designers and DBAs. If we can agree this smoothly, then it seems possible for 8.3. Comments? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Implied Functional index use (redux)
Simon Riggs [EMAIL PROTECTED] writes: A simpler, alternate proposal is to allow the user to specify whether a functional index is transformable or not using CREATE or ALTER INDEX, with a default of not transformable. That then leaves the responsibility for specifying this with the user, who as we have seen is the really only person really capable of judging the whole case on its merits. e.g. CREATE INDEX fooidx ON foo (foofunc(foocol1)) [TABLESPACE ...] [ENABLE|DISABLE TRANSFORM] [WHERE ...]; This is a foot-gun and nothing else. I hardly think the average DBA will realize such subtleties as numeric equality doesn't guarantee that such-and-such works. If it's not specified by the datatype author it's not going to be safe. In fact, this doesn't work anyway, since it still doesn't address the question of which equality operators we think permit us to apply the transform. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Implied Functional Index use
- add a new boolean to pg_operator to allow us to define which operators offer true equality ... This would be useful for other purposes too, as we keep coming up against what's the equality operator for this datatype problems. However, the restriction to true equality, such that we can assume x = y implies f(x) = f(y) for every immutable function f on the datatype Maybe we could have a tri (or more) state flag for the equality operators. ' ' .. not an equality op 'e' .. equality 's' .. strict equality (op only true iff the binary representation is equal) Andreas ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Implied Functional Index use
On Wed, 2006-07-12 at 22:34 -0400, Tom Lane wrote: More generally, I don't believe in hacks that only work for a small number of built-in types: to me, that's prima facie evidence that you haven't thought the problem through. I agree an attempt at a simple definition of the required functional properties hasn't yielded a great solution, so we must go deeper. On Wed, 2006-07-12 at 15:09 +0200, Peter Eisentraut wrote: From a mathematical point of view, it seems cleaner to attach this property to functions, not operators, namely, this function preserves the following relations. This would allow extending Simon's idea to other operators such as and and possibly more mind-boggling cases with geometric operators and such. Well, in PG, operators are functions that define various special properties of their inputs/outputs, so it seems the correct place to put these definitions. I agree it seems more normalised to place this information at the function itself, but that is not current precedent. On Thu, 2006-07-13 at 09:14 +0200, Zeugswetter Andreas DCP SD wrote: Maybe we could have a tri (or more) state flag for the equality operators. ' ' .. not an equality op 'e' .. equality 's' .. strict equality (op only true iff the binary representation is equal) Tom has pointed out that the required functional properties are actually a fourth state between equality and full binary equality. I was trying to avoid introducing new single-use properties, but I think that is the only way here. My concern was not the complexity of specifying this for function authors, but the problem of making an incorrect selection leading to incorrect query results. It seems we need this in the catalog: ' ' .. not an equality op 'e' .. equality (sufficient for FKs) 'f' .. functional equality (sufficient for this proposed optimisation) 's' .. strict equality (op only true iff the binary representation is equal) We're breaking new ground here, but that's a good thing. I'd much rather have an optimisable and extensible type system than a hard-wired one. There is a problem of implication here, AFAICS: When a user SQL asks WHERE col1 = 7 which equality level is meant when several exist? We currently already assume that they mean e-equality, since it is the most useful and intuitive meaning. That is not as strict as f-equality, so we would not be able to make implications from that. I'll think on this some more, but not for 8.2. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Implied Functional Index use
There is a problem of implication here, AFAICS: When a user SQL asks WHERE col1 = 7 which equality level is meant when several exist? Well, the operator must be unique, so there is no problem. Unique in the sense that an operator with the same name ('=' in this case) and argument types cannot exist for more than one level of equality. (and the level should not have an effect on the resolution) So, when we see col1 = 7 we lookup the equality level of the operator and decide whether it is strict enough for the particular optimization. Andreas ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Implied Functional Index use
Am Dienstag, 11. Juli 2006 23:31 schrieb Tom Lane: We could invent some more-complex concept involving well, this is equality, but there are some functions for which f(x) might differ from f(y) anyway and then mark the presumably-few functions that could produce divergent results --- examples are sgn() for float8 and anything dependent on dscale for numeric. This seems ugly and error prone however. From a mathematical point of view, it seems cleaner to attach this property to functions, not operators, namely, this function preserves the following relations. This would allow extending Simon's idea to other operators such as and and possibly more mind-boggling cases with geometric operators and such. Admittedly, this would put a lot of additional typing load on function authors, but perhaps it's safer that way. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Implied Functional Index use
On Tue, 2006-07-11 at 17:31 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: ... - add a new boolean to pg_operator to allow us to define which operators offer true equality ... This would be useful for other purposes too, as we keep coming up against what's the equality operator for this datatype problems. However, the restriction to true equality, such that we can assume x = y implies f(x) = f(y) for every immutable function f on the datatype (note this need not necessarily mean bitwise equality --- it depends on what operations the datatype provides), seems like a problem. For instance, the ordinary = operators on float and numeric are NOT true equality, nor do we provide any true equality in this sense for these common datatypes. We could hardly get away with using this concept to drive foreign-key comparisons, if it doesn't work for float or numeric. Normally, I would not suggest that we do things only for certain data types only. In this case however, it seems that the reason it would work only for INTEGER and TEXT data types is that they are simple atomic datatypes that have the required properties. So doing this for those datatypes only seems permissable on a theoretical basis, rather than just because we can't be bothered to do it for more complex types. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Implied Functional Index use
Simon Riggs [EMAIL PROTECTED] writes: Normally, I would not suggest that we do things only for certain data types only. In this case however, it seems that the reason it would work only for INTEGER and TEXT data types is that they are simple atomic datatypes that have the required properties. So doing this for those datatypes only seems permissable on a theoretical basis, rather than just because we can't be bothered to do it for more complex types. There's nothing simple nor atomic about TEXT, and in fact until very recently text_eq was NOT true equality by this definition. See discussions about hu_HU locale back in December. A number of people thought that fix was an ugly kluge, and so we may someday go back to a behavior in which text_eq is again not true equality --- in particular I'm dubious that such a restriction can survive once we support multiple encodings/collations in the same database. More generally, I don't believe in hacks that only work for a small number of built-in types: to me, that's prima facie evidence that you haven't thought the problem through. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] Implied Functional Index use
Currently, functional indexes can be used by a query that explicitly mentions the exact phrasing of the functional index within the WHERE clause. IMHO it is feasible to extend the range of WHERE clauses for which functional indexes can be used by using implication, much in the same way that we use transitive closure in join queries now. This has some powerful and important uses, just one of which is index prefix compression (so read on). If we have a subclause within a WHERE clause of the form col1 = k1 AND col2 = k2 AND ... colN = kN this implies that the following clause is also true: col1 = k1 AND col2 = k2 AND ... colN = kN AND F(col1, col2 ... colN) = F(k1, k2 ... kN) iff: - the function F is IMMUTABLE, since that definition implies that the output is always the same for any set of constant inputs. - the operator that connects each col1,k1 pair must be defined as true equality (note that in the above example what looks like the same equality operator is in fact context dependent on the datatypes) - the datatypes of each col1, k1 pair must match If we have a Functional Index F(col1, col2...colN) then we can use the second implied form of the subclause to recognise that the index can be useful for the query. This would then allow a query plan that uses an IndexScan on the functional index with a re-check filter of the original query clause. An example might be a query using the clause surname = 'RIGGS' would allow us to use an index defined as metaphone(surname) even though the original application was written before we added the index. Note that the index would be significantly smaller than an index on the full surname, as well as avoiding some of the hotspots caused by certain most-frequent-values in the data. An alternative example might be to define an index on substr(text_col, 1, 100) which still allows searching for longer strings without the need to store the complete string in the index. This is effectively index prefix compression, which is not directly possible with pgsql because of the requirements of datatype encapsulation. (Note that this avoids the need to have very long index keys also). One difficulty to this is defining which operators represent true-equality. There isn't a definition of this currently for operators and we cannot assume that an operator called = has the required properties, even if that is true for most built-in types. An example of true-equality and its difficulties is for the FLOAT type minus-zero (-0) and plus-zero (+0) have different byte representations yet when compared with the standard FLOAT comparison operator +0 and -0 would be considered equal. If we were to put each value through a hash-like function such as md5() then we would clearly get different answers. To take advantage of this, I propose to - add code to be called from allpaths.c: when we check functional indexes, if they exist and yet are not usable because of a lack of explicit clauses we will check to see if there any clauses that can be used to imply the correct use of the functional index. This is in many ways similar to existing constraint exclusion code. - add a new boolean to pg_operator to allow us to define which operators offer true equality - add a new keyword EQUALITY to CREATE/ALTER OPERATOR (which I think implies HASHES and MERGES if they are not mentioned explicitly). No promises for 8.2, but does anyone have further input? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Implied Functional Index use
Simon Riggs [EMAIL PROTECTED] writes: ... - add a new boolean to pg_operator to allow us to define which operators offer true equality ... This would be useful for other purposes too, as we keep coming up against what's the equality operator for this datatype problems. However, the restriction to true equality, such that we can assume x = y implies f(x) = f(y) for every immutable function f on the datatype (note this need not necessarily mean bitwise equality --- it depends on what operations the datatype provides), seems like a problem. For instance, the ordinary = operators on float and numeric are NOT true equality, nor do we provide any true equality in this sense for these common datatypes. We could hardly get away with using this concept to drive foreign-key comparisons, if it doesn't work for float or numeric. We could invent some more-complex concept involving well, this is equality, but there are some functions for which f(x) might differ from f(y) anyway and then mark the presumably-few functions that could produce divergent results --- examples are sgn() for float8 and anything dependent on dscale for numeric. This seems ugly and error prone however. Anyone have a better idea? regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq