Re: [HACKERS] FuncExpr.collid/OpExpr.collid unworkably serving double duty

2011-03-15 Thread Peter Eisentraut
On tor, 2011-03-10 at 17:16 -0500, Tom Lane wrote:
 On the other hand ... one thing that's been bothering me is that
 select_common_collation assumes that explicit collation derivation
 doesn't bubble up in the tree, ie a COLLATE is only a forcing function
 for the immediate parent expression node.  It's not at all clear to me
 that that's a correct reading of the spec.  If it's not, the only way
 we could make it work correctly in the current design is to keep
 *two* additional fields, both the collation OID and an
 explicit/implicit
 derivation flag.  Which would be well past the level of annoying.

That's correct.  I didn't finish implementing that yet because I didn't
have a good solution for the annoyance bit.  The patch I proposed early
on that would have grouped type+typmod+collation into a separate Node
would have provided a simple solution but did not go through.  My
assumption was that this issue was not critical to the core feature and
could be solved later.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FuncExpr.collid/OpExpr.collid unworkably serving double duty

2011-03-15 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On tor, 2011-03-10 at 17:16 -0500, Tom Lane wrote:
 On the other hand ... one thing that's been bothering me is that
 select_common_collation assumes that explicit collation derivation
 doesn't bubble up in the tree, ie a COLLATE is only a forcing function
 for the immediate parent expression node.  It's not at all clear to me
 that that's a correct reading of the spec.  If it's not, the only way
 we could make it work correctly in the current design is to keep
 *two* additional fields, both the collation OID and an
 explicit/implicit
 derivation flag.  Which would be well past the level of annoying.

 That's correct.  I didn't finish implementing that yet because I didn't
 have a good solution for the annoyance bit.  The patch I proposed early
 on that would have grouped type+typmod+collation into a separate Node
 would have provided a simple solution but did not go through.  My
 assumption was that this issue was not critical to the core feature and
 could be solved later.

Adding a separate Node for those fields would hardly meet the core
objection, which is the extra space needed for the storage --- in fact,
it would make that problem considerably worse.  But what did you think
of the idea of setting collations during a post-pass, so that the
collation derivation values need only be local storage during that
one recursive routine?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FuncExpr.collid/OpExpr.collid unworkably serving double duty

2011-03-15 Thread Peter Eisentraut
On tis, 2011-03-15 at 16:16 -0400, Tom Lane wrote:
 But what did you think of the idea of setting collations during a
 post-pass, so that the collation derivation values need only be local
 storage during that one recursive routine?

That sounds reasonable.  We do need the collation value in the
expression tree permanently, but not the collation derivation, as far as
I can tell.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FuncExpr.collid/OpExpr.collid unworkably serving double duty

2011-03-14 Thread Tom Lane
I wrote:
 In any case, I am sure that that what this describes is not what our
 current code does :-(, and that we can't get anywhere close to this with
 the existing infrastructure.  So at this point I'm thinking that the
 safest approach is to rip out the result-collation caching fields and
 perform collation assignment in a parsing post-pass.  That will allow us
 to revise the collation assignment algorithm without further catversion
 bumps.

After further examination of the patch, I'm feeling that removing the
result-collation fields isn't going to be very practical after all.
The problem is that there are now exprCollation() calls all over the
place, many of them in performance-critical places, and there is no
way that exprCollation() will be a cheap operation if we remove the
result-collation fields.  I had thought that we could avoid this if
we still stored collation in a few critical places such as TargetEntry.
But that only appears to take care of some of the call sites, and anyway
it'd be a little weird to store collation there when we don't store
column type there.  For better or worse, the system is designed around
the assumption that expression trees are self-identifying as to output
type and typmod, so I think we've now got to include result collation
in that too.

This implies further bloat in expression trees of course.  By my count,
the following node types also need to store input collation, so will now
have 2 collation fields not one:
Aggref
WindowFunc
FuncExpr
OpExpr
DistinctExpr
NullIfExpr
ScalarArrayOpExpr
RowCompareExpr
MinMaxExpr

(No, wait, DistinctExpr and RowCompareExpr always yield boolean so
they don't need result collation fields, just the input collation.)

There are also several node types that the existing patch neglected to
add collation fields to, but really need to have it AFAICS; one pretty
obvious example being CoerceToDomain.  Actually I think we have to have
a result collation field in every node type that can possibly deliver a
result of a collatable type.

I'm still going to switch over to the post-pass collation assignment
method, because otherwise we're going to need to add collation state
fields as well to all of these node types ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FuncExpr.collid/OpExpr.collid unworkably serving double duty

2011-03-12 Thread Martijn van Oosterhout
On Fri, Mar 11, 2011 at 04:56:31PM -0500, Tom Lane wrote:
  In my implementation I needed to expand this to the general set of
  operators postgresql supported and relaxed this to only consider
  arguments to the function/operator that had the same type as the
  resulting type of the function/operator, since that's the only thing
  that makes sense.
 
 Yeah, that corresponds to Transact-SQL's distinction between functions
 that take a string and produce a string, versus those that produce a
 string without any string inputs.  But I don't see anything justifying
 such a distinction in the text of the standard.

I remember being bugged by the fact that the standard indeed said
nothing (that I could find) about the results of functions that
accepted something other than strings.

 Also note that the TSQL doc explicitly points out that collation labels
 can be carried up through changes of character string types, so I think
 you're wrong to say that collation is only carried through functions that
 produce exactly the same type as their input.  I'd say collation labels
 propagate through any function that has both collatable input(s) and a
 collatable result type.

Yes, this is the most logical and useful interpretation. Collations for
all the string types are essentially compatable so can be treated as
equivalent for this purpose.

(My bit about same type at input is due to my idea of extending
collation to more than just strings. I don't beleive the current patch
supports that anyway. I intended to treat all string types as
equivalent.

 In any case, I am sure that that what this describes is not what our
 current code does :-(, and that we can't get anywhere close to this with
 the existing infrastructure.  So at this point I'm thinking that the
 safest approach is to rip out the result-collation caching fields and
 perform collation assignment in a parsing post-pass.  That will allow us
 to revise the collation assignment algorithm without further catversion
 bumps.

Are you sure? I've grabbed the patch and an looking at it. It looks to
me like it is parse_expr properly combining the collations of the
subnodes, just like it is doing the type determination.

Hmm, but select_common_collation looks a bit dodgy indeed. For example,
whether a collation is explicit or not depends on the type of the
subnode. That's not right at all. During the parse phase you must not
only track the collation but also the state (default, implicit,
explicit, error, none). Otherwise you indeed get the issue where you
don't throw errors at the right time.

I had a collatestate object:

+   Oid colloid;/* OID of collation */
+   CollateType colltype;   /* Type of Collation */

Where CollateType is IMPLICIT, EXPLICIT, NONE or DEFAULT.

And then I had the function below to handle the combining. This I think
handles the collation rules correctly, but I don't think it can be
shoehorned into the current patch.

Hope this helps with your review.

Have a nice day,

+ /* This logic is dictated by the SQL standard */
+ CollateState *
+ CombineCollateState( CollateState *arg1, CollateState *arg2 )
+ {
+   /* If either argument is coerable (the default), return the other */
+   if( !arg1 || arg1-colltype == COLLATE_COERCIBLE )
+   return arg2;
+   if( !arg2 || arg2-colltype == COLLATE_COERCIBLE )
+   return arg1;
+   /* If both are explicit, they must be the same */
+   if( arg1-colltype == COLLATE_EXPLICIT  arg2-colltype == 
COLLATE_EXPLICIT )
+   {
+   if( arg1-colloid != arg2-colloid )
+   ereport( ERROR, (errmsg(Conflicting COLLATE 
clauses)));
+   /* Both same, doesn't matter which we return */
+   return arg1;
+   }
+   /* Otherwise, explicit overrides anything */
+   if( arg1-colltype == COLLATE_EXPLICIT )
+   return arg1;
+   if( arg2-colltype == COLLATE_EXPLICIT )
+   return arg2;
+   /* COLLATE_NONE can only be overridden by EXPLICIT */
+   if( arg1-colltype == COLLATE_NONE )
+   return arg1;
+   if( arg2-colltype == COLLATE_NONE )
+   return arg2;
+   
+   /* We only get here if both are implicit */
+   /* Same locale, return that implicitly */
+   if( arg1-colloid == arg2-colloid )
+   return arg1;
+   
+   /* Conflicting implicit collation, return NONE */
+   {
+   CollateState *result = makeNode(CollateState);
+   result-colltype = COLLATE_NONE;
+   result-colloid = InvalidOid;
+   return result;
+   }
+ }
+ 

-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: 

Re: [HACKERS] FuncExpr.collid/OpExpr.collid unworkably serving double duty

2011-03-11 Thread Martijn van Oosterhout
On Thu, Mar 10, 2011 at 05:51:31PM -0500, Tom Lane wrote:
 No, that's not what I'm on about.  Consider
 
   (((A COLLATE X) || B) || (C COLLATE Y))  (D COLLATE Z)
 
 (I've spelled out the parenthesization in full for clarity, but most
 of these parens could be omitted.)  Is this expression legal, or
 should the  operator be throwing an error for conflicting
 explicitly-derived collations?  Our code as it stands will take it,
 because no individual operator sees more than one COLLATE among its
 arguments.  But I'm not sure this is right.  The only text I can find
 in SQL2008 that seems to bear on the point is in 4.2.2:

The rules are essentially as described here:

http://msdn.microsoft.com/en-us/library/ms179886.aspx

So:

(A COLLATE X) = collation X
((A COLLATE X) || B)   = collation X
(((A COLLATE X) || B) || (C COLLATE Y))  = error

If we aren't erroring on this then we're doing it wrong. The whole
point of going through the parse tree and assigning a collation to each
node is to catch these things.

 As I read this, the collation attached to any Var clause is implicit
 (because it came from the Var's data type), and the collation attached
 to a CollateClause is presumably explicit, but where does it say what
 happens at higher levels in the expression tree?  It's at least arguable
 that the result collation of an expression is explicit if its input
 collation was explicit.  The fact that the default in case of doubt
 apparently is supposed to be explicit doesn't give any aid or comfort
 to your position either.  If explicitness comes only from the immediate
 use of COLLATE, why don't they say that?  This is worded to make one
 think that most cases will have explicit derivation, not only COLLATE.

See 9.3 Data types of results of aggregations clause (ii). It
contains essentially the rules outlined by the Transact-SQL page above.

The collation derivation and declared type collation of the result are
determined as follows.
Case:
1) If some data type in DTS has an explicit collation derivation and
declared type collation
EC1, then every data type in DTS that has an explicit collation
derivation shall have a declared
type collation that is EC1. The collation derivation is explicit and
the collation is EC1.
2) If every data type in DTS has an implicit collation derivation, then
Case:
A) If every data type in DTS has the same declared type collation IC1,
then the collation
derivation is implicit and the declared type collation is IC1.
B) Otherwise, the collation derivation is none.
3) Otherwise, the collation derivation is none.

In my implementation I needed to expand this to the general set of
operators postgresql supported and relaxed this to only consider
arguments to the function/operator that had the same type as the
resulting type of the function/operator, since that's the only thing
that makes sense.

A concatination then requires its arguments to be compatible. A substr
has the collation of its sole string argument.

I hope this helps,

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature


Re: [HACKERS] FuncExpr.collid/OpExpr.collid unworkably serving double duty

2011-03-11 Thread Tom Lane
Martijn van Oosterhout klep...@svana.org writes:
 On Thu, Mar 10, 2011 at 05:51:31PM -0500, Tom Lane wrote:
 No, that's not what I'm on about.  Consider
 
 (((A COLLATE X) || B) || (C COLLATE Y))  (D COLLATE Z)

 The rules are essentially as described here:
 http://msdn.microsoft.com/en-us/library/ms179886.aspx

Hmm ... that's an interesting document, but I'm not at all sure that
it intends to describe the same rules that are in the SQL standard.
In particular I don't believe that their notion of coercible-default
matches the standard.

 In my implementation I needed to expand this to the general set of
 operators postgresql supported and relaxed this to only consider
 arguments to the function/operator that had the same type as the
 resulting type of the function/operator, since that's the only thing
 that makes sense.

Yeah, that corresponds to Transact-SQL's distinction between functions
that take a string and produce a string, versus those that produce a
string without any string inputs.  But I don't see anything justifying
such a distinction in the text of the standard.

Also note that the TSQL doc explicitly points out that collation labels
can be carried up through changes of character string types, so I think
you're wrong to say that collation is only carried through functions that
produce exactly the same type as their input.  I'd say collation labels
propagate through any function that has both collatable input(s) and a
collatable result type.

In any case, I am sure that that what this describes is not what our
current code does :-(, and that we can't get anywhere close to this with
the existing infrastructure.  So at this point I'm thinking that the
safest approach is to rip out the result-collation caching fields and
perform collation assignment in a parsing post-pass.  That will allow us
to revise the collation assignment algorithm without further catversion
bumps.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FuncExpr.collid/OpExpr.collid unworkably serving double duty

2011-03-10 Thread Martijn van Oosterhout
On Wed, Mar 09, 2011 at 04:49:28PM -0500, Tom Lane wrote:
 So I was moving some error checks around and all of a sudden the
 regression tests blew up on me, with lots of errors about how type X
 didn't support collations (which indeed it didn't).  After some
 investigation I realized what should have been apparent much earlier:
 the collations patch is trying to use one field for two different
 purposes.  In particular, collid in FuncExpr and related nodes is
 used in both of these ways:
 
 * as the collation to apply during execution of the function;
 
 * as the collation of the function's result.

Ouch, that is painful.

Looking back at my first attempt I see I made the same error, though I
had noted that it had an unusual failure mode, namely that:

func( a COLLATE x ) COLLATE y

would determine that x was the collation to use for func, not y,
and that y may be ignored. A bit of a corner case, but someone was
bound to try it.

I think I avoided the particular failure mode you found, because the
GetCollation on the FuncExpr didn't return the collation calculated for
the node, but the the collation derived from the collations of any
arguments that had the same type and the return value. So operators
like = and  automatically got NONE because none of their arguments are
booleans.

 regression=# create view vv as select 'z'::text  'y'::text as b;
 ERROR:  collations are not supported by type boolean

I'm my original idea, any data type was collatable, since I considered
ASC and DESC, NULLS FIRST/LAST to be collations every datatype had.
Thus the above wasn't an error. As long as the collation was a
collation appropriate for booleans it worked.

 There are basically two things we could do about this:
 
 1. Add two fields not one to nodes representing function/operator calls.
 
 2. Change exprCollation() to do a type_is_collatable check on the
 node result type before believing that the collid field is relevant.

It might be worthwhile adding an extra field, but I think I didn't do
it because you only need the information exactly once, while descending
the parse tree in parse_expr. But for clarity the extra field is a
definite win.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature


Re: [HACKERS] FuncExpr.collid/OpExpr.collid unworkably serving double duty

2011-03-10 Thread Tom Lane
Martijn van Oosterhout klep...@svana.org writes:
 On Wed, Mar 09, 2011 at 04:49:28PM -0500, Tom Lane wrote:
 There are basically two things we could do about this:
 
 1. Add two fields not one to nodes representing function/operator calls.
 
 2. Change exprCollation() to do a type_is_collatable check on the
 node result type before believing that the collid field is relevant.

 It might be worthwhile adding an extra field, but I think I didn't do
 it because you only need the information exactly once, while descending
 the parse tree in parse_expr. But for clarity the extra field is a
 definite win.

Hmm.  That suggests a third solution: revert the addition of *all* the
collid fields except the ones that represent collation-to-apply-during-
function-execution.  (So they'd still be there in FuncExpr/OpExpr, but
not most other places.)  Then we'd have to dig down more deeply in the
expression tree during select_common_collation, but we'd save space
and avoid confusion over the meaning of the fields.

I suspect this is probably not a good idea because of the added cost in
select_common_collation: aside from probably needing more syscache
lookups, there's a potential for worse-than-linear cost behavior if we
have to repeatedly dig through a deep expression tree to find out
collations.  We had a similar case in the past [ checks archives ... see
http://archives.postgresql.org/pgsql-performance/2005-06/msg00075.php
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=ba4200246
] so I'm hesitant to go down that road again.  Still, I'll throw it out
for comment.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FuncExpr.collid/OpExpr.collid unworkably serving double duty

2011-03-10 Thread Martijn van Oosterhout
On Thu, Mar 10, 2011 at 10:34:00AM -0500, Tom Lane wrote:
 Hmm.  That suggests a third solution: revert the addition of *all* the
 collid fields except the ones that represent collation-to-apply-during-
 function-execution.  (So they'd still be there in FuncExpr/OpExpr, but
 not most other places.)  Then we'd have to dig down more deeply in the
 expression tree during select_common_collation, but we'd save space
 and avoid confusion over the meaning of the fields.

Yeah, it occurred to me if you made each collate clause translate to a
collate node that changes the collation, a bit like casts, then the
parse nodes don't need to know about collation at all.

 I suspect this is probably not a good idea because of the added cost in
 select_common_collation: aside from probably needing more syscache
 lookups, there's a potential for worse-than-linear cost behavior if we
 have to repeatedly dig through a deep expression tree to find out
 collations.  We had a similar case in the past [ checks archives ... see
 http://archives.postgresql.org/pgsql-performance/2005-06/msg00075.php
 http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=ba4200246
 ] so I'm hesitant to go down that road again.  Still, I'll throw it out
 for comment.

Two things can make a difference here:

- If you knew which operators/functions cared about the collation, the
  cost could be manageable. We don't so...

- ISTM that in theory any algorithm that is defined by recursion at
  each node, should be calculatable via a single pass of the tree by
  something like parse_expr. That's essentially what the variables are
  doing in the Expr nodes, though whether you need one or two is
  ofcourse another question.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature


Re: [HACKERS] FuncExpr.collid/OpExpr.collid unworkably serving double duty

2011-03-10 Thread Tom Lane
Martijn van Oosterhout klep...@svana.org writes:
 On Thu, Mar 10, 2011 at 10:34:00AM -0500, Tom Lane wrote:
 I suspect this is probably not a good idea because of the added cost in
 select_common_collation: aside from probably needing more syscache
 lookups, there's a potential for worse-than-linear cost behavior if we
 have to repeatedly dig through a deep expression tree to find out
 collations.

 Two things can make a difference here:

 - If you knew which operators/functions cared about the collation, the
   cost could be manageable. We don't so...

Yeah, the possibility of skipping select_common_collation altogether for
most operators is pretty attractive.  Maybe we'll get to that before
we're done, but I don't want to assume it'll be done for 9.1.

 - ISTM that in theory any algorithm that is defined by recursion at
   each node, should be calculatable via a single pass of the tree by
   something like parse_expr. That's essentially what the variables are
   doing in the Expr nodes, though whether you need one or two is
   ofcourse another question.

We could do that if we were willing to go back and fill in the collation
fields after the whole expression tree is built.  If you want to fill in
at the time the FuncExpr/OpExpr is first built, then you will get O(N^2)
behavior from repeated calculations in a deep tree if you don't cache
the results for the lower levels.  Which is what the output-collation
fields would do for us.

A post-pass is not out of the question, but it's enough unlike
everything else the parser does that I'm not too thrilled about it.

Also, there's the issue that started the whole discussion, which is that
sometimes we *do* need to know, post-parse-analysis, what the result
collation of an expression tree is.  See CREATE VIEW.  If that's the
*only* thing that ever needed to know it, I wouldn't mind accepting a
double calculation of the collation for CREATE VIEW ... but somehow it
doesn't seem real likely that no other uses for the information will
emerge, and some of them might be more performance-critical.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FuncExpr.collid/OpExpr.collid unworkably serving double duty

2011-03-10 Thread Tom Lane
I wrote:
 A post-pass is not out of the question, but it's enough unlike
 everything else the parser does that I'm not too thrilled about it.

On the other hand ... one thing that's been bothering me is that
select_common_collation assumes that explicit collation derivation
doesn't bubble up in the tree, ie a COLLATE is only a forcing function
for the immediate parent expression node.  It's not at all clear to me
that that's a correct reading of the spec.  If it's not, the only way
we could make it work correctly in the current design is to keep
*two* additional fields, both the collation OID and an explicit/implicit
derivation flag.  Which would be well past the level of annoying.
But in a post-pass implementation it would be no great trouble to do
either one, and we'd not be looking at a forced initdb to change our
minds either.

Maybe a post-pass, with only collation-to-apply fields actually stored
in the tree, is the way to go.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FuncExpr.collid/OpExpr.collid unworkably serving double duty

2011-03-10 Thread Martijn van Oosterhout
On Thu, Mar 10, 2011 at 05:16:52PM -0500, Tom Lane wrote:
 On the other hand ... one thing that's been bothering me is that
 select_common_collation assumes that explicit collation derivation
 doesn't bubble up in the tree, ie a COLLATE is only a forcing function
 for the immediate parent expression node.  It's not at all clear to me
 that that's a correct reading of the spec.  If it's not, the only way
 we could make it work correctly in the current design is to keep
 *two* additional fields, both the collation OID and an explicit/implicit
 derivation flag.  Which would be well past the level of annoying.
 But in a post-pass implementation it would be no great trouble to do
 either one, and we'd not be looking at a forced initdb to change our
 minds either.

I beleive the current interpretation, that is the COLLATE only applies
to levels above, is the correct interpretation. COLLATE binds tightly,
so

A op B COLLATE C  parses as  A op (B COLLATE C)

which is why it works. It's actually the only way that makes sense,
otherwise it becomes completely impossible to specify different
collations for a function and its return value.

For example in your example with a view:

CREATE VIEW foo AS func(x COLLATE A) COLLATE B;

B is the collation for the output column, A is the collation for the
function.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature


Re: [HACKERS] FuncExpr.collid/OpExpr.collid unworkably serving double duty

2011-03-10 Thread Tom Lane
Martijn van Oosterhout klep...@svana.org writes:
 On Thu, Mar 10, 2011 at 05:16:52PM -0500, Tom Lane wrote:
 On the other hand ... one thing that's been bothering me is that
 select_common_collation assumes that explicit collation derivation
 doesn't bubble up in the tree, ie a COLLATE is only a forcing function
 for the immediate parent expression node.  It's not at all clear to me
 that that's a correct reading of the spec.

 I beleive the current interpretation, that is the COLLATE only applies
 to levels above, is the correct interpretation. COLLATE binds tightly,
 so

 A op B COLLATE C  parses as  A op (B COLLATE C)

 which is why it works.

No, that's not what I'm on about.  Consider

(((A COLLATE X) || B) || (C COLLATE Y))  (D COLLATE Z)

(I've spelled out the parenthesization in full for clarity, but most
of these parens could be omitted.)  Is this expression legal, or
should the  operator be throwing an error for conflicting
explicitly-derived collations?  Our code as it stands will take it,
because no individual operator sees more than one COLLATE among its
arguments.  But I'm not sure this is right.  The only text I can find
in SQL2008 that seems to bear on the point is in 4.2.2:

Anything that has a declared type can, if that type is a
character string type, be associated with a collation applicable
to its character set; this is known as a declared type
collation. Every declared type that is a character string type
has a collation derivation, this being either none, implicit, or
explicit. The collation derivation of a declared type with a
declared type collation that is explicitly or implicitly
specified by a data type is implicit. If the collation
derivation of a declared type that has a declared type collation
is not implicit, then it is explicit. The collation derivation
of an expression of character string type that has no declared
type collation is none.

As I read this, the collation attached to any Var clause is implicit
(because it came from the Var's data type), and the collation attached
to a CollateClause is presumably explicit, but where does it say what
happens at higher levels in the expression tree?  It's at least arguable
that the result collation of an expression is explicit if its input
collation was explicit.  The fact that the default in case of doubt
apparently is supposed to be explicit doesn't give any aid or comfort
to your position either.  If explicitness comes only from the immediate
use of COLLATE, why don't they say that?  This is worded to make one
think that most cases will have explicit derivation, not only COLLATE.

I wonder if anyone can check the behavior of nested collate clauses in
DB2 or some other probably-spec-conforming database.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] FuncExpr.collid/OpExpr.collid unworkably serving double duty

2011-03-09 Thread Tom Lane
So I was moving some error checks around and all of a sudden the
regression tests blew up on me, with lots of errors about how type X
didn't support collations (which indeed it didn't).  After some
investigation I realized what should have been apparent much earlier:
the collations patch is trying to use one field for two different
purposes.  In particular, collid in FuncExpr and related nodes is
used in both of these ways:

* as the collation to apply during execution of the function;

* as the collation of the function's result.

The trouble is that these usages are only compatible if both the
arguments and result of the function are of collatable types.  In
particular, the change I made was causing CREATE VIEW to fail on
examples like this:

regression=# create view vv as select 'z'::text  'y'::text as b;
ERROR:  collations are not supported by type boolean

because the OpExpr node must have nonzero collid to do a textual
comparison, but then exprCollation() claims that the result of the
expression has that collation too, which it does not because it's
only bool.

Aside from confusing code that tries to impute a collation to the
result of an expression, this will confuse the collation assignment code
itself: select_common_collation will mistakenly believe that
non-collatable input arguments should affect its results.  I'm not sure
how you managed to avoid such failures in the committed patch (if indeed
you did avoid them and they're not just lurking in un-regression-tested
cases); but in any case it seems far too fragile to keep it this way.

There are basically two things we could do about this:

1. Add two fields not one to nodes representing function/operator calls.

2. Change exprCollation() to do a type_is_collatable check on the
node result type before believing that the collid field is relevant.

Of course the syscache lookup implied by type_is_collatable will mean
that exprCollation is orders of magnitude slower than it is now.  So
this is a pretty straightforward space vs speed tradeoff.  I'm inclined
to think that choice #1 is the lesser evil, because I'm afraid that this
patch has already added an undesirable number of new cache lookups to
the basic expression parsing paths.

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers