Problem
-------

I feel that Clemens has brought up an important point, but I feel that rather 
than notice a specific error in the implementation of BETWEEN he has 
highlighted a conceptual error in SQLite.  Consider this comparison:

        "Albert" > "albert"

It would be inappropriate to write something like

        "Albert" COLLATE MYCOL1 > "albert" COLLATE MYCOL2

which means

        ("Albert" COLLATE MYCOL1) > ("albert" COLLATE MYCOL2)

Such a comparison, where each value is collated using a different function, is 
absurd.  The "COLLATE" mechanism is about the task of comparison, not the 
values.  It doesn't convert a value into another value, it tells SQLite how to 
compare two values.  In fact rather than the lines above the real way to 
express COLLATE would be

        "Albert" > COLLATE MYCOL3 "albert"

which means something like

        "Albert" (> COLLATE MYCOL3) "albert"

though you might prefer to express it

        ("Albert" > "303" COLLATE MYCOL3)

or perhaps

        ("Albert" > "303") COLLATE MYCOL3

The COLLATE clause affects the comparison, not the values.

So to extend this to the problem Clemens observed ...

        x collate nocase between 'a' and 'b'

really means something like

        x (between COLLATE NOCASE) 'a' and 'b'

though you might prefer to express it

        (x between 'a' and 'b' COLLATE NOCASE)

or perhaps

        (x between 'a' and 'b') COLLATE NOCASE

in all phrasings, the COLLATE clause applies to the comparison between x and 
'a', and to the comparison between x and 'b'.

Remedy
------

Looking at how COLLATE is implemented in SQLite

<http://www.sqlite.org/c3ref/create_collation.html>

the key is the callback function.  And that works correctly: you pass it the 
two values, and it returns the result: negative, zero or positive.  And you 
pass it both values, you can't collate one value one way and another the other 
way.  Anything it might do to the values to arrive at the result is purely an 
internal matter.  This is the way I feel it should work.

Given that this level of implementation is correct I feel that any error in 
SQLite's handling of COLLATE is closer to the expression parsing level than the 
low level implementation.  The low level implementation is fine.  The problem 
would appear to be in

<http://www.sqlite.org/lang_expr.html#collateop>

which states

"The COLLATE operator is a unary postfix operator that assigns a collating 
sequence to an expression."

This is bad.  It says that COLLATE modifies a value.  It shouldn't.  COLLATE is 
not a value operator, it's an operator operator.  It modifies the '>', not a 
value one side of it.

It's too late to contrafit this into SQLite3, but I'm wondering whether SQLite4 
might have the collation operator rethought along these lines.  I bet it 
results in a simpler parse tree and simpler code.

Simon.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to