Re: [sqlite] BETWEEN and explicit collation assignment

2013-09-04 Thread James K. Lowden
On Tue, 3 Sep 2013 18:37:42 -0500
Nico Williams  wrote:

> > There's no need to qualify string literals, as it turns out.  SQLite
> > makes a reasonable choice in that context. When comparing a string
> > literal to a column, the literal (in effect) takes on the collation
> > of the column.
> 
> But SQLite3 is dynamically typed.  Consider a query that uses a UNION
> query as a table source, where the corresponding columns of the
> sub-queries use different collations.  

Each column has a type, and every character column has a collation.  
Each individual value, each row, is compared to the string literal on
its own terms.  

When you say

where NAME = 'George'

the only reasonable thing to do is to compare each NAME with 'George'.
Even if different NAMEs have different collations (as in a UNION,
perhaps), each individual NAME has a particular collation, which
governs the comparison.  

> It's easy to build contrived (and real) queries where SQLite3 can't
> keep track of a collation -- not at run-time, much less at query
> compile-time.  

I might regret this, but if it's easy could you perhaps provide
an example?  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-09-03 Thread Nico Williams
On Thu, Aug 29, 2013 at 6:35 PM, James K. Lowden
 wrote:
> On Tue, 27 Aug 2013 21:00:50 -0500
> Nico Williams  wrote:
>
>> Of course, lacking a syntax for associating collations with string
>> literals there will be times when some, or even all of the operands to
>> an operation that needs collation information, is missing.  In such
>> cases the collation information of some operands could be used to cast
>> the others, or lexical clues might be needed, or a default might be
>> provided.
>
> There's no need to qualify string literals, as it turns out.  SQLite
> makes a reasonable choice in that context. When comparing a string
> literal to a column, the literal (in effect) takes on the collation of
> the column.

But SQLite3 is dynamically typed.  Consider a query that uses a UNION
query as a table source, where the corresponding columns of the
sub-queries use different collations.  It's easy to build contrived
(and real) queries where SQLite3 can't keep track of a collation --
not at run-time, much less at query compile-time.  One might say that
such cases are programmer error, but they are supported nonetheless.

Still, as long as BETWEEN, IS, =, and so on, have a way to
consistently derive a collation from one or more of their operands
then things can be OK.  CAST can serve that purpose, and if users
choose to think of it as providing a collation for the operator, so be
it (though it is subtly not the same).

BTW, looking at struct Mem, I don't see SQLite3 keeping track of
collations in string values.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-29 Thread James K. Lowden
On Tue, 27 Aug 2013 21:00:50 -0500
Nico Williams  wrote:

> Of course, lacking a syntax for associating collations with string
> literals there will be times when some, or even all of the operands to
> an operation that needs collation information, is missing.  In such
> cases the collation information of some operands could be used to cast
> the others, or lexical clues might be needed, or a default might be
> provided.

There's no need to qualify string literals, as it turns out.  SQLite
makes a reasonable choice in that context. When comparing a string
literal to a column, the literal (in effect) takes on the collation of
the column.  

sqlite> create table t ("binary" char collate binary, "rtrim" char
collate rtrim, "nocase" char collate nocase); 
sqlite> insert into t values ('A', 'A   ', 'A'); 

sqlite> select * from t where "nocase" = 'a' and "rtrim" = 'A';
binary  rtrim   nocase
--  --  --
A   A   A 

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-28 Thread Konrad Hambrick
> -Original Message-
> From: sqlite-users-boun...@sqlite.org 
> [mailto:sqlite-users-boun...@sqlite.org] On Behalf
> Of James K. Lowden
> Sent: Tuesday, August 27, 2013 8:11 PM
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] BETWEEN and explicit collation assignment
> 
> On Mon, 26 Aug 2013 19:03:39 +
> Roman Fleysher <roman.fleys...@einstein.yu.edu> wrote:
> 
> > However, sometimes, in comparison we want to ignore some of the
> > attributes, or compare derived ones. Many busses can carry 25 people,
> > and may be considered equal if we simply need to transport people.
> > Busses certainly differ by other attributes.
> 
> Busses might indeed differ in many ways, but if you make NAME the
> primary key for BUSSES, the rule is not "compare BUSSES, ignoring
> columns other than NAME".  The rule is "compare BUSSES.NAME".
> 
> > it is the comparison ( "=", BETWEEN, IN , etc) statements that must
> > be modified
> 
> This not a syntax issue.  Equality is deeply embedded in the system, in
> many places where there's no SQL in play (e.g. keys).
> 
> It's a system of types and operators.  We can already convert between
> types and compare them.  If you can show some kind of comparison
> that *cannot* be done via type conversion using the operators exactly
> as they are, you might have a point.
> 

Roman --

Another issue is the fundamental but often confused difference between 
Object Oriented Programming and Relational DataBases ( AKA the Object
Relational Impedance mismatch ) 

One should not model a complex object like a Bus in a Relational DB as 
an amorphous Blob, it should instead be modeled as as a collection of 
orthogonal attributes in a set of atomic columns, each having a specific 
type in one-or-more tables.

Equality, or more generally, comparison of the column primitives is 
absolutely as James said, 'deeply embedded in the system itself'.

Be the CPU ...

To find the Bus that fits your needs, you would need to compare sets 
of primitive Attributes of interest to specific needs, invoking AND-OR 
clauses to combine those specific primitive comparisons.

For example:  to find the Name of all the short busses:

   -- one model of a bus might be:

   CREATE TABLE Bus
   (
  IdIntINTEGER PRIMARY KEY AUTOINCREMENT
, Name  varchar( 32 )  COLLATE NOCASE
, Color varchar( 16 )  COLLATE NOCASE
, Lengthnumeric( 5,3 )
, Capacity  int  
   ) 
   ;

   -- find the Name of each short bus:

   SELECT Name 
 FROM Bus
WHEREColor = 'yellow'
  AND (( Length < 20 ) 
  OR   ( Capacity >= 8 AND Capacity < 16 ))  
   ;
 
-- kjh


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-27 Thread Nico Williams
On Mon, Aug 26, 2013 at 2:03 PM, Roman Fleysher
 wrote:

Certainly associating a type conversion with a lexical instance of an
operator, applying to all of the operator's operands (or perhaps even
just to some, while having to explicitly cast others) would work, at
least for operations which can only be accessed via the given
syntactic operator.  But while some implicit type conversions are
useful and handy, I don't think *collations* can be implied, really,
at least not if there are no constant literals around (and we have no
syntax for associating collations with those!)...

Now, perhaps for most operators we can't easily pick one operand to
supply type information for implicit casting of the other(s), but for
BETWEEN you might argue that the type of the first operand should be
used to inform type conversion of the second and third operands.  That
would work, and it'd be easy to remember even.  But conceptually even
then the collation is best thought of as an attribute of types.  For
string values changing their collation is easy, thankfully: it's just
an attribute of the string.  And most systems don't keep track of
collations (or lots of other info) associated with specific strings --
this includes SQLite, IIRC.  In such systems one workaround for the
lack of compile- and/or run-time sting collation information... would
be to specify the collation syntactically.  So you have a point, but I
think the point really is that an RDBMS should keep track of column
and string collations so as to avoid having to require lexical clues
for collation.

Of course, lacking a syntax for associating collations with string
literals there will be times when some, or even all of the operands to
an operation that needs collation information, is missing.  In such
cases the collation information of some operands could be used to cast
the others, or lexical clues might be needed, or a default might be
provided.

So, in short, I think you and James are both right, and maybe we're
squaring a circle.  But still, notionally I'm with James on this:
collation is an attribute of type.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-27 Thread James K. Lowden
On Mon, 26 Aug 2013 19:03:39 +
Roman Fleysher  wrote:

> However, sometimes, in comparison we want to ignore some of the
> attributes, or compare derived ones. Many busses can carry 25 people,
> and may be considered equal if we simply need to transport people.
> Busses certainly differ by other attributes. 

Busses might indeed differ in many ways, but if you make NAME the
primary key for BUSSES, the rule is not "compare BUSSES, ignoring
columns other than NAME".  The rule is "compare BUSSES.NAME". 

> it is the comparison ( "=", BETWEEN, IN , etc) statements that must
> be modified

This not a syntax issue.  Equality is deeply embedded in the system, in
many places where there's no SQL in play (e.g. keys).  

It's a system of types and operators.  We can already convert between
types and compare them.  If you can show some kind of comparison
that *cannot* be done via type conversion using the operators exactly
as they are, you might have a point.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-26 Thread Roman Fleysher
> What is relevant is not determined by the type of the object, but by
> the task at hand, in other words by the comparator.

No.  There is only one meaning of "equal".   Two things are either
equal or not.  How to compare them is determined by only their type.


You are correct, James,  types are defined by the set of attributes or 
properties: things that have the same attributes are of the same type. Agree. 
You are also correct that two objects of the same type are equivalent when all 
of their attributes are identical. This may be called "byte equal". Agree too. 
However, sometimes, in comparison we want to ignore some of the attributes, or 
compare derived ones. Many busses can carry 25 people, and may be considered 
equal if we simply need to transport people. Busses certainly differ by other 
attributes. 

When we talk about error correction, if a word does not exist in the 
dictionary, we want to find a closest one --- different by one character, 
perhaps. These two words are called equal, although they are not byte equal. 
See for example 

http://en.wikipedia.org/wiki/Hamming_distance

Thus, equality or inequality is, in general, established with respect to a 
metric (a set of attributes and a rule to combine them). The list of these 
attributes and the rule must be known to the comparator. Thus different 
comparators differ by the set of attributes and rules that they apply. 
Therefore, one can speak of types of comparators, as these are their attributes.

Therefore, I agree with Simon: it is the comparison ( "=", BETWEEN, IN , etc) 
statements that must be modified with a collation, not their operands. 


Roman

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-25 Thread James K. Lowden
On Fri, 23 Aug 2013 17:08:59 +
Roman Fleysher  wrote:

> This leads to enormous growth in number of types. Another solution is
> to realize that the comparator (the thing that makes comparisons) is
> actually an object of its own

An operation is not an object. SQL has no objects.  The "enormous
growth in number of types" does not exist; at present in SQLite, after
a decade of development, that number is less than 10.  

> the rules of equality -- of any comparison -- are governed by the
> attribute(s) of the object relevant for comparison. 

Yes, because those attributes -- what I called "properties" -- define
the type.  

> What is relevant is not determined by the type of the object, but by
> the task at hand, in other words by the comparator.

No.  There is only one meaning of "equal".   Two things are either
equal or not.  How to compare them is determined by only their type. 

In mathematics, the concept of "equal" is so basic that it has its own
symbol. In computer science, we can think of it as a binary function:
two inputs producing a Boolean output.  Those inputs are the function's
only source of information.  There's no third parameter governing the
function's behavior.  

Strings are not unique in that regard.  Consider floats and integers.
How to compare

5.5 = 5

where the latter is an integer?  By implicit conversion: 

5.5 = 5.0 -- false

or explicit conversion: 

CAST(5.5 as INT) = 5 -- true

Not by modifying '='.  

The principle is the same with strings even though, as you point out,
they have potentially many properties.  Other DBMSs, for instance,
support cultural and linguistic properties for character data, thus
affecting whether e.g. 

'resume' = 'résumé'

is true.  In English the strings are equal because the accents are
mere decoration; in most other languages they are distinct because
they bear semantic information.  (It gets even trickier because two
columns may use different encodings.  People sometimes assume that
"equal" means "same bytes in same order", but it really means "same
letters in same order", where "letter" is defined culturally.)  

HTH.

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-23 Thread Roman Fleysher
You are correct, James, comparison depends on types. However, lets say we have 
suitcase and we want to test if it is bigger than the allowed limit, i.e. we 
have a biggest allowed suitcase to compare against. How do we answer: is this 
one bigger than the standard? Well, if this is a check-in bag, it has to be 
lighter than 50 lb, below a weight limit. If the suitcase is a carry-on it has 
to pass both weight limit and dimension limit. It is possible to cast the 
suitcase into carry-on subtype and conduct test there and to cast it to 
check-in type and test there too. But comparisons can be more complicated in 
general. If we have vectors, we may want to compare them by length (one 
sub-type of a vector), by one of its elements (one sub-type of vector per each 
element), by projection to a given vector (another sub-type), by angle that it 
makes with a vector (you guessed, another sub-type).

Thus, it is indeed possible to attach comparisons to types. This leads to 
enormous growth in number of types. Another solution is to realize that the 
comparator (the thing that makes comparisons) is actually an object of its own 
and it can be configured to apply different algorithms. Then, one does not need 
many  subtypes: one suitcase and one vector. This is, as I understand, the 
approach implemented in SQLite: algorithm is passed as an argument to the 
comparator along with the objects.

All to say something simple but not obvious: the rules of equality -- of any 
comparison -- are governed by the attribute(s) of the object relevant for 
comparison. What is relevant is not determined by the type of the object, but 
by the task at hand, in other words by the comparator.

Each column in a table is supposed to hold an attribute of an object (of the 
record -- the row). But strings, stored as attributes of the record, are 
objects of their own. Composite objects, composed of other objects -- 
characters. The characters  have upper and lower case, they can be printable 
and and non-printable, white spaces or not, bold and italic(?).  Therefore, 
comparing strings requires specification which of the attributes of those 
underlying objects are relevant. Collation rule is such a specification, an 
algorithm that tells comparator how to compare two strings -- objects of the 
same type. Type alone is not enough, if we want all types to fit human heads.


Roman


From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of James K. Lowden [jklow...@schemamania.org]
Sent: Thursday, August 22, 2013 9:18 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] BETWEEN and explicit collation assignment

On Thu, 22 Aug 2013 13:36:00 +0100
Simon Slavin <slav...@bigfraud.org> wrote:

> > 1.  where COLLATE( x AS NOCASE )
> > between COLLATE( 'a' AS NOCASE )
> > and COLLATE( 'b' AS NOCASE )
>
> Again, you are trying to apply a collation to a value.  And because
> BETWEEN takes three operands you are being given the opportunity to
> apply three collations for one condition.  Which is nonsense.  So the
> syntax shouldn't allow it or even suggest it would work.  You should
> be casting the 'BETWEEN', not the operands you're supplying to it.

There are two issues:

1.  Is collation a property of the value, its type, or the operation?
2.  Should the implied collation as currently implemented be retained,
or should it be replaced?

Before we can consider what syntax to use, we have to agree on what the
semantics are, on what is means to compare two things.  So let's
dispense with item #2 until we agree on #1.

Collation is not a property of the operation.  There is no other kind
of '=' than equality.  Equal is absolute; there is no modifier for it.

I understand your temptation.  I've used strcasecmp(3) a few times
myself.  But when you realize there's no such thing in SQLite or in the
relational model as an "operator modifier", you'll see the light.

Collation is not a property of the value.  Strings are just strings.
They have an *encoding*, necessarily, but not a collation.

Collation therefore must, by process of elimination, be a property of
the type.  And a good thing, too, because it's a property of the
column, and a column is a type.

Forget indexes; they're a feature of the implementation.  Think
logically, *semantically*: when are two things equal?  Two values in a
column either are or are not equal.  What determines what "equal"
*means*?  The type!

The question of whether

A = B

is true or false rests *not* on "how they are compared" but on *what*
they are.  The same is true for

'abc' = NAME

To make that obvious, let's take a little mathematical excursion.

Ask yourself how this is evaluated:

where 1 = '1'

SQL is logical (or so we like to think), and we're being asked to
compare a number to a string, an apple to an orange.  What to do?


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Nico Williams
On Aug 22, 2013 8:18 PM, "James K. Lowden"  wrote:
> Collation is not a property of the value.  Strings are just strings.
> They have an *encoding*, necessarily, but not a collation.
>
> Collation therefore must, by process of elimination, be a property of
> the type.  And a good thing, too, because it's a property of the
> column, and a column is a type.

+1.  That SQLite is duck-typed may help confuse this issue, but collation
has to be an aspect of a type or an operator, and your arguments are
convincing that it is better to associate collation with the type than with
the operators (otherwise setting a collation for an binary or trusty
operator implies more type casting, which is confusing).

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Roman Fleysher
I would not either. We can agree to agree. But clarity requires this. It is 
like "the" in English. 


From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Igor Tandetnik [i...@tandetnik.org]
Sent: Thursday, August 22, 2013 9:37 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] BETWEEN and explicit collation assignment

On 8/22/2013 9:01 PM, Roman Fleysher wrote:
>> create table t(x text collate nocase);
>> insert into t values ('A');
>> select count(*) from t where x = 'a';
>
>> In your opinion, what result should this select statement produce? 1, as
>> it does now? 0, as Simon's approach seems to imply? Something else?
>
> Since logically it is the operation that needs a modifier, then it should be 
> supplied and not derived from the operands. If I put aside backward 
> compatibility and take position of purity and clarity to the extreme (because 
> taking extreme positions clarifies thinking), then this should fail since no 
> rule to compare is fully specified. The "=" must be accompanied by a 
> modifier. Once the modifier is supplied, comparison can proceed and index can 
> be identified.
>
> Thus, I would use lengthy version for clarity:
>
> create table t(x text)
> create index t(x collate nocase)
> create index t(x collate someothercase)
> insert into t values ('A');
> select count(*) from t where x = 'a' nocase;

Well, I personally would not want to use a DBMS that requires me to do
that for every single comparison operator in every single query I use. I
suppose we would have to agree to disagree here.
--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Igor Tandetnik

On 8/22/2013 9:01 PM, Roman Fleysher wrote:

create table t(x text collate nocase);
insert into t values ('A');
select count(*) from t where x = 'a';



In your opinion, what result should this select statement produce? 1, as
it does now? 0, as Simon's approach seems to imply? Something else?


Since logically it is the operation that needs a modifier, then it should be supplied and 
not derived from the operands. If I put aside backward compatibility and take position of 
purity and clarity to the extreme (because taking extreme positions clarifies thinking), 
then this should fail since no rule to compare is fully specified. The "=" must 
be accompanied by a modifier. Once the modifier is supplied, comparison can proceed and 
index can be identified.

Thus, I would use lengthy version for clarity:

create table t(x text)
create index t(x collate nocase)
create index t(x collate someothercase)
insert into t values ('A');
select count(*) from t where x = 'a' nocase;


Well, I personally would not want to use a DBMS that requires me to do 
that for every single comparison operator in every single query I use. I 
suppose we would have to agree to disagree here.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread James K. Lowden
On Thu, 22 Aug 2013 13:36:00 +0100
Simon Slavin  wrote:

> > 1.  where COLLATE( x AS NOCASE )
> > between COLLATE( 'a' AS NOCASE )
> > and COLLATE( 'b' AS NOCASE )
> 
> Again, you are trying to apply a collation to a value.  And because
> BETWEEN takes three operands you are being given the opportunity to
> apply three collations for one condition.  Which is nonsense.  So the
> syntax shouldn't allow it or even suggest it would work.  You should
> be casting the 'BETWEEN', not the operands you're supplying to it.

There are two issues:

1.  Is collation a property of the value, its type, or the operation? 
2.  Should the implied collation as currently implemented be retained,
or should it be replaced?  

Before we can consider what syntax to use, we have to agree on what the
semantics are, on what is means to compare two things.  So let's
dispense with item #2 until we agree on #1.  

Collation is not a property of the operation.  There is no other kind
of '=' than equality.  Equal is absolute; there is no modifier for it.  

I understand your temptation.  I've used strcasecmp(3) a few times
myself.  But when you realize there's no such thing in SQLite or in the
relational model as an "operator modifier", you'll see the light.  

Collation is not a property of the value.  Strings are just strings.
They have an *encoding*, necessarily, but not a collation.  

Collation therefore must, by process of elimination, be a property of
the type.  And a good thing, too, because it's a property of the
column, and a column is a type.  

Forget indexes; they're a feature of the implementation.  Think
logically, *semantically*: when are two things equal?  Two values in a
column either are or are not equal.  What determines what "equal"
*means*?  The type!  

The question of whether 

A = B 

is true or false rests *not* on "how they are compared" but on *what*
they are.  The same is true for 

'abc' = NAME 

To make that obvious, let's take a little mathematical excursion.  

Ask yourself how this is evaluated:

where 1 = '1'

SQL is logical (or so we like to think), and we're being asked to
compare a number to a string, an apple to an orange.  What to do?  

There are two ways to treat that: report a type error, or perform a
type conversion.  Currently, as it happens, SQLite does something
rather surprising: 

sqlite> select 1 = '1', 1 * '1', 1 - '-1', 1 + '1', 1 / '1', 0 = '1';
1 = '1' 1 * '1' 1 - '-1'1 + '1' 1 / '1' 0 = '1'   
--  --  --  --  --  --
0   1   2   2   1   0 

As explained in http://www.sqlite.org/datatype3.html, "mathematical"
operators have different rules from "comparison" operators.
Mathematical binary operators demand two operands of the same type, and
the conversion rule is that they "cast both operands to the NUMERIC
storage class".  No such rule exists for logical operators; afaict the
comparison of strings to numbers is specified.  On the evidence, no
string is equal to any number, and no implicit conversion is defined.  

Note well: the operand didn't reach out and bang the operator on the
head to modify its behavior.  The operator converted the operand to a
comparable type.  Or not, as the case may be.  

All to say something simple but not obvious: the rules of equality --
of any comparision -- are governed by the type being compared.  

Now keep that in mind as you think about strings and their collations.
when we evaluate:

where 'abc' = NAME

we must know:

1.  The collation of NAME (not of the values in NAME)
2.  The collation of 'abc'
3.  The rules of conversion

Dr. Hipp told us, "If X does not have a specified collation, then the
collation of Y is used instead", which I would express as, "if only one
of two types has the default collation, it is converted to the other
type".  Collations aren't "used"; they're *honored*!  

By that rule, if NAME has the default collation, the two operands are
of the same type and can be compared.  Else, if NAME's type uses a
collation other than the default, 'abc' is converted to that type and
the comparison is performed.  

That is the easiest way -- and, by the grace of Codd, the right way --
to think about it.  Get your types in order, and then compare.  It
works for any comparison, and any type, and our friend equality remains
unbothered.  And unmodified.  

Regards, 

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Roman Fleysher
> create table t(x text collate nocase);
> insert into t values ('A');
> select count(*) from t where x = 'a';

> In your opinion, what result should this select statement produce? 1, as 
> it does now? 0, as Simon's approach seems to imply? Something else?

Since logically it is the operation that needs a modifier, then it should be 
supplied and not derived from the operands. If I put aside backward 
compatibility and take position of purity and clarity to the extreme (because 
taking extreme positions clarifies thinking), then this should fail since no 
rule to compare is fully specified. The "=" must be accompanied by a modifier. 
Once the modifier is supplied, comparison can proceed and index can be 
identified. 

Thus, I would use lengthy version for clarity:

create table t(x text)
create index t(x collate nocase)
create index t(x collate someothercase)
insert into t values ('A');
select count(*) from t where x = 'a' nocase;


Roman



From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Igor Tandetnik [i...@tandetnik.org]
Sent: Thursday, August 22, 2013 8:14 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] BETWEEN and explicit collation assignment

On 8/22/2013 6:59 PM, Roman Fleysher wrote:
> First, I do not expect any changes to SQLite to be made as a result of this 
> discussion.

You might not, but Simon seems to:

"It's too late to contrafit this into SQLite3, but I'm wondering whether
SQLite4 might have the collation operator rethought along these lines."

> You are right, the discussion is largely philosophical. (I do not imply 
> "philosophical" means "impractical", which are sometimes equated.) But the 
> questions you ask are also philosophical: for is there something that can be 
> done in C++ that can not be done in Fortran or Assembler?

There are lots of things that can be done much more quickly,
conveniently and reliably in C++ than in Fortran or Assembler. The
cost-benefit analysis has been performed, and the market has spoken.

But let's even set the costs aside: what are the benefits? In what way
is your proposed approach an improvement over the status quo? Sorry for
being dense here, but I honestly do not understand what advantages you
believe you are gaining. Could you explain it to me like I'm five?

> These are the questions, as I understand, Simon is asking. Can collate rule 
> be expressed in such a way that it is clear it modifies the operation not the 
> value?

This question would perhaps be interesting to explore if there were
widespread confusion on this topic. But I'm not aware of any evidence
that such confusion exists. I have followed this group for years (and so
did Simon): in my experience, the issue simply fails to arise in
practice. Or indeed in theory: I don't recall any prior discussion of
this topic, ever.

> Because COLLATE should be a modifier to comparisons, then COLLATE should be a 
> modifier to index. We agree here, I think. I do not know if it is possible to 
> build two indices on the same column using two different collate rules.

Yes it is possible. You can specify a COLLATE clause on a column in
CREATE INDEX statement. If you don't, then the collation assigned to
that column in CREATE TABLE, if any, is used. If neither place specifies
a collation, the default is BINARY.

> Assuming comparison should specify collate rule (implicitly or explicitly 
> without which comparison is impossible)...

That's precisely where the disagreement lies. Should comparison derive
its collation implicitly from its operands? I say yes, and that's
exactly the mechanism currently in place. Simon seems to say no, and
wants to introduce a different mechanism. What's your position?

Once again, a concrete example:

create table t(x text collate nocase);
insert into t values ('A');
select count(*) from t where x = 'a';

In your opinion, what result should this select statement produce? 1, as
it does now? 0, as Simon's approach seems to imply? Something else?

> Thus the collation should not be picked up from the column, it must be picked 
> up from the comparison.

Once again, see the example above. Are you willing to argue that 0 is
the correct result there? That would seem to be the logical conclusion
of your statement.

> It is possible to adopt column's collate rule, if it is unique, to be used by 
> default if no rule is explicitly specified, as you would like to see.

So, should the collation be picked up from the column, or should it not
be? You are making two contradictory statements right next to each
other. I'm confused.

> It is then clear that this adaptation is a shortcut which should raise FAIL 
> when columns with different collations are compared without explicit rule.

Yes, I've already conceded that the case where operands have conflicting
collations could be treated 

Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Igor Tandetnik

On 8/22/2013 8:41 PM, Simon Slavin wrote:


On 23 Aug 2013, at 1:14am, Igor Tandetnik  wrote:


Once again, a concrete example:

create table t(x text collate nocase);
insert into t values ('A');
select count(*) from t where x = 'a';

In your opinion, what result should this select statement produce? 1, as it 
does now? 0, as Simon's approach seems to imply? Something else?


You seem to thing that I had a problem with this.  I don't.  If you go back to 
my earlier post you will see that I was objecting to the way in which the 
COLLATE token was used in expressions.  The above does not use COLLATE in an 
expression and has nothing to do with my post.


I asked: "does a property of the column affect the behavior of the 
operator?" And you answered: "I see no reason for it to do that." This 
example demonstrates a reason for it to do that, does it not?

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Simon Slavin

On 23 Aug 2013, at 1:14am, Igor Tandetnik  wrote:

> Once again, a concrete example:
> 
> create table t(x text collate nocase);
> insert into t values ('A');
> select count(*) from t where x = 'a';
> 
> In your opinion, what result should this select statement produce? 1, as it 
> does now? 0, as Simon's approach seems to imply? Something else?

You seem to thing that I had a problem with this.  I don't.  If you go back to 
my earlier post you will see that I was objecting to the way in which the 
COLLATE token was used in expressions.  The above does not use COLLATE in an 
expression and has nothing to do with my post.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Igor Tandetnik

On 8/22/2013 6:59 PM, Roman Fleysher wrote:

First, I do not expect any changes to SQLite to be made as a result of this 
discussion.


You might not, but Simon seems to:

"It's too late to contrafit this into SQLite3, but I'm wondering whether 
SQLite4 might have the collation operator rethought along these lines."



You are right, the discussion is largely philosophical. (I do not imply "philosophical" 
means "impractical", which are sometimes equated.) But the questions you ask are also 
philosophical: for is there something that can be done in C++ that can not be done in Fortran or 
Assembler?


There are lots of things that can be done much more quickly, 
conveniently and reliably in C++ than in Fortran or Assembler. The 
cost-benefit analysis has been performed, and the market has spoken.


But let's even set the costs aside: what are the benefits? In what way 
is your proposed approach an improvement over the status quo? Sorry for 
being dense here, but I honestly do not understand what advantages you 
believe you are gaining. Could you explain it to me like I'm five?



These are the questions, as I understand, Simon is asking. Can collate rule be 
expressed in such a way that it is clear it modifies the operation not the 
value?


This question would perhaps be interesting to explore if there were 
widespread confusion on this topic. But I'm not aware of any evidence 
that such confusion exists. I have followed this group for years (and so 
did Simon): in my experience, the issue simply fails to arise in 
practice. Or indeed in theory: I don't recall any prior discussion of 
this topic, ever.



Because COLLATE should be a modifier to comparisons, then COLLATE should be a 
modifier to index. We agree here, I think. I do not know if it is possible to 
build two indices on the same column using two different collate rules.


Yes it is possible. You can specify a COLLATE clause on a column in 
CREATE INDEX statement. If you don't, then the collation assigned to 
that column in CREATE TABLE, if any, is used. If neither place specifies 
a collation, the default is BINARY.



Assuming comparison should specify collate rule (implicitly or explicitly 
without which comparison is impossible)...


That's precisely where the disagreement lies. Should comparison derive 
its collation implicitly from its operands? I say yes, and that's 
exactly the mechanism currently in place. Simon seems to say no, and 
wants to introduce a different mechanism. What's your position?


Once again, a concrete example:

create table t(x text collate nocase);
insert into t values ('A');
select count(*) from t where x = 'a';

In your opinion, what result should this select statement produce? 1, as 
it does now? 0, as Simon's approach seems to imply? Something else?



Thus the collation should not be picked up from the column, it must be picked 
up from the comparison.


Once again, see the example above. Are you willing to argue that 0 is 
the correct result there? That would seem to be the logical conclusion 
of your statement.



It is possible to adopt column's collate rule, if it is unique, to be used by 
default if no rule is explicitly specified, as you would like to see.


So, should the collation be picked up from the column, or should it not 
be? You are making two contradictory statements right next to each 
other. I'm confused.



It is then clear that this adaptation is a shortcut which should raise FAIL 
when columns with different collations are compared without explicit rule.


Yes, I've already conceded that the case where operands have conflicting 
collations could be treated as an error. But that doesn't requite new 
syntax or new machinery, just a minor tweak to existing rules. It 
doesn't need to raise FAIL at query execution time - all the information 
is available at prepare, it could be treated as a syntax error.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Roman Fleysher
Dear Igor,

First, I do not expect any changes to SQLite to be made as a result of this 
discussion. You are right, the discussion is largely philosophical. (I do not 
imply "philosophical" means "impractical", which are sometimes equated.) But 
the questions you ask are also philosophical: for is there something that can 
be done in C++ that can not be done in Fortran or Assembler? I am sure very 
little. Nevertheless, C++ came to exist. Thus these questions driven by 
practical considerations --- I understand --- must be replaced by some others. 
For example: can the desired action be expressed more clearly, elegantly, 
fool-proof, efficiently, etc in the language. I believe these sorts of 
questions led to development of C++, Java, etc not what could or could not be 
done in Fortran and Assembler. These are the questions, as I understand, Simon 
is asking. Can collate rule be expressed in such a way that it is clear it 
modifies the operation not the value? Perhaps, I should not speak for Simon, 
this is 
 my interpretation of his writing.

Because COLLATE should be a modifier to comparisons, then COLLATE should be a 
modifier to index. We agree here, I think. I do not know if it is possible to 
build two indices on the same column using two different collate rules. 
Assuming comparison should specify collate rule (implicitly or explicitly 
without which comparison is impossible) the same rule will identify the 
corresponding index. If available, this index would be used. Thus the collation 
should not be picked up from the column, it must be picked up from the 
comparison. It is possible to adopt column's collate rule, if it is unique, to 
be used by default if no rule is explicitly specified, as you would like to 
see. It is then clear that this adaptation is a shortcut which should raise 
FAIL when columns with different collations are compared without explicit rule. 

As a matter of taste, I prefer to tell exactly what I want to happen, rather 
than to have software figure out what I meant. I can not be sure it will figure 
what I wanted. Historically SQL statements were typed by operators, and typos 
were huge issue. Thus shortcuts were introduced to mitigate. This is in the 
past, I believe. We do not mind typing long names of variables in C++ code, we 
want the code to be self-documenting. For me having x=y and y=x comparisons 
produce different results is complex even though I know how to decipher. I do 
not mind typing an extra word -- a modifier for comparison -- to make clear 
what I want to happen. I do not think this increases complexity. Complexity is 
sometimes equated to the lack of clarity. In fact, it is the consequence. 


Roman


From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Igor Tandetnik [i...@tandetnik.org]
Sent: Thursday, August 22, 2013 5:26 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] BETWEEN and explicit collation assignment

On 8/22/2013 4:06 PM, Roman Fleysher wrote:
> Logically, I agree with Simon, collate modifier in table definition describes 
> how indices should be built, nothing more.

But for what purpose are indexes built? Isn't it for the purpose of
being used to speed up queries? Wouldn't you expect a query like

select count(*) from t where x = 'a';

to actually use an index on t(x), when available? And if you do, doesn't
that mean that the comparison (x = 'a') should use the same collation as
an index on t(x), that is, the collation associated with column x? And
if so, then you are right back to the idea that the collation used by
the comparison operator should, at least sometimes, be inferred from the
collation associated with its operands. Therefore, the COLLATE clause in
the table definition should apply not just to building indexes, but also
to the behavior of comparisons. QED.

Where is the flaw in this chain of reasoning, in your opinion?

> Comparisons, as in these examples should specify which collation is to be 
> used and if it is different from one used for indexing, then yes, Igor, index 
> can not be used. I believe this is the present behavior already.

Obviously, if you explicitly specify a collation for a comparison that
doesn't match that of an index, then the index cannot be used. The
question is, if you do *not* explicitly specify a collation for a
comparison, as in example above - would you expect an index to be used?
Personally, I would, and I rather like the fact that SQLite currently
does use an index in this case. Simon, on the other hand, appears to be
saying that he would rather the index not be used, that the comparison
be done using BINARY collation (though he never answered my direct
question about it).

> I do not think Simon asks to change the way collations work. He is asking to 
> make clear how they work. I think Simon is asking for clarity so that error 
> like "(x collate A) = (y collate 

Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Igor Tandetnik

On 8/22/2013 4:06 PM, Roman Fleysher wrote:

Logically, I agree with Simon, collate modifier in table definition describes 
how indices should be built, nothing more.


But for what purpose are indexes built? Isn't it for the purpose of 
being used to speed up queries? Wouldn't you expect a query like


select count(*) from t where x = 'a';

to actually use an index on t(x), when available? And if you do, doesn't 
that mean that the comparison (x = 'a') should use the same collation as 
an index on t(x), that is, the collation associated with column x? And 
if so, then you are right back to the idea that the collation used by 
the comparison operator should, at least sometimes, be inferred from the 
collation associated with its operands. Therefore, the COLLATE clause in 
the table definition should apply not just to building indexes, but also 
to the behavior of comparisons. QED.


Where is the flaw in this chain of reasoning, in your opinion?


Comparisons, as in these examples should specify which collation is to be used 
and if it is different from one used for indexing, then yes, Igor, index can 
not be used. I believe this is the present behavior already.


Obviously, if you explicitly specify a collation for a comparison that 
doesn't match that of an index, then the index cannot be used. The 
question is, if you do *not* explicitly specify a collation for a 
comparison, as in example above - would you expect an index to be used? 
Personally, I would, and I rather like the fact that SQLite currently 
does use an index in this case. Simon, on the other hand, appears to be 
saying that he would rather the index not be used, that the comparison 
be done using BINARY collation (though he never answered my direct 
question about it).



I do not think Simon asks to change the way collations work. He is asking to make clear 
how they work. I think Simon is asking for clarity so that error like "(x collate A) 
= (y collate B)" can not take place.


What to do with the case of (x = y) where x and y are two columns with 
different collations, then? How do you ensure that it cannot take place? 
And if you can't prevent it, then you need to do *something* in this 
case, and then you can do the exact same thing with "(x collate A) = (y 
collate B)" case - no need to invent new syntax or machinery to try and 
avoid it.



This is similar to JOIN operation, which requires two tables to have common 
column.


JOIN operation requires no such thing. I don't understand this analogy.


Using this analogy, I would write:

select * from t WHERE x = y USING NOCASE


How is this different from

select * from t WHERE x = y COLLATE NOCASE

I only see a superficial difference in the keyword used. What's the 
improvement?



I *do* understand what you and Simon are suggesting. I *do* understand 
that you want some kind of syntax that would hang a collation onto the 
operator, rather than its operands, for certain philosophical and/or 
aesthetic reasons. My point is, this would have to be done *in addition 
to*, not *instead of*, the existing mechanism (because of the whole 
"comparison must automatically pick the collation off the column in 
order to use the index" thing). So it's strictly an added complexity.


The burden is then on the authors of the proposal to demonstrate the 
benefits of the feature you are proposing, and to show that said 
benefits do in fact outweigh the cost of this added complexity. So I 
would ask you the same question I asked Simon: what exactly are the 
benefits of your approach? Is there something that can be done your way 
but can't be done the current way? Is there some bad outcome that's 
possible the current way but prevented your way? Do you just prefer your 
(as yet unspecified) syntax on purely aesthetic grounds?

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Roman Fleysher
Dear Igor,

I was reading with great interest your debate with Simon. I have come to 
respect your opinion by reading your comments. I am not a moderator, but trying 
to imply that many SQLite users are on your side reduced your image in my head. 
I left USSR (where such arguments were plenty) and would prefer to stay away 
from it. I block my eyes and ears so that I can not see/hear further arguments 
even if they are correct and even if I would agree with them if I heard them. 
This is my psychological defense.

On the issue. I find Simon's example

select * from t where x = y;
select * from t where y = x;

extremely disturbing. One has to remember implied rules to interpret this 
statement and/or to compose it properly. Spelling everything out is long, I 
agree, but makes things clear. However, his example is no different from yours 
because 

select count(*) from t where x = 'a';
select count(*) from t where 'a'=x;

works only thanks to 'a' having not collation attached and SQLite using one 
from x instead. This is exactly the point Simon is making: collations are 
"attached" to values, not to operations. From Simon's description, this 
"attachment" takes place on a higher level in SQLite since deep inside 
(comparisons) it is attached to the operator. Thus SQLite transfers collate 
rules from values to operations somewhere on the high level. This transfer is 
the issue that Simon points out.

Logically, I agree with Simon, collate modifier in table definition describes 
how indices should be built, nothing more. Comparisons, as in these examples 
should specify which collation is to be used and if it is different from one 
used for indexing, then yes, Igor, index can not be used. I believe this is the 
present behavior already.

I do not think Simon asks to change the way collations work. He is asking to 
make clear how they work. I think Simon is asking for clarity so that error 
like "(x collate A) = (y collate B)" can not take place. This is similar to 
JOIN operation, which requires two tables to have common column. Using this 
analogy, I would write:

select * from t WHERE x = y USING NOCASE


Roman

From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Igor Tandetnik [i...@tandetnik.org]
Sent: Thursday, August 22, 2013 3:53 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] BETWEEN and explicit collation assignment

On 8/22/2013 3:12 PM, Simon Slavin wrote:
>
> On 22 Aug 2013, at 8:04pm, Igor Tandetnik <i...@tandetnik.org> wrote:
>> [snip]
>
> I pretty much agree with everything you wrote there.  But it has nothing to 
> do with my original objection which was the explicit use of a COLLATE 
> operator inside an expression.

If you agree with everything I wrote, then I don't understand what
exactly you find wrong with the explicit use of a COLLATE operator
inside an expression. Is there something that can be done your way but
can't be done the current way? Is there some bad outcome that's possible
the current way but prevented your way? Do you just prefer your (as yet
unspecified) syntax on purely aesthetic grounds?

Let's assume, for the sake of argument, that the case of "(x collate A)
= (y collate B)" and similar instances of mismatched collations are made
an error (I've conceded that it may be a good idea, so there's no reason
to re-tread this ground).
--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Igor Tandetnik

On 8/22/2013 3:12 PM, Simon Slavin wrote:


On 22 Aug 2013, at 8:04pm, Igor Tandetnik  wrote:

[snip]


I pretty much agree with everything you wrote there.  But it has nothing to do 
with my original objection which was the explicit use of a COLLATE operator 
inside an expression.


If you agree with everything I wrote, then I don't understand what 
exactly you find wrong with the explicit use of a COLLATE operator 
inside an expression. Is there something that can be done your way but 
can't be done the current way? Is there some bad outcome that's possible 
the current way but prevented your way? Do you just prefer your (as yet 
unspecified) syntax on purely aesthetic grounds?


Let's assume, for the sake of argument, that the case of "(x collate A) 
= (y collate B)" and similar instances of mismatched collations are made 
an error (I've conceded that it may be a good idea, so there's no reason 
to re-tread this ground).

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Igor Tandetnik

On 8/22/2013 3:13 PM, Niall O'Reilly wrote:

But let's bring the thread back to the original problem.

What Simon Slavin seems (to me) to be pointing out is that the
counter-intuitive behaviour observed by the OP (Clemens Ladisch)
needs either to be corrected or explicitly documented; he also
seems to be trying to find a way to meet this need.


The behavior observed by Clemens looks like a bug to me. It should be 
fixed, so that the actual behavior matches the documentation. I have no 
quarrel with that.


But that's not what Simon argues:

"I feel that rather than notice a specific error in the implementation 
of BETWEEN [Clemens] has highlighted a conceptual error in SQLite."


In other words: let's not just fix this particular bug and go back to 
business as usual; let's instead completely change the way collations 
work in SQLite.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Clemens Ladisch
Niall O'Reilly wrote:
> What Simon Slavin seems (to me) to be pointing out is that the
> counter-intuitive behaviour observed by the OP (Clemens Ladisch)
> needs either to be corrected or explicitly documented

It worked correctly (i.e., as documented) before 3.7.15, so it's a bug.
Which is already fixed: .


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Niall O'Reilly
On Thu, 22 Aug 2013 13:15:19 -0400
Igor Tandetnik  wrote:

> On 8/22/2013 11:49 AM, Simon Slavin wrote:

> > Step 1 would be remove all ability to specify collation applying to a 
> > single value.
> 
> But it already applies, implicitly, to a single value that happens to be 
> a column name. It seems you want to preserve that, right?

Surely not!  

A column name is not a value, but a label for a set.

The collation associated with a column applies to each subset 
(of that set) whose cardinality is exactly two, and creates an
ordering on the set for which the column name is a label.

But let's bring the thread back to the original problem.

What Simon Slavin seems (to me) to be pointing out is that the 
counter-intuitive behaviour observed by the OP (Clemens Ladisch)
needs either to be corrected or explicitly documented; he also
seems to be trying to find a way to meet this need.

> Hi,
>
> the documentation says (on ):
>
> | The expression "x BETWEEN y and z" is logically equivalent to two
> | comparisons "x >= y AND x <= z" and works with respect to collating
> | functions as if it were two separate comparisons.
>
> However, this is not true when the first operator has an explicit
> collation assignment:
>
> SQLite version 3.7.17 2013-05-20 00:56:22
> Enter ".help" for instructions
> Enter SQL statements terminated with a ";"
> sqlite> create table t(x);
> sqlite> insert into t values('a');
> sqlite> insert into t values('A');
> sqlite> select * from t where x collate nocase between 'a' and 'b';
> a
> sqlite> select * from t where x collate nocase >= 'a' and x collate nocase <=
> 'b';
> a
> A
>
> It works only on the second and third operators:
>
> sqlite> select * from t where x between 'a' collate nocase and 'b' collate
> nocase;
> a
> A
>
> And adding it to the first operator breaks it again:
> sqlite> select * from t where x collate nocase between 'a' collate nocase and
> 'b' collate nocase;
> a 

[I would use "operand" where Clemens uses "operator".]

This behaviour is inconsistent with the documentation (Rule 1 of
section 6.1 of http://www.sqlite.org/datatype3.html#collation) because
(a) the first operand of the BETWEEN operator is precisely the left 
operand of each of the two comparisons to which the BETWEEN operator
is described as being equivalent, and (b) rule 1 just mentioned gives
precedence to the explicit collating function associated with the left
operand of a comparison.  It should not therefore be necessary to
declare a collation for the second and third operands of BETWEEN; one
should rather be able to rely on the collation declared for the first
one.  The observed behaviour indicates that precisely the opposite is 
true in practice.

AFAICS, either the code or the documentation is broken, and either one
needs to be corrected.


Best regards,
Niall O'Reilly

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Simon Slavin

On 22 Aug 2013, at 8:04pm, Igor Tandetnik  wrote:
> [snip]

I pretty much agree with everything you wrote there.  But it has nothing to do 
with my original objection which was the explicit use of a COLLATE operator 
inside an expression.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Igor Tandetnik

On 8/22/2013 2:10 PM, Simon Slavin wrote:

Sorry, hit 'send' too early.

On 22 Aug 2013, at 6:15pm, Igor Tandetnik  wrote:


But again, by what formal mechanism does a property of the column affect the 
behavior of the operator?


I see no reason for it to do that.


So to be clear:

create table t(x text collate nocase);
insert into t values ('A');
select count(*) from t where x = 'a';

Currently, this select statement returns 1. Is it your position that it 
should return 0 instead? Somehow I doubt you would find many SQLite 
users sympathetic to this position.



But my original problem, as stated, isn't with this.  It's just with the way 
that one can explicitly apply a correlation in an expression by using the 
COLLATE operator.


And my point is that, once you have collations on columns and have to 
deal with them anyway, it's not a problem to have collations on any 
other expression: you just deal with them the exact same way.


Now, if your position is that comparison operators shouldn't pay 
attention to collations associated with columns either, then the rest of 
your argument becomes internally consistent. But I don't think I would 
like to live in the world you envision. Consider my earlier example, 
slightly modified:


create table t(x text primary key collate nocase);
select count(*) from t where x = 'a';

There is an implicit index on t(x collate nocase), thanks to the PRIMARY 
KEY constraint. But the statement (in your interpretation) *cannot* use 
that index, since the comparison uses the default BINARY collation, and 
has to resort to full table scan instead. In order to take advantage of 
the index, one must write


select count(*) from t where x =NOCASE 'a';

(or whatever the syntax would be for hanging a collation onto the 
operator). I feel this is a) extremely inconvenient, and b) a severe 
violation of the principle of least astonishment.



About your example where two seemingly equivalent statements like

select * from t where x = y;
select * from t where y = x;

may produce different results - I've already stated a few times that I 
would be sympathetic to the argument that having two operands with 
different collations could be considered a syntax error. With that in 
place, both statements would be invalid, and the issue would fail to 
arise. This can be easily accomplished with a small tweak to existing 
rules, perhaps enabled with a pragma.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Simon Slavin
Sorry, hit 'send' too early.

On 22 Aug 2013, at 6:15pm, Igor Tandetnik  wrote:

> But again, by what formal mechanism does a property of the column affect the 
> behavior of the operator?

I see no reason for it to do that.  I can see why the property of the column 
might affect the behaviour of an index made on that column but there are 
problems if it affects comparisons made using values from that column.  For 
example, the meaning of something like

create table t(x text collate nocase,
y text collate rtrim);
select * from t where x = y;

is not at all clear purely from reading the SQL code.  You have to have 
actually read the picky details.  And the consequences of the picky details are 
that under SQLite3, these two queries:

select * from t where x = y;
select * from t where y = x;

can return different results.  Which I find annoying.

But my original problem, as stated, isn't with this.  It's just with the way 
that one can explicitly apply a correlation in an expression by using the 
COLLATE operator.  I feel that it doesn't reflect the real-world operation of 
correlations by SQLite.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Simon Slavin

On 22 Aug 2013, at 6:15pm, Igor Tandetnik  wrote:

> But again, by what formal mechanism does a property of the column affect the 
> behavior of the operator?

I see no reason for it to do that.  I can see why the property of the column 
might affect the behaviour of an index made on that column but there are 
problems if it affects comparisons made using values from that column.  For 
example, the meaning of something like

create table t(x text collate nocase,
y text collate rtrim);
select * from t where x = y;

is not at all clear purely from reading the SQL code.  You have to have 
actually read the picky details.  And the consequences of the picky details are 
that under SQLite3, these two queries:

select * from t where x = y;
select * from t where y = x;

can return different results.  Which I find annoying.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Igor Tandetnik

On 8/22/2013 11:49 AM, Simon Slavin wrote:


On 22 Aug 2013, at 2:36pm, Igor Tandetnik  wrote:


On 8/22/2013 8:52 AM, Simon Slavin wrote:

Nevertheless do you understand the point I'm trying to make -- that collations 
are a modifier for comparisons not individual values ?


I do understand your point. I still don't understand how it's supposed to apply 
in practice to a situation like

create table t(x text collate nocase);
select * from t where x = 'a';

Here, at the time I specify "collate nocase", there is no comparison it could 
apply to. So what is it a property of, in your thinking?


COLLATE is a property of that column


So sometimes it's a property of a column, and other times it's a 
property of a comparison operator? I predict you are going to have a 
hard time describing this notion in a formal spec.



Your second line doesn't specify a COLLATE operator in its expression, so I 
have no problem with it.


But again, by what formal mechanism does a property of the column affect 
the behavior of the operator? And if you are OK with it doing that, then 
what's wrong with the existing model, which is based on this very 
approach (collation is a property of an expression; behavior of an 
operator depends on collations associated with its operands)?


Let's put it this way: why should there be a fundamental difference 
between an expression (x) where x is a column declared with COLLATE 
NOCASE clause, and an expression ('a' COLLATE NOCASE)? What purpose 
would such a distinction serve? Why exactly are you OK with the former, 
but not the latter?



By what mechanism does it end up applying to x='a' comparison (I assume to do 
want the statement to return rows both with 'a' and 'A' in column x)? How would 
you modify the formal spec at http://sqlite.org/datatype3.html to lead to your 
desired outcome? Precise wording matters.


I don't understand why binary comparison operators are on that page at all.  
They aren't used directly as column definitions, only as parts of expressions, 
and expressions are defined on another page.  If you remove mention of 
comparison operators from that page, the rest of that page is fine.

Precise wording as you requested ?  Remove all of section 6.1 apart from the 
last paragraph.


But again, I assume you do want the expression (x='a') to sometimes 
evaluate to true when column x contains the value 'A', and other times 
evaluate to false. How would this happen? It doesn't matter which page 
describes the behavior: if it's not 
http://www.sqlite.org/datatype3.html, then I imagine it would be 
somewhere on http://www.sqlite.org/lang_expr.html. What should it say?



As I said, I could relate to your point of view better if you just did this: 
inhttp://sqlite.org/datatype3.html section 6.1, replaced two occurrences of "with precedence 
to the left operand" with "It's an error if two operands have different collations". 
This keeps the existing, well defined mechanisms intact, while neatly excluding the case you seem 
to find most objectionable.


As I wrote originally, my problem is not with the use of COLLATE in column 
definitions, it's with its use in expressions.  So my problem in documentation 
of SQLite doesn't come in datatype3, it comes in



Step 1 would be remove all ability to specify collation applying to a single 
value.


But it already applies, implicitly, to a single value that happens to be 
a column name. It seems you want to preserve that, right?



 This is just two short paragraphs on that page (search for the word 'collate').


One of those paragraphs says: "See the detailed discussion on collating 
sequences in the Datatype In SQLite3 document for additional 
information." You claim that you want to excise that very discussion 
from "datatypes" article - doesn't that mean that you would have to move 
it here, rather than incorporating it by reference? The behavior must be 
described *somewhere*.



This would remove all trace of the use of COLLATE I have a problem with, i.e. 
where it can be used in such a way as to be applied to a single value and not a 
comparison.


However, a collation would still apply to a single value that happens to 
be a column name, and comparison operators where such a value is an 
operand would have to take that into account somehow, wouldn't they? You 
are not saying that (x='a') should always use BINARY collation 
regardless of how column x was declared, are you? And if you define some 
mechanism by which the comparison takes the collation of x into account, 
then I don't understand why it's fundamentally awful and wrong to extend 
that same mechanism to the expression (('A' collate nocase) = 'a').


You instead suggest the invention of a separate mechanism whereby the 
collation is assigned directly to the operator rather than inferred from 
its operands; but you also need to keep the original mechanism around. 
So now you have to specify two mechanisms, 

Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Igor Tandetnik

On 8/22/2013 8:52 AM, Simon Slavin wrote:

Nevertheless do you understand the point I'm trying to make -- that collations 
are a modifier for comparisons not individual values ?


I do understand your point. I still don't understand how it's supposed 
to apply in practice to a situation like


create table t(x text collate nocase);
select * from t where x = 'a';

Here, at the time I specify "collate nocase", there is no comparison it 
could apply to. So what is it a property of, in your thinking? By what 
mechanism does it end up applying to x='a' comparison (I assume to do 
want the statement to return rows both with 'a' and 'A' in column x)? 
How would you modify the formal spec at http://sqlite.org/datatype3.html 
to lead to your desired outcome? Precise wording matters.


As I said, I could relate to your point of view better if you just did 
this: in http://sqlite.org/datatype3.html section 6.1, replaced two 
occurrences of "with precedence to the left operand" with "It's an error 
if two operands have different collations". This keeps the existing, 
well defined mechanisms intact, while neatly excluding the case you seem 
to find most objectionable.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Richard Hipp
On Thu, Aug 22, 2013 at 8:52 AM, Simon Slavin  wrote:

>
> Nevertheless do you understand the point I'm trying to make -- that
> collations are a modifier for comparisons not individual values ?
>
> The collation is a property of the value.  It is part of the datatype of
the value.

When you do X==Y and X and Y have different collations, SQLite chooses the
collation on the left to use for the comparison.  (PostgreSQL, being more
rigid in its typing rules, throws an error, if I'm not mistaken.)  If X
does not have a specified collation, then the collation of Y is used
instead.  If neither X nor Y have collations, then the default collation
(BINARY) is used.


-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Simon Slavin

On 22 Aug 2013, at 1:47pm, Igor Tandetnik  wrote:

> You can, actually: hulkify(x) := (case when x = 'the hulk' then 'B' else 'A' 
> end) . With such a function, "a = b collate muchStronger" is equivalent to 
> "hulkify(a)  = hulkify(b) collate BINARY". In other words, hulkify(x) is to 
> muchStronger what upper(x) is to NOCASE.

The Well Ordering Principle states that you can always devise such a function, 
given the requirements for SQLite collations.

Nevertheless do you understand the point I'm trying to make -- that collations 
are a modifier for comparisons not individual values ?

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Igor Tandetnik

On 8/22/2013 8:36 AM, Simon Slavin wrote:

Collation 'muchStronger':
If a = b return 0, else ...
If b is 'the hulk' return -1, else ...
If a is 'the hulk' return 1, else ...
return 0

(translation: people are roughly as strong as one-another, except for The Hulk 
who is stronger than everyone except himself)

This is a legitimate collation under SQLite rules, but you cannot apply it 
without knowing both of the values.


You can, actually: hulkify(x) := (case when x = 'the hulk' then 'B' else 
'A' end) . With such a function, "a = b collate muchStronger" is 
equivalent to "hulkify(a)  = hulkify(b) collate BINARY". In other words, 
hulkify(x) is to muchStronger what upper(x) is to NOCASE.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Simon Slavin

On 22 Aug 2013, at 2:55am, James K. Lowden  wrote:

> Simon Slavin  wrote:
> 
>> My problem is not with COLLATE in general.  It's with expressions.
>> Table definitions are fine.  Index definitions are fine.  It's purely
>> that my understanding of the documention says that something like this
>> 
>> "ABC" COLLATE COL1 = "3F" COLLATE COL2
> 
> (I don't understand what that syntax is supposed to mean.

Good.  Because it's nonsense.  What it implies is that COLLAGE is a unary 
postfix operator that assigns a collating sequence to an expression.  In other 
words that "ABC" is an expression, and you can assign a collation to it, and 
that "3F" is another expression and you can assign a different collation to 
that, and that having done so that testing to see whether they equal one 
another makes some kind of sense.  Which it obviously doesn't.

Yet the above syntax is what the documentation for 'expr' says should be 
allowed.  Simpler still, it implies that

"ABC" COLLATE COL1

means something.  It doesn't.  Collation in SQLite makes sense only when you 
compare two things.  That's how the function works: you have to feed it two 
values, you can't call it with just one.

> AFAIK, the
> only operands to COLLATE are BINARY, NOCASE, and RTRIM.) 
> 
>> should be allowed, and I don't agree.
> 
> You're quite right to say that collation is property of the comparison,
> not the data.  It's obvious from very fact that the same data can be
> compared in different ways.  
> 
> A column's collation in CREATE TABLE is a property of the column, not
> of the data.  That is, it's a property of the data *type*, the domain
> to which the values in the column belong.  It informs the system how to
> compare two values in the *same* column.  
> 
> The question arises: how to compare two columns with different
> collations?  Because they're drawn from different domains -- per the
> column defintions -- they have different types.  To compare two
> different types requires conversion.  That conversion may be implicit
> or explicit.  (And it many fail; some types are incommensuate.)

I feel that this is even more proof that collation type is not a property of a 
value.  Trying to figure out the consequences of a value having inherent 
collation leads to the sort of problem you describe. 

> The COLLATE operator is the sole postfix operator in SQLite's SQL, and
> weird because it appears late but binds early.  I would suggest
> instead it mimic or become part of CAST, or become its own operator.
> For example:
> 
> 1.where COLLATE( x AS NOCASE )
>   between COLLATE( 'a' AS NOCASE )
>   and COLLATE( 'b' AS NOCASE )

Again, you are trying to apply a collation to a value.  And because BETWEEN 
takes three operands you are being given the opportunity to apply three 
collations for one condition.  Which is nonsense.  So the syntax shouldn't 
allow it or even suggest it would work.  You should be casting the 'BETWEEN', 
not the operands you're supplying to it.

One problem is that your example uses a collation which can be neatly expressed 
as a unary function call.  This is a legitimate collation but it's too simple 
to force you to think the problem through.  Consider instead the following 
collation:

Collation 'muchStronger':
If a = b return 0, else ...
If b is 'the hulk' return -1, else ...
If a is 'the hulk' return 1, else ...
return 0

(translation: people are roughly as strong as one-another, except for The Hulk 
who is stronger than everyone except himself)

This is a legitimate collation under SQLite rules, but you cannot apply it 
without knowing both of the values.  It is a better example than NOCASE for 
thinking about the problem as long as you don't turn it into a unary function.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-21 Thread James K. Lowden
On Wed, 21 Aug 2013 20:26:30 +0100
Simon Slavin  wrote:

> My problem is not with COLLATE in general.  It's with expressions.
> Table definitions are fine.  Index definitions are fine.  It's purely
> that my understanding of the documention says that something like this
> 
> "ABC" COLLATE COL1 = "3F" COLLATE COL2

(I don't understand what that syntax is supposed to mean.  AFAIK, the
only operands to COLLATE are BINARY, NOCASE, and RTRIM.) 

> should be allowed, and I don't agree.

You're quite right to say that collation is property of the comparison,
not the data.  It's obvious from very fact that the same data can be
compared in different ways.  

A column's collation in CREATE TABLE is a property of the column, not
of the data.  That is, it's a property of the data *type*, the domain
to which the values in the column belong.  It informs the system how to
compare two values in the *same* column.  

The question arises: how to compare two columns with different
collations?  Because they're drawn from different domains -- per the
column defintions -- they have different types.  To compare two
different types requires conversion.  That conversion may be implicit
or explicit.  (And it many fail; some types are incommensuate.)  

The question then becomes whether to require explicit conversion and,
if so, what syntax would be clearest.  I think you'd say, and I agree,
that implicit conversion isn't attractive because any choice would be
idiosyncratic, and such choices are traps for the unwary.  

Regarding syntax, the current situation is less than idea.  I tried the
OP's query and got different results:

SQLite version 3.7.13 2012-06-11 02:05:22
...
sqlite> create table t(x);
sqlite> insert into t values('a');
sqlite> insert into t values('A');
sqlite> select * from t where x collate nocase between 'a' and 'b';
x 
--
a 
A 

Just for giggles, I tried another variation: 

sqlite> select * from t where x between 'a' and 'b' collate nocase;
x 
--
a 

The COLLATE operator is the sole postfix operator in SQLite's SQL, and
weird because it appears late but binds early.  I would suggest
instead it mimic or become part of CAST, or become its own operator.
For example:

1.  where COLLATE( x AS NOCASE )
between COLLATE( 'a' AS NOCASE )
and COLLATE( 'b' AS NOCASE )

Making it part of CAST emphasizes the fact that we're dealing in types:

2.  where CAST( x as COLLATE NOCASE ) 
between CASE( 'a' AS COLLATE NOCASE )
and CASE( 'b' AS COLLATE NOCASE )

But isn't this clearer in any case (no pun intended)? 

3.  where upper(x) between 'A' and 'B'

which suggests that each collation could become an operator in its own
right:

4.  where NOCASE(x) between NOCASE('a') and NOCASE('b')

because it allows for future expansion as more collations are added. 

--jkl


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-21 Thread Igor Tandetnik

On 8/21/2013 3:26 PM, Simon Slavin wrote:

My problem is not with COLLATE in general.  It's with expressions.  Table 
definitions are fine.  Index definitions are fine.  It's purely that my 
understanding of the documention says that something like this

"ABC" COLLATE COL1 = "3F" COLLATE COL2

should be allowed, and I don't agree.


Should you be able to write

create table t1(x1 collate col1);
create table t2(x2 collate col2);
select * from t1, t2 where x1 = x2;

If that is to be allowed, then SQLite would already need a mechanism to 
deal with your case, and then there would be no reason to explicitly 
prohibit it - just an added complexity.


Now, perhaps what you are trying to say is that it should be a syntax 
error to apply a comparison to two operands with different collations. I 
can see how such a rule can be weaved into the existing behavior in a 
consistent way. I don't think it would fly, if only for reasons of 
backward compatibility and compatibility with other database systems, 
but at least this argument is defensible.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-21 Thread Simon Slavin

On 21 Aug 2013, at 8:22pm, Igor Tandetnik  wrote:

> On 8/21/2013 2:55 PM, Simon Slavin wrote:
>> 
>> On 21 Aug 2013, at 5:02pm, Igor Tandetnik  wrote:
>> 
>>> I imagine you'd still want to be able to put COLLATE clause on the column 
>>> definition, as in "create table t (x collate NOCASE);". How is this 
>>> supposed to work in your hypothetical new world?
>> 
>> It works the same as it does now, I think.
> 
> Precisely. So why would you want to invent a completely different mechanism, 
> when you already have one and have to maintain it anyway?

My problem is not with COLLATE in general.  It's with expressions.  Table 
definitions are fine.  Index definitions are fine.  It's purely that my 
understanding of the documention says that something like this

"ABC" COLLATE COL1 = "3F" COLLATE COL2

should be allowed, and I don't agree.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-21 Thread Igor Tandetnik

On 8/21/2013 2:55 PM, Simon Slavin wrote:


On 21 Aug 2013, at 5:02pm, Igor Tandetnik  wrote:


I imagine you'd still want to be able to put COLLATE clause on the column definition, as 
in "create table t (x collate NOCASE);". How is this supposed to work in your 
hypothetical new world?


It works the same as it does now, I think.


Precisely. So why would you want to invent a completely different 
mechanism, when you already have one and have to maintain it anyway?



 The COLLATE clause in that position isn't used for storing the value itself.  
It's used later on for when two values in that column are compared with 
one-another.


... or with values in some other column of some other table, or indeed 
with arbitrary expressions.



Collation is a property of the value, similar to type and affinity - it must 
be, to allow this kind of annotation. Along with other properties, collation 
then affects the behavior of operators acting on the value.


I'm going to let the SQLite experts argue over that one.  I don't think a 
COLLATE can be a property of a value.  If it was you could compare two 
different values with different collations.  Perhaps it's a property of an 
index component.


I was imprecise. Collation is a property of an expression. It can be 
determined, for every expression, at statement prepare time, from the 
syntax of the statement and the database schema. In that, it's similar 
to affinity, but not to type (which is in fact a property of the value: 
the same expression may evaluate to values of different types at 
different points in the statement's execution).

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-21 Thread Simon Slavin

On 21 Aug 2013, at 5:02pm, Igor Tandetnik  wrote:

> I imagine you'd still want to be able to put COLLATE clause on the column 
> definition, as in "create table t (x collate NOCASE);". How is this supposed 
> to work in your hypothetical new world?

It works the same as it does now, I think.  The COLLATE clause in that position 
isn't used for storing the value itself.  It's used later on for when two 
values in that column are compared with one-another.  For example, as the 
default COLLATE for an index defined on that column.  Which would still work 
the same way.

> Collation is a property of the value, similar to type and affinity - it must 
> be, to allow this kind of annotation. Along with other properties, collation 
> then affects the behavior of operators acting on the value.

I'm going to let the SQLite experts argue over that one.  I don't think a 
COLLATE can be a property of a value.  If it was you could compare two 
different values with different collations.  Perhaps it's a property of an 
index component.


On 21 Aug 2013, at 5:11pm, Dan Kennedy  wrote:

> ORDER BY and GROUP BY clauses to consider as well. You want to do:
> 
>  SELECT ... FROM ... ORDER BY  COLLATE nocase;

Yes, those would both make sense.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-21 Thread Dan Kennedy

On 08/21/2013 11:02 PM, Igor Tandetnik wrote:

On 8/21/2013 11:17 AM, Simon Slavin wrote:
Given that this level of implementation is correct I feel that any 
error in SQLite's handling of COLLATE is closer to the expression 
parsing level than the low level implementation.  The low level 
implementation is fine.  The problem would appear to be in




which states

"The COLLATE operator is a unary postfix operator that assigns a 
collating sequence to an expression."


This is bad.  It says that COLLATE modifies a value.  It shouldn't.  
COLLATE is not a value operator, it's an operator operator.  It 
modifies the '>', not a value one side of it.


I imagine you'd still want to be able to put COLLATE clause on the 
column definition, as in "create table t (x collate NOCASE);". How is 
this supposed to work in your hypothetical new world?


Collation is a property of the value, similar to type and affinity - 
it must be, to allow this kind of annotation. Along with other 
properties, collation then affects the behavior of operators acting on 
the value.


ORDER BY and GROUP BY clauses to consider as well. You want to do:

  SELECT ... FROM ... ORDER BY  COLLATE nocase;

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-21 Thread Igor Tandetnik

On 8/21/2013 11:17 AM, Simon Slavin wrote:

Given that this level of implementation is correct I feel that any error in 
SQLite's handling of COLLATE is closer to the expression parsing level than the 
low level implementation.  The low level implementation is fine.  The problem 
would appear to be in



which states

"The COLLATE operator is a unary postfix operator that assigns a collating sequence 
to an expression."

This is bad.  It says that COLLATE modifies a value.  It shouldn't.  COLLATE is 
not a value operator, it's an operator operator.  It modifies the '>', not a 
value one side of it.


I imagine you'd still want to be able to put COLLATE clause on the 
column definition, as in "create table t (x collate NOCASE);". How is 
this supposed to work in your hypothetical new world?


Collation is a property of the value, similar to type and affinity - it 
must be, to allow this kind of annotation. Along with other properties, 
collation then affects the behavior of operators acting on the value.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-21 Thread Simon Slavin
Problem
---

I feel that Clemens has brought up an important point, but I feel that rather 
than notice a specific error in the implementation of BETWEEN he has 
highlighted a conceptual error in SQLite.  Consider this comparison:

"Albert" > "albert"

It would be inappropriate to write something like

"Albert" COLLATE MYCOL1 > "albert" COLLATE MYCOL2

which means

("Albert" COLLATE MYCOL1) > ("albert" COLLATE MYCOL2)

Such a comparison, where each value is collated using a different function, is 
absurd.  The "COLLATE" mechanism is about the task of comparison, not the 
values.  It doesn't convert a value into another value, it tells SQLite how to 
compare two values.  In fact rather than the lines above the real way to 
express COLLATE would be

"Albert" > COLLATE MYCOL3 "albert"

which means something like

"Albert" (> COLLATE MYCOL3) "albert"

though you might prefer to express it

("Albert" > "303" COLLATE MYCOL3)

or perhaps

("Albert" > "303") COLLATE MYCOL3

The COLLATE clause affects the comparison, not the values.

So to extend this to the problem Clemens observed ...

x collate nocase between 'a' and 'b'

really means something like

x (between COLLATE NOCASE) 'a' and 'b'

though you might prefer to express it

(x between 'a' and 'b' COLLATE NOCASE)

or perhaps

(x between 'a' and 'b') COLLATE NOCASE

in all phrasings, the COLLATE clause applies to the comparison between x and 
'a', and to the comparison between x and 'b'.

Remedy
--

Looking at how COLLATE is implemented in SQLite



the key is the callback function.  And that works correctly: you pass it the 
two values, and it returns the result: negative, zero or positive.  And you 
pass it both values, you can't collate one value one way and another the other 
way.  Anything it might do to the values to arrive at the result is purely an 
internal matter.  This is the way I feel it should work.

Given that this level of implementation is correct I feel that any error in 
SQLite's handling of COLLATE is closer to the expression parsing level than the 
low level implementation.  The low level implementation is fine.  The problem 
would appear to be in



which states

"The COLLATE operator is a unary postfix operator that assigns a collating 
sequence to an expression."

This is bad.  It says that COLLATE modifies a value.  It shouldn't.  COLLATE is 
not a value operator, it's an operator operator.  It modifies the '>', not a 
value one side of it.

It's too late to contrafit this into SQLite3, but I'm wondering whether SQLite4 
might have the collation operator rethought along these lines.  I bet it 
results in a simpler parse tree and simpler code.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] BETWEEN and explicit collation assignment

2013-08-21 Thread Clemens Ladisch
Hi,

the documentation says (on ):
| The expression "x BETWEEN y and z" is logically equivalent to two
| comparisons "x >= y AND x <= z" and works with respect to collating
| functions as if it were two separate comparisons.

However, this is not true when the first operator has an explicit
collation assignment:

  SQLite version 3.7.17 2013-05-20 00:56:22
  Enter ".help" for instructions
  Enter SQL statements terminated with a ";"
  sqlite> create table t(x);
  sqlite> insert into t values('a');
  sqlite> insert into t values('A');
  sqlite> select * from t where x collate nocase between 'a' and 'b';
  a
  sqlite> select * from t where x collate nocase >= 'a' and x collate nocase <= 
'b';
  a
  A

It works only on the second and third operators:

  sqlite> select * from t where x between 'a' collate nocase and 'b' collate 
nocase;
  a
  A

And adding it to the first operator breaks it again:

  sqlite> select * from t where x collate nocase between 'a' collate nocase and 
'b' collate nocase;
  a


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users