subject:"\[sqlite\] BETWEEN and explicit collation assignment"

I would not either. We can agree to agree. But clarity requires this. It is 
like "the" in English. 


From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Igor Tandetnik [i...@tandetnik.org]
Sent: Thursday, August 22, 2013 9:37 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] BETWEEN and explicit collation assignment

On 8/22/2013 9:01 PM, Roman Fleysher wrote:
>> create table t(x text collate nocase);
>> insert into t values ('A');
>> select count(*) from t where x = 'a';
>
>> In your opinion, what result should this select statement produce? 1, as
>> it does now? 0, as Simon's approach seems to imply? Something else?
>
> Since logically it is the operation that needs a modifier, then it should be 
> supplied and not derived from the operands. If I put aside backward 
> compatibility and take position of purity and clarity to the extreme (because 
> taking extreme positions clarifies thinking), then this should fail since no 
> rule to compare is fully specified. The "=" must be accompanied by a 
> modifier. Once the modifier is supplied, comparison can proceed and index can 
> be identified.
>
> Thus, I would use lengthy version for clarity:
>
> create table t(x text)
> create index t(x collate nocase)
> create index t(x collate someothercase)
> insert into t values ('A');
> select count(*) from t where x = 'a' nocase;

Well, I personally would not want to use a DBMS that requires me to do
that for every single comparison operator in every single query I use. I
suppose we would have to agree to disagree here.
--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] BETWEEN and explicit collation assignment


On 8/22/2013 9:01 PM, Roman Fleysher wrote:

create table t(x text collate nocase);
insert into t values ('A');
select count(*) from t where x = 'a';



In your opinion, what result should this select statement produce? 1, as
it does now? 0, as Simon's approach seems to imply? Something else?


Since logically it is the operation that needs a modifier, then it should be supplied and 
not derived from the operands. If I put aside backward compatibility and take position of 
purity and clarity to the extreme (because taking extreme positions clarifies thinking), 
then this should fail since no rule to compare is fully specified. The "=" must 
be accompanied by a modifier. Once the modifier is supplied, comparison can proceed and 
index can be identified.

Thus, I would use lengthy version for clarity:

create table t(x text)
create index t(x collate nocase)
create index t(x collate someothercase)
insert into t values ('A');
select count(*) from t where x = 'a' nocase;


Well, I personally would not want to use a DBMS that requires me to do 
that for every single comparison operator in every single query I use. I 
suppose we would have to agree to disagree here.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread James K. Lowden

On Thu, 22 Aug 2013 13:36:00 +0100
Simon Slavin  wrote:

> > 1.  where COLLATE( x AS NOCASE )
> > between COLLATE( 'a' AS NOCASE )
> > and COLLATE( 'b' AS NOCASE )
> 
> Again, you are trying to apply a collation to a value.  And because
> BETWEEN takes three operands you are being given the opportunity to
> apply three collations for one condition.  Which is nonsense.  So the
> syntax shouldn't allow it or even suggest it would work.  You should
> be casting the 'BETWEEN', not the operands you're supplying to it.

There are two issues:

1.  Is collation a property of the value, its type, or the operation? 
2.  Should the implied collation as currently implemented be retained,
or should it be replaced?  

Before we can consider what syntax to use, we have to agree on what the
semantics are, on what is means to compare two things.  So let's
dispense with item #2 until we agree on #1.  

Collation is not a property of the operation.  There is no other kind
of '=' than equality.  Equal is absolute; there is no modifier for it.  

I understand your temptation.  I've used strcasecmp(3) a few times
myself.  But when you realize there's no such thing in SQLite or in the
relational model as an "operator modifier", you'll see the light.  

Collation is not a property of the value.  Strings are just strings.
They have an *encoding*, necessarily, but not a collation.  

Collation therefore must, by process of elimination, be a property of
the type.  And a good thing, too, because it's a property of the
column, and a column is a type.  

Forget indexes; they're a feature of the implementation.  Think
logically, *semantically*: when are two things equal?  Two values in a
column either are or are not equal.  What determines what "equal"
*means*?  The type!  

The question of whether 

A = B 

is true or false rests *not* on "how they are compared" but on *what*
they are.  The same is true for 

'abc' = NAME 

To make that obvious, let's take a little mathematical excursion.  

Ask yourself how this is evaluated:

where 1 = '1'

SQL is logical (or so we like to think), and we're being asked to
compare a number to a string, an apple to an orange.  What to do?  

There are two ways to treat that: report a type error, or perform a
type conversion.  Currently, as it happens, SQLite does something
rather surprising: 

sqlite> select 1 = '1', 1 * '1', 1 - '-1', 1 + '1', 1 / '1', 0 = '1';
1 = '1' 1 * '1' 1 - '-1'1 + '1' 1 / '1' 0 = '1'   
--  --  --  --  --  --
0   1   2   2   1   0 

As explained in http://www.sqlite.org/datatype3.html, "mathematical"
operators have different rules from "comparison" operators.
Mathematical binary operators demand two operands of the same type, and
the conversion rule is that they "cast both operands to the NUMERIC
storage class".  No such rule exists for logical operators; afaict the
comparison of strings to numbers is specified.  On the evidence, no
string is equal to any number, and no implicit conversion is defined.  

Note well: the operand didn't reach out and bang the operator on the
head to modify its behavior.  The operator converted the operand to a
comparable type.  Or not, as the case may be.  

All to say something simple but not obvious: the rules of equality --
of any comparision -- are governed by the type being compared.  

Now keep that in mind as you think about strings and their collations.
when we evaluate:

where 'abc' = NAME

we must know:

1.  The collation of NAME (not of the values in NAME)
2.  The collation of 'abc'
3.  The rules of conversion

Dr. Hipp told us, "If X does not have a specified collation, then the
collation of Y is used instead", which I would express as, "if only one
of two types has the default collation, it is converted to the other
type".  Collations aren't "used"; they're *honored*!  

By that rule, if NAME has the default collation, the two operands are
of the same type and can be compared.  Else, if NAME's type uses a
collation other than the default, 'abc' is converted to that type and
the comparison is performed.  

That is the easiest way -- and, by the grace of Codd, the right way --
to think about it.  Get your types in order, and then compare.  It
works for any comparison, and any type, and our friend equality remains
unbothered.  And unmodified.  

Regards, 

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] BETWEEN and explicit collation assignment

> create table t(x text collate nocase);
> insert into t values ('A');
> select count(*) from t where x = 'a';

> In your opinion, what result should this select statement produce? 1, as 
> it does now? 0, as Simon's approach seems to imply? Something else?

Since logically it is the operation that needs a modifier, then it should be 
supplied and not derived from the operands. If I put aside backward 
compatibility and take position of purity and clarity to the extreme (because 
taking extreme positions clarifies thinking), then this should fail since no 
rule to compare is fully specified. The "=" must be accompanied by a modifier. 
Once the modifier is supplied, comparison can proceed and index can be 
identified. 

Thus, I would use lengthy version for clarity:

create table t(x text)
create index t(x collate nocase)
create index t(x collate someothercase)
insert into t values ('A');
select count(*) from t where x = 'a' nocase;

Roman

From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Igor Tandetnik [i...@tandetnik.org]
Sent: Thursday, August 22, 2013 8:14 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] BETWEEN and explicit collation assignment

On 8/22/2013 6:59 PM, Roman Fleysher wrote:
> First, I do not expect any changes to SQLite to be made as a result of this 
> discussion.

You might not, but Simon seems to:

"It's too late to contrafit this into SQLite3, but I'm wondering whether
SQLite4 might have the collation operator rethought along these lines."

> You are right, the discussion is largely philosophical. (I do not imply 
> "philosophical" means "impractical", which are sometimes equated.) But the 
> questions you ask are also philosophical: for is there something that can be 
> done in C++ that can not be done in Fortran or Assembler?

There are lots of things that can be done much more quickly,
conveniently and reliably in C++ than in Fortran or Assembler. The
cost-benefit analysis has been performed, and the market has spoken.

But let's even set the costs aside: what are the benefits? In what way
is your proposed approach an improvement over the status quo? Sorry for
being dense here, but I honestly do not understand what advantages you
believe you are gaining. Could you explain it to me like I'm five?

> These are the questions, as I understand, Simon is asking. Can collate rule 
> be expressed in such a way that it is clear it modifies the operation not the 
> value?

This question would perhaps be interesting to explore if there were
widespread confusion on this topic. But I'm not aware of any evidence
that such confusion exists. I have followed this group for years (and so
did Simon): in my experience, the issue simply fails to arise in
practice. Or indeed in theory: I don't recall any prior discussion of
this topic, ever.

> Because COLLATE should be a modifier to comparisons, then COLLATE should be a 
> modifier to index. We agree here, I think. I do not know if it is possible to 
> build two indices on the same column using two different collate rules.

Yes it is possible. You can specify a COLLATE clause on a column in
CREATE INDEX statement. If you don't, then the collation assigned to
that column in CREATE TABLE, if any, is used. If neither place specifies
a collation, the default is BINARY.

> Assuming comparison should specify collate rule (implicitly or explicitly 
> without which comparison is impossible)...

That's precisely where the disagreement lies. Should comparison derive
its collation implicitly from its operands? I say yes, and that's
exactly the mechanism currently in place. Simon seems to say no, and
wants to introduce a different mechanism. What's your position?

Once again, a concrete example:

create table t(x text collate nocase);
insert into t values ('A');
select count(*) from t where x = 'a';

In your opinion, what result should this select statement produce? 1, as
it does now? 0, as Simon's approach seems to imply? Something else?

> Thus the collation should not be picked up from the column, it must be picked 
> up from the comparison.

Once again, see the example above. Are you willing to argue that 0 is
the correct result there? That would seem to be the logical conclusion
of your statement.

> It is possible to adopt column's collate rule, if it is unique, to be used by 
> default if no rule is explicitly specified, as you would like to see.

So, should the collation be picked up from the column, or should it not
be? You are making two contradictory statements right next to each
other. I'm confused.

> It is then clear that this adaptation is a shortcut which should raise FAIL 
> when columns with different collations are compared without explicit rule.

Yes, I've already conceded that the case where operands have conflicting
collations could be treated

Re: [sqlite] BETWEEN and explicit collation assignment


On 8/22/2013 8:41 PM, Simon Slavin wrote:


On 23 Aug 2013, at 1:14am, Igor Tandetnik  wrote:


Once again, a concrete example:

create table t(x text collate nocase);
insert into t values ('A');
select count(*) from t where x = 'a';

In your opinion, what result should this select statement produce? 1, as it 
does now? 0, as Simon's approach seems to imply? Something else?


You seem to thing that I had a problem with this.  I don't.  If you go back to 
my earlier post you will see that I was objecting to the way in which the 
COLLATE token was used in expressions.  The above does not use COLLATE in an 
expression and has nothing to do with my post.


I asked: "does a property of the column affect the behavior of the 
operator?" And you answered: "I see no reason for it to do that." This 
example demonstrates a reason for it to do that, does it not?

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] BETWEEN and explicit collation assignment

On 23 Aug 2013, at 1:14am, Igor Tandetnik  wrote:

> Once again, a concrete example:
> 
> create table t(x text collate nocase);
> insert into t values ('A');
> select count(*) from t where x = 'a';
> 
> In your opinion, what result should this select statement produce? 1, as it 
> does now? 0, as Simon's approach seems to imply? Something else?

You seem to thing that I had a problem with this.  I don't.  If you go back to 
my earlier post you will see that I was objecting to the way in which the 
COLLATE token was used in expressions.  The above does not use COLLATE in an 
expression and has nothing to do with my post.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] BETWEEN and explicit collation assignment

On 8/22/2013 6:59 PM, Roman Fleysher wrote:

First, I do not expect any changes to SQLite to be made as a result of this
discussion.

You might not, but Simon seems to:

"It's too late to contrafit this into SQLite3, but I'm wondering whether
SQLite4 might have the collation operator rethought along these lines."

You are right, the discussion is largely philosophical. (I do not imply "philosophical"
means "impractical", which are sometimes equated.) But the questions you ask are also
philosophical: for is there something that can be done in C++ that can not be done in Fortran or
Assembler?

There are lots of things that can be done much more quickly,
conveniently and reliably in C++ than in Fortran or Assembler. The
cost-benefit analysis has been performed, and the market has spoken.

But let's even set the costs aside: what are the benefits? In what way
is your proposed approach an improvement over the status quo? Sorry for
being dense here, but I honestly do not understand what advantages you
believe you are gaining. Could you explain it to me like I'm five?

These are the questions, as I understand, Simon is asking. Can collate rule be
expressed in such a way that it is clear it modifies the operation not the
value?

This question would perhaps be interesting to explore if there were
widespread confusion on this topic. But I'm not aware of any evidence
that such confusion exists. I have followed this group for years (and so
did Simon): in my experience, the issue simply fails to arise in
practice. Or indeed in theory: I don't recall any prior discussion of
this topic, ever.

Because COLLATE should be a modifier to comparisons, then COLLATE should be a
modifier to index. We agree here, I think. I do not know if it is possible to
build two indices on the same column using two different collate rules.

Yes it is possible. You can specify a COLLATE clause on a column in
CREATE INDEX statement. If you don't, then the collation assigned to
that column in CREATE TABLE, if any, is used. If neither place specifies
a collation, the default is BINARY.

Assuming comparison should specify collate rule (implicitly or explicitly
without which comparison is impossible)...

That's precisely where the disagreement lies. Should comparison derive
its collation implicitly from its operands? I say yes, and that's
exactly the mechanism currently in place. Simon seems to say no, and
wants to introduce a different mechanism. What's your position?

Once again, a concrete example:

create table t(x text collate nocase);
insert into t values ('A');
select count(*) from t where x = 'a';

In your opinion, what result should this select statement produce? 1, as
it does now? 0, as Simon's approach seems to imply? Something else?

Thus the collation should not be picked up from the column, it must be picked
up from the comparison.

Once again, see the example above. Are you willing to argue that 0 is
the correct result there? That would seem to be the logical conclusion
of your statement.

It is possible to adopt column's collate rule, if it is unique, to be used by
default if no rule is explicitly specified, as you would like to see.

So, should the collation be picked up from the column, or should it not
be? You are making two contradictory statements right next to each
other. I'm confused.

It is then clear that this adaptation is a shortcut which should raise FAIL
when columns with different collations are compared without explicit rule.

Yes, I've already conceded that the case where operands have conflicting
collations could be treated as an error. But that doesn't requite new
syntax or new machinery, just a minor tweak to existing rules. It
doesn't need to raise FAIL at query execution time - all the information
is available at prepare, it could be treated as a syntax error.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] BETWEEN and explicit collation assignment

Dear Igor,

First, I do not expect any changes to SQLite to be made as a result of this 
discussion. You are right, the discussion is largely philosophical. (I do not 
imply "philosophical" means "impractical", which are sometimes equated.) But 
the questions you ask are also philosophical: for is there something that can 
be done in C++ that can not be done in Fortran or Assembler? I am sure very 
little. Nevertheless, C++ came to exist. Thus these questions driven by 
practical considerations --- I understand --- must be replaced by some others. 
For example: can the desired action be expressed more clearly, elegantly, 
fool-proof, efficiently, etc in the language. I believe these sorts of 
questions led to development of C++, Java, etc not what could or could not be 
done in Fortran and Assembler. These are the questions, as I understand, Simon 
is asking. Can collate rule be expressed in such a way that it is clear it 
modifies the operation not the value? Perhaps, I should not speak for Simon, 
this is 
 my interpretation of his writing.

Because COLLATE should be a modifier to comparisons, then COLLATE should be a 
modifier to index. We agree here, I think. I do not know if it is possible to 
build two indices on the same column using two different collate rules. 
Assuming comparison should specify collate rule (implicitly or explicitly 
without which comparison is impossible) the same rule will identify the 
corresponding index. If available, this index would be used. Thus the collation 
should not be picked up from the column, it must be picked up from the 
comparison. It is possible to adopt column's collate rule, if it is unique, to 
be used by default if no rule is explicitly specified, as you would like to 
see. It is then clear that this adaptation is a shortcut which should raise 
FAIL when columns with different collations are compared without explicit rule. 

As a matter of taste, I prefer to tell exactly what I want to happen, rather 
than to have software figure out what I meant. I can not be sure it will figure 
what I wanted. Historically SQL statements were typed by operators, and typos 
were huge issue. Thus shortcuts were introduced to mitigate. This is in the 
past, I believe. We do not mind typing long names of variables in C++ code, we 
want the code to be self-documenting. For me having x=y and y=x comparisons 
produce different results is complex even though I know how to decipher. I do 
not mind typing an extra word -- a modifier for comparison -- to make clear 
what I want to happen. I do not think this increases complexity. Complexity is 
sometimes equated to the lack of clarity. In fact, it is the consequence. 

Roman

From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Igor Tandetnik [i...@tandetnik.org]
Sent: Thursday, August 22, 2013 5:26 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] BETWEEN and explicit collation assignment

On 8/22/2013 4:06 PM, Roman Fleysher wrote:
> Logically, I agree with Simon, collate modifier in table definition describes 
> how indices should be built, nothing more.

But for what purpose are indexes built? Isn't it for the purpose of
being used to speed up queries? Wouldn't you expect a query like

select count(*) from t where x = 'a';

to actually use an index on t(x), when available? And if you do, doesn't
that mean that the comparison (x = 'a') should use the same collation as
an index on t(x), that is, the collation associated with column x? And
if so, then you are right back to the idea that the collation used by
the comparison operator should, at least sometimes, be inferred from the
collation associated with its operands. Therefore, the COLLATE clause in
the table definition should apply not just to building indexes, but also
to the behavior of comparisons. QED.

Where is the flaw in this chain of reasoning, in your opinion?

> Comparisons, as in these examples should specify which collation is to be 
> used and if it is different from one used for indexing, then yes, Igor, index 
> can not be used. I believe this is the present behavior already.

Obviously, if you explicitly specify a collation for a comparison that
doesn't match that of an index, then the index cannot be used. The
question is, if you do *not* explicitly specify a collation for a
comparison, as in example above - would you expect an index to be used?
Personally, I would, and I rather like the fact that SQLite currently
does use an index in this case. Simon, on the other hand, appears to be
saying that he would rather the index not be used, that the comparison
be done using BINARY collation (though he never answered my direct
question about it).

> I do not think Simon asks to change the way collations work. He is asking to 
> make clear how they work. I think Simon is asking for clarity so that error 
> like "(x collate A) = (y collate

Re: [sqlite] BETWEEN and explicit collation assignment


On 8/22/2013 4:06 PM, Roman Fleysher wrote:

Logically, I agree with Simon, collate modifier in table definition describes 
how indices should be built, nothing more.


But for what purpose are indexes built? Isn't it for the purpose of 
being used to speed up queries? Wouldn't you expect a query like


select count(*) from t where x = 'a';

to actually use an index on t(x), when available? And if you do, doesn't 
that mean that the comparison (x = 'a') should use the same collation as 
an index on t(x), that is, the collation associated with column x? And 
if so, then you are right back to the idea that the collation used by 
the comparison operator should, at least sometimes, be inferred from the 
collation associated with its operands. Therefore, the COLLATE clause in 
the table definition should apply not just to building indexes, but also 
to the behavior of comparisons. QED.


Where is the flaw in this chain of reasoning, in your opinion?


Comparisons, as in these examples should specify which collation is to be used 
and if it is different from one used for indexing, then yes, Igor, index can 
not be used. I believe this is the present behavior already.


Obviously, if you explicitly specify a collation for a comparison that 
doesn't match that of an index, then the index cannot be used. The 
question is, if you do *not* explicitly specify a collation for a 
comparison, as in example above - would you expect an index to be used? 
Personally, I would, and I rather like the fact that SQLite currently 
does use an index in this case. Simon, on the other hand, appears to be 
saying that he would rather the index not be used, that the comparison 
be done using BINARY collation (though he never answered my direct 
question about it).



I do not think Simon asks to change the way collations work. He is asking to make clear 
how they work. I think Simon is asking for clarity so that error like "(x collate A) 
= (y collate B)" can not take place.


What to do with the case of (x = y) where x and y are two columns with 
different collations, then? How do you ensure that it cannot take place? 
And if you can't prevent it, then you need to do *something* in this 
case, and then you can do the exact same thing with "(x collate A) = (y 
collate B)" case - no need to invent new syntax or machinery to try and 
avoid it.



This is similar to JOIN operation, which requires two tables to have common 
column.


JOIN operation requires no such thing. I don't understand this analogy.


Using this analogy, I would write:

select * from t WHERE x = y USING NOCASE


How is this different from

select * from t WHERE x = y COLLATE NOCASE

I only see a superficial difference in the keyword used. What's the 
improvement?



I *do* understand what you and Simon are suggesting. I *do* understand 
that you want some kind of syntax that would hang a collation onto the 
operator, rather than its operands, for certain philosophical and/or 
aesthetic reasons. My point is, this would have to be done *in addition 
to*, not *instead of*, the existing mechanism (because of the whole 
"comparison must automatically pick the collation off the column in 
order to use the index" thing). So it's strictly an added complexity.


The burden is then on the authors of the proposal to demonstrate the 
benefits of the feature you are proposing, and to show that said 
benefits do in fact outweigh the cost of this added complexity. So I 
would ask you the same question I asked Simon: what exactly are the 
benefits of your approach? Is there something that can be done your way 
but can't be done the current way? Is there some bad outcome that's 
possible the current way but prevented your way? Do you just prefer your 
(as yet unspecified) syntax on purely aesthetic grounds?

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] BETWEEN and explicit collation assignment

Dear Igor,

I was reading with great interest your debate with Simon. I have come to 
respect your opinion by reading your comments. I am not a moderator, but trying 
to imply that many SQLite users are on your side reduced your image in my head. 
I left USSR (where such arguments were plenty) and would prefer to stay away 
from it. I block my eyes and ears so that I can not see/hear further arguments 
even if they are correct and even if I would agree with them if I heard them. 
This is my psychological defense.

On the issue. I find Simon's example

select * from t where x = y;
select * from t where y = x;

extremely disturbing. One has to remember implied rules to interpret this 
statement and/or to compose it properly. Spelling everything out is long, I 
agree, but makes things clear. However, his example is no different from yours 
because 

select count(*) from t where x = 'a';
select count(*) from t where 'a'=x;

works only thanks to 'a' having not collation attached and SQLite using one 
from x instead. This is exactly the point Simon is making: collations are 
"attached" to values, not to operations. From Simon's description, this 
"attachment" takes place on a higher level in SQLite since deep inside 
(comparisons) it is attached to the operator. Thus SQLite transfers collate 
rules from values to operations somewhere on the high level. This transfer is 
the issue that Simon points out.

Logically, I agree with Simon, collate modifier in table definition describes 
how indices should be built, nothing more. Comparisons, as in these examples 
should specify which collation is to be used and if it is different from one 
used for indexing, then yes, Igor, index can not be used. I believe this is the 
present behavior already.

I do not think Simon asks to change the way collations work. He is asking to 
make clear how they work. I think Simon is asking for clarity so that error 
like "(x collate A) = (y collate B)" can not take place. This is similar to 
JOIN operation, which requires two tables to have common column. Using this 
analogy, I would write:

select * from t WHERE x = y USING NOCASE

Roman

From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Igor Tandetnik [i...@tandetnik.org]
Sent: Thursday, August 22, 2013 3:53 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] BETWEEN and explicit collation assignment

On 8/22/2013 3:12 PM, Simon Slavin wrote:
>
> On 22 Aug 2013, at 8:04pm, Igor Tandetnik <i...@tandetnik.org> wrote:
>> [snip]
>
> I pretty much agree with everything you wrote there.  But it has nothing to 
> do with my original objection which was the explicit use of a COLLATE 
> operator inside an expression.

If you agree with everything I wrote, then I don't understand what
exactly you find wrong with the explicit use of a COLLATE operator
inside an expression. Is there something that can be done your way but
can't be done the current way? Is there some bad outcome that's possible
the current way but prevented your way? Do you just prefer your (as yet
unspecified) syntax on purely aesthetic grounds?

Let's assume, for the sake of argument, that the case of "(x collate A)
= (y collate B)" and similar instances of mismatched collations are made
an error (I've conceded that it may be a good idea, so there's no reason
to re-tread this ground).
--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] BETWEEN and explicit collation assignment


On 8/22/2013 3:12 PM, Simon Slavin wrote:


On 22 Aug 2013, at 8:04pm, Igor Tandetnik  wrote:

[snip]


I pretty much agree with everything you wrote there.  But it has nothing to do 
with my original objection which was the explicit use of a COLLATE operator 
inside an expression.


If you agree with everything I wrote, then I don't understand what 
exactly you find wrong with the explicit use of a COLLATE operator 
inside an expression. Is there something that can be done your way but 
can't be done the current way? Is there some bad outcome that's possible 
the current way but prevented your way? Do you just prefer your (as yet 
unspecified) syntax on purely aesthetic grounds?


Let's assume, for the sake of argument, that the case of "(x collate A) 
= (y collate B)" and similar instances of mismatched collations are made 
an error (I've conceded that it may be a good idea, so there's no reason 
to re-tread this ground).

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] BETWEEN and explicit collation assignment


On 8/22/2013 3:13 PM, Niall O'Reilly wrote:

But let's bring the thread back to the original problem.

What Simon Slavin seems (to me) to be pointing out is that the
counter-intuitive behaviour observed by the OP (Clemens Ladisch)
needs either to be corrected or explicitly documented; he also
seems to be trying to find a way to meet this need.


The behavior observed by Clemens looks like a bug to me. It should be 
fixed, so that the actual behavior matches the documentation. I have no 
quarrel with that.


But that's not what Simon argues:

"I feel that rather than notice a specific error in the implementation 
of BETWEEN [Clemens] has highlighted a conceptual error in SQLite."


In other words: let's not just fix this particular bug and go back to 
business as usual; let's instead completely change the way collations 
work in SQLite.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Clemens Ladisch

Niall O'Reilly wrote:
> What Simon Slavin seems (to me) to be pointing out is that the
> counter-intuitive behaviour observed by the OP (Clemens Ladisch)
> needs either to be corrected or explicitly documented

It worked correctly (i.e., as documented) before 3.7.15, so it's a bug.
Which is already fixed: .


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Niall O'Reilly

On Thu, 22 Aug 2013 13:15:19 -0400
Igor Tandetnik  wrote:

> On 8/22/2013 11:49 AM, Simon Slavin wrote:

> > Step 1 would be remove all ability to specify collation applying to a 
> > single value.
> 
> But it already applies, implicitly, to a single value that happens to be 
> a column name. It seems you want to preserve that, right?

Surely not!  

A column name is not a value, but a label for a set.

The collation associated with a column applies to each subset 
(of that set) whose cardinality is exactly two, and creates an
ordering on the set for which the column name is a label.

But let's bring the thread back to the original problem.

What Simon Slavin seems (to me) to be pointing out is that the 
counter-intuitive behaviour observed by the OP (Clemens Ladisch)
needs either to be corrected or explicitly documented; he also
seems to be trying to find a way to meet this need.

> Hi,
>
> the documentation says (on ):
>
> | The expression "x BETWEEN y and z" is logically equivalent to two
> | comparisons "x >= y AND x <= z" and works with respect to collating
> | functions as if it were two separate comparisons.
>
> However, this is not true when the first operator has an explicit
> collation assignment:
>
> SQLite version 3.7.17 2013-05-20 00:56:22
> Enter ".help" for instructions
> Enter SQL statements terminated with a ";"
> sqlite> create table t(x);
> sqlite> insert into t values('a');
> sqlite> insert into t values('A');
> sqlite> select * from t where x collate nocase between 'a' and 'b';
> a
> sqlite> select * from t where x collate nocase >= 'a' and x collate nocase <=
> 'b';
> a
> A
>
> It works only on the second and third operators:
>
> sqlite> select * from t where x between 'a' collate nocase and 'b' collate
> nocase;
> a
> A
>
> And adding it to the first operator breaks it again:
> sqlite> select * from t where x collate nocase between 'a' collate nocase and
> 'b' collate nocase;
> a 

[I would use "operand" where Clemens uses "operator".]

This behaviour is inconsistent with the documentation (Rule 1 of
section 6.1 of http://www.sqlite.org/datatype3.html#collation) because
(a) the first operand of the BETWEEN operator is precisely the left 
operand of each of the two comparisons to which the BETWEEN operator
is described as being equivalent, and (b) rule 1 just mentioned gives
precedence to the explicit collating function associated with the left
operand of a comparison.  It should not therefore be necessary to
declare a collation for the second and third operands of BETWEEN; one
should rather be able to rely on the collation declared for the first
one.  The observed behaviour indicates that precisely the opposite is 
true in practice.

AFAICS, either the code or the documentation is broken, and either one
needs to be corrected.

Best regards,
Niall O'Reilly

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] BETWEEN and explicit collation assignment


On 22 Aug 2013, at 8:04pm, Igor Tandetnik  wrote:
> [snip]

I pretty much agree with everything you wrote there.  But it has nothing to do 
with my original objection which was the explicit use of a COLLATE operator 
inside an expression.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] BETWEEN and explicit collation assignment


On 8/22/2013 2:10 PM, Simon Slavin wrote:

Sorry, hit 'send' too early.

On 22 Aug 2013, at 6:15pm, Igor Tandetnik  wrote:


But again, by what formal mechanism does a property of the column affect the 
behavior of the operator?


I see no reason for it to do that.


So to be clear:

create table t(x text collate nocase);
insert into t values ('A');
select count(*) from t where x = 'a';

Currently, this select statement returns 1. Is it your position that it 
should return 0 instead? Somehow I doubt you would find many SQLite 
users sympathetic to this position.



But my original problem, as stated, isn't with this.  It's just with the way 
that one can explicitly apply a correlation in an expression by using the 
COLLATE operator.


And my point is that, once you have collations on columns and have to 
deal with them anyway, it's not a problem to have collations on any 
other expression: you just deal with them the exact same way.


Now, if your position is that comparison operators shouldn't pay 
attention to collations associated with columns either, then the rest of 
your argument becomes internally consistent. But I don't think I would 
like to live in the world you envision. Consider my earlier example, 
slightly modified:


create table t(x text primary key collate nocase);
select count(*) from t where x = 'a';

There is an implicit index on t(x collate nocase), thanks to the PRIMARY 
KEY constraint. But the statement (in your interpretation) *cannot* use 
that index, since the comparison uses the default BINARY collation, and 
has to resort to full table scan instead. In order to take advantage of 
the index, one must write


select count(*) from t where x =NOCASE 'a';

(or whatever the syntax would be for hanging a collation onto the 
operator). I feel this is a) extremely inconvenient, and b) a severe 
violation of the principle of least astonishment.



About your example where two seemingly equivalent statements like

select * from t where x = y;
select * from t where y = x;

may produce different results - I've already stated a few times that I 
would be sympathetic to the argument that having two operands with 
different collations could be considered a syntax error. With that in 
place, both statements would be invalid, and the issue would fail to 
arise. This can be easily accomplished with a small tweak to existing 
rules, perhaps enabled with a pragma.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] BETWEEN and explicit collation assignment

Sorry, hit 'send' too early.

On 22 Aug 2013, at 6:15pm, Igor Tandetnik  wrote:

> But again, by what formal mechanism does a property of the column affect the 
> behavior of the operator?

I see no reason for it to do that.  I can see why the property of the column 
might affect the behaviour of an index made on that column but there are 
problems if it affects comparisons made using values from that column.  For 
example, the meaning of something like

create table t(x text collate nocase,
y text collate rtrim);
select * from t where x = y;

is not at all clear purely from reading the SQL code.  You have to have 
actually read the picky details.  And the consequences of the picky details are 
that under SQLite3, these two queries:

select * from t where x = y;
select * from t where y = x;

can return different results.  Which I find annoying.

But my original problem, as stated, isn't with this.  It's just with the way 
that one can explicitly apply a correlation in an expression by using the 
COLLATE operator.  I feel that it doesn't reflect the real-world operation of 
correlations by SQLite.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] BETWEEN and explicit collation assignment

On 22 Aug 2013, at 6:15pm, Igor Tandetnik  wrote:

> But again, by what formal mechanism does a property of the column affect the 
> behavior of the operator?

I see no reason for it to do that.  I can see why the property of the column 
might affect the behaviour of an index made on that column but there are 
problems if it affects comparisons made using values from that column.  For 
example, the meaning of something like

create table t(x text collate nocase,
y text collate rtrim);
select * from t where x = y;

is not at all clear purely from reading the SQL code.  You have to have 
actually read the picky details.  And the consequences of the picky details are 
that under SQLite3, these two queries:

select * from t where x = y;
select * from t where y = x;

can return different results.  Which I find annoying.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] BETWEEN and explicit collation assignment

On 8/22/2013 11:49 AM, Simon Slavin wrote:

On 22 Aug 2013, at 2:36pm, Igor Tandetnik wrote:

On 8/22/2013 8:52 AM, Simon Slavin wrote:

Nevertheless do you understand the point I'm trying to make -- that collations
are a modifier for comparisons not individual values ?

I do understand your point. I still don't understand how it's supposed to apply
in practice to a situation like

create table t(x text collate nocase);
select * from t where x = 'a';

Here, at the time I specify "collate nocase", there is no comparison it could
apply to. So what is it a property of, in your thinking?

COLLATE is a property of that column

So sometimes it's a property of a column, and other times it's a
property of a comparison operator? I predict you are going to have a
hard time describing this notion in a formal spec.

Your second line doesn't specify a COLLATE operator in its expression, so I
have no problem with it.

But again, by what formal mechanism does a property of the column affect
the behavior of the operator? And if you are OK with it doing that, then
what's wrong with the existing model, which is based on this very
approach (collation is a property of an expression; behavior of an
operator depends on collations associated with its operands)?

Let's put it this way: why should there be a fundamental difference
between an expression (x) where x is a column declared with COLLATE
NOCASE clause, and an expression ('a' COLLATE NOCASE)? What purpose
would such a distinction serve? Why exactly are you OK with the former,
but not the latter?

By what mechanism does it end up applying to x='a' comparison (I assume to do
want the statement to return rows both with 'a' and 'A' in column x)? How would
you modify the formal spec at http://sqlite.org/datatype3.html to lead to your
desired outcome? Precise wording matters.

I don't understand why binary comparison operators are on that page at all.
They aren't used directly as column definitions, only as parts of expressions,
and expressions are defined on another page. If you remove mention of
comparison operators from that page, the rest of that page is fine.

Precise wording as you requested ? Remove all of section 6.1 apart from the
last paragraph.

But again, I assume you do want the expression (x='a') to sometimes
evaluate to true when column x contains the value 'A', and other times
evaluate to false. How would this happen? It doesn't matter which page
describes the behavior: if it's not
http://www.sqlite.org/datatype3.html, then I imagine it would be
somewhere on http://www.sqlite.org/lang_expr.html. What should it say?

As I said, I could relate to your point of view better if you just did this:
inhttp://sqlite.org/datatype3.html section 6.1, replaced two occurrences of "with precedence
to the left operand" with "It's an error if two operands have different collations".
This keeps the existing, well defined mechanisms intact, while neatly excluding the case you seem
to find most objectionable.

As I wrote originally, my problem is not with the use of COLLATE in column
definitions, it's with its use in expressions. So my problem in documentation
of SQLite doesn't come in datatype3, it comes in

Step 1 would be remove all ability to specify collation applying to a single
value.

But it already applies, implicitly, to a single value that happens to be
a column name. It seems you want to preserve that, right?

This is just two short paragraphs on that page (search for the word 'collate').

One of those paragraphs says: "See the detailed discussion on collating
sequences in the Datatype In SQLite3 document for additional
information." You claim that you want to excise that very discussion
from "datatypes" article - doesn't that mean that you would have to move
it here, rather than incorporating it by reference? The behavior must be
described *somewhere*.

This would remove all trace of the use of COLLATE I have a problem with, i.e.
where it can be used in such a way as to be applied to a single value and not a
comparison.

However, a collation would still apply to a single value that happens to
be a column name, and comparison operators where such a value is an
operand would have to take that into account somehow, wouldn't they? You
are not saying that (x='a') should always use BINARY collation
regardless of how column x was declared, are you? And if you define some
mechanism by which the comparison takes the collation of x into account,
then I don't understand why it's fundamentally awful and wrong to extend
that same mechanism to the expression (('A' collate nocase) = 'a').

You instead suggest the invention of a separate mechanism whereby the
collation is assigned directly to the operator rather than inferred from
its operands; but you also need to keep the original mechanism around.
So now you have to specify two mechanisms,

Re: [sqlite] BETWEEN and explicit collation assignment


On 8/22/2013 8:52 AM, Simon Slavin wrote:

Nevertheless do you understand the point I'm trying to make -- that collations 
are a modifier for comparisons not individual values ?


I do understand your point. I still don't understand how it's supposed 
to apply in practice to a situation like


create table t(x text collate nocase);
select * from t where x = 'a';

Here, at the time I specify "collate nocase", there is no comparison it 
could apply to. So what is it a property of, in your thinking? By what 
mechanism does it end up applying to x='a' comparison (I assume to do 
want the statement to return rows both with 'a' and 'A' in column x)? 
How would you modify the formal spec at http://sqlite.org/datatype3.html 
to lead to your desired outcome? Precise wording matters.


As I said, I could relate to your point of view better if you just did 
this: in http://sqlite.org/datatype3.html section 6.1, replaced two 
occurrences of "with precedence to the left operand" with "It's an error 
if two operands have different collations". This keeps the existing, 
well defined mechanisms intact, while neatly excluding the case you seem 
to find most objectionable.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] BETWEEN and explicit collation assignment

2013-08-22 Thread Richard Hipp

On Thu, Aug 22, 2013 at 8:52 AM, Simon Slavin  wrote:

>
> Nevertheless do you understand the point I'm trying to make -- that
> collations are a modifier for comparisons not individual values ?
>
> The collation is a property of the value.  It is part of the datatype of
the value.

When you do X==Y and X and Y have different collations, SQLite chooses the
collation on the left to use for the comparison.  (PostgreSQL, being more
rigid in its typing rules, throws an error, if I'm not mistaken.)  If X
does not have a specified collation, then the collation of Y is used
instead.  If neither X nor Y have collations, then the default collation
(BINARY) is used.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] BETWEEN and explicit collation assignment

On 22 Aug 2013, at 1:47pm, Igor Tandetnik  wrote:

> You can, actually: hulkify(x) := (case when x = 'the hulk' then 'B' else 'A' 
> end) . With such a function, "a = b collate muchStronger" is equivalent to 
> "hulkify(a)  = hulkify(b) collate BINARY". In other words, hulkify(x) is to 
> muchStronger what upper(x) is to NOCASE.

The Well Ordering Principle states that you can always devise such a function, 
given the requirements for SQLite collations.

Nevertheless do you understand the point I'm trying to make -- that collations 
are a modifier for comparisons not individual values ?

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] BETWEEN and explicit collation assignment


On 8/22/2013 8:36 AM, Simon Slavin wrote:

Collation 'muchStronger':
If a = b return 0, else ...
If b is 'the hulk' return -1, else ...
If a is 'the hulk' return 1, else ...
return 0

(translation: people are roughly as strong as one-another, except for The Hulk 
who is stronger than everyone except himself)

This is a legitimate collation under SQLite rules, but you cannot apply it 
without knowing both of the values.


You can, actually: hulkify(x) := (case when x = 'the hulk' then 'B' else 
'A' end) . With such a function, "a = b collate muchStronger" is 
equivalent to "hulkify(a)  = hulkify(b) collate BINARY". In other words, 
hulkify(x) is to muchStronger what upper(x) is to NOCASE.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] BETWEEN and explicit collation assignment