Re: [sqlite] SQLite character comparisons

2008-01-24 Thread Stephen Oberholtzer
I feel compelled to throw in my $0.02 here.

To everyone who thinks that SQLite should allow  'foo '  == 'foo':

SQL was originally conceived as a query *language* -- a way for a
human being to request a set of data from a database.  It was
specifically designed for ad-hoc queries.

This little 'magic space trimming' feature exists to match the
'char(N)' data type.  A char(10) field is always exactly 10 characters
long; longer strings are truncated and shorter strings are
space-padded.
Most database engines are more efficient at these, because when all
rows are the same width, the task of finding a particular row reduces
to a simple array lookup; therefore, if performance is
a critical issue (and when SQL was first formed, CPUs weren't quite as
powerful as they are now.)

But this presents a problem: the 'usual' definition of equality would
mean that any comparisons to a char(N) field would need to be N
characters long, or they would always fail.  Since it's stupid to make
people count spaces, somebody came up with the solution 'if they enter
something shorter, pad it with spaces and then compare.'

(If anyone wishes to quote the spec regarding space-extension and
varchar(N) fields, first recall that the SQL specifications have been
created by committees.)

Since SQLite does not have any concept of a fixed-width field of
character data, the whole concept of ignoring/appending trailing
spaces doesn't even apply.

-- 
-- Stevie-O
Real programmers use COPY CON PROGRAM.EXE

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] SQLite character comparisons

2008-01-21 Thread Dennis Cote

Fowler, Jeff wrote:

I may very well be mistaken - wouldn't be the first time!! I tried to
access the actual specification from ANSI (www.ansi.org), but you either
have to purchase them or access one of the "drafts" (which a couple
folks in this thread have done), but the draft may or may not match the
actual standard. While Dr. Hipp's new RTRIM collation makes this a moot
point, the source for my statement came from this article:
http://support.microsoft.com/kb/316626

  
After reading the MS article, I dug into the standard some more to see 
if I had missed something.


It seems there is a requirement that all "standard character set names" 
define a default collation that has the PAD SPACE property (see section 
10.6 general rule 2).


2) A  specifies the name of a character 
set that is defined by a
national or international standard. The character repertoire of CS, 
implied by the character set name>, are defined by the standard defining the 
character set identified by that
. The default collating sequence of the 
character set is defined
by the order of the characters in the standard and has the PAD SPACE 
characteristic.
There is a similar definition for an "implementation defined character 
set name" in general rule 3.
3) An  specifies the name 
of a character set that
is implementation-defined. The character repertoire of CS, implied by 
the character set name>, are implementation-defined. The default collating 
sequence of the
character set and whether the collating sequence has the NO PAD 
characteristic or the PAD

SPACE characteristic is implementation-defined.
But syntax rule 1 in that section seems to say that what is supported is 
implementation defined.
1) The s and character set name>s that

are supported are implementation-defined.
These are character sets, and predefined collations, can be used to 
create additional collations as described in section 11.32

 ::=
CREATE COLLATION  FOR 
FROM 
[  ]
 ::= 
 ::=
NO PAD
| PAD SPACE
If the pad characteristic is not given explicitly, it is inherited 
implicitly from the existing collation name.


So if an implementation provides a standard character set, such as 
SQL_TEXT, it must also have a default collation with the PAD SPACE 
property that is the default collation for strings in that character 
set. The user can create a new collation that explicitly overrides the 
pad characteristic and use that for NO PAD operation.


But ultimately there is no requirement to provide any standard character 
sets. So all bets are off when switching implementations (as Darren has 
suggested).


Dennis Cote


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] SQLite character comparisons

2008-01-21 Thread Fowler, Jeff
Hi Dennis,

I may very well be mistaken - wouldn't be the first time!! I tried to
access the actual specification from ANSI (www.ansi.org), but you either
have to purchase them or access one of the "drafts" (which a couple
folks in this thread have done), but the draft may or may not match the
actual standard. While Dr. Hipp's new RTRIM collation makes this a moot
point, the source for my statement came from this article:
http://support.microsoft.com/kb/316626


As I'd mentioned, we have both SQL Server 2005 and Oracle 10g and they
default to ignore trailing spaces on character compares, although it may
be possible to turn this off for those who don't like it. I believe (but
am not sure) that MYSql and PostGreSQL ignore them by default also. I
understand many of the arguments folks have against this and there's not
much point to continuing the discussion; I won't change their mind and
they won't change mine! But I guess that's why we have Republicans and
Democrats :-)

But all is well - the RTRIM collation option is a neat and simple
solution. My thanks to Dr. Hipp for that, and also to everyone who has
shown an interest in this issue.

- Jeff


-Original Message-
From: Dennis Cote [mailto:[EMAIL PROTECTED] 
Sent: Monday, January 21, 2008 11:46 AM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] SQLite character comparisons

Fowler, Jeff wrote:
> Hello All,
>  
> Not trying to be antagonistic, but I'm curious to know how many of you
agree with Darren's sentiments on this issue. To restate briefly, ANSI
SQL-92 specifies that when comparing two character fields, trailing
spaces should be ignored. Correct me if I'm wrong Darren, but you feel
this is a bad decision, and in fact SQLite's implementation of character
comparison (respecting trailing spaces) is superior to ANSI's specs.
Keep in mind this is not some obscure issue that can be subject to
different interpretations by different vendors; it's very clearly
stated: "The ANSI standard requires padding for the character strings
used in comparisons so that their lengths match before comparing them."
>   
Jeff,

I think you are mistaken about what the ANSI spec says.

There are two string types in ANSI SQL, character strings (which come is
several subtypes), and binary strings. The following excerpts are taken
from the SQL:1999 spec.

Section 4.2.1 Character Strings and Collations describes the operations
on character strings. It describes comparisons as
> Given a collating sequence, two character strings are identical if and

> only if they are equal in accordance with the comparison rules 
> specified in Subclause 8.2, . The collating 
> sequence used for a particular comparison is determined as in 
> Subclause 4.2.3, ''Rules determining collating sequence usage''.
Binary strings are defined in Section 4.3 as;
> A binary string is a sequence of octets that does not have either a 
> character set or collation associated with it.
And their comparison is detailed in 4.3.1 as;
> All binary strings are mutually comparable. A binary string is 
> identical to another binary string if and only if it is equal to that 
> binary string in accordance with the comparison rules specified in 
> Subclause 8.2, .

General Rules 3 and 4 of section 8.2  describe the
comparison of these strings. I have copied these sections below.

> 3) The comparison of two character strings is determined as follows:
>
> a) Let CS be the collating sequence indicated in Subclause 4.2.3, 
> ''Rules determining collating sequence usage'', based on the declared 
> types of the two character strings.
>
> b) If the length in characters of X is not equal to the length in 
> characters of Y, then the shorter string is effectively replaced, for 
> the purposes of comparison, with a copy of itself that has been 
> extended to the length of the longer string by concatenation on the 
> right of one or more pad characters, where the pad character is chosen

> based on CS. If CS has the NO PAD characteristic, then the pad 
> character is an implementation-dependent character different from any 
> character in the character set of X and Y that collates less than any 
> string under CS. Otherwise, the pad character is a .
>
> c) The result of the comparison of X and Y is given by the collating 
> sequence CS.
>
> d) Depending on the collating sequence, two strings may compare as 
> equal even if they are of different lengths or contain different 
> sequences of characters.
> When any of the operations
> MAX, MIN, and DISTINCT reference a grouping column, and the UNION, 
> EXCEPT, and INTERSECT operators refer to character strings, the 
> specific value selected by these operations from a set of such equal 
> values is implementation-dependent.
>
> NOTE 129 - If the coercibility characteristic of the comparison is 
> Coercible

Re: [sqlite] SQLite character comparisons

2008-01-21 Thread drh
John Elrick <[EMAIL PROTECTED]> wrote:
> 
> If we are ignoring trailing spaces, then by definition:
> 
> ' ' = ''
> 
> and for that matter:
> 
> '  ' = '  '
> 

Good point.  I added these as test cases for the new RTRIM
collation.  http://www.sqlite.org/cvstrac/chngview?cn=4735

--
D. Richard Hipp <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] SQLite character comparisons

2008-01-21 Thread Dennis Cote

Fowler, Jeff wrote:

Hello All,
 
Not trying to be antagonistic, but I'm curious to know how many of you agree with Darren's sentiments on this issue. To restate briefly, ANSI SQL-92 specifies that when comparing two character fields, trailing spaces should be ignored. Correct me if I'm wrong Darren, but you feel this is a bad decision, and in fact SQLite's implementation of character comparison (respecting trailing spaces) is superior to ANSI's specs. Keep in mind this is not some obscure issue that can be subject to different interpretations by different vendors; it's very clearly stated: "The ANSI standard requires padding for the character strings used in comparisons so that their lengths match before comparing them."
  

Jeff,

I think you are mistaken about what the ANSI spec says.

There are two string types in ANSI SQL, character strings (which come is 
several subtypes), and binary strings. The following excerpts are taken 
from the SQL:1999 spec.


Section 4.2.1 Character Strings and Collations describes the operations 
on character strings. It describes comparisons as
Given a collating sequence, two character strings are identical if and 
only if they are equal in
accordance with the comparison rules specified in Subclause 8.2, 
‘‘’’. The
collating sequence used for a particular comparison is determined as 
in Subclause 4.2.3, ‘‘Rules

determining collating sequence usage’’.

Binary strings are defined in Section 4.3 as;
A binary string is a sequence of octets that does not have either a 
character set or collation associated

with it.

And their comparison is detailed in 4.3.1 as;
All binary strings are mutually comparable. A binary string is 
identical to another binary string
if and only if it is equal to that binary string in accordance with 
the comparison rules specified in

Subclause 8.2, ‘‘’’.


General Rules 3 and 4 of section 8.2  describe the 
comparison of these strings. I have copied these sections below.



3) The comparison of two character strings is determined as follows:

a) Let CS be the collating sequence indicated in Subclause 4.2.3, 
‘‘Rules determining collating
sequence usage’’, based on the declared types of the two character 
strings.


b) If the length in characters of X is not equal to the length in 
characters of Y, then the shorter
string is effectively replaced, for the purposes of comparison, with a 
copy of itself that has
been extended to the length of the longer string by concatenation on 
the right of one or more
pad characters, where the pad character is chosen based on CS. If CS 
has the NO PAD
characteristic, then the pad character is an implementation-dependent 
character different
from any character in the character set of X and Y that collates less 
than any string under

CS. Otherwise, the pad character is a .

c) The result of the comparison of X and Y is given by the collating 
sequence CS.


d) Depending on the collating sequence, two strings may compare as 
equal even if they are
of different lengths or contain different sequences of characters. 
When any of the operations

MAX, MIN, and DISTINCT reference a grouping column, and the UNION, EXCEPT,
and INTERSECT operators refer to character strings, the specific value 
selected by these

operations from a set of such equal values is implementation-dependent.

NOTE 129 – If the coercibility characteristic of the comparison is 
Coercible, then the collating sequence
used is the default defined for the character repertoire. See also 
other Syntax Rules in this Subclause,
Subclause 10.6, ‘‘’’, and Subclause 
11.30, ‘‘’’.


4) The comparison of two binary string values, X and Y, is determined 
by comparison of their
octets with the same ordinal position. If Xi and Yi are the values of 
the i-th octets of X and Y,
respectively, and if Lx is the length in octets of X AND Ly is the 
length in octets of Y, then X is

equal to Y if and only if Ly = Ly and if Xi = Yi for all i.
I note that there is a typo in rule 4 for binary strings; Ly = Ly should 
be Lx = Ly, since binary strings can only be compared for equality.


Rule 3.b details how strings of unequal length are to be compared. It 
allows exactly the operation performed by SQLite, since it allows 
collating sequences to have a NO PAD characteristic which results in the 
shorter string comparing less than the longer string.


This distinction also appears in section 4.12 which discusses type 
conversions and mixing of data types. It says;
Values corresponding to the data types CHARACTER, CHARACTER VARYING, 
and CHARACTER
LARGE OBJECT are mutually assignable if and only if they are taken 
from the same character
repertoire. If they are from different character repertoires, then the 
value of the source of the
assignment must be translated to the character repertoire of the 
target before an assignment is
possible. Such translation may be implementation-defined and 
implicitly performed, in which case
the two character data types are also mutually 

Re: [sqlite] SQLite character comparisons

2008-01-21 Thread John Elrick

Fowler, Jeff wrote:

Hello All,
 
Not trying to be antagonistic, but I'm curious to know how many of you agree with Darren's sentiments on this issue. To restate briefly, ANSI SQL-92 specifies that when comparing two character fields, trailing spaces should be ignored. 


My $0.02.

If we are ignoring trailing spaces, then by definition:

' ' = ''

and for that matter:

'  ' = '  '

whereas

'xx' <> 'xxx'

Therefore, from the standpoint of arbitrary text,

A may or may not actually be identical to B when A = B

I don't consider that type of result to be predicable and therefore 
would respectfully state that the standard is a potential source of bugs.


What could be useful is a function:

SELECT * FROM FOO WHERE TRIM(BAR) = TRIM(:bar)

or even a shortcut function along the lines of:

SELECT * FROM FOO WHERE TEXT_EQUIVALENT(BAR, :bar)

However, in my opinion, the operator '=' is better reserved as defining 
strict equality.



John

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] SQLite character comparisons

2008-01-20 Thread Fowler, Jeff
I don't recall asking for a vote on the subject or asking anyone to agree with 
me, but thank you for your input anyway Jay. 
 
And I do appreciate Dr H. for adding the "COLLATE RTRIM" declaration!
 
Regards,
 
- Jeff



From: Jay Sprenkle [mailto:[EMAIL PROTECTED]
Sent: Sun 1/20/2008 11:56 AM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] SQLite character comparisons



On Jan 20, 2008 10:19 AM, Fowler, Jeff <[EMAIL PROTECTED]> wrote:
>
> Not trying to be antagonistic, but I'm curious to know how many of you agree 
> with Darren's sentiments on this issue.

Sqlite is DRH's project. Voting doesn't come into the picture
anywhere. If you convince him
to fix it then he might, if not you're wasting your time asking if
people agree with you.

>Sure - we can handle this situation by writing more code looking for
spaces everywhere they might occur.

The source for the database engine is provided. It would be more
efficient to change it there.
You could also provide the code back to the community as an option for
others to use.

I'm not trying to be unpleasant, I just believe where you're headed
isn't going to be as valuable as other options.

--
The PixAddixImage Collector suite:
http://groups-beta.google.com/group/pixaddix

SqliteImporter and SqliteReplicator: Command line utilities for Sqlite
http://www.reddawn.net/~jsprenkl/Sqlite

Cthulhu Bucks!
http://www.cthulhubucks.com <http://www.cthulhubucks.com/> 

-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-

RE: [sqlite] SQLite character comparisons

2008-01-20 Thread Darren Duncan

At 11:19 AM -0500 1/20/08, Fowler, Jeff wrote:
To restate briefly, ANSI SQL-92 specifies that when comparing two 
character fields, trailing spaces should be ignored. Correct me if 
I'm wrong Darren, but you feel this is a bad decision, and in fact 
SQLite's implementation of character comparison (respecting trailing 
spaces) is superior to ANSI's specs.


Yes, that is indeed what I am saying.

More broadly speaking, and this may already be familiar to some of 
you who remember several of my writings over the last few years, I 
believe that while SQL has a lot of good things going for it, it also 
has numerous flaws, some of which are quite severe in their 
consequences.  I am specifically addressing the ANSI/ISO SQL standard 
itself with this blame, not any implementation in particular.


I make this assessment of SQL both in respect to how much SQL is able 
to represent the relational model of data that Codd proposed to be 
used for computer databases, and in respect to how much SQL is 
constructed according to well-established principles of good language 
design.


As far as I am concerned, any quasi-implementation of SQL that 
addresses these flaws is something to applaud.


And at times that it seems SQLite is already doing things a better 
way, I am inclined to argue in support of its current status quo.


I won't address/re-address the other perceived SQL flaws in this 
thread, to stay on topic, but I'll further clarify my position on the 
space-pad thing in light of the previous paragraphs.  It may even 
appear that I changed or reversed my position, but I don't feel that 
it changed.


1.  The most important thing to have in regards to data types and 
values is to have a fully deterministic (and preferrably simple) 
concept of value identity, that is, when 2 containers are considered 
to hold identical values or not, or should I say, when 2 appearances 
of values are in fact the same one value.


2.  The same conceptual value can have multiple physical 
representations, but this distinction is meant to be abstracted away 
from the user, so for example if the definition of a data type says 
that the representations 2.0 and 2.00 are the same value, then an 
equality test on them should return true; that said, users should not 
even see the difference then; any display of either physical 
representation to the user should be normalized to the same thing, 
such as 2, so when 2 values are considered equal by the system, they 
look the same to the user, but it is still okay to store them 
differently behind the scenes.


3.  It is okay in the general case for a system's conception of value 
identity to be different than another system's as long as the rules 
are clearly documented.  In this respect, it is okay for either 
trailing spaces to be significant, or for them to be non-significant, 
for determining identity (and by extension, equality), as long as 
these rules are consistently applied everywhere that value appears. 
Eg, 2 given character strings Foo and Bar can't be considered 
identical in some contexts and non-identical in other contexts.  If 
you want to have it both ways, you need to have 2 distinct data types 
which happen to look similar, eg a CharStrSpSignif data type and a 
CharStrSpInsig data type, and then you use values of one type in one 
context and separate values of the other type in other contexts.


4.  In this respect, if I don't misunderstand, SQLite's text data 
type is the CharStrSpSignif data type, and the SQL standard has the 
CharStrSpInsig type instead; if you consider the 2 systems as having 
different data types, then this difference of behaviour is 
explainable.  Moreover, you can have your choice of the behaviour in 
different systems by having both types implemented there to choose 
from when you want, like you can choose between text and number types 
now.


5.  A more practical example of #2, ignoring the whole spaces thing, 
is in regard to Unicode codepoints vs graphemes.  Even if you are 
using a consistent byte encoding throughout, such as just UTF-8 or 
UTF-16-LE, you still have to be concerned with the fact that Unicode 
has multiple normal forms.  Depending on your normal form, such as 
normal form C vs normal form D, you may have different sequences of 
code points representing the same grapheme.  An example of a single 
grapheme being the combination of a plain roman letter plus a 
diacritical mark or accent; in NFC, that may be a single codepoint, 
in NFD, it might be a sequence of 2 code points.  So, it is important 
for a character string data type to explicitly be considered either 
as a string of code points or of graphemes, for example.  At the 
higher level abstraction, the 2 forms of letter+accent would be 
considered identical, but in the lower level abstraction, they would 
be non-identical.  Note that afaik most high-level Unicode systems 
normally work in the highest abstraction level possible (whether they 
synchronize the normal form on 

Re: [sqlite] SQLite character comparisons

2008-01-20 Thread drh
"Fowler, Jeff" <[EMAIL PROTECTED]> wrote:
> I'm curious to know how many of you agree with Darren's 
> sentiments on this issue

Changing the behavior of SQLite to ignore trailing 
spaces is not an option for SQLite version 3, since
to do so would result in a incompatible file format
All indices created before the change would be invalid 
since they would use a different collation.  There 
are multiple thousands of SQLite applications and
hundreds of millions of existing SQLite database files
that depend on this backwards compatibility.  To make 
this change would therefore require bumping the version 
number up to SQLite 4.0.

>  
> Our app creates SQLite tables dynamically based on the output
> from user-defined queries that run against data warehouses 
> (of practically any "flavor") we have no control over, and
> we insert the results into SQLite. Sure - we can handle this
> situation by writing more code looking for spaces everywhere
> they might occur. But to me (and maybe only to me?), it makes
> sense for SQLite -- where reasonably possible -- to attempt
> to follow clear ANSI guidelines
>  

Check-in [4732] implements a built-in RTRIM collating sequence
that provides the ignore-spaces comparison semantics that you
desire.  All you have to do is add "COLLATE RTRIM" to the declarations
of text columns in your SQLite schema and SQLite will thereafter
ignore trailing spaces on comparisons involving those columns.
If you do SQL comparisons that do not involve columns, you can
put "COLLATE RTRIM" after the comparison operator itself to get
this behavior.  Example (an actual screen capture):

[EMAIL PROTECTED]:~/sqlite/bld> ./sqlite3
SQLite version 3.5.4
Enter ".help" for instructions
sqlite> SELECT 'abc'='abc   ';
0
sqlite> SELECT 'abc'='abc   ' COLLATE RTRIM;
1

--
D. Richard Hipp <[EMAIL PROTECTED]>



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] SQLite character comparisons

2008-01-20 Thread John Stanton
I would suggest that Sqlite is correct having regard to its universal 
variable length feature and single TEXT type.  Compatibility would be 
achieved by not storing redundant trailing spaces if a string field is 
required.  More efficient storage would be a bonus.


Fowler, Jeff wrote:

Hello All,
 
Not trying to be antagonistic, but I'm curious to know how many of you agree with Darren's sentiments on this issue. To restate briefly, ANSI SQL-92 specifies that when comparing two character fields, trailing spaces should be ignored. Correct me if I'm wrong Darren, but you feel this is a bad decision, and in fact SQLite's implementation of character comparison (respecting trailing spaces) is superior to ANSI's specs. Keep in mind this is not some obscure issue that can be subject to different interpretations by different vendors; it's very clearly stated: "The ANSI standard requires padding for the character strings used in comparisons so that their lengths match before comparing them."
 
Does anyone know of another RDBMS (ANSI or no) that respects trailing spaces when comparing character data? We have both Oracle 10g and SQL Server 2005 in house and they both work "correctly" according to the specification. Has anyone tried it with DB2 or Informix? What about PostGres and MySQL? Although I haven't asked him, I'm guessing Zbigniew's suggestion a while back for an auto-trim feature stemmed from this issue. Other than saving space, would there be a need to trim data if WHERE, HAVING clauses and joins followed the spec? Also, other than performance (which seems to be the primary concern), would anyone would be negatively impacted if the current behavior were changed?
 
Our app creates SQLite tables dynamically based on the output from user-defined queries that run against data warehouses (of practically any "flavor") we have no control over, and we insert the results into SQLite. Sure - we can handle this situation by writing more code looking for spaces everywhere they might occur. But to me (and maybe only to me?), it makes sense for SQLite -- where reasonably possible -- to attempt to follow clear ANSI guidelines, allowing developers to override it only in cases where this adherence produces undesirable results. I can't see where this is undesirable from an applications standpoint, although I can see where performance may be a concern.
 
Regards,
 
Jeff Fowler
 



From: Darren Duncan [mailto:[EMAIL PROTECTED]
Sent: Fri 1/18/2008 4:47 PM
To: sqlite-users@sqlite.org
Subject: RE: [sqlite] SQLite character comparisons



At 4:11 PM -0500 1/18/08, Fowler, Jeff wrote:

"Better" depends on who you ask - I'd say it's worse, and I bet most
DBA's would agree. The ANSI standard is to ignore trailing spaces when
comparing character strings in a WHERE clause, a HAVING clause, or a
join. So I can take the exact same data, run the exact same query, yet
get a different answer from SQLite vs. Oracle or SQL Server. In fact, we
found this issue because we DID get a different answer.


And every SQL implementation already has many differences from every
other one, even with fundamentals, making them all generally
incompatible and proprietary, and this example is right in line with
the kinds of differences they have.  Other implementations have
distinct var-char and fixed-char types, while SQLite only has the
former.

Part of the problem here is that the ANSI standard is itself flawed.

(Maybe this particular case of trailing spaces is an exception, but
many parts of ANSI SQL, AFAIK, are more focused on what the syntax of
SQL is, and they say that the semantics, even of some fundamental
operations, is left up to the implementation to decide for itself.
So what good does it do you if SQL of the same syntax will compile on
different DBMSs if it behaves differently in each one?  And the
standard considers this valid.)

How many other programming langauges besides ANSI SQL treat trailing
spaces as insignificant.


Regarding whether by extension it should be impossible to create strings
with trailing spaces; I side with the SQLite developers who say it isn't
the engine's job to trim blanks in data. Most other engines I've used do
not trim spaces either, even if the field is a varchar.


And rightly so, you should not trim spaces, because spaces are significant.

Herein lies a wider part of the problem.  The ANSI SQL is
inconsistent in how it treats trailing spaces in strings.  On one
hand it wants to preserve them, but on the other hand it wants to
ignore them in its most fundamental operation other than preserving.

(With my "it should not be possible" sentence, I was not saying that
spaces should not be trimmed in the fictional scenario where a
character string does by definition not contain trailing spaces, but
that code specifying them should produce an error rather than
succeed.  The matter is analagous to what would happen if you write
code tha

Re: [sqlite] SQLite character comparisons

2008-01-20 Thread Jay Sprenkle
On Jan 20, 2008 10:19 AM, Fowler, Jeff <[EMAIL PROTECTED]> wrote:
>
> Not trying to be antagonistic, but I'm curious to know how many of you agree 
> with Darren's sentiments on this issue.

Sqlite is DRH's project. Voting doesn't come into the picture
anywhere. If you convince him
to fix it then he might, if not you're wasting your time asking if
people agree with you.

>Sure - we can handle this situation by writing more code looking for
spaces everywhere they might occur.

The source for the database engine is provided. It would be more
efficient to change it there.
You could also provide the code back to the community as an option for
others to use.

I'm not trying to be unpleasant, I just believe where you're headed
isn't going to be as valuable as other options.

--
The PixAddixImage Collector suite:
http://groups-beta.google.com/group/pixaddix

SqliteImporter and SqliteReplicator: Command line utilities for Sqlite
http://www.reddawn.net/~jsprenkl/Sqlite

Cthulhu Bucks!
http://www.cthulhubucks.com

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] SQLite character comparisons

2008-01-20 Thread Doug Currie
On Sunday, January 20, 2008 Fowler, Jeff wrote: 

> briefly, ANSI SQL-92 specifies that when comparing two character
> fields, trailing spaces should be ignored.

From SQL-92 (draft July 1992) section 4.6

> When values of unequal length are compared, if the collating
> sequence for the comparison has the NO PAD attribute and the shorter
> value is equal to a prefix of the longer value, then the shorter
> value is considered less than the longer value. If the collating
> sequence for the comparison has the PAD SPACE attribute, for the
> purposes of the comparison, the shorter value is effectively
> extended to the length of the longer by concatenation of s on
> the right.

section 8.2 also says

> 3) The comparison of two character strings is determined as fol-
>lows:
> 
>a) If the length in characters of X is not equal to the length
>  in characters of Y, then the shorter string is effectively
>  replaced, for the purposes of comparison, with a copy of
>  itself that has been extended to the length of the longer
>  string by concatenation on the right of one or more pad char-
>  acters, where the pad character is chosen based on CS. If
>  CS has the NO PAD attribute, then the pad character is an
>  implementation-dependent character different from any char-
>  acter in the character set of X and Y that collates less
>  than any string under CS. Otherwise, the pad character is a
>  .

So, using this terminology, the SQLite default collating sequence has
the NO PAD attribute, and the pad character is NUL.

Jeff, can you solve your problem with a custom collating sequence?

e

-- 
Doug Currie
Londonderry, NH, USA


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] SQLite character comparisons

2008-01-20 Thread Fowler, Jeff
Hello All,
 
Not trying to be antagonistic, but I'm curious to know how many of you agree 
with Darren's sentiments on this issue. To restate briefly, ANSI SQL-92 
specifies that when comparing two character fields, trailing spaces should be 
ignored. Correct me if I'm wrong Darren, but you feel this is a bad decision, 
and in fact SQLite's implementation of character comparison (respecting 
trailing spaces) is superior to ANSI's specs. Keep in mind this is not some 
obscure issue that can be subject to different interpretations by different 
vendors; it's very clearly stated: "The ANSI standard requires padding for the 
character strings used in comparisons so that their lengths match before 
comparing them."
 
Does anyone know of another RDBMS (ANSI or no) that respects trailing spaces 
when comparing character data? We have both Oracle 10g and SQL Server 2005 in 
house and they both work "correctly" according to the specification. Has anyone 
tried it with DB2 or Informix? What about PostGres and MySQL? Although I 
haven't asked him, I'm guessing Zbigniew's suggestion a while back for an 
auto-trim feature stemmed from this issue. Other than saving space, would there 
be a need to trim data if WHERE, HAVING clauses and joins followed the spec? 
Also, other than performance (which seems to be the primary concern), would 
anyone would be negatively impacted if the current behavior were changed?
 
Our app creates SQLite tables dynamically based on the output from user-defined 
queries that run against data warehouses (of practically any "flavor") we have 
no control over, and we insert the results into SQLite. Sure - we can handle 
this situation by writing more code looking for spaces everywhere they might 
occur. But to me (and maybe only to me?), it makes sense for SQLite -- where 
reasonably possible -- to attempt to follow clear ANSI guidelines, allowing 
developers to override it only in cases where this adherence produces 
undesirable results. I can't see where this is undesirable from an applications 
standpoint, although I can see where performance may be a concern.
 
Regards,
 
Jeff Fowler
 


From: Darren Duncan [mailto:[EMAIL PROTECTED]
Sent: Fri 1/18/2008 4:47 PM
To: sqlite-users@sqlite.org
Subject: RE: [sqlite] SQLite character comparisons



At 4:11 PM -0500 1/18/08, Fowler, Jeff wrote:
>"Better" depends on who you ask - I'd say it's worse, and I bet most
>DBA's would agree. The ANSI standard is to ignore trailing spaces when
>comparing character strings in a WHERE clause, a HAVING clause, or a
>join. So I can take the exact same data, run the exact same query, yet
>get a different answer from SQLite vs. Oracle or SQL Server. In fact, we
>found this issue because we DID get a different answer.

And every SQL implementation already has many differences from every
other one, even with fundamentals, making them all generally
incompatible and proprietary, and this example is right in line with
the kinds of differences they have.  Other implementations have
distinct var-char and fixed-char types, while SQLite only has the
former.

Part of the problem here is that the ANSI standard is itself flawed.

(Maybe this particular case of trailing spaces is an exception, but
many parts of ANSI SQL, AFAIK, are more focused on what the syntax of
SQL is, and they say that the semantics, even of some fundamental
operations, is left up to the implementation to decide for itself.
So what good does it do you if SQL of the same syntax will compile on
different DBMSs if it behaves differently in each one?  And the
standard considers this valid.)

How many other programming langauges besides ANSI SQL treat trailing
spaces as insignificant.

>Regarding whether by extension it should be impossible to create strings
>with trailing spaces; I side with the SQLite developers who say it isn't
>the engine's job to trim blanks in data. Most other engines I've used do
>not trim spaces either, even if the field is a varchar.

And rightly so, you should not trim spaces, because spaces are significant.

Herein lies a wider part of the problem.  The ANSI SQL is
inconsistent in how it treats trailing spaces in strings.  On one
hand it wants to preserve them, but on the other hand it wants to
ignore them in its most fundamental operation other than preserving.

(With my "it should not be possible" sentence, I was not saying that
spaces should not be trimmed in the fictional scenario where a
character string does by definition not contain trailing spaces, but
that code specifying them should produce an error rather than
succeed.  The matter is analagous to what would happen if you write
code that tries to treat the character string literal 'foo' as a
number.)

>But - whether ANSI compliance is considered "bloat" is not really my
>place to comment upon.  I guess it's really is up to the SQLite team.

Y

RE: [sqlite] SQLite character comparisons

2008-01-18 Thread Evans, Mark (Tandem)
Doh!  Wrong thread.  :) :) :)

> -Original Message-
> From: Zbigniew Baniewski [mailto:[EMAIL PROTECTED]
> Sent: Friday, January 18, 2008 5:14 PM
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] SQLite character comparisons
>
> On Fri, Jan 18, 2008 at 11:04:09PM +, Evans, Mark (Tandem) wrote:
>
> > Think of NULL as "value is unknown".
> >
> > With zero length blob/text, value is known:  0-length blob/text
>
> OK, I will :)
> --
> pozdrawiam / regards
>
> Zbigniew Baniewski
>
> --
> ---
> To unsubscribe, send email to [EMAIL PROTECTED]
> --
> ---
>
>

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] SQLite character comparisons

2008-01-18 Thread Zbigniew Baniewski
On Fri, Jan 18, 2008 at 11:04:09PM +, Evans, Mark (Tandem) wrote:

> Think of NULL as "value is unknown".
> 
> With zero length blob/text, value is known:  0-length blob/text

OK, I will :)
-- 
pozdrawiam / regards

Zbigniew Baniewski

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] SQLite character comparisons

2008-01-18 Thread Evans, Mark (Tandem)
Think of NULL as "value is unknown".

With zero length blob/text, value is known:  0-length blob/text

Cheers,
Mark

> -Original Message-
> From: Zbigniew Baniewski [mailto:[EMAIL PROTECTED]
> Sent: Friday, January 18, 2008 4:51 PM
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] SQLite character comparisons
>
> On Fri, Jan 18, 2008 at 02:13:51PM -0800, Darren Duncan wrote:
>
> > Don't read too much into that statement; I'm was not raising the
> > auto-trim thing.
>
> Yes, yes - I know... "Roma locuta"...
> --
> pozdrawiam / regards
>
> Zbigniew Baniewski
>
> --
> ---
> To unsubscribe, send email to [EMAIL PROTECTED]
> --
> ---
>
>

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] SQLite character comparisons

2008-01-18 Thread Zbigniew Baniewski
On Fri, Jan 18, 2008 at 02:13:51PM -0800, Darren Duncan wrote:

> Don't read too much into that statement; I'm was not raising the 
> auto-trim thing.

Yes, yes - I know... "Roma locuta"...
-- 
pozdrawiam / regards

Zbigniew Baniewski

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] SQLite character comparisons

2008-01-18 Thread Darren Duncan

At 10:57 PM +0100 1/18/08, Zbigniew Baniewski wrote:

On Fri, Jan 18, 2008 at 12:32:36PM -0800, Darren Duncan wrote:
 > If trailing spaces were supposed to be insignificant for an equality test,

 then it should not be possible to define a string value containing
 trailing spaces at all.


Yes, yes: quite right... the above reminds me something... ;)


Don't read too much into that statement; I'm was not raising the 
auto-trim thing.  See my previous email in this thread, which replied 
to Jeff Fowler, which addresses his own comment on my statement. -- 
Darren Duncan


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] SQLite character comparisons

2008-01-18 Thread Zbigniew Baniewski
On Fri, Jan 18, 2008 at 12:32:36PM -0800, Darren Duncan wrote:

> If trailing spaces were supposed to be insignificant for an equality test,
> then it should not be possible to define a string value containing
> trailing spaces at all.

Yes, yes: quite right... the above reminds me something... ;)
-- 
pozdrawiam / regards

Zbigniew Baniewski

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] SQLite character comparisons

2008-01-18 Thread Darren Duncan

At 4:11 PM -0500 1/18/08, Fowler, Jeff wrote:

"Better" depends on who you ask - I'd say it's worse, and I bet most
DBA's would agree. The ANSI standard is to ignore trailing spaces when
comparing character strings in a WHERE clause, a HAVING clause, or a
join. So I can take the exact same data, run the exact same query, yet
get a different answer from SQLite vs. Oracle or SQL Server. In fact, we
found this issue because we DID get a different answer.


And every SQL implementation already has many differences from every 
other one, even with fundamentals, making them all generally 
incompatible and proprietary, and this example is right in line with 
the kinds of differences they have.  Other implementations have 
distinct var-char and fixed-char types, while SQLite only has the 
former.


Part of the problem here is that the ANSI standard is itself flawed.

(Maybe this particular case of trailing spaces is an exception, but 
many parts of ANSI SQL, AFAIK, are more focused on what the syntax of 
SQL is, and they say that the semantics, even of some fundamental 
operations, is left up to the implementation to decide for itself. 
So what good does it do you if SQL of the same syntax will compile on 
different DBMSs if it behaves differently in each one?  And the 
standard considers this valid.)


How many other programming langauges besides ANSI SQL treat trailing 
spaces as insignificant.



Regarding whether by extension it should be impossible to create strings
with trailing spaces; I side with the SQLite developers who say it isn't
the engine's job to trim blanks in data. Most other engines I've used do
not trim spaces either, even if the field is a varchar.


And rightly so, you should not trim spaces, because spaces are significant.

Herein lies a wider part of the problem.  The ANSI SQL is 
inconsistent in how it treats trailing spaces in strings.  On one 
hand it wants to preserve them, but on the other hand it wants to 
ignore them in its most fundamental operation other than preserving.


(With my "it should not be possible" sentence, I was not saying that 
spaces should not be trimmed in the fictional scenario where a 
character string does by definition not contain trailing spaces, but 
that code specifying them should produce an error rather than 
succeed.  The matter is analagous to what would happen if you write 
code that tries to treat the character string literal 'foo' as a 
number.)



But - whether ANSI compliance is considered "bloat" is not really my
place to comment upon.  I guess it's really is up to the SQLite team.


Yes it is up to the developers.  And they have already demonstrated 
willingness to do some things differently than ANSI SQL because they 
considered the differences to be improvements, or alternately 
reasonable feature cutting.



Purely from a business usability standpoint (not a programming one), I
would say there's no question that it's far more useful to do
comparisons the ANSI way.


And why is it more useful to ignore trailing spaces than respect 
them.  And if ignoring them is more useful, why do most programming 
languages (AFAIK) respect them?



 If for some reason I truly want to compare &
respect trailing spaces, I can still do that using a function such as
HEX(A) = HEX(B) or something better.


I would argue that it is rediculous to do such ugly things in order 
to do something that should be fundamental, and is simple and 
fundamental in any other language.  Better for basic '=' comparison 
to test that the values are the same, and have some other operator or 
function like 'equal_when_trimmed( v1, v2 )' when you want various 
exceptional comparisons.


-- Darren Duncan

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] SQLite character comparisons

2008-01-18 Thread Fowler, Jeff
"Better" depends on who you ask - I'd say it's worse, and I bet most
DBA's would agree. The ANSI standard is to ignore trailing spaces when
comparing character strings in a WHERE clause, a HAVING clause, or a
join. So I can take the exact same data, run the exact same query, yet
get a different answer from SQLite vs. Oracle or SQL Server. In fact, we
found this issue because we DID get a different answer.

Regarding whether by extension it should be impossible to create strings
with trailing spaces; I side with the SQLite developers who say it isn't
the engine's job to trim blanks in data. Most other engines I've used do
not trim spaces either, even if the field is a varchar.

But - whether ANSI compliance is considered "bloat" is not really my
place to comment upon. I guess it's really is up to the SQLite team.
Purely from a business usability standpoint (not a programming one), I
would say there's no question that it's far more useful to do
comparisons the ANSI way. If for some reason I truly want to compare &
respect trailing spaces, I can still do that using a function such as
HEX(A) = HEX(B) or something better.

- Jeff Fowler

 

-Original Message-
From: Darren Duncan [mailto:[EMAIL PROTECTED] 
Sent: Friday, January 18, 2008 3:33 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] SQLite character comparisons

At 10:43 AM -0500 1/17/08, Fowler, Jeff wrote:
>Hello All,
>I've used SQL Server for over 15 years, Oracle off & on when I have no 
>choice, but SQLite for a couple weeks. I've just learned (today) that 
>SQLite respects trailing spaces when comparing two character fields.
>I.e. 'SQLITE' <> 'SQLITE '
>Is this behavior intentional? Neither SQL Server nor Oracle do this.
>Just curious as to why it works this way.

Because respecting the actual contents of the string is the better way
to do things.

The strings 'SQLITE' and 'SQLITE ' are not the same string.  Just as the
strings 'SQLITE' and 'sqlite' are not the same string.  A computer
language is more logical, predictable, and easy to use when a test for
equality or inequality actually treats every distinct value as distinct.
If trailing spaces were supposed to be insignificant for an equality
test, then it should not be possible to define a string value containing
trailing spaces at all.

Moreover, treating 'SQLITE' and 'SQLITE ' as not being the same string
also is consistent with the SQLite philosophy, because it means SQLite
has fewer exceptions to be concerned with in a simpler set of rules, and
also not having to check lengths and space pad before each compare also
makes the code simpler, and less buggy, and it saves CPU cycles.  A
value equality test is a very common and fundamental thing to do in a
DBMS, and bloating that will have a big performance impact.

-- Darren Duncan


-
To unsubscribe, send email to [EMAIL PROTECTED]

-


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] SQLite character comparisons

2008-01-18 Thread Darren Duncan

At 10:43 AM -0500 1/17/08, Fowler, Jeff wrote:

Hello All,
I've used SQL Server for over 15 years, Oracle off & on when I have no
choice, but SQLite for a couple weeks. I've just learned (today) that
SQLite respects trailing spaces when comparing two character fields.
I.e. 'SQLITE' <> 'SQLITE '
Is this behavior intentional? Neither SQL Server nor Oracle do this.
Just curious as to why it works this way.


Because respecting the actual contents of the string is the better 
way to do things.


The strings 'SQLITE' and 'SQLITE ' are not the same string.  Just as 
the strings 'SQLITE' and 'sqlite' are not the same string.  A 
computer language is more logical, predictable, and easy to use when 
a test for equality or inequality actually treats every distinct 
value as distinct.  If trailing spaces were supposed to be 
insignificant for an equality test, then it should not be possible to 
define a string value containing trailing spaces at all.


Moreover, treating 'SQLITE' and 'SQLITE ' as not being the same 
string also is consistent with the SQLite philosophy, because it 
means SQLite has fewer exceptions to be concerned with in a simpler 
set of rules, and also not having to check lengths and space pad 
before each compare also makes the code simpler, and less buggy, and 
it saves CPU cycles.  A value equality test is a very common and 
fundamental thing to do in a DBMS, and bloating that will have a big 
performance impact.


-- Darren Duncan

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] SQLite character comparisons

2008-01-17 Thread Zbigniew Baniewski
On Thu, Jan 17, 2008 at 05:25:30PM -0500, Fowler, Jeff wrote:

> By the way.. I found this snippet. If I read it right, it seems that
> IGNORING trailing spaces during string comparisons is ANSI standard.

I'm not sure. I was always avoiding such problem by "trim"-ming everything
to be inserted; either before, or during insertion (directly in SQL query).

There is remark in Postgres docs, that it does follow SQL-92 - so probably
the above is a recommendation rather than a rule.

In the Postgres docs I've found:

#v+
  The notations varchar(n) and char(n) are aliases for character varying(n)
  and character(n) [..]
  Values of type character are physically padded with spaces to the specified
  width n, and are stored and displayed that way. However, the padding spaces
  are treated as semantically insignificant. Trailing spaces are disregarded
  when comparing two values of type character, and they will be removed when
  converting a character value to one of the other string types. Note that
  trailing spaces are semantically significant in character varying and text
  values.
#v-

So, you can just use "character" type, to have what you need. But I'm still
talking about Postgres ;) - and you were asking about SQLite.

I'm trimming it all anyway... ;)
-- 
pozdrawiam / regards

Zbigniew Baniewski

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] SQLite character comparisons

2008-01-17 Thread Fowler, Jeff
By the way.. I found this snippet. If I read it right, it seems that
IGNORING trailing spaces during string comparisons is ANSI standard.

SQL Server follows the ANSI/ISO SQL-92 specification (Section 8.2,
, General rules #3) on how to compare strings with
spaces. The ANSI standard requires padding for the character strings
used in comparisons so that their lengths match before comparing them.
The padding directly affects the semantics of WHERE and HAVING clause
predicates and other Transact-SQL string comparisons. For example,
Transact-SQL considers the strings 'abc' and 'abc ' to be equivalent for
most comparison operations. 

-Original Message-
From: Zbigniew Baniewski [mailto:[EMAIL PROTECTED] 
Sent: Thursday, January 17, 2008 5:20 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] SQLite character comparisons

On Thu, Jan 17, 2008 at 10:43:20AM -0500, Fowler, Jeff wrote:

> I've used SQL Server for over 15 years, Oracle off & on when I have no

> choice, but SQLite for a couple weeks. I've just learned (today) that 
> SQLite respects trailing spaces when comparing two character fields.
> I.e. 'SQLITE' <> 'SQLITE '
>  
> Is this behavior intentional? Neither SQL Server nor Oracle do this.
> Just curious as to why it works this way.

PostgreSQL sees the strings that way too:

mydbase=> select 'str' = 'str';
 ?column? 
--
 t
(1 row)

mydbase=> select 'str' = 'str ';
 ?column? 
--
 f
(1 row)


Those are different strings, anyway. Can't recall now, does there any
setting to change this - but you can easily find out at their website.
-- 
pozdrawiam / regards

Zbigniew Baniewski


-
To unsubscribe, send email to [EMAIL PROTECTED]

-


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] SQLite character comparisons

2008-01-17 Thread Zbigniew Baniewski
On Thu, Jan 17, 2008 at 10:43:20AM -0500, Fowler, Jeff wrote:

> I've used SQL Server for over 15 years, Oracle off & on when I have no
> choice, but SQLite for a couple weeks. I've just learned (today) that
> SQLite respects trailing spaces when comparing two character fields.
> I.e. 'SQLITE' <> 'SQLITE '
>  
> Is this behavior intentional? Neither SQL Server nor Oracle do this.
> Just curious as to why it works this way.

PostgreSQL sees the strings that way too:

mydbase=> select 'str' = 'str';
 ?column? 
--
 t
(1 row)

mydbase=> select 'str' = 'str ';
 ?column? 
--
 f
(1 row)


Those are different strings, anyway. Can't recall now, does there any
setting to change this - but you can easily find out at their website.
-- 
pozdrawiam / regards

Zbigniew Baniewski

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] SQLite character comparisons

2008-01-17 Thread Fowler, Jeff
Hello All,
 
I've used SQL Server for over 15 years, Oracle off & on when I have no
choice, but SQLite for a couple weeks. I've just learned (today) that
SQLite respects trailing spaces when comparing two character fields.
I.e. 'SQLITE' <> 'SQLITE '
 
Is this behavior intentional? Neither SQL Server nor Oracle do this.
Just curious as to why it works this way.
 
Thanks,
 
- Jeff