Fowler, Jeff wrote:
Hello All,
Not trying to be antagonistic, but I'm curious to know how many of you agree with Darren's sentiments on this issue. To restate briefly, ANSI SQL-92 specifies that when comparing two character fields, trailing spaces should be ignored. Correct me if I'm wrong Darren, but you feel this is a bad decision, and in fact SQLite's implementation of character comparison (respecting trailing spaces) is superior to ANSI's specs. Keep in mind this is not some obscure issue that can be subject to different interpretations by different vendors; it's very clearly stated: "The ANSI standard requires padding for the character strings used in comparisons so that their lengths match before comparing them."
Jeff,

I think you are mistaken about what the ANSI spec says.

There are two string types in ANSI SQL, character strings (which come is several subtypes), and binary strings. The following excerpts are taken from the SQL:1999 spec.

Section 4.2.1 Character Strings and Collations describes the operations on character strings. It describes comparisons as
Given a collating sequence, two character strings are identical if and only if they are equal in accordance with the comparison rules specified in Subclause 8.2, ‘‘<comparison predicate>’’. The collating sequence used for a particular comparison is determined as in Subclause 4.2.3, ‘‘Rules
determining collating sequence usage’’.
Binary strings are defined in Section 4.3 as;
A binary string is a sequence of octets that does not have either a character set or collation associated
with it.
And their comparison is detailed in 4.3.1 as;
All binary strings are mutually comparable. A binary string is identical to another binary string if and only if it is equal to that binary string in accordance with the comparison rules specified in
Subclause 8.2, ‘‘<comparison predicate>’’.

General Rules 3 and 4 of section 8.2 <comparison predicate> describe the comparison of these strings. I have copied these sections below.

3) The comparison of two character strings is determined as follows:

a) Let CS be the collating sequence indicated in Subclause 4.2.3, ‘‘Rules determining collating sequence usage’’, based on the declared types of the two character strings.

b) If the length in characters of X is not equal to the length in characters of Y, then the shorter string is effectively replaced, for the purposes of comparison, with a copy of itself that has been extended to the length of the longer string by concatenation on the right of one or more pad characters, where the pad character is chosen based on CS. If CS has the NO PAD characteristic, then the pad character is an implementation-dependent character different from any character in the character set of X and Y that collates less than any string under
CS. Otherwise, the pad character is a <space>.

c) The result of the comparison of X and Y is given by the collating sequence CS.

d) Depending on the collating sequence, two strings may compare as equal even if they are of different lengths or contain different sequences of characters. When any of the operations
MAX, MIN, and DISTINCT reference a grouping column, and the UNION, EXCEPT,
and INTERSECT operators refer to character strings, the specific value selected by these
operations from a set of such equal values is implementation-dependent.

NOTE 129 – If the coercibility characteristic of the comparison is Coercible, then the collating sequence used is the default defined for the character repertoire. See also other Syntax Rules in this Subclause, Subclause 10.6, ‘‘<character set specification>’’, and Subclause 11.30, ‘‘<character set definition>’’.

4) The comparison of two binary string values, X and Y, is determined by comparison of their octets with the same ordinal position. If Xi and Yi are the values of the i-th octets of X and Y, respectively, and if Lx is the length in octets of X AND Ly is the length in octets of Y, then X is
equal to Y if and only if Ly = Ly and if Xi = Yi for all i.
I note that there is a typo in rule 4 for binary strings; Ly = Ly should be Lx = Ly, since binary strings can only be compared for equality.

Rule 3.b details how strings of unequal length are to be compared. It allows exactly the operation performed by SQLite, since it allows collating sequences to have a NO PAD characteristic which results in the shorter string comparing less than the longer string.

This distinction also appears in section 4.12 which discusses type conversions and mixing of data types. It says;
Values corresponding to the data types CHARACTER, CHARACTER VARYING, and CHARACTER LARGE OBJECT are mutually assignable if and only if they are taken from the same character repertoire. If they are from different character repertoires, then the value of the source of the assignment must be translated to the character repertoire of the target before an assignment is possible. Such translation may be implementation-defined and implicitly performed, in which case the two character data types are also mutually assignable. If a store assignment would result in the loss of non-<space> characters due to truncation, then an exception condition is raised. If a retrieval assignment would result in the loss of characters due to truncation, then a warning condition is raised. The values are mutually comparable only if they are mutually assignable and can be coerced to have the same collation. The comparison of two character strings depends on the collating sequence used for the comparison (see Table 3, ‘‘Collating sequence usage for comparisons’’). When values of unequal length are compared, if the collating sequence for the comparison has the NO PAD characteristic and the shorter value is equal to a prefix of the longer value, then the shorter value is considered less than the longer value. If the collating sequence for the comparison has the PAD SPACE characteristic, for the purposes of the comparison, the shorter value is effectively extended to the length of the longer by concatenation of <space>s on the right.

Values corresponding to the binary data type are mutually assignable. If a store assignment would result in the loss of non-zero octets due to truncation, then an exception condition is raised. If a retrieval assignment would result in the loss of octets due to truncation, then a warning condition is raised. When binary string values are compared, they must have exactly the same length (in octets) to be considered equal. Binary string values can only be compared for equality.

Which again explains that a collating sequence can have a NO PAD property which prevents padding the shorter string for comparison, and that binary strings can only be compared for equality.

The only place in the standard that I can find any explicit mention of removing spaces is the description of casting a string to a numeric value. In this case the leading and trailing spaces are to be removed from the string before it is converted.

So, while the standard does allow the operation you describe (actually it does the opposite, it pads the shorter string with spaces, instead of removing trailing spaces from the longer string), it also allows the operation SQLite performs. It is simply the case that all of SQLite's collations have the NO PAD characteristic.

HTH
Dennis Cote



-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to