Re: [sqlite] SQLite character comparisons
I feel compelled to throw in my $0.02 here. To everyone who thinks that SQLite should allow 'foo ' == 'foo': SQL was originally conceived as a query *language* -- a way for a human being to request a set of data from a database. It was specifically designed for ad-hoc queries. This little 'magic space trimming' feature exists to match the 'char(N)' data type. A char(10) field is always exactly 10 characters long; longer strings are truncated and shorter strings are space-padded. Most database engines are more efficient at these, because when all rows are the same width, the task of finding a particular row reduces to a simple array lookup; therefore, if performance is a critical issue (and when SQL was first formed, CPUs weren't quite as powerful as they are now.) But this presents a problem: the 'usual' definition of equality would mean that any comparisons to a char(N) field would need to be N characters long, or they would always fail. Since it's stupid to make people count spaces, somebody came up with the solution 'if they enter something shorter, pad it with spaces and then compare.' (If anyone wishes to quote the spec regarding space-extension and varchar(N) fields, first recall that the SQL specifications have been created by committees.) Since SQLite does not have any concept of a fixed-width field of character data, the whole concept of ignoring/appending trailing spaces doesn't even apply. -- -- Stevie-O Real programmers use COPY CON PROGRAM.EXE - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] SQLite character comparisons
Fowler, Jeff wrote: I may very well be mistaken - wouldn't be the first time!! I tried to access the actual specification from ANSI (www.ansi.org), but you either have to purchase them or access one of the "drafts" (which a couple folks in this thread have done), but the draft may or may not match the actual standard. While Dr. Hipp's new RTRIM collation makes this a moot point, the source for my statement came from this article: http://support.microsoft.com/kb/316626 After reading the MS article, I dug into the standard some more to see if I had missed something. It seems there is a requirement that all "standard character set names" define a default collation that has the PAD SPACE property (see section 10.6 general rule 2). 2) A specifies the name of a character set that is defined by a national or international standard. The character repertoire of CS, implied by the character set name>, are defined by the standard defining the character set identified by that . The default collating sequence of the character set is defined by the order of the characters in the standard and has the PAD SPACE characteristic. There is a similar definition for an "implementation defined character set name" in general rule 3. 3) An specifies the name of a character set that is implementation-defined. The character repertoire of CS, implied by the character set name>, are implementation-defined. The default collating sequence of the character set and whether the collating sequence has the NO PAD characteristic or the PAD SPACE characteristic is implementation-defined. But syntax rule 1 in that section seems to say that what is supported is implementation defined. 1) The s and character set name>s that are supported are implementation-defined. These are character sets, and predefined collations, can be used to create additional collations as described in section 11.32 ::= CREATE COLLATION FOR FROM [ ] ::= ::= NO PAD | PAD SPACE If the pad characteristic is not given explicitly, it is inherited implicitly from the existing collation name. So if an implementation provides a standard character set, such as SQL_TEXT, it must also have a default collation with the PAD SPACE property that is the default collation for strings in that character set. The user can create a new collation that explicitly overrides the pad characteristic and use that for NO PAD operation. But ultimately there is no requirement to provide any standard character sets. So all bets are off when switching implementations (as Darren has suggested). Dennis Cote - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] SQLite character comparisons
Hi Dennis, I may very well be mistaken - wouldn't be the first time!! I tried to access the actual specification from ANSI (www.ansi.org), but you either have to purchase them or access one of the "drafts" (which a couple folks in this thread have done), but the draft may or may not match the actual standard. While Dr. Hipp's new RTRIM collation makes this a moot point, the source for my statement came from this article: http://support.microsoft.com/kb/316626 As I'd mentioned, we have both SQL Server 2005 and Oracle 10g and they default to ignore trailing spaces on character compares, although it may be possible to turn this off for those who don't like it. I believe (but am not sure) that MYSql and PostGreSQL ignore them by default also. I understand many of the arguments folks have against this and there's not much point to continuing the discussion; I won't change their mind and they won't change mine! But I guess that's why we have Republicans and Democrats :-) But all is well - the RTRIM collation option is a neat and simple solution. My thanks to Dr. Hipp for that, and also to everyone who has shown an interest in this issue. - Jeff -Original Message- From: Dennis Cote [mailto:[EMAIL PROTECTED] Sent: Monday, January 21, 2008 11:46 AM To: sqlite-users@sqlite.org Subject: Re: [sqlite] SQLite character comparisons Fowler, Jeff wrote: > Hello All, > > Not trying to be antagonistic, but I'm curious to know how many of you agree with Darren's sentiments on this issue. To restate briefly, ANSI SQL-92 specifies that when comparing two character fields, trailing spaces should be ignored. Correct me if I'm wrong Darren, but you feel this is a bad decision, and in fact SQLite's implementation of character comparison (respecting trailing spaces) is superior to ANSI's specs. Keep in mind this is not some obscure issue that can be subject to different interpretations by different vendors; it's very clearly stated: "The ANSI standard requires padding for the character strings used in comparisons so that their lengths match before comparing them." > Jeff, I think you are mistaken about what the ANSI spec says. There are two string types in ANSI SQL, character strings (which come is several subtypes), and binary strings. The following excerpts are taken from the SQL:1999 spec. Section 4.2.1 Character Strings and Collations describes the operations on character strings. It describes comparisons as > Given a collating sequence, two character strings are identical if and > only if they are equal in accordance with the comparison rules > specified in Subclause 8.2, . The collating > sequence used for a particular comparison is determined as in > Subclause 4.2.3, ''Rules determining collating sequence usage''. Binary strings are defined in Section 4.3 as; > A binary string is a sequence of octets that does not have either a > character set or collation associated with it. And their comparison is detailed in 4.3.1 as; > All binary strings are mutually comparable. A binary string is > identical to another binary string if and only if it is equal to that > binary string in accordance with the comparison rules specified in > Subclause 8.2, . General Rules 3 and 4 of section 8.2 describe the comparison of these strings. I have copied these sections below. > 3) The comparison of two character strings is determined as follows: > > a) Let CS be the collating sequence indicated in Subclause 4.2.3, > ''Rules determining collating sequence usage'', based on the declared > types of the two character strings. > > b) If the length in characters of X is not equal to the length in > characters of Y, then the shorter string is effectively replaced, for > the purposes of comparison, with a copy of itself that has been > extended to the length of the longer string by concatenation on the > right of one or more pad characters, where the pad character is chosen > based on CS. If CS has the NO PAD characteristic, then the pad > character is an implementation-dependent character different from any > character in the character set of X and Y that collates less than any > string under CS. Otherwise, the pad character is a . > > c) The result of the comparison of X and Y is given by the collating > sequence CS. > > d) Depending on the collating sequence, two strings may compare as > equal even if they are of different lengths or contain different > sequences of characters. > When any of the operations > MAX, MIN, and DISTINCT reference a grouping column, and the UNION, > EXCEPT, and INTERSECT operators refer to character strings, the > specific value selected by these operations from a set of such equal > values is implementation-dependent. > > NOTE 129 - If the coercibility characteristic of the comparison is > Coercible
Re: [sqlite] SQLite character comparisons
John Elrick <[EMAIL PROTECTED]> wrote: > > If we are ignoring trailing spaces, then by definition: > > ' ' = '' > > and for that matter: > > ' ' = ' ' > Good point. I added these as test cases for the new RTRIM collation. http://www.sqlite.org/cvstrac/chngview?cn=4735 -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] SQLite character comparisons
Fowler, Jeff wrote: Hello All, Not trying to be antagonistic, but I'm curious to know how many of you agree with Darren's sentiments on this issue. To restate briefly, ANSI SQL-92 specifies that when comparing two character fields, trailing spaces should be ignored. Correct me if I'm wrong Darren, but you feel this is a bad decision, and in fact SQLite's implementation of character comparison (respecting trailing spaces) is superior to ANSI's specs. Keep in mind this is not some obscure issue that can be subject to different interpretations by different vendors; it's very clearly stated: "The ANSI standard requires padding for the character strings used in comparisons so that their lengths match before comparing them." Jeff, I think you are mistaken about what the ANSI spec says. There are two string types in ANSI SQL, character strings (which come is several subtypes), and binary strings. The following excerpts are taken from the SQL:1999 spec. Section 4.2.1 Character Strings and Collations describes the operations on character strings. It describes comparisons as Given a collating sequence, two character strings are identical if and only if they are equal in accordance with the comparison rules specified in Subclause 8.2, ‘‘’’. The collating sequence used for a particular comparison is determined as in Subclause 4.2.3, ‘‘Rules determining collating sequence usage’’. Binary strings are defined in Section 4.3 as; A binary string is a sequence of octets that does not have either a character set or collation associated with it. And their comparison is detailed in 4.3.1 as; All binary strings are mutually comparable. A binary string is identical to another binary string if and only if it is equal to that binary string in accordance with the comparison rules specified in Subclause 8.2, ‘‘’’. General Rules 3 and 4 of section 8.2 describe the comparison of these strings. I have copied these sections below. 3) The comparison of two character strings is determined as follows: a) Let CS be the collating sequence indicated in Subclause 4.2.3, ‘‘Rules determining collating sequence usage’’, based on the declared types of the two character strings. b) If the length in characters of X is not equal to the length in characters of Y, then the shorter string is effectively replaced, for the purposes of comparison, with a copy of itself that has been extended to the length of the longer string by concatenation on the right of one or more pad characters, where the pad character is chosen based on CS. If CS has the NO PAD characteristic, then the pad character is an implementation-dependent character different from any character in the character set of X and Y that collates less than any string under CS. Otherwise, the pad character is a . c) The result of the comparison of X and Y is given by the collating sequence CS. d) Depending on the collating sequence, two strings may compare as equal even if they are of different lengths or contain different sequences of characters. When any of the operations MAX, MIN, and DISTINCT reference a grouping column, and the UNION, EXCEPT, and INTERSECT operators refer to character strings, the specific value selected by these operations from a set of such equal values is implementation-dependent. NOTE 129 – If the coercibility characteristic of the comparison is Coercible, then the collating sequence used is the default defined for the character repertoire. See also other Syntax Rules in this Subclause, Subclause 10.6, ‘‘’’, and Subclause 11.30, ‘‘’’. 4) The comparison of two binary string values, X and Y, is determined by comparison of their octets with the same ordinal position. If Xi and Yi are the values of the i-th octets of X and Y, respectively, and if Lx is the length in octets of X AND Ly is the length in octets of Y, then X is equal to Y if and only if Ly = Ly and if Xi = Yi for all i. I note that there is a typo in rule 4 for binary strings; Ly = Ly should be Lx = Ly, since binary strings can only be compared for equality. Rule 3.b details how strings of unequal length are to be compared. It allows exactly the operation performed by SQLite, since it allows collating sequences to have a NO PAD characteristic which results in the shorter string comparing less than the longer string. This distinction also appears in section 4.12 which discusses type conversions and mixing of data types. It says; Values corresponding to the data types CHARACTER, CHARACTER VARYING, and CHARACTER LARGE OBJECT are mutually assignable if and only if they are taken from the same character repertoire. If they are from different character repertoires, then the value of the source of the assignment must be translated to the character repertoire of the target before an assignment is possible. Such translation may be implementation-defined and implicitly performed, in which case the two character data types are also mutually
Re: [sqlite] SQLite character comparisons
Fowler, Jeff wrote: Hello All, Not trying to be antagonistic, but I'm curious to know how many of you agree with Darren's sentiments on this issue. To restate briefly, ANSI SQL-92 specifies that when comparing two character fields, trailing spaces should be ignored. My $0.02. If we are ignoring trailing spaces, then by definition: ' ' = '' and for that matter: ' ' = ' ' whereas 'xx' <> 'xxx' Therefore, from the standpoint of arbitrary text, A may or may not actually be identical to B when A = B I don't consider that type of result to be predicable and therefore would respectfully state that the standard is a potential source of bugs. What could be useful is a function: SELECT * FROM FOO WHERE TRIM(BAR) = TRIM(:bar) or even a shortcut function along the lines of: SELECT * FROM FOO WHERE TEXT_EQUIVALENT(BAR, :bar) However, in my opinion, the operator '=' is better reserved as defining strict equality. John - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] SQLite character comparisons
I don't recall asking for a vote on the subject or asking anyone to agree with me, but thank you for your input anyway Jay. And I do appreciate Dr H. for adding the "COLLATE RTRIM" declaration! Regards, - Jeff From: Jay Sprenkle [mailto:[EMAIL PROTECTED] Sent: Sun 1/20/2008 11:56 AM To: sqlite-users@sqlite.org Subject: Re: [sqlite] SQLite character comparisons On Jan 20, 2008 10:19 AM, Fowler, Jeff <[EMAIL PROTECTED]> wrote: > > Not trying to be antagonistic, but I'm curious to know how many of you agree > with Darren's sentiments on this issue. Sqlite is DRH's project. Voting doesn't come into the picture anywhere. If you convince him to fix it then he might, if not you're wasting your time asking if people agree with you. >Sure - we can handle this situation by writing more code looking for spaces everywhere they might occur. The source for the database engine is provided. It would be more efficient to change it there. You could also provide the code back to the community as an option for others to use. I'm not trying to be unpleasant, I just believe where you're headed isn't going to be as valuable as other options. -- The PixAddixImage Collector suite: http://groups-beta.google.com/group/pixaddix SqliteImporter and SqliteReplicator: Command line utilities for Sqlite http://www.reddawn.net/~jsprenkl/Sqlite Cthulhu Bucks! http://www.cthulhubucks.com <http://www.cthulhubucks.com/> - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] SQLite character comparisons
At 11:19 AM -0500 1/20/08, Fowler, Jeff wrote: To restate briefly, ANSI SQL-92 specifies that when comparing two character fields, trailing spaces should be ignored. Correct me if I'm wrong Darren, but you feel this is a bad decision, and in fact SQLite's implementation of character comparison (respecting trailing spaces) is superior to ANSI's specs. Yes, that is indeed what I am saying. More broadly speaking, and this may already be familiar to some of you who remember several of my writings over the last few years, I believe that while SQL has a lot of good things going for it, it also has numerous flaws, some of which are quite severe in their consequences. I am specifically addressing the ANSI/ISO SQL standard itself with this blame, not any implementation in particular. I make this assessment of SQL both in respect to how much SQL is able to represent the relational model of data that Codd proposed to be used for computer databases, and in respect to how much SQL is constructed according to well-established principles of good language design. As far as I am concerned, any quasi-implementation of SQL that addresses these flaws is something to applaud. And at times that it seems SQLite is already doing things a better way, I am inclined to argue in support of its current status quo. I won't address/re-address the other perceived SQL flaws in this thread, to stay on topic, but I'll further clarify my position on the space-pad thing in light of the previous paragraphs. It may even appear that I changed or reversed my position, but I don't feel that it changed. 1. The most important thing to have in regards to data types and values is to have a fully deterministic (and preferrably simple) concept of value identity, that is, when 2 containers are considered to hold identical values or not, or should I say, when 2 appearances of values are in fact the same one value. 2. The same conceptual value can have multiple physical representations, but this distinction is meant to be abstracted away from the user, so for example if the definition of a data type says that the representations 2.0 and 2.00 are the same value, then an equality test on them should return true; that said, users should not even see the difference then; any display of either physical representation to the user should be normalized to the same thing, such as 2, so when 2 values are considered equal by the system, they look the same to the user, but it is still okay to store them differently behind the scenes. 3. It is okay in the general case for a system's conception of value identity to be different than another system's as long as the rules are clearly documented. In this respect, it is okay for either trailing spaces to be significant, or for them to be non-significant, for determining identity (and by extension, equality), as long as these rules are consistently applied everywhere that value appears. Eg, 2 given character strings Foo and Bar can't be considered identical in some contexts and non-identical in other contexts. If you want to have it both ways, you need to have 2 distinct data types which happen to look similar, eg a CharStrSpSignif data type and a CharStrSpInsig data type, and then you use values of one type in one context and separate values of the other type in other contexts. 4. In this respect, if I don't misunderstand, SQLite's text data type is the CharStrSpSignif data type, and the SQL standard has the CharStrSpInsig type instead; if you consider the 2 systems as having different data types, then this difference of behaviour is explainable. Moreover, you can have your choice of the behaviour in different systems by having both types implemented there to choose from when you want, like you can choose between text and number types now. 5. A more practical example of #2, ignoring the whole spaces thing, is in regard to Unicode codepoints vs graphemes. Even if you are using a consistent byte encoding throughout, such as just UTF-8 or UTF-16-LE, you still have to be concerned with the fact that Unicode has multiple normal forms. Depending on your normal form, such as normal form C vs normal form D, you may have different sequences of code points representing the same grapheme. An example of a single grapheme being the combination of a plain roman letter plus a diacritical mark or accent; in NFC, that may be a single codepoint, in NFD, it might be a sequence of 2 code points. So, it is important for a character string data type to explicitly be considered either as a string of code points or of graphemes, for example. At the higher level abstraction, the 2 forms of letter+accent would be considered identical, but in the lower level abstraction, they would be non-identical. Note that afaik most high-level Unicode systems normally work in the highest abstraction level possible (whether they synchronize the normal form on
Re: [sqlite] SQLite character comparisons
"Fowler, Jeff" <[EMAIL PROTECTED]> wrote: > I'm curious to know how many of you agree with Darren's > sentiments on this issue Changing the behavior of SQLite to ignore trailing spaces is not an option for SQLite version 3, since to do so would result in a incompatible file format All indices created before the change would be invalid since they would use a different collation. There are multiple thousands of SQLite applications and hundreds of millions of existing SQLite database files that depend on this backwards compatibility. To make this change would therefore require bumping the version number up to SQLite 4.0. > > Our app creates SQLite tables dynamically based on the output > from user-defined queries that run against data warehouses > (of practically any "flavor") we have no control over, and > we insert the results into SQLite. Sure - we can handle this > situation by writing more code looking for spaces everywhere > they might occur. But to me (and maybe only to me?), it makes > sense for SQLite -- where reasonably possible -- to attempt > to follow clear ANSI guidelines > Check-in [4732] implements a built-in RTRIM collating sequence that provides the ignore-spaces comparison semantics that you desire. All you have to do is add "COLLATE RTRIM" to the declarations of text columns in your SQLite schema and SQLite will thereafter ignore trailing spaces on comparisons involving those columns. If you do SQL comparisons that do not involve columns, you can put "COLLATE RTRIM" after the comparison operator itself to get this behavior. Example (an actual screen capture): [EMAIL PROTECTED]:~/sqlite/bld> ./sqlite3 SQLite version 3.5.4 Enter ".help" for instructions sqlite> SELECT 'abc'='abc '; 0 sqlite> SELECT 'abc'='abc ' COLLATE RTRIM; 1 -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] SQLite character comparisons
I would suggest that Sqlite is correct having regard to its universal variable length feature and single TEXT type. Compatibility would be achieved by not storing redundant trailing spaces if a string field is required. More efficient storage would be a bonus. Fowler, Jeff wrote: Hello All, Not trying to be antagonistic, but I'm curious to know how many of you agree with Darren's sentiments on this issue. To restate briefly, ANSI SQL-92 specifies that when comparing two character fields, trailing spaces should be ignored. Correct me if I'm wrong Darren, but you feel this is a bad decision, and in fact SQLite's implementation of character comparison (respecting trailing spaces) is superior to ANSI's specs. Keep in mind this is not some obscure issue that can be subject to different interpretations by different vendors; it's very clearly stated: "The ANSI standard requires padding for the character strings used in comparisons so that their lengths match before comparing them." Does anyone know of another RDBMS (ANSI or no) that respects trailing spaces when comparing character data? We have both Oracle 10g and SQL Server 2005 in house and they both work "correctly" according to the specification. Has anyone tried it with DB2 or Informix? What about PostGres and MySQL? Although I haven't asked him, I'm guessing Zbigniew's suggestion a while back for an auto-trim feature stemmed from this issue. Other than saving space, would there be a need to trim data if WHERE, HAVING clauses and joins followed the spec? Also, other than performance (which seems to be the primary concern), would anyone would be negatively impacted if the current behavior were changed? Our app creates SQLite tables dynamically based on the output from user-defined queries that run against data warehouses (of practically any "flavor") we have no control over, and we insert the results into SQLite. Sure - we can handle this situation by writing more code looking for spaces everywhere they might occur. But to me (and maybe only to me?), it makes sense for SQLite -- where reasonably possible -- to attempt to follow clear ANSI guidelines, allowing developers to override it only in cases where this adherence produces undesirable results. I can't see where this is undesirable from an applications standpoint, although I can see where performance may be a concern. Regards, Jeff Fowler From: Darren Duncan [mailto:[EMAIL PROTECTED] Sent: Fri 1/18/2008 4:47 PM To: sqlite-users@sqlite.org Subject: RE: [sqlite] SQLite character comparisons At 4:11 PM -0500 1/18/08, Fowler, Jeff wrote: "Better" depends on who you ask - I'd say it's worse, and I bet most DBA's would agree. The ANSI standard is to ignore trailing spaces when comparing character strings in a WHERE clause, a HAVING clause, or a join. So I can take the exact same data, run the exact same query, yet get a different answer from SQLite vs. Oracle or SQL Server. In fact, we found this issue because we DID get a different answer. And every SQL implementation already has many differences from every other one, even with fundamentals, making them all generally incompatible and proprietary, and this example is right in line with the kinds of differences they have. Other implementations have distinct var-char and fixed-char types, while SQLite only has the former. Part of the problem here is that the ANSI standard is itself flawed. (Maybe this particular case of trailing spaces is an exception, but many parts of ANSI SQL, AFAIK, are more focused on what the syntax of SQL is, and they say that the semantics, even of some fundamental operations, is left up to the implementation to decide for itself. So what good does it do you if SQL of the same syntax will compile on different DBMSs if it behaves differently in each one? And the standard considers this valid.) How many other programming langauges besides ANSI SQL treat trailing spaces as insignificant. Regarding whether by extension it should be impossible to create strings with trailing spaces; I side with the SQLite developers who say it isn't the engine's job to trim blanks in data. Most other engines I've used do not trim spaces either, even if the field is a varchar. And rightly so, you should not trim spaces, because spaces are significant. Herein lies a wider part of the problem. The ANSI SQL is inconsistent in how it treats trailing spaces in strings. On one hand it wants to preserve them, but on the other hand it wants to ignore them in its most fundamental operation other than preserving. (With my "it should not be possible" sentence, I was not saying that spaces should not be trimmed in the fictional scenario where a character string does by definition not contain trailing spaces, but that code specifying them should produce an error rather than succeed. The matter is analagous to what would happen if you write code tha
Re: [sqlite] SQLite character comparisons
On Jan 20, 2008 10:19 AM, Fowler, Jeff <[EMAIL PROTECTED]> wrote: > > Not trying to be antagonistic, but I'm curious to know how many of you agree > with Darren's sentiments on this issue. Sqlite is DRH's project. Voting doesn't come into the picture anywhere. If you convince him to fix it then he might, if not you're wasting your time asking if people agree with you. >Sure - we can handle this situation by writing more code looking for spaces everywhere they might occur. The source for the database engine is provided. It would be more efficient to change it there. You could also provide the code back to the community as an option for others to use. I'm not trying to be unpleasant, I just believe where you're headed isn't going to be as valuable as other options. -- The PixAddixImage Collector suite: http://groups-beta.google.com/group/pixaddix SqliteImporter and SqliteReplicator: Command line utilities for Sqlite http://www.reddawn.net/~jsprenkl/Sqlite Cthulhu Bucks! http://www.cthulhubucks.com - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] SQLite character comparisons
On Sunday, January 20, 2008 Fowler, Jeff wrote: > briefly, ANSI SQL-92 specifies that when comparing two character > fields, trailing spaces should be ignored. From SQL-92 (draft July 1992) section 4.6 > When values of unequal length are compared, if the collating > sequence for the comparison has the NO PAD attribute and the shorter > value is equal to a prefix of the longer value, then the shorter > value is considered less than the longer value. If the collating > sequence for the comparison has the PAD SPACE attribute, for the > purposes of the comparison, the shorter value is effectively > extended to the length of the longer by concatenation of s on > the right. section 8.2 also says > 3) The comparison of two character strings is determined as fol- >lows: > >a) If the length in characters of X is not equal to the length > in characters of Y, then the shorter string is effectively > replaced, for the purposes of comparison, with a copy of > itself that has been extended to the length of the longer > string by concatenation on the right of one or more pad char- > acters, where the pad character is chosen based on CS. If > CS has the NO PAD attribute, then the pad character is an > implementation-dependent character different from any char- > acter in the character set of X and Y that collates less > than any string under CS. Otherwise, the pad character is a > . So, using this terminology, the SQLite default collating sequence has the NO PAD attribute, and the pad character is NUL. Jeff, can you solve your problem with a custom collating sequence? e -- Doug Currie Londonderry, NH, USA - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] SQLite character comparisons
Hello All, Not trying to be antagonistic, but I'm curious to know how many of you agree with Darren's sentiments on this issue. To restate briefly, ANSI SQL-92 specifies that when comparing two character fields, trailing spaces should be ignored. Correct me if I'm wrong Darren, but you feel this is a bad decision, and in fact SQLite's implementation of character comparison (respecting trailing spaces) is superior to ANSI's specs. Keep in mind this is not some obscure issue that can be subject to different interpretations by different vendors; it's very clearly stated: "The ANSI standard requires padding for the character strings used in comparisons so that their lengths match before comparing them." Does anyone know of another RDBMS (ANSI or no) that respects trailing spaces when comparing character data? We have both Oracle 10g and SQL Server 2005 in house and they both work "correctly" according to the specification. Has anyone tried it with DB2 or Informix? What about PostGres and MySQL? Although I haven't asked him, I'm guessing Zbigniew's suggestion a while back for an auto-trim feature stemmed from this issue. Other than saving space, would there be a need to trim data if WHERE, HAVING clauses and joins followed the spec? Also, other than performance (which seems to be the primary concern), would anyone would be negatively impacted if the current behavior were changed? Our app creates SQLite tables dynamically based on the output from user-defined queries that run against data warehouses (of practically any "flavor") we have no control over, and we insert the results into SQLite. Sure - we can handle this situation by writing more code looking for spaces everywhere they might occur. But to me (and maybe only to me?), it makes sense for SQLite -- where reasonably possible -- to attempt to follow clear ANSI guidelines, allowing developers to override it only in cases where this adherence produces undesirable results. I can't see where this is undesirable from an applications standpoint, although I can see where performance may be a concern. Regards, Jeff Fowler From: Darren Duncan [mailto:[EMAIL PROTECTED] Sent: Fri 1/18/2008 4:47 PM To: sqlite-users@sqlite.org Subject: RE: [sqlite] SQLite character comparisons At 4:11 PM -0500 1/18/08, Fowler, Jeff wrote: >"Better" depends on who you ask - I'd say it's worse, and I bet most >DBA's would agree. The ANSI standard is to ignore trailing spaces when >comparing character strings in a WHERE clause, a HAVING clause, or a >join. So I can take the exact same data, run the exact same query, yet >get a different answer from SQLite vs. Oracle or SQL Server. In fact, we >found this issue because we DID get a different answer. And every SQL implementation already has many differences from every other one, even with fundamentals, making them all generally incompatible and proprietary, and this example is right in line with the kinds of differences they have. Other implementations have distinct var-char and fixed-char types, while SQLite only has the former. Part of the problem here is that the ANSI standard is itself flawed. (Maybe this particular case of trailing spaces is an exception, but many parts of ANSI SQL, AFAIK, are more focused on what the syntax of SQL is, and they say that the semantics, even of some fundamental operations, is left up to the implementation to decide for itself. So what good does it do you if SQL of the same syntax will compile on different DBMSs if it behaves differently in each one? And the standard considers this valid.) How many other programming langauges besides ANSI SQL treat trailing spaces as insignificant. >Regarding whether by extension it should be impossible to create strings >with trailing spaces; I side with the SQLite developers who say it isn't >the engine's job to trim blanks in data. Most other engines I've used do >not trim spaces either, even if the field is a varchar. And rightly so, you should not trim spaces, because spaces are significant. Herein lies a wider part of the problem. The ANSI SQL is inconsistent in how it treats trailing spaces in strings. On one hand it wants to preserve them, but on the other hand it wants to ignore them in its most fundamental operation other than preserving. (With my "it should not be possible" sentence, I was not saying that spaces should not be trimmed in the fictional scenario where a character string does by definition not contain trailing spaces, but that code specifying them should produce an error rather than succeed. The matter is analagous to what would happen if you write code that tries to treat the character string literal 'foo' as a number.) >But - whether ANSI compliance is considered "bloat" is not really my >place to comment upon. I guess it's really is up to the SQLite team. Y
RE: [sqlite] SQLite character comparisons
Doh! Wrong thread. :) :) :) > -Original Message- > From: Zbigniew Baniewski [mailto:[EMAIL PROTECTED] > Sent: Friday, January 18, 2008 5:14 PM > To: sqlite-users@sqlite.org > Subject: Re: [sqlite] SQLite character comparisons > > On Fri, Jan 18, 2008 at 11:04:09PM +, Evans, Mark (Tandem) wrote: > > > Think of NULL as "value is unknown". > > > > With zero length blob/text, value is known: 0-length blob/text > > OK, I will :) > -- > pozdrawiam / regards > > Zbigniew Baniewski > > -- > --- > To unsubscribe, send email to [EMAIL PROTECTED] > -- > --- > > - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] SQLite character comparisons
On Fri, Jan 18, 2008 at 11:04:09PM +, Evans, Mark (Tandem) wrote: > Think of NULL as "value is unknown". > > With zero length blob/text, value is known: 0-length blob/text OK, I will :) -- pozdrawiam / regards Zbigniew Baniewski - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] SQLite character comparisons
Think of NULL as "value is unknown". With zero length blob/text, value is known: 0-length blob/text Cheers, Mark > -Original Message- > From: Zbigniew Baniewski [mailto:[EMAIL PROTECTED] > Sent: Friday, January 18, 2008 4:51 PM > To: sqlite-users@sqlite.org > Subject: Re: [sqlite] SQLite character comparisons > > On Fri, Jan 18, 2008 at 02:13:51PM -0800, Darren Duncan wrote: > > > Don't read too much into that statement; I'm was not raising the > > auto-trim thing. > > Yes, yes - I know... "Roma locuta"... > -- > pozdrawiam / regards > > Zbigniew Baniewski > > -- > --- > To unsubscribe, send email to [EMAIL PROTECTED] > -- > --- > > - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] SQLite character comparisons
On Fri, Jan 18, 2008 at 02:13:51PM -0800, Darren Duncan wrote: > Don't read too much into that statement; I'm was not raising the > auto-trim thing. Yes, yes - I know... "Roma locuta"... -- pozdrawiam / regards Zbigniew Baniewski - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] SQLite character comparisons
At 10:57 PM +0100 1/18/08, Zbigniew Baniewski wrote: On Fri, Jan 18, 2008 at 12:32:36PM -0800, Darren Duncan wrote: > If trailing spaces were supposed to be insignificant for an equality test, then it should not be possible to define a string value containing trailing spaces at all. Yes, yes: quite right... the above reminds me something... ;) Don't read too much into that statement; I'm was not raising the auto-trim thing. See my previous email in this thread, which replied to Jeff Fowler, which addresses his own comment on my statement. -- Darren Duncan - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] SQLite character comparisons
On Fri, Jan 18, 2008 at 12:32:36PM -0800, Darren Duncan wrote: > If trailing spaces were supposed to be insignificant for an equality test, > then it should not be possible to define a string value containing > trailing spaces at all. Yes, yes: quite right... the above reminds me something... ;) -- pozdrawiam / regards Zbigniew Baniewski - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] SQLite character comparisons
At 4:11 PM -0500 1/18/08, Fowler, Jeff wrote: "Better" depends on who you ask - I'd say it's worse, and I bet most DBA's would agree. The ANSI standard is to ignore trailing spaces when comparing character strings in a WHERE clause, a HAVING clause, or a join. So I can take the exact same data, run the exact same query, yet get a different answer from SQLite vs. Oracle or SQL Server. In fact, we found this issue because we DID get a different answer. And every SQL implementation already has many differences from every other one, even with fundamentals, making them all generally incompatible and proprietary, and this example is right in line with the kinds of differences they have. Other implementations have distinct var-char and fixed-char types, while SQLite only has the former. Part of the problem here is that the ANSI standard is itself flawed. (Maybe this particular case of trailing spaces is an exception, but many parts of ANSI SQL, AFAIK, are more focused on what the syntax of SQL is, and they say that the semantics, even of some fundamental operations, is left up to the implementation to decide for itself. So what good does it do you if SQL of the same syntax will compile on different DBMSs if it behaves differently in each one? And the standard considers this valid.) How many other programming langauges besides ANSI SQL treat trailing spaces as insignificant. Regarding whether by extension it should be impossible to create strings with trailing spaces; I side with the SQLite developers who say it isn't the engine's job to trim blanks in data. Most other engines I've used do not trim spaces either, even if the field is a varchar. And rightly so, you should not trim spaces, because spaces are significant. Herein lies a wider part of the problem. The ANSI SQL is inconsistent in how it treats trailing spaces in strings. On one hand it wants to preserve them, but on the other hand it wants to ignore them in its most fundamental operation other than preserving. (With my "it should not be possible" sentence, I was not saying that spaces should not be trimmed in the fictional scenario where a character string does by definition not contain trailing spaces, but that code specifying them should produce an error rather than succeed. The matter is analagous to what would happen if you write code that tries to treat the character string literal 'foo' as a number.) But - whether ANSI compliance is considered "bloat" is not really my place to comment upon. I guess it's really is up to the SQLite team. Yes it is up to the developers. And they have already demonstrated willingness to do some things differently than ANSI SQL because they considered the differences to be improvements, or alternately reasonable feature cutting. Purely from a business usability standpoint (not a programming one), I would say there's no question that it's far more useful to do comparisons the ANSI way. And why is it more useful to ignore trailing spaces than respect them. And if ignoring them is more useful, why do most programming languages (AFAIK) respect them? If for some reason I truly want to compare & respect trailing spaces, I can still do that using a function such as HEX(A) = HEX(B) or something better. I would argue that it is rediculous to do such ugly things in order to do something that should be fundamental, and is simple and fundamental in any other language. Better for basic '=' comparison to test that the values are the same, and have some other operator or function like 'equal_when_trimmed( v1, v2 )' when you want various exceptional comparisons. -- Darren Duncan - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] SQLite character comparisons
"Better" depends on who you ask - I'd say it's worse, and I bet most DBA's would agree. The ANSI standard is to ignore trailing spaces when comparing character strings in a WHERE clause, a HAVING clause, or a join. So I can take the exact same data, run the exact same query, yet get a different answer from SQLite vs. Oracle or SQL Server. In fact, we found this issue because we DID get a different answer. Regarding whether by extension it should be impossible to create strings with trailing spaces; I side with the SQLite developers who say it isn't the engine's job to trim blanks in data. Most other engines I've used do not trim spaces either, even if the field is a varchar. But - whether ANSI compliance is considered "bloat" is not really my place to comment upon. I guess it's really is up to the SQLite team. Purely from a business usability standpoint (not a programming one), I would say there's no question that it's far more useful to do comparisons the ANSI way. If for some reason I truly want to compare & respect trailing spaces, I can still do that using a function such as HEX(A) = HEX(B) or something better. - Jeff Fowler -Original Message- From: Darren Duncan [mailto:[EMAIL PROTECTED] Sent: Friday, January 18, 2008 3:33 PM To: sqlite-users@sqlite.org Subject: Re: [sqlite] SQLite character comparisons At 10:43 AM -0500 1/17/08, Fowler, Jeff wrote: >Hello All, >I've used SQL Server for over 15 years, Oracle off & on when I have no >choice, but SQLite for a couple weeks. I've just learned (today) that >SQLite respects trailing spaces when comparing two character fields. >I.e. 'SQLITE' <> 'SQLITE ' >Is this behavior intentional? Neither SQL Server nor Oracle do this. >Just curious as to why it works this way. Because respecting the actual contents of the string is the better way to do things. The strings 'SQLITE' and 'SQLITE ' are not the same string. Just as the strings 'SQLITE' and 'sqlite' are not the same string. A computer language is more logical, predictable, and easy to use when a test for equality or inequality actually treats every distinct value as distinct. If trailing spaces were supposed to be insignificant for an equality test, then it should not be possible to define a string value containing trailing spaces at all. Moreover, treating 'SQLITE' and 'SQLITE ' as not being the same string also is consistent with the SQLite philosophy, because it means SQLite has fewer exceptions to be concerned with in a simpler set of rules, and also not having to check lengths and space pad before each compare also makes the code simpler, and less buggy, and it saves CPU cycles. A value equality test is a very common and fundamental thing to do in a DBMS, and bloating that will have a big performance impact. -- Darren Duncan - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] SQLite character comparisons
At 10:43 AM -0500 1/17/08, Fowler, Jeff wrote: Hello All, I've used SQL Server for over 15 years, Oracle off & on when I have no choice, but SQLite for a couple weeks. I've just learned (today) that SQLite respects trailing spaces when comparing two character fields. I.e. 'SQLITE' <> 'SQLITE ' Is this behavior intentional? Neither SQL Server nor Oracle do this. Just curious as to why it works this way. Because respecting the actual contents of the string is the better way to do things. The strings 'SQLITE' and 'SQLITE ' are not the same string. Just as the strings 'SQLITE' and 'sqlite' are not the same string. A computer language is more logical, predictable, and easy to use when a test for equality or inequality actually treats every distinct value as distinct. If trailing spaces were supposed to be insignificant for an equality test, then it should not be possible to define a string value containing trailing spaces at all. Moreover, treating 'SQLITE' and 'SQLITE ' as not being the same string also is consistent with the SQLite philosophy, because it means SQLite has fewer exceptions to be concerned with in a simpler set of rules, and also not having to check lengths and space pad before each compare also makes the code simpler, and less buggy, and it saves CPU cycles. A value equality test is a very common and fundamental thing to do in a DBMS, and bloating that will have a big performance impact. -- Darren Duncan - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] SQLite character comparisons
On Thu, Jan 17, 2008 at 05:25:30PM -0500, Fowler, Jeff wrote: > By the way.. I found this snippet. If I read it right, it seems that > IGNORING trailing spaces during string comparisons is ANSI standard. I'm not sure. I was always avoiding such problem by "trim"-ming everything to be inserted; either before, or during insertion (directly in SQL query). There is remark in Postgres docs, that it does follow SQL-92 - so probably the above is a recommendation rather than a rule. In the Postgres docs I've found: #v+ The notations varchar(n) and char(n) are aliases for character varying(n) and character(n) [..] Values of type character are physically padded with spaces to the specified width n, and are stored and displayed that way. However, the padding spaces are treated as semantically insignificant. Trailing spaces are disregarded when comparing two values of type character, and they will be removed when converting a character value to one of the other string types. Note that trailing spaces are semantically significant in character varying and text values. #v- So, you can just use "character" type, to have what you need. But I'm still talking about Postgres ;) - and you were asking about SQLite. I'm trimming it all anyway... ;) -- pozdrawiam / regards Zbigniew Baniewski - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] SQLite character comparisons
By the way.. I found this snippet. If I read it right, it seems that IGNORING trailing spaces during string comparisons is ANSI standard. SQL Server follows the ANSI/ISO SQL-92 specification (Section 8.2, , General rules #3) on how to compare strings with spaces. The ANSI standard requires padding for the character strings used in comparisons so that their lengths match before comparing them. The padding directly affects the semantics of WHERE and HAVING clause predicates and other Transact-SQL string comparisons. For example, Transact-SQL considers the strings 'abc' and 'abc ' to be equivalent for most comparison operations. -Original Message- From: Zbigniew Baniewski [mailto:[EMAIL PROTECTED] Sent: Thursday, January 17, 2008 5:20 PM To: sqlite-users@sqlite.org Subject: Re: [sqlite] SQLite character comparisons On Thu, Jan 17, 2008 at 10:43:20AM -0500, Fowler, Jeff wrote: > I've used SQL Server for over 15 years, Oracle off & on when I have no > choice, but SQLite for a couple weeks. I've just learned (today) that > SQLite respects trailing spaces when comparing two character fields. > I.e. 'SQLITE' <> 'SQLITE ' > > Is this behavior intentional? Neither SQL Server nor Oracle do this. > Just curious as to why it works this way. PostgreSQL sees the strings that way too: mydbase=> select 'str' = 'str'; ?column? -- t (1 row) mydbase=> select 'str' = 'str '; ?column? -- f (1 row) Those are different strings, anyway. Can't recall now, does there any setting to change this - but you can easily find out at their website. -- pozdrawiam / regards Zbigniew Baniewski - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] SQLite character comparisons
On Thu, Jan 17, 2008 at 10:43:20AM -0500, Fowler, Jeff wrote: > I've used SQL Server for over 15 years, Oracle off & on when I have no > choice, but SQLite for a couple weeks. I've just learned (today) that > SQLite respects trailing spaces when comparing two character fields. > I.e. 'SQLITE' <> 'SQLITE ' > > Is this behavior intentional? Neither SQL Server nor Oracle do this. > Just curious as to why it works this way. PostgreSQL sees the strings that way too: mydbase=> select 'str' = 'str'; ?column? -- t (1 row) mydbase=> select 'str' = 'str '; ?column? -- f (1 row) Those are different strings, anyway. Can't recall now, does there any setting to change this - but you can easily find out at their website. -- pozdrawiam / regards Zbigniew Baniewski - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] SQLite character comparisons
Hello All, I've used SQL Server for over 15 years, Oracle off & on when I have no choice, but SQLite for a couple weeks. I've just learned (today) that SQLite respects trailing spaces when comparing two character fields. I.e. 'SQLITE' <> 'SQLITE ' Is this behavior intentional? Neither SQL Server nor Oracle do this. Just curious as to why it works this way. Thanks, - Jeff