Re: [HACKERS] [SQL] Case Preservation disregarding case sensitivity?

beau hargis Mon, 30 Oct 2006 19:14:53 -0800

On Friday 27 October 2006 19:38, Joe wrote:
> Hi Beau,
>
> On Fri, 2006-10-27 at 16:23 -0700, beau hargis wrote:
> > I am hoping that there is an easy way to obtain case-preservation with
> > case-insensitivity, or at the very least, case-preservation and complete
> > case-sensitivity, or case-preservation and a consistant case-conversion
> > strategy.
> >
> > The case of the column names need to be preserved because that is the way
> > the schema is designed and most importantly (VERY, VERY IMPORTANT),
> > column names are used in apps as hash values, or as named references
> > which are case sensitive and as such need to be delivered to the client
> > in exactly in the manner specified at the time of table creation.
> >
>
> I went through the same issue in my conversion from MySQL to Postgres
> and (since I had a small application) I ended up changing up all my
> tables and columns "UserProfile" to user_profile.
>
> I'm afraid however, that it's MySQL that is the odd man out.  I haven't
> researched this completely but I believe PG follows either the FIPS-127
> or SQL-92 standard with respect to what are called "delimited
> identifiers".  Basically, this says if you want case sensitivity in
> identifier names, you have to use double quotes wherever you refer to
> the identifier.  Without the double quotes, the SQL implementor can
> either use UPPERCASE (as I believe Oracle and DB2 do) or lowercase (as
> PG does) when it displays those identifiers.
>
> Joe


Again, I am at the same point I was at when searching and searching for 
information on the problem, and I am not sure what the SQL standard has to 
say about it: I do not, and I dont think that anyone else who is struggling 
to deal with the problem does, care about case-sensitivity. I am interested 
in case preservation of column names. I do, indeed, want identifiers treated 
in a case insensitive way, but I want the case PRESERVED in the table 
definitions and I want that case, as preserved, to be reflected in the field 
names as returned by the server to any client library that connects and 
initiates a query. 

Case-preservation is not the same as case-sensitivity; nor is 
case-normalization the same as case-insensitivity. What PostgreSQL is doing 
is converting any, and all, identifiers to a lower case and then matching 
those against the identifiers (as stored in the table definition) in a 
case-sensitive manner. It 'normalizes' the case of the identifiers so that it 
has a common internal representation; the desires of the programmer and 
database architect be damned. 

Referenced specification details:
From FIPS-127:
=======================================
3. Delimited identifiers. In the previous ANSI SQL specification, it was not 
possible for an application to specify identifiers with spaces or other 
special symbols. Also, it was not possible to protect against future assaults 
on the name space for (identifier) by additions to the (reserved word) list. 
The new facility for (delimited identifier) allows a user to enclose all 
identifiers in double-quotation marks, thereby ensuring that the name defined 
or referenced may contain spaces or other special symbols and will not be 
impacted by future additions to the (reserved word) list.
=======================================

From SQL-92/Sec. 5.2:
=======================================
10)The <identifier body> of a <regular identifier> is equivalent
to an <identifier body> in which every letter that is a lower-
case letter is replaced by the equivalent upper-case letter
or letters. This treatment includes determination of equiva-
lence, representation in the Information and Definition Schemas,
representation in the diagnostics area, and similar uses.

11)The <identifier body> of a <regular identifier> (with every
letter that is a lower-case letter replaced by the equivalent
upper-case letter or letters), treated as the repetition of
a <character string literal> that specifies a <character set
specification> of SQL_TEXT, shall not be equal, according to
the comparison rules in Subclause 8.2, "<comparison predicate>",
to any <reserved word> (with every letter that is a lower-case
letter replaced by the equivalent upper-case letter or letters),
treated as the repetition of a <character string literal> that
specifies a <character set specification> of SQL_TEXT.

12)Two <regular identifier>s are equivalent if their <identifier
body>s, considered as the repetition of a <character string
literal> that specifies a <character set specification> of
SQL_TEXT, compare equally according to the comparison rules
in Subclause 8.2, "<comparison predicate>".

13)A <regular identifier> and a <delimited identifier> are equiva-
lent if the <identifier body> of the <regular identifier> (with
every letter that is a lower-case letter replaced by the equiva-
lent upper-case letter or letters) and the <delimited identifier
body> of the <delimited identifier> (with all occurrences of
<quote> replaced by <quote symbol> and all occurrences of <dou-
blequote symbol> replaced by <double quote>), considered as
the repetition of a <character string literal> that specifies a
<character set specification> of SQL_TEXT and an implementation-
defined collation that is sensitive to case, compare equally
according to the comparison rules in Subclause 8.2, "<comparison
predicate>".

14)Two <delimited identifier>s are equivalent if their <delimited
identifier body>s (with all occurrences of <quote> replaced
by <quote symbol> and all occurrences of <doublequote symbol>
replaced by <doublequote>), considered as the repetition of a
<character string literal> that specifies a <character set spec-
ification> of SQL_TEXT and an implementation-defined collation
that is sensitive to case, compare equally according to the
comparison rules in Subclause 8.2, "<comparison predicate>".
========================================

Having installed DB2 Enterprise today and taking it for a spin, it does indeed 
behave in a similar manner. However, after reading through both 
specifications, it seems that DB2 follows more of the spec than PostgreSQL. 
The specifications state that for purpose of comparing identifiers, both 
shall be converted to upper-case. DB2 displays all identifiers in upper-case 
whereas PostgreSQL displays all identifiers in lower-case. This alone would 
be a deviation from the specification. 

It seems, however, that the regular-identifier/delimited-identifier 
distinction is for the embedding of spaces or normally invalid characters 
from other character sets, not for actually creating case-sensitive 
identifiers. It is only a side-effect. It does not enforce any method for the 
representation of the identifiers which the client expects, either. Since a 
query can be constructed which specifies aliases (... AS ...), the 
presentation of the data is implementation/runtime specific. 

Considering the differences that already exist between database systems and 
their varying compliance with SQL and the various extensions that have been 
created, I do not consider that the preservation of case for identifiers 
would violate any SQL standard. The end result being eaiser migration, the 
continued operation of cross-platform tools and applications, and a database 
designer or data architect having more flexibility in the construction and 
presentation of a database.

Again, the distinction between case-sensitivity and case-preservation, and, 
case-insensitivity and case-normalization needs to be recognized. Having 
case-preservation defined in the CREATE TABLE syntax and/or a server 
configuration seems, to me, to be a valuable addition. Of course, making it 
an option would not interfere with the current state of affairs.

I have looked through the source code and have identified places which need to 
be patched to make this possible. Tomorrow I will work on that and see what 
it will impact in the code. Let me know what you think. 


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [SQL] Case Preservation disregarding case sensitivity?

Reply via email to