Re: [HACKERS] TODO Item: Consider allowing control of upper/lower case folding of unquoted, identifiers

Russell Smith Thu, 27 Mar 2008 12:52:58 -0700

Hi,

It looks like most of the hard yards will be in getting some form ofconsensus about what should be done for this TODO. I can't see a reasonnot to get started on the design now. If a decision is not able to bemade after 4 years since the original discussion, is it worth removingthe TODO or letting it sit for another 4? But to the actual issue at hand.

Andrew Dunstan attempted to summarize the original 2004 threadhttp://archives.postgresql.org/pgsql-hackers/2006-10/msg01545.php;

--

There was some discussion a couple of years ago on the -hackers listabout it, so you might like to review the archives. The consensus seemedto be that behaviour would need to be set no later than createdb time.


The options I thought of were:

1. current postgres behaviour (we need to do this for legacy reasons, ofcourse, as well as to keep happy the legions who hate using upper casefor anything)

2. strictly spec compliant (same as current behaviour, but folding toupper case for unquoted identifiers rather than lower)

3. fully case sensitive even for unquoted identifiers (not speccompliant at all, but nevertheless possibly attractive especially forpeople migrating from MS SQLServer, where it is an option, IIRC).

--

Supporting all 3 of these behaviours at initdb time is not too invasiveor complicated from my initial investigation. The steps appear to be;

1. parser has to parse incoming identifiers with the correct casingchanges. (currently downcase_truncate_identifier)2. The output quoting needs to quote identifiers using the same rules asthe parser. (currently quote_identifier)3. the client needs to know what quote rules are in place. (libpq:PQfname, PQfnumber)4. psql needs to \ commands to be taught about the fact that case canmean different things to different servers.5. bootstrap needs to correctly case the tables and insert values whenbootstrapping at initdb time. This is only really an issue for uppercase folding.

Many people appear advocate a 4th option to only want the column namesto be case preserved or upper cased. They expect other identifiers willbehave as they do now. This doesn't really bring us any closer to thespec, it takes us away from it as Tom has suggested in the past. Italso appears to increase the complexity and invasiveness of a patch.Being able to support case preservation/sensitivity for all identifiersat initdb time appears to be no extra work than supporting the upper andlower folding versions.

The discussions around having a name as supplied and a quoted versionallow lots of flexibility, probably even down to the session level.However I personally am struggling to get my head around the cornercases for that approach.

If this needs to be at createdb time, I think we add at least thefollowing complexities;

1. all relations cases must be altered when copied from the templatedatabase or quoted when copied.We have no idea what a template database might look like, all viewsand functions would need to be parsed to ensure they point to valid tables.2. shared relations must be able to be accessed using different names indifferent databases, eg PG_DATABASE, pg_database.3. The data in shared relations appears different to the same users indifferent databases.eg my unquoted username is MrRuss, in db1 (upper): MRRUSS, db2 (casesensitive): MrRuss, db3 (lower): mrruss

  My guts tells me that's going to lead to user confusion.

Dumping and restoring databases to different foldings can/will presentan interesting challenge and I'm not sure how to support that. I don'teven know if we want to support that officially.

I'm leaning towards initdb time, mainly because I think a patch can beproduced that isn't to invasive and is much easier to review andactually get applied. I also think that adding the createdb time flagswill push this task beyond my ability to write up a patch. Either waythough, consensus on what implementation we actually want going forwardwill enable some more skilled developer to do this without the pain ofhaving to flesh out the design.

In light of this email and the other comments Tom and Andrew have made,it's very easy to say 'too hard, we can't get agreement'. I would havethought that standards compliance would have been one good reason topush forward with at least the upper case folding ability. Both of theprevious threads on this issue raised lots of questions about possibleoptions but there never seemed to be any knocking the ideas around andgetting consensus phase. I would like to at least nail down some of therequirement, if not all. I have put forward my personal opinion, but Iexpect that is not of significant value as there are many others withmuch more experience than I.


Regards

Russell Smith

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO Item: Consider allowing control of upper/lower case folding of unquoted, identifiers

Reply via email to