Re: [HACKERS] Identifier case folding notes
On Wednesday 09 July 2008 00:35:07 Tom Lane wrote: Andreas Joseph Krogh [EMAIL PROTECTED] writes: Right. From a user's perspective 4) sounds best. I often run into problems having keywords as column-names: None of the proposals on the table will remove the need to use quotes in that case. I know, but then tools/frameworks won't fail when they produce queries like SELECT USER FROM test, because it sends USER and not user. -- Andreas Joseph Krogh [EMAIL PROTECTED] Senior Software Developer / Manager +-+ OfficeNet AS| The most difficult thing in the world is to | Karenslyst Allé 11 | know how to do a thing and to watch | PO. Box 529 Skøyen | somebody else doing it wrong, without | 0214 Oslo | comment.| NORWAY | | Tlf:+47 24 15 38 90 | | Fax:+47 24 15 38 91 | | Mobile: +47 909 56 963 | | +-+ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
Am Mittwoch, 9. Juli 2008 schrieb Gregory Stark: Peter Eisentraut [EMAIL PROTECTED] writes: One disadvantage is that one could no longer have objects that have names different only by case, but that is probably rare and incredibly stupid and can be neglected. Certainly not if you hope to claim being within a mile of spec Which I don't. -- which seems like the only point of fiddling with this. No, the point is making more applications run. Breaking this would take as further from spec-compliance than we are today. As far as standards compliance goes, I don't think there is a well-defined distance measure, but if you can come up with one, I would actually be quite keen on debating this point. :-) In any case, any of this would only be an option anyway. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
Hi, Peter Eisentraut wrote: ... 4. Compare the name data type in a case-insensitive manner. This would probably address most problem cases. Again, you can't have objects with names different in case only. One condition to implementing this would be that this behavior would have be tied down globally at initdb, because it affects system indexes and shared catalogs. That might be impractical for some, because you'd need different instances for different behaviors, especially when you want to host multiple applications or want to port an affected application to the native PostgreSQL behavior over time. The whole stuff as I understand is to fix the behavior with applications creating objects without quotes and accessing them QUOTEDUPPERCASE? Would a small script fixing the schema by using rename not fix this for many applications? T. smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] Identifier case folding notes
Tino Wildenhain [EMAIL PROTECTED] writes: The whole stuff as I understand is to fix the behavior with applications creating objects without quotes and accessing them QUOTEDUPPERCASE? Would a small script fixing the schema by using rename not fix this for many applications? Well there are other circumstances where this can arise select FOO from (select col as foo from bar) Not to mention temporary objects which are created and dropped inside functions. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
Tom, Unfortunately, they almost certainly don't. I'd bet long odds that what they expect is mysql's traditional behavior, Nope. They're looking for Oracle, which is spec-complaint since they wrote that spec. --Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
On Tue, 2008-07-08 at 19:25 +0200, Peter Eisentraut wrote: 4. Compare the name data type in a case-insensitive manner. This would probably address most problem cases. Again, you can't have objects with names different in case only. One condition to implementing this would be that this behavior would have be tied down globally at initdb, because it affects system indexes and shared catalogs. That might be impractical for some, because you'd need different instances for different behaviors, especially when you want to host multiple applications or want to port an affected application to the native PostgreSQL behavior over time. That sounds the most workable, given your descriptions. If objects are never different solely by case alone, then you will have the same index ordering as if you had sent them all to lower case. Surely it is possible to mix the two approaches somehow? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
Peter Eisentraut [EMAIL PROTECTED] writes: I have had some idle thoughts on the issue of identifier case folding. ... Comments? IMHO, practically the only solid argument for changing from the way we do things now is to meet the letter of the spec. The various sorts of gamesmanship you list would most definitely not meet the letter of the spec; between that and the inevitability of breaking some apps, I'm inclined to reject them all on sight. What I think would perhaps be worth investigating is a compile-time (or at latest initdb-time) option that flips the case folding behavior to SQL-spec-compliant and also changes all the built-in catalog entries to upper case. We would then have a solution we could offer to people who really need to run apps that depend on SQL-spec case folding ... and if the upper case hurts their eyes, or breaks some other apps that they wish they could run in the same DB, well it's their problem. Of course there would be large amounts of work to try to make psql, pg_dump, etc behave as nicely as possible with either case-folding rule, but it doesn't strike me as being so obviously unworkable as to be dismissed at once. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
Am Dienstag, 8. Juli 2008 schrieb Tom Lane: IMHO, practically the only solid argument for changing from the way we do things now is to meet the letter of the spec. Well no. As I have mentioned, there have actually been occasional complaints by people who can't run their code generated by closed-source applications, because they handle the case differently. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
Peter Eisentraut [EMAIL PROTECTED] writes: Am Dienstag, 8. Juli 2008 schrieb Tom Lane: IMHO, practically the only solid argument for changing from the way we do things now is to meet the letter of the spec. Well no. As I have mentioned, there have actually been occasional complaints by people who can't run their code generated by closed-source applications, because they handle the case differently. Sure, otherwise we wouldn't really be worrying about this. But if someone comes to us and says this closed source app requires some weird non-spec-compliant case folding rule, please make Postgres do that, we're going to say no. Their argument only has weight if they say their app expects the SQL-spec behavior. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
Tom, IMHO, practically the only solid argument for changing from the way we do things now is to meet the letter of the spec. The various sorts of gamesmanship you list would most definitely not meet the letter of the spec; between that and the inevitability of breaking some apps, I'm inclined to reject them all on sight. Actually, there are a number of *very* popular database tools, particularly in the Java world (such as Netbeans and BIRT) which do mix quoted and unquoted identifiers. In general, users of those tools reject PostgreSQL as broken for our nonstandard behavoir rather than trying to work around it. So it's not just a standards issue; this problem really *is* hurting us in adoption. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
On Tuesday 08 July 2008 23:04:51 Josh Berkus wrote: Tom, IMHO, practically the only solid argument for changing from the way we do things now is to meet the letter of the spec. The various sorts of gamesmanship you list would most definitely not meet the letter of the spec; between that and the inevitability of breaking some apps, I'm inclined to reject them all on sight. Actually, there are a number of *very* popular database tools, particularly in the Java world (such as Netbeans and BIRT) which do mix quoted and unquoted identifiers. In general, users of those tools reject PostgreSQL as broken for our nonstandard behavoir rather than trying to work around it. So it's not just a standards issue; this problem really *is* hurting us in adoption. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco Right. From a user's perspective 4) sounds best. I often run into problems having keywords as column-names: andreak=# create table test(user varchar); ERROR: syntax error at or near user LINE 1: create table test(user varchar); ^ andreak=# create table test(user varchar); CREATE TABLE andreak=# insert into test(USER) values('testuser'); ERROR: column USER of relation test does not exist LINE 1: insert into test(USER) values('testuser'); ^ andreak=# insert into test(user) values('testuser'); ERROR: syntax error at or near user LINE 1: insert into test(user) values('testuser'); ^ andreak=# insert into test(user) values('testuser'); INSERT 0 1 As you know, the only way of referring to the user-column is to qoute it in lowercase, which many apps and tools don't do. -- Andreas Joseph Krogh [EMAIL PROTECTED] Senior Software Developer / Manager -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
Josh Berkus [EMAIL PROTECTED] wrote: Actually, there are a number of *very* popular database tools, particularly in the Java world (such as Netbeans and BIRT) which do mix quoted and unquoted identifiers. In general, users of those tools reject PostgreSQL as broken for our nonstandard behavoir rather than trying to work around it. Do these tools expect an unquoted identifier to be treated according to the standard? As I read it, an unquoted identifier should be treated identically to the same identifier folded to uppercase and wrapped in quotes, except that it will be guaranteed to be considered an identifier, rather than possibly considered as a reserved word, etc. From our perspective, we're OK with the status quo since we always quote all identifiers. I don't think any of the suggestions would bite us (if implemented bug-free) because we also forbid names which differ only in capitalization. We help out our programmers by letting them ignore quoting (except identifiers which are reserved words) and capitalization when they write queries in our tool; we correct the capitalization and wrap the identifiers in quotes as we generate the Java query classes. Doing something like that in psql autocompletion and in other PostgreSQL tools would be a nice feature, if practicable. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
Tom Lane wrote: What I think would perhaps be worth investigating is a compile-time (or at latest initdb-time) option that flips the case folding behavior to SQL-spec-compliant and also changes all the built-in catalog entries to upper case. We would then have a solution we could offer to people who really need to run apps that depend on SQL-spec case folding ... and if the upper case hurts their eyes, or breaks some other apps that they wish they could run in the same DB, well it's their problem. +1 for a compile-time option for spec-compliant behavior. Even where the spec is stupid (timestamp with time zone literals) it'd be nice to have the option; both for feature completeness checklists and for teachers who want to teach targeting the spec. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
Ron Mayer [EMAIL PROTECTED] wrote: +1 for a compile-time option for spec-compliant behavior. Even where the spec is stupid (timestamp with time zone literals) it'd be nice to have the option; both for feature completeness checklists and for teachers who want to teach targeting the spec. In my world it would be even more important for feature completeness itself, and for production applications written to the spec for portability. But, agreed: +1 -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
Kevin Grittner [EMAIL PROTECTED] writes: Josh Berkus [EMAIL PROTECTED] wrote: Actually, there are a number of *very* popular database tools, particularly in the Java world (such as Netbeans and BIRT) which do mix quoted and unquoted identifiers. Do these tools expect an unquoted identifier to be treated according to the standard? Unfortunately, they almost certainly don't. I'd bet long odds that what they expect is mysql's traditional behavior, which is not even within hailing distance of being spec compliant. (In a quick test, it looks like mysql 5.0's default behavior is never to fold case at all; and then there's the problem that they use the wrong kind of quotes ...) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
Andreas Joseph Krogh [EMAIL PROTECTED] writes: Right. From a user's perspective 4) sounds best. I often run into problems having keywords as column-names: None of the proposals on the table will remove the need to use quotes in that case. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
Peter Eisentraut wrote: I have had some idle thoughts on the issue of identifier case folding. Maybe we can collect our ideas and inch closer to a solution sometime. Or we determine that it's useless and impossible, but then I can at least collect that result in the wiki and point future users to it. Background: PostgreSQL folds unquoted identifiers to lower case. SQL specifies folding to upper case, and most other SQL DBMS do it that way. If an application mixes referring to an object using quoted and unquoted versions of an identifier, then incompatibilities arise. We have always stressed to users that one should refer to an object always unquoted or always quoted. While that remains a good suggestion for a number of reasons, we have seen occasional complaints that some closed source applications violate this rule and therefore cannot be run on PostgreSQL. A number of solutions have been proposed over time, which I summarize here: 1. Change the lexer to fold to upper case, as it is supposed to do according to the SQL standard. This will break almost everything, because almost all built-in objects have lower-case names and thus couldn't be referred to anymore except by quoting. Changing the names of all the internal objects to upper-case names would involve vast code changes, probably break just as much, and make everything uglier. So this approach is unworkable. 2. Fold to upper case, but not when referring built-in objects. The lexer generally doesn't know what a name will refer to, so this is not possible to implement, at least without resorting to lots of hard-coding or horrid kludges. Also, a behavior like this will probably create all kinds of weird inconsistencies, resulting from putting catalog knowledge in the lexer. 3 and 4 -- Two variants of ignore case altogether: 3. Fold all identifiers to lower case, even quoted ones. This would probably in fact fix the breakage of many of the above-mentioned problem applications, and it would in general be very easy to understand for a user. And it could be implemented in about three lines. One disadvantage is that one could no longer have objects that have names different only by case, but that is probably rare and incredibly stupid and can be neglected. The main disadvantage is that the case of identifiers and in particular column labels is lost. So applications and programming interfaces that look up result columns in a case-sensitive manner would fail. And things like SELECT expr AS Nice Heading won't work properly anymore. 4. Compare the name data type in a case-insensitive manner. This would probably address most problem cases. Again, you can't have objects with names different in case only. One condition to implementing this would be that this behavior would have be tied down globally at initdb, because it affects system indexes and shared catalogs. That might be impractical for some, because you'd need different instances for different behaviors, especially when you want to host multiple applications or want to port an affected application to the native PostgreSQL behavior over time. 5. One additional approach I thought of is that you swap the case of identifiers as you lex them (upper to lower, lower to upper), and then swap them back when you send them to the client. This needs a small change in the lexer, one for sending the RowDescription, and support in pg_dump and a few other places if desired. There will, however, be a number of weird, albeit self-imposed, side-effects. I have implemented a little test patch for this. It's weird, but it works in basic ways. Obviously, no solution will ever work completely. And we probably don't want such a solution, because it would create two different and incompatible PostgreSQL universes. If we are aiming for a solution that would allow most affected applications to hobble along, we would probably serve most users. Implementing some or all of 3, 4, and 5 would probably achieve that. I'm not sure if you've read all the archive history on this. Here are the pointers from the TODO list: http://archives.postgresql.org/pgsql-hackers/2004-04/msg00818.php http://archives.postgresql.org/pgsql-hackers/2006-10/msg01527.php http://archives.postgresql.org/pgsql-hackers/2008-03/msg00849.php The fact is that we have substantial groups of users who want different things: . current users who want no change so there is no breakage in existing apps . users on other DBs who want Spec compliance . users on yet other DBs who want case preservation The last group should not be lightly dismissed - it is quite common behaviour on MSSQL as well as on MySQL, so we have some incentive to make this possible to encourage migration. I'm strongly of the opinion therefore that this should be behaviour determined at initdb time (can't make it later because of shared catalogs). I suspect that we won't be able to do all this
Re: [HACKERS] Identifier case folding notes
Peter Eisentraut [EMAIL PROTECTED] writes: One disadvantage is that one could no longer have objects that have names different only by case, but that is probably rare and incredibly stupid and can be neglected. Certainly not if you hope to claim being within a mile of spec -- which seems like the only point of fiddling with this. Breaking this would take as further from spec-compliance than we are today. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Identifier case folding notes
Andrew Dunstan wrote: I'm not sure if you've read all the archive history on this. Here are the pointers from the TODO list: http://archives.postgresql.org/pgsql-hackers/2004-04/msg00818.php http://archives.postgresql.org/pgsql-hackers/2006-10/msg01527.php http://archives.postgresql.org/pgsql-hackers/2008-03/msg00849.php The fact is that we have substantial groups of users who want different things: . current users who want no change so there is no breakage in existing apps . users on other DBs who want Spec compliance . users on yet other DBs who want case preservation The last group should not be lightly dismissed - it is quite common behaviour on MSSQL as well as on MySQL, so we have some incentive to make this possible to encourage migration. I'm strongly of the opinion therefore that this should be behaviour determined at initdb time (can't make it later because of shared catalogs). I suspect that we won't be able to do all this by simple transformations in the lexer, unlike what we do now. But I do think it's worth doing. cheers andrew Hi, as part of the http://archives.postgresql.org/pgsql-hackers/2008-03/msg00849.php thread, I did a reasonable amount of discovery work on making the 3 options andrew presents a reality. As I'm not skilled enough I never got far enough to make them all work at once. I did however get lower case and case preservation working. To make those tow work the catalogs need no changes. Some of the regressions tests expect case folding, so they did need changing to operate correctly. I was unable to make the input files to initdb correctly fold the case of system catalogs for the upper case version. I'm sure somebody with more experience would not find it as difficult as I did. Function names tended to be where all the gotchas were. Count() vs count() vs COUNT() for example. Once the db was up and running, the issue becomes all the supporting tools. psql was made to autocomplete with case preservation, I was going to make pg_dump just quote everything. I then got to the point of adding a fixed GUC like LC_LOCALE that allows psql to read the case folding situation and act according. That is where my progress ended. Attached is what i had worked in. It's a patch against 8.3.1. I know it's not CVS head, but it is what I was using at the time to experiment. Regards Russell === modified file 'src/backend/access/transam/xlog.c' --- src/backend/access/transam/xlog.c 2008-03-27 12:10:18 + +++ src/backend/access/transam/xlog.c 2008-03-27 14:15:13 + @@ -4040,6 +4040,9 @@ PGC_INTERNAL, PGC_S_OVERRIDE); SetConfigOption(lc_ctype, ControlFile-lc_ctype, PGC_INTERNAL, PGC_S_OVERRIDE); + /* Make the fixed case folding visible as GUC variables, too */ + SetConfigOption(identifier_case_folding, ControlFile-identifierCaseFolding, + PGC_INTERNAL, PGC_S_OVERRIDE); } void @@ -4290,6 +4293,10 @@ ControlFile-time = checkPoint.time; ControlFile-checkPoint = checkPoint.redo; ControlFile-checkPointCopy = checkPoint; + + /* Set the case folding option */ + strncpy(ControlFile-identifierCaseFolding, preserved, 9); + /* some additional ControlFile fields are set in WriteControlFile() */ WriteControlFile(); === modified file 'src/backend/catalog/information_schema.sql' --- src/backend/catalog/information_schema.sql 2008-03-27 12:10:18 + +++ src/backend/catalog/information_schema.sql 2008-03-27 12:12:15 + @@ -23,7 +23,7 @@ */ CREATE SCHEMA information_schema; -GRANT USAGE ON SCHEMA information_schema TO PUBLIC; +GRANT usage ON SCHEMA information_schema TO public; SET search_path TO information_schema, public; @@ -33,7 +33,7 @@ /* Expand any 1-D array into a set with integers 1..N */ CREATE FUNCTION _pg_expandarray(IN anyarray, OUT x anyelement, OUT n int) -RETURNS SETOF RECORD +RETURNS SETOF record LANGUAGE sql STRICT IMMUTABLE AS 'select $1[s], s - pg_catalog.array_lower($1,1) + 1 from pg_catalog.generate_series(pg_catalog.array_lower($1,1), @@ -214,7 +214,7 @@ CREATE VIEW information_schema_catalog_name AS SELECT CAST(current_database() AS sql_identifier) AS catalog_name; -GRANT SELECT ON information_schema_catalog_name TO PUBLIC; +GRANT SELECT ON information_schema_catalog_name TO public; /* @@ -241,9 +241,9 @@ FROM pg_auth_members m JOIN pg_authid a ON (m.member = a.oid) JOIN pg_authid b ON (m.roleid = b.oid) -WHERE pg_has_role(a.oid, 'USAGE'); +WHERE pg_has_role(a.oid, 'usage'); -GRANT SELECT ON applicable_roles TO PUBLIC; +GRANT SELECT ON applicable_roles TO public; /* @@ -256,7 +256,7 @@ FROM applicable_roles WHERE is_grantable = 'YES'; -GRANT SELECT ON administrable_role_authorizations TO PUBLIC; +GRANT SELECT ON administrable_role_authorizations TO public; /* @@ -353,7 +353,7 @@ AND a.attnum 0 AND NOT a.attisdropped AND c.relkind in ('c'); -GRANT SELECT ON attributes