subject:"\[HACKERS\] Why do we let CREATE DATABASE reassign encoding\?"

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

2009-05-05 Thread Tom Lane

I wrote:
 Peter Eisentraut pete...@gmx.net writes:
 AFAIR, the only reason that we haven't disallowed this sort of stuff
 years and years ago is that people use it; the Japanese in particular.
 I don't see what is different now.

 What's different now is that 8.4 has already established the principle
 that you have to clone template0 if you want to change the locale of a
 database.  I think this is a good time to establish the same principle
 for encodings.  (Or in other words, if we don't fix it now, when will
 be a better time?)

Attached is a proposed patch (without documentation changes as yet)
for this.  Since the code is already enforcing exact locale match when
cloning a non-template0 database, I just made it act the same for
encoding, without any strange exceptions for SQL_ASCII.

I found that mbregress.sh was already broken by the existing
restrictions, if you try to use it in a database whose default
locale isn't C.  The patch adds switches to fix that.

The patch also incidentally fixes a few ereport's that were missing
errcode values.

Last chance for objections ...

regards, tom lane

Index: src/backend/commands/dbcommands.c
===
RCS file: /cvsroot/pgsql/src/backend/commands/dbcommands.c,v
retrieving revision 1.223
diff -c -r1.223 dbcommands.c
*** src/backend/commands/dbcommands.c   5 May 2009 23:39:55 -   1.223
--- src/backend/commands/dbcommands.c   6 May 2009 00:30:59 -
***
*** 361,367 
  #endif
  (encoding == PG_SQL_ASCII  superuser(
ereport(ERROR,
!   (errmsg(encoding %s does not match locale %s,
pg_encoding_to_char(encoding),
dbctype),
 errdetail(The chosen LC_CTYPE setting requires 
encoding %s.,
--- 361,368 
  #endif
  (encoding == PG_SQL_ASCII  superuser(
ereport(ERROR,
!   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
!errmsg(encoding %s does not match locale %s,
pg_encoding_to_char(encoding),
dbctype),
 errdetail(The chosen LC_CTYPE setting requires 
encoding %s.,
***
*** 374,402 
  #endif
  (encoding == PG_SQL_ASCII  superuser(
ereport(ERROR,
!   (errmsg(encoding %s does not match locale %s,
pg_encoding_to_char(encoding),
dbcollate),
 errdetail(The chosen LC_COLLATE setting requires 
encoding %s.,
   
pg_encoding_to_char(collate_encoding;
  
/*
!* Check that the new locale is compatible with the source database.
 *
!* We know that template0 doesn't contain any indexes that depend on
!* collation or ctype, so template0 can be used as template for
!* any locale.
 */
if (strcmp(dbtemplate, template0) != 0)
{
if (strcmp(dbcollate, src_collate) != 0)
ereport(ERROR,
!   (errmsg(new collation is incompatible 
with the collation of the template database (%s), src_collate),
 errhint(Use the same collation as in 
the template database, or use template0 as template.)));
  
if (strcmp(dbctype, src_ctype) != 0)
ereport(ERROR,
!   (errmsg(new LC_CTYPE is incompatible 
with LC_CTYPE of the template database (%s), src_ctype),
 errhint(Use the same LC_CTYPE as in 
the template database, or use template0 as template.)));
}
  
--- 375,419 
  #endif
  (encoding == PG_SQL_ASCII  superuser(
ereport(ERROR,
!   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
!errmsg(encoding %s does not match locale %s,
pg_encoding_to_char(encoding),
dbcollate),
 errdetail(The chosen LC_COLLATE setting requires 
encoding %s.,
   
pg_encoding_to_char(collate_encoding;
  
/*
!* Check that the new encoding and locale settings match the source
!* database.  We insist on this because we simply copy the source data 
---
!* any non-ASCII data would be wrongly encoded, and any indexes sorted
!* according to the source locale would be wrong.
 *
!* However, we assume

[HACKERS] Why do we let CREATE DATABASE reassign encoding?

2009-04-23 Thread Tom Lane

If I have locale set to C, I can do this:

regression=# create database u8 encoding 'utf8';
CREATE DATABASE
regression=# create database l1 encoding 'latin1' template u8;
CREATE DATABASE

Had I had any actual utf8 data in u8, l1 would now contain
encoding-corrupt information.  Given that we've tried to
clamp down on encoding violations in recent releases, I wonder
why this case is still allowed.

(In non-C locales, this will typically fail because the two
different encodings can't both match the locale.  But I don't
believe it's our policy to enforce encoding validity only for
non-C locales.)

We should presumably let the encoding be changed when cloning
from template0, and probably it's reasonable to trust the user
if either source or destination DB encoding is SQL_ASCII.
In other cases I'm thinking it should fail.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

2009-04-23 Thread Heikki Linnakangas


Tom Lane wrote:

If I have locale set to C, I can do this:

regression=# create database u8 encoding 'utf8';
CREATE DATABASE
regression=# create database l1 encoding 'latin1' template u8;
CREATE DATABASE

Had I had any actual utf8 data in u8, l1 would now contain
encoding-corrupt information.  Given that we've tried to
clamp down on encoding violations in recent releases, I wonder
why this case is still allowed.


Wow, I'm surprised we allow that. Never occurred to me to try.


We should presumably let the encoding be changed when cloning
from template0, and probably it's reasonable to trust the user
if either source or destination DB encoding is SQL_ASCII.
In other cases I'm thinking it should fail.


Agreed, that's exactly what we did with per-database collation.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

2009-04-23 Thread Bill Moran

In response to Tom Lane t...@sss.pgh.pa.us:

 If I have locale set to C, I can do this:
 
 regression=# create database u8 encoding 'utf8';
 CREATE DATABASE
 regression=# create database l1 encoding 'latin1' template u8;
 CREATE DATABASE
 
 Had I had any actual utf8 data in u8, l1 would now contain
 encoding-corrupt information.  Given that we've tried to
 clamp down on encoding violations in recent releases, I wonder
 why this case is still allowed.
 
 (In non-C locales, this will typically fail because the two
 different encodings can't both match the locale.  But I don't
 believe it's our policy to enforce encoding validity only for
 non-C locales.)
 
 We should presumably let the encoding be changed when cloning
 from template0, and probably it's reasonable to trust the user
 if either source or destination DB encoding is SQL_ASCII.
 In other cases I'm thinking it should fail.

On a pedantic level, doesn't this remove the ability to have
databases on a single cluster that are different encodings?  I mean,
if template1 is utf8, and I can't change that using CREATE
DATABASE, then I'm stuck with utf8 for all databases on that
cluster ... unless I'm missing something.

Granted, there's the potential for special cases with databases used
only for templates, but as I see it, this should be allowed, it should
just fail if any data in the template can't be converted to the
desired encoding.  I mean, I can always alter template1 by inserting
non-utf8 data, and then try to use it to create a utf8 encoded
database ...

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

2009-04-23 Thread Andrew Dunstan




Tom Lane wrote:

If I have locale set to C, I can do this:

regression=# create database u8 encoding 'utf8';
CREATE DATABASE
regression=# create database l1 encoding 'latin1' template u8;
CREATE DATABASE

Had I had any actual utf8 data in u8, l1 would now contain
encoding-corrupt information.  Given that we've tried to
clamp down on encoding violations in recent releases, I wonder
why this case is still allowed.

(In non-C locales, this will typically fail because the two
different encodings can't both match the locale.  But I don't
believe it's our policy to enforce encoding validity only for
non-C locales.)

We should presumably let the encoding be changed when cloning
from template0, and probably it's reasonable to trust the user
if either source or destination DB encoding is SQL_ASCII.
In other cases I'm thinking it should fail.


  



Really? You want to forbid selecting an encoding when the source is 
template1, which is the default, and template1 is not SQL_ASCII? So the 
following sequence woiuld be illegal:


initdb -E latin1
createdb -E utf8

I think we have a bit more thinking to do on this - I don't have a 
reasonable solution immediately in my head.



cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

2009-04-23 Thread Tom Lane

Bill Moran wmo...@potentialtech.com writes:
 In response to Tom Lane t...@sss.pgh.pa.us:
 We should presumably let the encoding be changed when cloning
 from template0, and probably it's reasonable to trust the user
 if either source or destination DB encoding is SQL_ASCII.
 In other cases I'm thinking it should fail.

 On a pedantic level, doesn't this remove the ability to have
 databases on a single cluster that are different encodings?  I mean,
 if template1 is utf8, and I can't change that using CREATE
 DATABASE, then I'm stuck with utf8 for all databases on that
 cluster ... unless I'm missing something.

You're supposed to clone from template0, not template1, when creating
DBs that are different in either encoding or locale from the
installation default.  We already enforce this except for having missed
the special case of C locale.  (There might be some corner cases
involving UTF8 on Windows, too; not sure about that.)  The reason is
that template0 is expected to contain only ASCII data, but template1
might not.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

2009-04-23 Thread Heikki Linnakangas


Bill Moran wrote:

In response to Tom Lane t...@sss.pgh.pa.us:

We should presumably let the encoding be changed when cloning
from template0, and probably it's reasonable to trust the user
if either source or destination DB encoding is SQL_ASCII.
In other cases I'm thinking it should fail.


On a pedantic level, doesn't this remove the ability to have
databases on a single cluster that are different encodings?  I mean,
if template1 is utf8, and I can't change that using CREATE
DATABASE, then I'm stuck with utf8 for all databases on that
cluster ... unless I'm missing something.


You could still use template0 as template for a database with any 
encoding, like:


CREATE DATABASE .. TEMPLATE = template0;

We can special case template0 because we know its contents are pure 
7-bit ascii which is compatible with any server encoding.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

2009-04-23 Thread Tom Lane

Andrew Dunstan and...@dunslane.net writes:
 So the following sequence woiuld be illegal:

 initdb -E latin1
 createdb -E utf8

Yes, that's rather the point.  Note that it already *is* illegal
unless you happen to have selected C locale; AFAICS that is an
oversight and not intentional.  For instance, going in the other
direction in en_US locale, I get

$ createdb -E latin1 l1
createdb: database creation failed: ERROR:  encoding LATIN1 does not match 
locale en_US.utf8
DETAIL:  The chosen LC_CTYPE setting requires encoding UTF8.

You can get around this by cloning template0 instead of template1
(we assume template0 contains nothing that's encoding-specific).
Possibly the docs will need to be improved to emphasize that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

2009-04-23 Thread Andrew Dunstan




Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:
  

So the following sequence woiuld be illegal:



  

initdb -E latin1
createdb -E utf8



Yes, that's rather the point.  Note that it already *is* illegal
unless you happen to have selected C locale; AFAICS that is an
oversight and not intentional.  
  


Ok, then I agree that we need a big warning on that in the docs.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

2009-04-23 Thread Bill Moran

In response to Tom Lane t...@sss.pgh.pa.us:

 Bill Moran wmo...@potentialtech.com writes:
  In response to Tom Lane t...@sss.pgh.pa.us:
  We should presumably let the encoding be changed when cloning
  from template0, and probably it's reasonable to trust the user
  if either source or destination DB encoding is SQL_ASCII.
  In other cases I'm thinking it should fail.
 
  On a pedantic level, doesn't this remove the ability to have
  databases on a single cluster that are different encodings?  I mean,
  if template1 is utf8, and I can't change that using CREATE
  DATABASE, then I'm stuck with utf8 for all databases on that
  cluster ... unless I'm missing something.
 
 You're supposed to clone from template0, not template1, when creating
 DBs that are different in either encoding or locale from the
 installation default.  We already enforce this except for having missed
 the special case of C locale.

Ah ... was not aware of that.  It hasn't come up in my usage.

 (There might be some corner cases
 involving UTF8 on Windows, too; not sure about that.)  The reason is
 that template0 is expected to contain only ASCII data, but template1
 might not.

Makes sense, with that explanation.

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

2009-04-23 Thread Greg Stark

So it would still be possible to byass this check by cloning a  
database into SQL_ASCII and then cloning it into the desired encoding?  
Doesn't sound like it really accomplishes much.


I do seem to recall some discussion about this way back. I don't  
recall the conclusion but I remember some talk about detecting an  
empty template database and the ned to reindex.


--
Greg


On 23 Apr 2009, at 19:46, Bill Moran wmo...@potentialtech.com wrote:


In response to Tom Lane t...@sss.pgh.pa.us:


If I have locale set to C, I can do this:

regression=# create database u8 encoding 'utf8';
CREATE DATABASE
regression=# create database l1 encoding 'latin1' template u8;
CREATE DATABASE

Had I had any actual utf8 data in u8, l1 would now contain
encoding-corrupt information.  Given that we've tried to
clamp down on encoding violations in recent releases, I wonder
why this case is still allowed.

(In non-C locales, this will typically fail because the two
different encodings can't both match the locale.  But I don't
believe it's our policy to enforce encoding validity only for
non-C locales.)

We should presumably let the encoding be changed when cloning
from template0, and probably it's reasonable to trust the user
if either source or destination DB encoding is SQL_ASCII.
In other cases I'm thinking it should fail.


On a pedantic level, doesn't this remove the ability to have
databases on a single cluster that are different encodings?  I mean,
if template1 is utf8, and I can't change that using CREATE
DATABASE, then I'm stuck with utf8 for all databases on that
cluster ... unless I'm missing something.

Granted, there's the potential for special cases with databases used
only for templates, but as I see it, this should be allowed, it should
just fail if any data in the template can't be converted to the
desired encoding.  I mean, I can always alter template1 by inserting
non-utf8 data, and then try to use it to create a utf8 encoded
database ...

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

2009-04-23 Thread Tom Lane

Greg Stark greg.st...@enterprisedb.com writes:
 So it would still be possible to byass this check by cloning a  
 database into SQL_ASCII and then cloning it into the desired encoding?  
 Doesn't sound like it really accomplishes much.

Well, it accomplishes preventing stupid encoding violations.  The point
came to mind when I saw a post a little bit ago on -general in which
the poster seemed to imagine that CREATE DATABASE would convert
encodings for him.  Since that is not happening in the foreseeable
future, I think we need to prevent the system from acting like it
would work.

If we wanted to be entirely anal about this, we could allow SQL_ASCII
destination with a different source encoding, but not the reverse.
However, we currently consider that you're on your own to ensure sanity
when using SQL_ASCII as far as locale goes, so I'm not sure why the
policy would be different for encoding.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

2009-04-23 Thread Andrew Dunstan




Tom Lane wrote:

If we wanted to be entirely anal about this, we could allow SQL_ASCII
destination with a different source encoding, but not the reverse.
However, we currently consider that you're on your own to ensure sanity
when using SQL_ASCII as far as locale goes, so I'm not sure why the
policy would be different for encoding.


  


The trouble is that people won't know the provenance of their database. 
I think we should try to guarantee as far as possible that if a database 
has encoding X then all the data in it is valid in that encoding.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

2009-04-23 Thread David E. Wheeler


On Apr 23, 2009, at 12:00 PM, Tom Lane wrote:


You can get around this by cloning template0 instead of template1
(we assume template0 contains nothing that's encoding-specific).
Possibly the docs will need to be improved to emphasize that.


I was just about to suggest that. With this change, template0 is  
suddenly going to be a lot more important for people to know about and  
make use of.


Best,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

2009-04-23 Thread Peter Eisentraut

On Thursday 23 April 2009 22:00:25 Tom Lane wrote:
 Andrew Dunstan and...@dunslane.net writes:
  So the following sequence woiuld be illegal:
 
  initdb -E latin1
  createdb -E utf8

 Yes, that's rather the point.  Note that it already is illegal
 unless you happen to have selected C locale;

AFAIR, the only reason that we haven't disallowed this sort of stuff years and 
years ago is that people use it; the Japanese in particular.  I don't see what 
is different now.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

2009-04-23 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 AFAIR, the only reason that we haven't disallowed this sort of stuff
 years and years ago is that people use it; the Japanese in particular.
 I don't see what is different now.

What's different now is that 8.4 has already established the principle
that you have to clone template0 if you want to change the locale of a
database.  I think this is a good time to establish the same principle
for encodings.  (Or in other words, if we don't fix it now, when will
be a better time?)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

[HACKERS] Why do we let CREATE DATABASE reassign encoding?

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

Re: [HACKERS] Why do we let CREATE DATABASE reassign encoding?

16 matches

Site Navigation

Mail list logo

Footer information