Re: [GENERAL] Changing collate & ctype for an existing database

2017-07-14 Thread Albe Laurenz
rihad wrote:
> Btw, can we wrap the update of datcollate  and rebuilding of
> textual indices inside a transaction with effectively 0 downtime?

No.  Building indexes takes time and will lock the tables
until the transaction is done.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Changing collate & ctype for an existing database

2017-07-13 Thread rihad

On 07/12/2017 11:25 PM, Tom Lane wrote:

rihad  writes:

What if only English letters are used in the textual indices (ascii
0-127), would they still be impacted after datctype
"C"->"en_US.UTF-8" change?

Yes, as even minimal testing would have told you.  C sort order is
more case-sensitive, for instance.

regards, tom lane
.

Btw, can we wrap the update of datcollate  and rebuilding of 
textual indices inside a transaction with effectively 0 downtime?




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Changing collate & ctype for an existing database

2017-07-12 Thread rihad

On 07/12/2017 11:25 PM, Tom Lane wrote:

rihad  writes:

What if only English letters are used in the textual indices (ascii
0-127), would they still be impacted after datctype
"C"->"en_US.UTF-8" change?

Yes, as even minimal testing would have told you.  C sort order is
more case-sensitive, for instance.

regards, tom lane
.

Thanks. It would be great if initdb didn't assume an implicit encoding, 
to prevent such fundamental configuration mistakes in the future. More 
often than not collation/ctype settings of an ssh login session used to 
run initdb aren't what must be used to set up the cluster. It'd be great 
if initdb didn't go any further if not provided with an explicit 
encoding. The error message would require the user to think twice before 
proceeding, and to read up on the matter. Explicit is better than 
implicit, as the old saying goes :)




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Changing collate & ctype for an existing database

2017-07-12 Thread Tom Lane
rihad  writes:
> What if only English letters are used in the textual indices (ascii 
> 0-127), would they still be impacted after datctype 
> "C"->"en_US.UTF-8" change?

Yes, as even minimal testing would have told you.  C sort order is
more case-sensitive, for instance.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Changing collate & ctype for an existing database

2017-07-12 Thread rihad

On 07/12/2017 09:31 PM, Tom Lane wrote:

rihad  writes:

On 07/12/2017 01:54 PM, Albe Laurenz wrote:

As you see, your index is still sorted according to the C collation
and scanning it returns wrong results.

This ordering issue can certainly be classified as an inconsistency, but
nothing to lose sleep over. Is this all that is normally meant when
saying "index corruption"?

Laurenz neglected to point out that if the index isn't sorted the way that
the system assumes it is, then searches may fail to find values that are
present (due to descending into the wrong subtree), and by the same token
insertions may fail to enforce uniqueness.  That's pretty corrupt in
my book.

regards, tom lane

What if only English letters are used in the textual indices (ascii 
0-127), would they still be impacted after datctype 
"C"->"en_US.UTF-8" change? Encoding has always been UTF8, btw.



postgres=# \l
  List of databases
   Name|  Owner   | Encoding |   Collate   |Ctype| Access 
privileges

---+--+--+-+-+---
 mydb| myuser   | UTF8 | C   | C   |



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Changing collate & ctype for an existing database

2017-07-12 Thread rihad

On 07/12/2017 09:31 PM, Tom Lane wrote:

rihad  writes:

On 07/12/2017 01:54 PM, Albe Laurenz wrote:

As you see, your index is still sorted according to the C collation
and scanning it returns wrong results.

This ordering issue can certainly be classified as an inconsistency, but
nothing to lose sleep over. Is this all that is normally meant when
saying "index corruption"?

Laurenz neglected to point out that if the index isn't sorted the way that
the system assumes it is, then searches may fail to find values that are
present (due to descending into the wrong subtree), and by the same token
insertions may fail to enforce uniqueness.  That's pretty corrupt in
my book.

regards, tom lane


Wow. It sure is.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Changing collate & ctype for an existing database

2017-07-12 Thread Tom Lane
rihad  writes:
> On 07/12/2017 01:54 PM, Albe Laurenz wrote:
>> As you see, your index is still sorted according to the C collation
>> and scanning it returns wrong results.

> This ordering issue can certainly be classified as an inconsistency, but 
> nothing to lose sleep over. Is this all that is normally meant when 
> saying "index corruption"?

Laurenz neglected to point out that if the index isn't sorted the way that
the system assumes it is, then searches may fail to find values that are
present (due to descending into the wrong subtree), and by the same token
insertions may fail to enforce uniqueness.  That's pretty corrupt in
my book.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Changing collate & ctype for an existing database

2017-07-12 Thread rihad

On 07/12/2017 01:54 PM, Albe Laurenz wrote:

rihad wrote:

Hi there. We have a working database that was unfortunately created by
initdb with default ("C") collation & ctype. All other locale specific
settings have the value en_US.UTF-8 in postgresql.conf. The database
itself is multilingual and all its data is stored in UTF-8. Sorting
doesn't work correctly, though. To fix that, can I just do this:


update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8'
where datname='mydb';


This does seem to work on a testing copy of the database, i.e. select
lower('БлаБлаБла') now works correctly when connected to that database.


Is there still any chance for corrupting data by doing this, or indexes
stopping working etc?

p.s.: postgres 9.6.3

As explained, yes.  Indexes on string columns will be corrupted.

See this example:

test=# CREATE DATABASE breakme LC_COLLATE "C" LC_CTYPE "C" TEMPLATE template0;
test=# \c breakme
breakme=# CREATE TABLE sort(id integer PRIMARY KEY, val text NOT NULL);
breakme=# INSERT INTO sort VALUES (1, 'LITTLE'), (2, 'big'), (3, 'b-less');
breakme=# CREATE INDEX ON sort(val);
breakme=# SET enable_seqscan=off;  -- force index use
breakme=# SELECT * FROM sort ORDER BY val;
┌┬┐
│ id │  val   │
├┼┤
│  1 │ LITTLE │
│  3 │ b-less │
│  2 │ big│
└┴┘
(3 rows)

breakme=# UPDATE pg_database SET datcollate='en_US.UTF-8', 
datctype='en_US.UTF-8' WHERE datname='breakme';
breakme=# \c breakme
breakme=# SET enable_seqscan=off;  -- force index use
breakme=# SELECT * FROM sort ORDER BY val;
┌┬┐
│ id │  val   │
├┼┤
│  1 │ LITTLE │
│  3 │ b-less │
│  2 │ big│
└┴┘
(3 rows)

breakme=# SET enable_seqscan=on;  -- this and the following force sequential 
scan
breakme=# SET enable_bitmapscan=off;
breakme=# SET enable_indexscan=off;
breakme=# SET enable_indexonlyscan=off;
breakme=# SELECT * FROM sort ORDER BY val;  -- this returns the correct order
┌┬┐
│ id │  val   │
├┼┤
│  2 │ big│
│  3 │ b-less │
│  1 │ LITTLE │
└┴┘
(3 rows)

As you see, your index is still sorted according to the C collation
and scanning it returns wrong results.

Yours,
Laurenz Albe


This ordering issue can certainly be classified as an inconsistency, but 
nothing to lose sleep over. Is this all that is normally meant when 
saying "index corruption"? What about updating or deleting the wrong row 
addressed by the textual index that hasn't been rebuilt after 
datcollate/datctype change, complete table/database corruption, or other 
scary night-time stories of this kind? Possible?




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Changing collate & ctype for an existing database

2017-07-12 Thread Albe Laurenz
rihad wrote:
> Hi there. We have a working database that was unfortunately created by
> initdb with default ("C") collation & ctype. All other locale specific
> settings have the value en_US.UTF-8 in postgresql.conf. The database
> itself is multilingual and all its data is stored in UTF-8. Sorting
> doesn't work correctly, though. To fix that, can I just do this:
> 
> 
> update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8'
> where datname='mydb';
> 
> 
> This does seem to work on a testing copy of the database, i.e. select
> lower('БлаБлаБла') now works correctly when connected to that database.
> 
> 
> Is there still any chance for corrupting data by doing this, or indexes
> stopping working etc?
> 
> p.s.: postgres 9.6.3

As explained, yes.  Indexes on string columns will be corrupted.

See this example:

test=# CREATE DATABASE breakme LC_COLLATE "C" LC_CTYPE "C" TEMPLATE template0;
test=# \c breakme
breakme=# CREATE TABLE sort(id integer PRIMARY KEY, val text NOT NULL);
breakme=# INSERT INTO sort VALUES (1, 'LITTLE'), (2, 'big'), (3, 'b-less');
breakme=# CREATE INDEX ON sort(val);
breakme=# SET enable_seqscan=off;  -- force index use
breakme=# SELECT * FROM sort ORDER BY val;
┌┬┐
│ id │  val   │
├┼┤
│  1 │ LITTLE │
│  3 │ b-less │
│  2 │ big│
└┴┘
(3 rows)

breakme=# UPDATE pg_database SET datcollate='en_US.UTF-8', 
datctype='en_US.UTF-8' WHERE datname='breakme';
breakme=# \c breakme
breakme=# SET enable_seqscan=off;  -- force index use
breakme=# SELECT * FROM sort ORDER BY val;
┌┬┐
│ id │  val   │
├┼┤
│  1 │ LITTLE │
│  3 │ b-less │
│  2 │ big│
└┴┘
(3 rows)

breakme=# SET enable_seqscan=on;  -- this and the following force sequential 
scan
breakme=# SET enable_bitmapscan=off;
breakme=# SET enable_indexscan=off;
breakme=# SET enable_indexonlyscan=off;
breakme=# SELECT * FROM sort ORDER BY val;  -- this returns the correct order
┌┬┐
│ id │  val   │
├┼┤
│  2 │ big│
│  3 │ b-less │
│  1 │ LITTLE │
└┴┘
(3 rows)

As you see, your index is still sorted according to the C collation
and scanning it returns wrong results.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Changing collate & ctype for an existing database

2017-07-10 Thread rihad

On 07/11/2017 12:45 AM, Tom Lane wrote:

rihad  writes:

On 07/10/2017 11:07 PM, Tom Lane wrote:

... which that isn't.  I'd suggest checking for indexes that might need
to be rebuilt with this query borrowed from the regression tests:

I ran the query on our production database. Zero results.

Really?  You have no indexes on textual columns?  That seems surprising.
Oops, of course we do, around 10-15 per db. I was initially connected to 
the postgres database when I ran the query, I thought the query you gave 
me was global by looking at it.


So, deciding NOT to reindex all of them risks the corruption of their 
relevant tables?
It could be easier to simply drop and restore the db, albeit with some 
downtime.

Thank you so much for you help.


Do I have the green light to
set datcollate='en_US.UTF-8', datctype='en_US.UTF-8'

Well, I'd double-check that result, but I suppose you can always reindex
later if you find you missed something.

regards, tom lane






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Changing collate & ctype for an existing database

2017-07-10 Thread Tom Lane
rihad  writes:
> On 07/10/2017 11:07 PM, Tom Lane wrote:
>> ... which that isn't.  I'd suggest checking for indexes that might need
>> to be rebuilt with this query borrowed from the regression tests:

> I ran the query on our production database. Zero results.

Really?  You have no indexes on textual columns?  That seems surprising.

> Do I have the green light to
> set datcollate='en_US.UTF-8', datctype='en_US.UTF-8'

Well, I'd double-check that result, but I suppose you can always reindex
later if you find you missed something.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Changing collate & ctype for an existing database

2017-07-10 Thread rihad

On 07/10/2017 11:07 PM, Tom Lane wrote:

rihad  writes:

On 07/10/2017 08:42 PM, Tom Lane wrote:

No, your indexes on text/char/varchar columns will be corrupted
(because their sort order will now be wrong).  If you can reindex
them before doing anything more with the database, you'd be ok
... I think.  Testing on a scratch copy of the database would be
a good idea, if this is valuable data.

Thank you, Tom. But can I still do it for the template1 database?
update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8'
where datname='template1';

It'd be safe to do it on template0, and also on template1 as long as that
has only the original contents ...


It's empty, only hosting a few extensions.

... which that isn't.  I'd suggest checking for indexes that might need
to be rebuilt with this query borrowed from the regression tests:

SELECT indexrelid::regclass, indrelid::regclass, iclass, icoll
FROM (SELECT indexrelid, indrelid,
  unnest(indclass) as iclass, unnest(indcollation) as icoll
   FROM pg_index) ss
WHERE icoll != 0 AND iclass !=
 (SELECT oid FROM pg_opclass
  WHERE opcname = 'text_pattern_ops' AND opcmethod =
(SELECT oid FROM pg_am WHERE amname = 'btree'));

I ran the query on our production database. Zero results. Do I have the 
green light to


set datcollate='en_US.UTF-8', datctype='en_US.UTF-8'

for all our working databases? :) Or for template0 & template1 only?




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Changing collate & ctype for an existing database

2017-07-10 Thread Tom Lane
rihad  writes:
> On 07/10/2017 08:42 PM, Tom Lane wrote:
>> No, your indexes on text/char/varchar columns will be corrupted
>> (because their sort order will now be wrong).  If you can reindex
>> them before doing anything more with the database, you'd be ok
>> ... I think.  Testing on a scratch copy of the database would be
>> a good idea, if this is valuable data.

> Thank you, Tom. But can I still do it for the template1 database?

> update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8'
> where datname='template1';

It'd be safe to do it on template0, and also on template1 as long as that
has only the original contents ...

> It's empty, only hosting a few extensions.

... which that isn't.  I'd suggest checking for indexes that might need
to be rebuilt with this query borrowed from the regression tests:

SELECT indexrelid::regclass, indrelid::regclass, iclass, icoll
FROM (SELECT indexrelid, indrelid,
 unnest(indclass) as iclass, unnest(indcollation) as icoll
  FROM pg_index) ss
WHERE icoll != 0 AND iclass !=
(SELECT oid FROM pg_opclass
 WHERE opcname = 'text_pattern_ops' AND opcmethod =
   (SELECT oid FROM pg_am WHERE amname = 'btree'));

> Now I can't even create a database having a different collation:
> $ createdb -O myuser --locale='en_US.UTF-8' mydb
> createdb: database creation failed: ERROR:  new collation (en_US.UTF-8) is 
> incompatible with the collation of the template database (C)
> HINT:  Use the same collation as in the template database, or use template0 
> as template.

Read the HINT ...

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Changing collate & ctype for an existing database

2017-07-10 Thread rihad

On 07/10/2017 08:42 PM, Tom Lane wrote:

rihad  writes:

Hi there. We have a working database that was unfortunately created by
initdb with default ("C") collation & ctype. All other locale specific
settings have the value en_US.UTF-8 in postgresql.conf. The database
itself is multilingual and all its data is stored in UTF-8. Sorting
doesn't work correctly, though. To fix that, can I just do this:
update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8'
where datname='mydb';

No, your indexes on text/char/varchar columns will be corrupted
(because their sort order will now be wrong).  If you can reindex
them before doing anything more with the database, you'd be ok
... I think.  Testing on a scratch copy of the database would be
a good idea, if this is valuable data.

regards, tom lane


Thank you, Tom. But can I still do it for the template1 database?

update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8'
where datname='template1';

It's empty, only hosting a few extensions. Now I can't even create a 
database having a different collation:


$ createdb -O myuser --locale='en_US.UTF-8' mydb
createdb: database creation failed: ERROR:  new collation (en_US.UTF-8) 
is incompatible with the collation of the template database (C)
HINT:  Use the same collation as in the template database, or use 
template0 as template.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Changing collate & ctype for an existing database

2017-07-10 Thread Tom Lane
rihad  writes:
> Hi there. We have a working database that was unfortunately created by 
> initdb with default ("C") collation & ctype. All other locale specific 
> settings have the value en_US.UTF-8 in postgresql.conf. The database 
> itself is multilingual and all its data is stored in UTF-8. Sorting 
> doesn't work correctly, though. To fix that, can I just do this:

> update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8' 
> where datname='mydb';

No, your indexes on text/char/varchar columns will be corrupted
(because their sort order will now be wrong).  If you can reindex
them before doing anything more with the database, you'd be ok
... I think.  Testing on a scratch copy of the database would be
a good idea, if this is valuable data.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general