rihad wrote:
> Hi there. We have a working database that was unfortunately created by
> initdb with default ("C") collation & ctype. All other locale specific
> settings have the value en_US.UTF-8 in postgresql.conf. The database
> itself is multilingual and all its data is stored in UTF-8. Sorting
> doesn't work correctly, though. To fix that, can I just do this:
> 
> 
> update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8'
> where datname='mydb';
> 
> 
> This does seem to work on a testing copy of the database, i.e. select
> lower('БлаБлаБла') now works correctly when connected to that database.
> 
> 
> Is there still any chance for corrupting data by doing this, or indexes
> stopping working etc?
> 
> p.s.: postgres 9.6.3

As explained, yes.  Indexes on string columns will be corrupted.

See this example:

test=# CREATE DATABASE breakme LC_COLLATE "C" LC_CTYPE "C" TEMPLATE template0;
test=# \c breakme
breakme=# CREATE TABLE sort(id integer PRIMARY KEY, val text NOT NULL);
breakme=# INSERT INTO sort VALUES (1, 'LITTLE'), (2, 'big'), (3, 'b-less');
breakme=# CREATE INDEX ON sort(val);
breakme=# SET enable_seqscan=off;  -- force index use
breakme=# SELECT * FROM sort ORDER BY val;
┌────┬────────┐
│ id │  val   │
├────┼────────┤
│  1 │ LITTLE │
│  3 │ b-less │
│  2 │ big    │
└────┴────────┘
(3 rows)

breakme=# UPDATE pg_database SET datcollate='en_US.UTF-8', 
datctype='en_US.UTF-8' WHERE datname='breakme';
breakme=# \c breakme
breakme=# SET enable_seqscan=off;  -- force index use
breakme=# SELECT * FROM sort ORDER BY val;
┌────┬────────┐
│ id │  val   │
├────┼────────┤
│  1 │ LITTLE │
│  3 │ b-less │
│  2 │ big    │
└────┴────────┘
(3 rows)

breakme=# SET enable_seqscan=on;  -- this and the following force sequential 
scan
breakme=# SET enable_bitmapscan=off;
breakme=# SET enable_indexscan=off;
breakme=# SET enable_indexonlyscan=off;
breakme=# SELECT * FROM sort ORDER BY val;  -- this returns the correct order
┌────┬────────┐
│ id │  val   │
├────┼────────┤
│  2 │ big    │
│  3 │ b-less │
│  1 │ LITTLE │
└────┴────────┘
(3 rows)

As you see, your index is still sorted according to the C collation
and scanning it returns wrong results.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to