Re: [HACKERS] pg_upgrade, locale and encoding

2014-10-09 Thread Bruce Momjian
On Tue, Oct  7, 2014 at 03:52:24PM +0300, Heikki Linnakangas wrote:
> While looking at bug #11431, I noticed that pg_upgrade still seems
> to think that encoding and locale are cluster-wide properties. We
> got per-database locale support in 8.4, and encoding has been
> per-database much longer than that.
> 
> pg_upgrade checks the encoding and locale of template0 in both
> clusters, and throws an error if they don't match. But it doesn't
> check the locale or encoding of postgres or template1 databases.
> That leads to problems if e.g. the postgres database was dropped and
> recreated with a different encoding or locale in the old cluster. We
> will merrily upgrade it, but strings in the database will be
> incorrectly encoded.

Wow, I never thought someone would do that, but they certainly could ---
good catch.

> I propose the attached patch, for git master. It's more complicated
> in back-branches, as they still support upgrading from pre-8.4
> clusters. We haven't heard any complaints from the field on this, so
> I don't think it's worth trying to back-patch this.

Agreed.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_upgrade, locale and encoding

2014-10-07 Thread Heikki Linnakangas
While looking at bug #11431, I noticed that pg_upgrade still seems to 
think that encoding and locale are cluster-wide properties. We got 
per-database locale support in 8.4, and encoding has been per-database 
much longer than that.


pg_upgrade checks the encoding and locale of template0 in both clusters, 
and throws an error if they don't match. But it doesn't check the locale 
or encoding of postgres or template1 databases. That leads to problems 
if e.g. the postgres database was dropped and recreated with a different 
encoding or locale in the old cluster. We will merrily upgrade it, but 
strings in the database will be incorrectly encoded.


I propose the attached patch, for git master. It's more complicated in 
back-branches, as they still support upgrading from pre-8.4 clusters. We 
haven't heard any complaints from the field on this, so I don't think 
it's worth trying to back-patch this.


This slightly changes the way the locale comparison works. First, it 
ignores the encoding suffix of the locale name. It's of course important 
that the databases have a compatible encoding, but pg_database has a 
separate field for encoding, and that's now compared directly. Secondly, 
it tries to canonicalize the names, by calling setlocale(). That seems 
like a good idea, in response to bug #11431 
(http://www.postgresql.org/message-id/5424090e.9060...@vmware.com).


- Heikki
>From ff44c80710ce16a8268ecfe63b6306026d4db87f Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Tue, 7 Oct 2014 15:38:53 +0300
Subject: [PATCH 1/1] In pg_upgrade, check the encoding and locale of template1
 and postgres dbs.

Lc_collate and lc_ctype have been per-database settings since server version
8.4, but we were still treating them as cluster-wide options, fetching the
values for template0 database, and comparing them. That's backwards; we
don't care about the encoding and locale of the template0 database, as
template0 is guaranteed to contain only ASCII characters. But if there are
any other databases that exist on both clusters (in particular template1 and
postgres databases), their encodings and locales must be compatible.

No backpatching, as earlier versions of pg_upgrade still support upgrading
from 8.3 servers. That would be more complicated.
---
 contrib/pg_upgrade/check.c   | 204 ++-
 contrib/pg_upgrade/controldata.c |  34 ---
 contrib/pg_upgrade/info.c|  14 ++-
 contrib/pg_upgrade/pg_upgrade.h  |   6 +-
 4 files changed, 85 insertions(+), 173 deletions(-)

diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
index bbfcab7..3df4b95 100644
--- a/contrib/pg_upgrade/check.c
+++ b/contrib/pg_upgrade/check.c
@@ -14,12 +14,10 @@
 #include "pg_upgrade.h"
 
 
-static void set_locale_and_encoding(ClusterInfo *cluster);
 static void check_new_cluster_is_empty(void);
-static void check_locale_and_encoding(ControlData *oldctrl,
-		  ControlData *newctrl);
-static bool equivalent_locale(const char *loca, const char *locb);
-static bool equivalent_encoding(const char *chara, const char *charb);
+static void check_databases_are_compatible(void);
+static void check_locale_and_encoding(DbInfo *olddb, DbInfo *newdb);
+static bool equivalent_locale(int category, const char *loca, const char *locb);
 static void check_is_install_user(ClusterInfo *cluster);
 static void check_for_prepared_transactions(ClusterInfo *cluster);
 static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
@@ -81,8 +79,6 @@ check_and_dump_old_cluster(bool live_check)
 	if (!live_check)
 		start_postmaster(&old_cluster, true);
 
-	set_locale_and_encoding(&old_cluster);
-
 	get_pg_database_relfilenode(&old_cluster);
 
 	/* Extract a list of databases and tables from the old cluster */
@@ -127,13 +123,10 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-	set_locale_and_encoding(&new_cluster);
-
-	check_locale_and_encoding(&old_cluster.controldata, &new_cluster.controldata);
-
 	get_db_and_rel_infos(&new_cluster);
 
 	check_new_cluster_is_empty();
+	check_databases_are_compatible();
 
 	check_loadable_libraries();
 
@@ -279,93 +272,24 @@ check_cluster_compatibility(bool live_check)
 
 
 /*
- * set_locale_and_encoding()
- *
- * query the database to get the template0 locale
- */
-static void
-set_locale_and_encoding(ClusterInfo *cluster)
-{
-	ControlData *ctrl = &cluster->controldata;
-	PGconn	   *conn;
-	PGresult   *res;
-	int			i_encoding;
-	int			cluster_version = cluster->major_version;
-
-	conn = connectToServer(cluster, "template1");
-
-	/* for pg < 80400, we got the values from pg_controldata */
-	if (cluster_version >= 80400)
-	{
-		int			i_datcollate;
-		int			i_datctype;
-
-		res = executeQueryOrDie(conn,
-"SELECT datcollate, datctype "
-"FROM	pg_catalog.pg_database "
-"WHERE	datname = 'template0' ");
-		assert(PQntuples(res) == 1);
-
-		i_datcollate = PQfnumber(res, "datcollate");
-		i_datctype = PQfnumber(res