Thank you for committing this.

At Mon, 13 Mar 2017 21:07:39 +0200, Heikki Linnakangas <> wrote 
in <>
> On 03/13/2017 08:53 PM, Tom Lane wrote:
> > Heikki Linnakangas <> writes:
> >> It would be nice to run the map_checker tool one more time, though, to
> >> verify that the mappings match those from PostgreSQL 9.6.
> >
> > +1
> >
> >> Just to be sure, and after that the map checker can go to the dustbin.
> >
> > Hm, maybe we should keep it around for the next time somebody has a
> > bright
> > idea in this area?
> The map checker compares old-style maps with the new radix maps. The
> next time 'round, we'll need something that compares the radix maps
> with the next great thing. Not sure how easy it would be to adapt.
> Hmm. A somewhat different approach might be more suitable for testing
> across versions, though. We could modify the perl scripts slightly to
> print out SQL statements that exercise every mapping. For every
> supported conversion, the SQL script could:
> 1. create a database in the source encoding.
> 2. set client_encoding='<target encoding>'
> 3. SELECT a string that contains every character in the source
> encoding.

There are many encodings that can be client-encoding but cannot
be database-encoding. And some encodings such as UTF-8 has
several one-way conversion. If we do something like this, it
would be as the following.

1. Encoding test
1-1. create a database in UTF-8
1-2. set client_encoding='<source encoding>'
1-3. INSERT all characters defined in the source encoding.
1-4. set client_encoding='UTF-8'
1-5. SELECT a string that contains every character in UTF-8.
2. Decoding test

.... sucks!

I would like to use convert() function. It can be a large
PL/PgSQL function or a series of "SELECT convert(...)"s. The
latter is doable on-the-fly (by not generating/storing the whole

| -- Test for SJIS->UTF-8 conversion
| ...
| SELECT convert('\0000', 'SJIS', 'UTF-8'); -- results in error
| ...
| SELECT convert('\897e', 'SJIS', 'UTF-8');

> You could then run those SQL statements against old and new server
> version, and verify that you get the same results.

Including the result files in the repository will make this easy
but unacceptably bloats. Put mb/Unicode/README.sanity_check?


Kyotaro Horiguchi
NTT Open Source Software Center

Sent via pgsql-hackers mailing list (
To make changes to your subscription:

Reply via email to