I wrote: > 3. Try to select some "more portable" non-ASCII character, perhaps U+00A0 > (non breaking space) or U+00E1 (a-acute). I think this would probably > work for most encodings but it might still fail in the Far East. Another > objection is that the expected/plpython_unicode.out file would contain > that character in UTF8 form. In principle that would work, since the test > sets client_encoding = utf8 explicitly, but I'm worried about accidental > corruption of the expected file by text editors, file transfers, etc. > (The current usage of U+0080 doesn't suffer from this risk because psql > special-cases printing of multibyte UTF8 control characters, so that we > get exactly "\u0080".)
I did a little bit of experimentation and determined that none of the LATIN1 characters are significantly more portable than what we've got: for instance a-acute fails to convert into 16 of the 33 supported server-side encodings (versus 17 failures for U+0080). However, non-breaking space is significantly better: it converts into all our supported server encodings except EUC_CN, EUC_JP, EUC_KR, EUC_TW. It seems likely that we won't do better than that except with a basic ASCII character. In principle we could make the test "pass" even in these encodings by adding variant expected files, but I doubt it's worth it. I'd be inclined to just add a comment to the regression test file indicating that that's a known failure case, and move on. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers