On Thu, 2009-10-15 at 00:43 +0300, Peter Eisentraut wrote: > On Sun, 2009-10-04 at 10:48 -0400, Tom Lane wrote: > > Peter Eisentraut <pete...@gmx.net> writes: > > > I understand the annoyance, but I think we do need to have an organized > > > way to do testing of non-ASCII data and in particular UTF8 data, because > > > there are an increasing number of special code paths for those. > > > > Well, if you want to keep the test, we should put in the variant with > > \200, because it is now clear that that is in fact the right answer > > in a nontrivial number of environments (arguably *more* cases than > > in which "\u0080" is correct). > > I put in a new variant file. Let's see if it works.
[http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/pl/plpython/expected/plpython_unicode_0.out] Actually, what I committed was really the output I got. Now with your commit my tests started failing again. The difference turns out to be caused by glibc. When you print an invalid UTF-8 byte sequence using "%.*s" when LC_CTYPE is a UTF-8 locale (e.g., en_US.utf8), it prints nothing. Presumably, it gets confused counting the characters for aligning the field width. Test program: #include <locale.h> #include <stdio.h> int main() { setlocale(LC_ALL, ""); printf("%.*s", 1, "\200"); return 0; } This prints nothing (check with od) when LC_CTYPE is en_US.utf8. I think this can be filed under trouble caused by mismatching LC_CTYPE and client encoding and doesn't need further fixing, but it's good to keep in mind. Let's see what the Solaris builds say now. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers