Re: [HACKERS] psql crashes on encoding mismatch

2011-01-13 Thread Hitoshi Harada
2011/1/13 Tom Lane t...@sss.pgh.pa.us:
 Hitoshi Harada umi.tan...@gmail.com writes:
 I found a crash case (assertion failure) when runing psql -f
 utf8_encoded_script.sql against client_encoding = shift_jis in
 postgresql.conf. Though encoding mismatch is obviously user's fault, a
 crash doesn't explain anything to him.

 I'm not too impressed with this patch: it seems like the most it will
 accomplish is to move the failure to some other, equally obscure, place
 --- because you'll still have a string that's invalidly encoded.
 Moreover, if you've got wrongly encoded data, it wouldn't be hard at all
 for it to mess up psql's lexing; consider cases such as a
 character-that's-not-as-long-as-we-think just in front of a quote mark.

 Shouldn't we instead try to verify the multibyte encoding somewhere
 upstream of here?

I had thought it before going into the patch, too. However, the fact
that psql(fe-misc.c) doesn't have PQverfiymb() although it has
PQmblen() implied to me that encoding verification should be done in
server side perhaps. I might be too ignorant to imagine the lexing
problem of your quote mark, but my crash sample has multibyte
characters in sql comment, which is ignored in the server parsing. If
we decided that the case raises error, wouldn't some existing
applications be broken? I can imagine they are in the same situation
of encoding mismatch and are run without problem I found by chance.

Just for reference I attach the case sql file. To reproduce it:

1. initdb
2. edit client_encoding = shift_jis in postgresql.conf
3. start postgres
4. psql -f case_utf8.sql

Note: the line break should be LF as the file stands. CR-LF cannot
reproduce the problem.

Regards,

-- 
Hitoshi Harada


case_utf8.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql crashes on encoding mismatch

2011-01-12 Thread Tom Lane
Hitoshi Harada umi.tan...@gmail.com writes:
 I found a crash case (assertion failure) when runing psql -f
 utf8_encoded_script.sql against client_encoding = shift_jis in
 postgresql.conf. Though encoding mismatch is obviously user's fault, a
 crash doesn't explain anything to him.

I'm not too impressed with this patch: it seems like the most it will
accomplish is to move the failure to some other, equally obscure, place
--- because you'll still have a string that's invalidly encoded.
Moreover, if you've got wrongly encoded data, it wouldn't be hard at all
for it to mess up psql's lexing; consider cases such as a
character-that's-not-as-long-as-we-think just in front of a quote mark.

Shouldn't we instead try to verify the multibyte encoding somewhere
upstream of here?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers