Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Robert Haas
On Thu, Jun 9, 2011 at 12:39 AM, Jeevan Chalke jeevan.cha...@enterprisedb.com wrote: It's a problem, but without an efficient algorithm for Unicode case folding, any fix we attempt to implement seems like it'll just be moving the problem around. Agree. I read on other mail thread that

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: But now that I re-think about it, I guess what I'm confused about is this code here: if (ch = 'A' ch = 'Z') ch += 'a' - 'A'; else if (IS_HIGHBIT_SET(ch) isupper(ch))

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Robert Haas
On Thu, Jun 9, 2011 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: But now that I re-think about it, I guess what I'm confused about is this code here:                 if (ch = 'A' ch = 'Z')                         ch += 'a' - 'A';                

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Thu, Jun 9, 2011 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote: We are relying on isupper() to not return true when presented with a character fragment in a multibyte locale. Based on Jeevan's original message, it seems like that's not always the

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Robert Haas
On Thu, Jun 9, 2011 at 10:15 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Jun 9, 2011 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote: We are relying on isupper() to not return true when presented with a character fragment in a multibyte locale.

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Thu, Jun 9, 2011 at 10:15 AM, Tom Lane t...@sss.pgh.pa.us wrote: If we need to work around brain-dead isupper() tests, maybe the best thing is to implement two versions of the loop: if (encoding is single byte) ... loop as

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Robert Haas
On Thu, Jun 9, 2011 at 10:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Jun 9, 2011 at 10:15 AM, Tom Lane t...@sss.pgh.pa.us wrote: If we need to work around brain-dead isupper() tests, maybe the best thing is to implement two versions of the loop:

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Thu, Jun 9, 2011 at 10:15 AM, Tom Lane t...@sss.pgh.pa.us wrote: If we need to work around brain-dead isupper() tests, maybe the best thing is to implement two versions of the loop: if (encoding is single byte) That seems like a clear

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Robert Haas
On Thu, Jun 9, 2011 at 11:17 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Jun 9, 2011 at 10:15 AM, Tom Lane t...@sss.pgh.pa.us wrote: If we need to work around brain-dead isupper() tests, maybe the best thing is to implement two versions of the loop:

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Thu, Jun 9, 2011 at 11:17 AM, Tom Lane t...@sss.pgh.pa.us wrote: Hmm ... while the above is easy enough to do in the backend, where we can look at pg_database_encoding_max_length, we have also got instances of this coding pattern in

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Robert Haas
On Thu, Jun 9, 2011 at 1:22 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Jun 9, 2011 at 11:17 AM, Tom Lane t...@sss.pgh.pa.us wrote: Hmm ... while the above is easy enough to do in the backend, where we can look at pg_database_encoding_max_length, we

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: Right. Understood. So let's look at the cases (from git grep pg_strcasecmp and pg_strncasecmp): Also pg_toupper and pg_tolower. Right offhand, it looks like psql *does* assume it can lower-case identifiers this way :-(

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Robert Haas
On Thu, Jun 9, 2011 at 2:58 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Right.  Understood.  So let's look at the cases (from git grep pg_strcasecmp and pg_strncasecmp): Also pg_toupper and pg_tolower.  Right offhand, it looks like psql *does* assume it

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-08 Thread Jeevan Chalke
On Wed, Jun 8, 2011 at 6:22 AM, Robert Haas robertmh...@gmail.com wrote: 2011/6/7 Jeevan Chalke jeevan.cha...@enterprisedb.com: since we smash the identifier to lower case using downcase_truncate_identifier() function, the solution is to make this function should be wide-char aware, like

[HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-07 Thread Jeevan Chalke
Hi Tom, Issue is on Windows: If you see in attached failure.out file, (after running failure.sql) we are getting ERROR: invalid byte sequence for encoding UTF8: 0xe59aff error. Please note that byte sequence we got from database is e5 9a ff, where as actual byte sequence for the wide character

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-07 Thread Robert Haas
2011/6/7 Jeevan Chalke jeevan.cha...@enterprisedb.com: since we smash the identifier to lower case using downcase_truncate_identifier() function, the solution is to make this function should be wide-char aware, like LOWER() function functionality. I see some discussion related to