subject:"\[HACKERS\] Invalid byte sequence for encoding UTF8, caused due to non wide\-char\-aware downcase_truncate

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Robert Haas

On Thu, Jun 9, 2011 at 12:39 AM, Jeevan Chalke
jeevan.cha...@enterprisedb.com wrote:
 It's a problem, but without an efficient algorithm for Unicode case
 folding, any fix we attempt to implement seems like it'll just be
 moving the problem around.

 Agree.

 I read on other mail thread that str_tolower() is a  wide-character-aware
 lower function but it is also a collation-aware and hence might change its
 behaviour wrt change in locale. However, Tom suggested that we need to have
 non-locale-dependent case folding algorithm.

 But still for same locale on same machine, where we can able to create a
 table, insert some data, we cannot retrieve it. Don't you think it is more
 serious and we need a quick solution here? As said earlier it may even lead
 to pg_dump failures. Given that str_tolower() functionality is locale
 dependent but still it will resolve this particular issue. Not sure, there
 might be a performance issue but at-least we are not giving an error.

Well, as I understand it, the problem here is that if someone goes and
changes the locale, then you might massively break the user's
application.  For example, if the user says:

CREATE TABLE FOO (...);
SELECT * FROM FOO;

...that'll work, of course, because whatever you get when you downcase
FOO will be the same both times.  But if the locale now changes, then
the next...

SELECT * FROM FOO;

...might fail, because the new downcasing of FOO might not match the old one.

You could argue that that's better than the current situation, but
it's not clear-cut.

But now that I re-think about it, I guess what I'm confused about is
this code here:

if (ch = 'A'  ch = 'Z')
ch += 'a' - 'A';
else if (IS_HIGHBIT_SET(ch)  isupper(ch))
ch = tolower(ch);
result[i] = (char) ch;

It seems to me that we're downcasing the first byte of each wide
character and ignoring the rest... which seems like it can't possibly
be a good idea in a multi-byte encoding.  Perhaps we could keep that
approach for single-byte encodings and just pass through multi-byte
characters untouched?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 But now that I re-think about it, I guess what I'm confused about is
 this code here:

 if (ch = 'A'  ch = 'Z')
 ch += 'a' - 'A';
 else if (IS_HIGHBIT_SET(ch)  isupper(ch))
 ch = tolower(ch);
 result[i] = (char) ch;

The expected behavior there is that case-folding of non-ASCII characters
will occur in single-byte encodings but nothing will happen to
multi-byte characters.  We are relying on isupper() to not return true
when presented with a character fragment in a multibyte locale.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Robert Haas

On Thu, Jun 9, 2011 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 But now that I re-think about it, I guess what I'm confused about is
 this code here:

                 if (ch = 'A'  ch = 'Z')
                         ch += 'a' - 'A';
                 else if (IS_HIGHBIT_SET(ch)  isupper(ch))
                         ch = tolower(ch);
                 result[i] = (char) ch;

 The expected behavior there is that case-folding of non-ASCII characters
 will occur in single-byte encodings but nothing will happen to
 multi-byte characters.  We are relying on isupper() to not return true
 when presented with a character fragment in a multibyte locale.

Based on Jeevan's original message, it seems like that's not always
the case, at least on Windows.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Thu, Jun 9, 2011 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 We are relying on isupper() to not return true
 when presented with a character fragment in a multibyte locale.

 Based on Jeevan's original message, it seems like that's not always
 the case, at least on Windows.

Hmm.  Now that you mention it, I think the same has been said about OSX.

If we need to work around brain-dead isupper() tests, maybe the best
thing is to implement two versions of the loop:

if (encoding is single byte)
... loop as it stands ...
else
... loop without the else if part

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Robert Haas

On Thu, Jun 9, 2011 at 10:15 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Jun 9, 2011 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 We are relying on isupper() to not return true
 when presented with a character fragment in a multibyte locale.

 Based on Jeevan's original message, it seems like that's not always
 the case, at least on Windows.

 Hmm.  Now that you mention it, I think the same has been said about OSX.

 If we need to work around brain-dead isupper() tests, maybe the best
 thing is to implement two versions of the loop:

        if (encoding is single byte)
                ... loop as it stands ...
        else
                ... loop without the else if part

That seems like a clear improvement.  It's a long way from perfect,
but still worthwhile.

Would we back-patch that?  Just do it in master?  Wait for 9.2?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Thu, Jun 9, 2011 at 10:15 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 If we need to work around brain-dead isupper() tests, maybe the best
 thing is to implement two versions of the loop:
 
if (encoding is single byte)
... loop as it stands ...
else
... loop without the else if part

 That seems like a clear improvement.  It's a long way from perfect,
 but still worthwhile.

 Would we back-patch that?  Just do it in master?  Wait for 9.2?

It looks to me like a portability bug fix, so I'd say back-patch.

I'll take care of it, unless you're hot to do so?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Robert Haas

On Thu, Jun 9, 2011 at 10:38 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Jun 9, 2011 at 10:15 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 If we need to work around brain-dead isupper() tests, maybe the best
 thing is to implement two versions of the loop:

        if (encoding is single byte)
                ... loop as it stands ...
        else
                ... loop without the else if part

 That seems like a clear improvement.  It's a long way from perfect,
 but still worthwhile.

 Would we back-patch that?  Just do it in master?  Wait for 9.2?

 It looks to me like a portability bug fix, so I'd say back-patch.

 I'll take care of it, unless you're hot to do so?

Nope, have at it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Thu, Jun 9, 2011 at 10:15 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 If we need to work around brain-dead isupper() tests, maybe the best
 thing is to implement two versions of the loop:
if (encoding is single byte)

 That seems like a clear improvement.  It's a long way from perfect,
 but still worthwhile.

Hmm ... while the above is easy enough to do in the backend, where we
can look at pg_database_encoding_max_length, we have also got instances
of this coding pattern in src/port/pgstrcasecmp.c.  It's a lot less
obvious how to make the test in frontend environments.  Thoughts anyone?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Robert Haas

On Thu, Jun 9, 2011 at 11:17 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Jun 9, 2011 at 10:15 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 If we need to work around brain-dead isupper() tests, maybe the best
 thing is to implement two versions of the loop:
        if (encoding is single byte)

 That seems like a clear improvement.  It's a long way from perfect,
 but still worthwhile.

 Hmm ... while the above is easy enough to do in the backend, where we
 can look at pg_database_encoding_max_length, we have also got instances
 of this coding pattern in src/port/pgstrcasecmp.c.  It's a lot less
 obvious how to make the test in frontend environments.  Thoughts anyone?

I'm not sure if this helps at all, but an awful lot of those tests are
against hard-coded strings that are known to contain only ASCII
characters.  Is there some way we can optimize this for that case?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Thu, Jun 9, 2011 at 11:17 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Hmm ... while the above is easy enough to do in the backend, where we
 can look at pg_database_encoding_max_length, we have also got instances
 of this coding pattern in src/port/pgstrcasecmp.c.  It's a lot less
 obvious how to make the test in frontend environments.  Thoughts anyone?

 I'm not sure if this helps at all, but an awful lot of those tests are
 against hard-coded strings that are known to contain only ASCII
 characters.  Is there some way we can optimize this for that case?

For the places where we're just looking for a match to a fixed all-ASCII
string, an ASCII-only downcasing would be sufficient, and would
eliminate the whole problem.  But I doubt all the callers fall into that
class.

What I'm particularly worried about at the moment is whether we are
assuming anywhere that the frontend side can duplicate the backend's
identifier downcasing behavior.  That seems like a complete morass,
because (1) they might not have the same locale, (2) they might not
have the same encoding, (3) even if they do, the same locale is known
to behave differently on different platforms.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Robert Haas

On Thu, Jun 9, 2011 at 1:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Jun 9, 2011 at 11:17 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Hmm ... while the above is easy enough to do in the backend, where we
 can look at pg_database_encoding_max_length, we have also got instances
 of this coding pattern in src/port/pgstrcasecmp.c.  It's a lot less
 obvious how to make the test in frontend environments.  Thoughts anyone?

 I'm not sure if this helps at all, but an awful lot of those tests are
 against hard-coded strings that are known to contain only ASCII
 characters.  Is there some way we can optimize this for that case?

 For the places where we're just looking for a match to a fixed all-ASCII
 string, an ASCII-only downcasing would be sufficient, and would
 eliminate the whole problem.  But I doubt all the callers fall into that
 class.

 What I'm particularly worried about at the moment is whether we are
 assuming anywhere that the frontend side can duplicate the backend's
 identifier downcasing behavior.  That seems like a complete morass,
 because (1) they might not have the same locale, (2) they might not
 have the same encoding, (3) even if they do, the same locale is known
 to behave differently on different platforms.

Right.  Understood.  So let's look at the cases (from git grep
pg_strcasecmp and pg_strncasecmp):

contrib/dict_int: Fixed strings only, and it's all backend code anyway.
contrib/dict_xsyn: Fixed strings only, and it's all backend code anyway.
contrib/hstore: Fixed strings only, and it's all backend code anyway.
contrib/pg_upgrade: Used to compare LC_COLLATE, LC_CTYPE, and encoding names.
contrib/pgbench: Definitely front-end code, but it's all fixed strings.
contrib/pgcrypto: All fixed strings except for one instance in
px_find_digit.  But it's all backend
contrib/spi: One instance, not a fixed string, but it's backend code.
contrib/unaccent: One instance, not a fixed string, but it's backend code.
src/backend/*: Backend code, obviously.
src/bin/initdb: Strings from a constant lookup table
(tsearch_config_languages) only.
src/bin/pg_basebackup: Fixed strings only.
src/bin/pg_ctl: Fixed strings only.
src/bin/pg_dump: Fixed strings only.
src/bin/psql: Fixed strings only.  In a couple of cases they are not
constants - help.c uses strings from to generated file sql_help.h, and
tab-complete.c uses strings from a constant array called
words_after_create[].  But these are constant lookup tables.
src/include: access/reloptions.h uses strncasecmp() as part of a
macro.  That should be OK as long as no one tries to include this in
frontend code, which seems rather impractical.
src/interfaces/ecpg/ecpglib: Fixed strings.
src/interfaces/ecpg/pgtypeslib: Fixed strings, and strings from a
constant lookup table, only.
src/interfaces/ecpg/preproc: This looks a bit worrisome.  It seems we
might be using it on identifiers here.
src/interfaces/libpq: This is attempting to match a wildcard
certificate name against a hostname, in two different places.
src/port/chklocale.c: Fixed strings or ones from a lookup table.
src/timezone/pgtz.c: Matches input strings against filenames read from the OS.

So mostly I think these are OK.  The instance in
src/interfaces/ecpg/preproc looks like the most likely candidate for a
problem spot.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 Right.  Understood.  So let's look at the cases (from git grep
 pg_strcasecmp and pg_strncasecmp):

Also pg_toupper and pg_tolower.  Right offhand, it looks like psql
*does* assume it can lower-case identifiers this way :-(

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-09 Thread Robert Haas

On Thu, Jun 9, 2011 at 2:58 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Right.  Understood.  So let's look at the cases (from git grep
 pg_strcasecmp and pg_strncasecmp):

 Also pg_toupper and pg_tolower.  Right offhand, it looks like psql
 *does* assume it can lower-case identifiers this way :-(

Blarg.  I dunno what to do about that - but surely we must have the
encoding available somewhere?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-08 Thread Jeevan Chalke

On Wed, Jun 8, 2011 at 6:22 AM, Robert Haas robertmh...@gmail.com wrote:

 2011/6/7 Jeevan Chalke jeevan.cha...@enterprisedb.com:
  since we smash the identifier to lower case using
  downcase_truncate_identifier() function, the solution is to make this
  function should be wide-char aware, like LOWER() function functionality.
 
  I see some discussion related to downcase_truncate_identifier() and
  wide-char aware function, but seems like we lost somewhere.
  (http://archives.postgresql.org/pgsql-hackers/2010-11/msg01385.php)
  This invalid byte sequence issue seems like a more serious issue, because
 it
  might lead e.g to pg_dump failures.

 It's a problem, but without an efficient algorithm for Unicode case
 folding, any fix we attempt to implement seems like it'll just be
 moving the problem around.


Agree.

I read on other mail thread that str_tolower() is a  wide-character-aware
lower function but it is also a collation-aware and hence might change its
behaviour wrt change in locale. However, Tom suggested that we need to have
non-locale-dependent case folding algorithm.

But still for same locale on same machine, where we can able to create a
table, insert some data, we cannot retrieve it. Don't you think it is more
serious and we need a quick solution here? As said earlier it may even lead
to pg_dump failures. Given that str_tolower() functionality is locale
dependent but still it will resolve this particular issue. Not sure, there
might be a performance issue but at-least we are not giving an error.

Please excuse me, if community already had a lot of discussion and kept this
behaviour intentionally knowing all these errors and serious issues.

Thanks



 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company




-- 
Jeevan B Chalke
Senior Software Engineer, RD
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are not
the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.

[HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-07 Thread Jeevan Chalke

Hi Tom,

Issue is on Windows:

If you see in attached failure.out file, (after running failure.sql) we are
getting ERROR:  invalid
byte sequence for encoding UTF8: 0xe59aff error. Please note that byte
sequence we got from database is e5 9a ff, where as actual byte sequence for
the wide character '功' is e5 8a 9f.


'功'  == UNICODE Character
e5 8a 9f  == Original Byte Sequence for the given characters
e5 9a ff  == downcase_truncate_identifier() result, which is invalid UTF8
representation, stored in pg_catalog table

While displaying on client, we receive this invalid byte sequence which
throws an error. Note that UTF8 characters have predefined character ranges
for each byte which is checked in pg_utf8_islegal() function. Here is the
code snippet:

==
a = source[2];
if (a  0x80 || a  0xBF)
return false;
==
Note that source[2] = ff, which does not fall into the valid range which
results in illegal UTF8 character sequence. If you carefully see the
original one i.e. 9f, which falls within the range.

since we smash the identifier to lower case using
downcase_truncate_identifier() function, the solution is to make this
function should be wide-char aware, like LOWER() function functionality.

I see some discussion related to downcase_truncate_identifier() and
wide-char aware function, but seems like we lost somewhere.
(http://archives.postgresql.org/pgsql-hackers/2010-11/msg01385.php)
This invalid byte sequence issue seems like a more serious issue, because it
might lead e.g to pg_dump failures.

I have tested this on PG9.0 beta4 (one click installers), BTW, we have
observed same with earlier version as well.

Attached is the .sql and its output (run on PG9.0 beta4).

Any thoughts???

Thanks

-- 
Jeevan B Chalke
Senior Software Engineer, RD
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are not
the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.
SELECT version();
set client_encoding to EUC_CN;
SELECT name,setting FROM pg_settings WHERE name like 'lc%' OR name like '%encoding';
 create table  加入 (  用户名 text, 新功能 varchar);
 insert into  加入 values('- 隐私政策 ',' 使用条款');
 insert into  加入 values('计划政策',' 登录到');
 select 新功能 from  加入;
 select * from  加入;
 drop table  加入 ;


failure.out
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

2011-06-07 Thread Robert Haas

2011/6/7 Jeevan Chalke jeevan.cha...@enterprisedb.com:
 since we smash the identifier to lower case using
 downcase_truncate_identifier() function, the solution is to make this
 function should be wide-char aware, like LOWER() function functionality.

 I see some discussion related to downcase_truncate_identifier() and
 wide-char aware function, but seems like we lost somewhere.
 (http://archives.postgresql.org/pgsql-hackers/2010-11/msg01385.php)
 This invalid byte sequence issue seems like a more serious issue, because it
 might lead e.g to pg_dump failures.

It's a problem, but without an efficient algorithm for Unicode case
folding, any fix we attempt to implement seems like it'll just be
moving the problem around.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

[HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

Re: [HACKERS] Invalid byte sequence for encoding UTF8, caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

16 matches

Site Navigation

Mail list logo

Footer information