Re: [HACKERS] Database object names and libpq in UTF-8 locale on Windows

Andrew Dunstan Mon, 22 Oct 2012 10:38:01 -0700


On 10/22/2012 12:53 PM, Sebastien FLAESCH wrote:


[Issues with unquoted utf8 identifiers in Windows 1252 locale]

I suspect this has something to do with the fact that non-quoted
identifiers
are converted to lowercase, and because my LC_CTYPE is English_United
States.1252,
the conversion to lowercase fails...



Quite possibly. The code comment says this:

        /*
         * SQL99 specifies Unicode-aware case normalization, which we
   don't yet
         * have the infrastructure for.  Instead we use tolower() to
   provide a
         * locale-aware translation.  However, there are some locales
   where this
         * is not right either (eg, Turkish may do strange things with
   'i' and
         * 'I').  Our current compromise is to use tolower() for
   characters with
         * the high bit set, and use an ASCII-only downcasing for 7-bit
         * characters.
         */

For now your best bet is probably not to use UTF8 non-ascii chars or toquote the identifiers.

Given we're calling to_lower() on a single byte in the code referred to,should we even be doing that when we have a multi-byte encoding and thehigh bit is set?

Aside: I'd love to fix up our treatment of identifiers, but there isprobably a LOT of very tedious work involved.


cheers

andrew




--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Database object names and libpq in UTF-8 locale on Windows

Reply via email to