Hi all,
I would appreciate some help or comments on this topic...
Is this the wrong mailing list to ask such question?
Seb

On 10/18/2012 10:15 AM, Sebastien FLAESCH wrote:
Hello,

Using PostgreSQL 9.2.1 on Windows, I am facing a strange character set
issue
with a UTF-8 database.

Maybe this is expected, but want to be sure that I am not missing
something.

On Windows, I have created a database with:

ENCODING = 'UTF-8'
LC_COLLATE = 'English_United States.1252'
LC_CTYPE = 'English_United States.1252'

I am using libpq with a simple C program.
The program handles UTF-8 strings.

While it's not a traditional "UNICODE" Windows application using WCHAR, it
should be possible to send and get back UTF-8 data with PostgreSQL.

Interacting with the Windows system in the system locale is another
story, but
from a pure C / SQL / libpq point of view, as long as the PostgreSQL client
encoding is properly defined to UTF-8, it should work, and it does mainly:

- I can use UTF-8 string constants in my queries.
- I can pass UTF-8 data to the database with parameterized queries.
- I can fetch UTF-8 data from the database.
- I can create db object names with UTF-8 characters.

But the db object names must be specified with double quotes:
When I do not use quoted db object names, I get a strange problem.
The table is created, I can use it in my program, I can even use it in the
pgAdmin query tool, but in the pgAdmin db browser, there is no table name
displayed in the treeview...

Further, when selecting schema information from pg_class, I can see that
the
table exists, but there is nothing displayed in the relname attribute...

It appears that the problem disappears when using a "C" collation are
char type:

ENCODING = 'UTF-8'
LC_COLLATE = 'C'
LC_CTYPE = 'C'

I suspect this has something to do with the fact that non-quoted
identifiers
are converted to lowercase, and because my LC_CTYPE is English_United
States.1252,
the conversion to lowercase fails...

But:

- why does PostgreSQL accept to create the table with invalid UTF-8
characters?

- why can I make queries in the query tool with the UTF-8 table name?
(note that show client_encoding in the query tool gives me "UNICODE" -
is this
the same as "UTF-8"?)


So is there a bug, or do I have to use a C collation and char type for
UTF-8
databases on Windows?

I know that UTF-8 is not supported by the Windows C/POSIX library
(setlocale,
and co).

Does it mean that it's not realistic to use UTF-8 encoding for PostgreSQL
databases on Windows...?

Thanks for reading.
Sebastien FLAESCH
Four Js Development Tools





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to