I try to create database columns with umlauts, using the UTF8 client encoding.
However, the server seems to mess up the column names. In particular, it seems
to perform a lowercase operation on each byte of the UTF-8 multi-byte sequence.
Here is my code:
const wchar_t *strName = L"id_äß";
wstring strCreate = wstring(L"create table test_umlaut(") + strName + L"
integer primary key)";
PGconn *pConn = PQsetdbLogin("", "", NULL, NULL, "dev503", "postgres",
"******");
if (!pConn) FAIL;
if (PQsetClientEncoding(pConn, "UTF-8")) FAIL;
PGresult *pResult = PQexec(pConn, "drop table test_umlaut");
if (pResult) PQclear(pResult);
pResult = PQexec(pConn, ToUtf8(strCreate.c_str()).c_str());
if (pResult) PQclear(pResult);
pResult = PQexec(pConn, "select * from test_umlaut");
if (!pResult) FAIL;
if (PQresultStatus(pResult)!=PGRES_TUPLES_OK) FAIL;
if (PQnfields(pResult)!=1) FAIL;
const char *fName = PQfname(pResult,0);
ShowW("Name: ", strName);
ShowA("in UTF8: ", ToUtf8(strName).c_str());
ShowA("from DB: ", fName);
ShowW("in UTF16: ", ToWide(fName).c_str());
PQclear(pResult);
PQreset(pConn);
(ShowA/W call OutputDebugStringA/W, and ToUtf8/ToWide use
WideCharToMultiByte/MultiByteToWideChar with CP_UTF8.)
And this is the output generated:
Name: id_äß
in UTF8: id_äß
from DB: id_ã¤ãÿ
in UTF16: id_???
It seems like the backend thinks the name is in ANSI encoding, not in UTF-8.
If I change the strCreate query and add double quotes around the column name,
then the problem disappears. But the original name is already in lowercase, so
I think it should also work without quoting the column name.
Am I missing some setup in either the database or in the use of libpq?
I’m using PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit
The database uses:
ENCODING = 'UTF8'
LC_COLLATE = 'English_United Kingdom.1252'
LC_CTYPE = 'English_United Kingdom.1252'
Thanks for any help,
Martin