Re: [HACKERS] UTF-8 encoding problem w/ libpq
Thanks Andrew. I will test the next release. Martin -Original Message- From: Andrew Dunstan [mailto:and...@dunslane.net] Sent: 08 June 2013 16:43 To: Tom Lane Cc: Heikki Linnakangas; k...@rice.edu; Martin Schäfer; pgsql- hack...@postgresql.org Subject: Re: [HACKERS] UTF-8 encoding problem w/ libpq On 06/03/2013 02:41 PM, Andrew Dunstan wrote: On 06/03/2013 02:28 PM, Tom Lane wrote: . I wonder though if we couldn't just fix this code to not do anything to high-bit-set bytes in multibyte encodings. That's exactly what I suggested back in November. This thread seems to have gone cold, so I have applied the fix I originally suggested along these lines to all live branches. At least that means we won't produce junk, but we still need to work out how to downcase multi-byte characters. If anyone thinks there are other places in the code that need similar treatment, they are welcome to find them. I have not yet found one. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF-8 encoding problem w/ libpq
Can't really blame Windows on that. On Windows, we don't require that the encoding and LC_CTYPE's charset match. The OP used UTF-8 encoding in the server, but LC_CTYPE=English_United Kingdom.1252, ie. LC_CTYPE implies WIN1252 encoding. We allow that and it generally works on Windows because in varstr_cmp, we use MultiByteToWideChar() followed by wcscoll_l(), which doesn't care about the charset implied by LC_CTYPE. But for isupper(), it matters. Does this mean that the UTF-8 messing up would disappear if the database were using a different locale for LC_CTYPE? If so, which locale should I use? This would be useful for a temporary workaround. We talked about this before and went off into the weeds about whether it was sensible to try to use towlower() and whether that wouldn't create undesirably platform-sensitive results. I wonder though if we couldn't just fix this code to not do anything to high-bit-set bytes in multibyte encodings. Yeah, we should do that. It makes no sense to call isupper or tolower on bytes belonging to multi-byte characters. Actually, I would expect that 'create table HÄUSER (...)' would create a table named 'häuser', and not a table named 'hÄuser', so towlower seems the right choice IMHO. Martin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] UTF-8 encoding problem w/ libpq
I try to create database columns with umlauts, using the UTF8 client encoding. However, the server seems to mess up the column names. In particular, it seems to perform a lowercase operation on each byte of the UTF-8 multi-byte sequence. Here is my code: const wchar_t *strName = Lid_äß; wstring strCreate = wstring(Lcreate table test_umlaut() + strName + L integer primary key); PGconn *pConn = PQsetdbLogin(, , NULL, NULL, dev503, postgres, **); if (!pConn) FAIL; if (PQsetClientEncoding(pConn, UTF-8)) FAIL; PGresult *pResult = PQexec(pConn, drop table test_umlaut); if (pResult) PQclear(pResult); pResult = PQexec(pConn, ToUtf8(strCreate.c_str()).c_str()); if (pResult) PQclear(pResult); pResult = PQexec(pConn, select * from test_umlaut); if (!pResult) FAIL; if (PQresultStatus(pResult)!=PGRES_TUPLES_OK) FAIL; if (PQnfields(pResult)!=1) FAIL; const char *fName = PQfname(pResult,0); ShowW(Name: , strName); ShowA(in UTF8: , ToUtf8(strName).c_str()); ShowA(from DB: , fName); ShowW(in UTF16: , ToWide(fName).c_str()); PQclear(pResult); PQreset(pConn); (ShowA/W call OutputDebugStringA/W, and ToUtf8/ToWide use WideCharToMultiByte/MultiByteToWideChar with CP_UTF8.) And this is the output generated: Name: id_äß in UTF8: id_äß from DB: id_ã¤ãÿ in UTF16: id_??? It seems like the backend thinks the name is in ANSI encoding, not in UTF-8. If I change the strCreate query and add double quotes around the column name, then the problem disappears. But the original name is already in lowercase, so I think it should also work without quoting the column name. Am I missing some setup in either the database or in the use of libpq? I’m using PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit The database uses: ENCODING = 'UTF8' LC_COLLATE = 'English_United Kingdom.1252' LC_CTYPE = 'English_United Kingdom.1252' Thanks for any help, Martin
Re: [HACKERS] UTF-8 encoding problem w/ libpq
-Original Message- From: k...@rice.edu [mailto:k...@rice.edu] Sent: 03 June 2013 16:48 To: Martin Schäfer Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] UTF-8 encoding problem w/ libpq On Mon, Jun 03, 2013 at 03:40:14PM +0100, Martin Schäfer wrote: I try to create database columns with umlauts, using the UTF8 client encoding. However, the server seems to mess up the column names. In particular, it seems to perform a lowercase operation on each byte of the UTF-8 multi-byte sequence. Here is my code: const wchar_t *strName = Lid_äß; wstring strCreate = wstring(Lcreate table test_umlaut() + strName + L integer primary key); PGconn *pConn = PQsetdbLogin(, , NULL, NULL, dev503, postgres, **); if (!pConn) FAIL; if (PQsetClientEncoding(pConn, UTF-8)) FAIL; PGresult *pResult = PQexec(pConn, drop table test_umlaut); if (pResult) PQclear(pResult); pResult = PQexec(pConn, ToUtf8(strCreate.c_str()).c_str()); if (pResult) PQclear(pResult); pResult = PQexec(pConn, select * from test_umlaut); if (!pResult) FAIL; if (PQresultStatus(pResult)!=PGRES_TUPLES_OK) FAIL; if (PQnfields(pResult)!=1) FAIL; const char *fName = PQfname(pResult,0); ShowW(Name: , strName); ShowA(in UTF8: , ToUtf8(strName).c_str()); ShowA(from DB: , fName); ShowW(in UTF16: , ToWide(fName).c_str()); PQclear(pResult); PQreset(pConn); (ShowA/W call OutputDebugStringA/W, and ToUtf8/ToWide use WideCharToMultiByte/MultiByteToWideChar with CP_UTF8.) And this is the output generated: Name: id_äß in UTF8: id_äß from DB: id_ã¤ãÿ in UTF16: id_??? It seems like the backend thinks the name is in ANSI encoding, not in UTF-8. If I change the strCreate query and add double quotes around the column name, then the problem disappears. But the original name is already in lowercase, so I think it should also work without quoting the column name. Am I missing some setup in either the database or in the use of libpq? I’m using PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit The database uses: ENCODING = 'UTF8' LC_COLLATE = 'English_United Kingdom.1252' LC_CTYPE = 'English_United Kingdom.1252' Thanks for any help, Martin Hi Martin, If you do not want the lowercase behavior, you must put double-quotes around the column name per the documentation: http://www.postgresql.org/docs/9.2/interactive/sql-syntax- lexical.html#SQL-SYNTAX-IDENTIFIERS section 4.1.1. Regards, Ken The original name 'id_äß' is already in lowercase. The backend should leave it unchanged IMO. Regards, Martin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incorrect cursor behaviour with gist index
Okay. I'll go fix the core code, and you can take out whatever you want in GiST/GIN. Which PostgreSQL versions will contain the fix? Regards, Martin Schaefer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers