Re: [HACKERS] UTF-8 encoding problem w/ libpq

2013-06-10 Thread Martin Schäfer
Thanks Andrew. I will test the next release.

Martin

 -Original Message-
 From: Andrew Dunstan [mailto:and...@dunslane.net]
 Sent: 08 June 2013 16:43
 To: Tom Lane
 Cc: Heikki Linnakangas; k...@rice.edu; Martin Schäfer; pgsql-
 hack...@postgresql.org
 Subject: Re: [HACKERS] UTF-8 encoding problem w/ libpq
 
 
 On 06/03/2013 02:41 PM, Andrew Dunstan wrote:
 
  On 06/03/2013 02:28 PM, Tom Lane wrote:
  . I wonder though if we couldn't just fix this code to not do
  anything to high-bit-set bytes in multibyte encodings.
 
 
  That's exactly what I suggested back in November.
 
 
 This thread seems to have gone cold, so I have applied the fix I originally
 suggested along these lines to all live branches.
 
 At least that means we won't produce junk, but we still need to work out
 how to downcase multi-byte characters.
 
 If anyone thinks there are other places in the code that need similar
 treatment, they are welcome to find them. I have not yet found one.
 
 
 cheers
 
 andrew
 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UTF-8 encoding problem w/ libpq

2013-06-04 Thread Martin Schäfer
 Can't really blame Windows on that. On Windows, we don't require that the
 encoding and LC_CTYPE's charset match. The OP used UTF-8 encoding in the
 server, but LC_CTYPE=English_United Kingdom.1252, ie. LC_CTYPE implies
 WIN1252 encoding. We allow that and it generally works on Windows
 because in varstr_cmp, we use MultiByteToWideChar() followed by
 wcscoll_l(), which doesn't care about the charset implied by LC_CTYPE.
 But for isupper(), it matters.

Does this mean that the UTF-8 messing up would disappear if the database were 
using a different locale for LC_CTYPE? If so, which locale should I use?
This would be useful for a temporary workaround.

  We talked about this before and went off into the weeds about whether
  it was sensible to try to use towlower() and whether that wouldn't
  create undesirably platform-sensitive results.  I wonder though if we
  couldn't just fix this code to not do anything to high-bit-set bytes
  in multibyte encodings.
 
 Yeah, we should do that. It makes no sense to call isupper or tolower on
 bytes belonging to multi-byte characters.

Actually, I would expect that 'create table HÄUSER (...)' would create a table 
named 'häuser', and not a table named 'hÄuser', so towlower seems the right 
choice IMHO.

Martin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] UTF-8 encoding problem w/ libpq

2013-06-03 Thread Martin Schäfer
I try to create database columns with umlauts, using the UTF8 client encoding. 
However, the server seems to mess up the column names. In particular, it seems 
to perform a lowercase operation on each byte of the UTF-8 multi-byte sequence.

Here is my code:

const wchar_t *strName = Lid_äß;
wstring strCreate = wstring(Lcreate table test_umlaut() + strName + L 
integer primary key);

PGconn *pConn = PQsetdbLogin(, , NULL, NULL, dev503, postgres, 
**);
if (!pConn) FAIL;
if (PQsetClientEncoding(pConn, UTF-8)) FAIL;

PGresult *pResult = PQexec(pConn, drop table test_umlaut);
if (pResult) PQclear(pResult);

pResult = PQexec(pConn, ToUtf8(strCreate.c_str()).c_str());
if (pResult) PQclear(pResult);

pResult = PQexec(pConn, select * from test_umlaut);
if (!pResult) FAIL;
if (PQresultStatus(pResult)!=PGRES_TUPLES_OK) FAIL;
if (PQnfields(pResult)!=1) FAIL;
const char *fName = PQfname(pResult,0);

ShowW(Name: , strName);
ShowA(in UTF8:  , ToUtf8(strName).c_str());
ShowA(from DB:  , fName);
ShowW(in UTF16: , ToWide(fName).c_str());

PQclear(pResult);
PQreset(pConn);

(ShowA/W call OutputDebugStringA/W, and ToUtf8/ToWide use 
WideCharToMultiByte/MultiByteToWideChar with CP_UTF8.)

And this is the output generated:

Name: id_äß
in UTF8:  id_äß
from DB:  id_ã¤ãÿ
in UTF16: id_???

It seems like the backend thinks the name is in ANSI encoding, not in UTF-8.
If I change the strCreate query and add double quotes around the column name, 
then the problem disappears. But the original name is already in lowercase, so 
I think it should also work without quoting the column name.
Am I missing some setup in either the database or in the use of libpq?

I’m using PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit

The database uses:
ENCODING = 'UTF8'
LC_COLLATE = 'English_United Kingdom.1252'
LC_CTYPE = 'English_United Kingdom.1252'

Thanks for any help,

Martin



Re: [HACKERS] UTF-8 encoding problem w/ libpq

2013-06-03 Thread Martin Schäfer


 -Original Message-
 From: k...@rice.edu [mailto:k...@rice.edu]
 Sent: 03 June 2013 16:48
 To: Martin Schäfer
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] UTF-8 encoding problem w/ libpq
 
 On Mon, Jun 03, 2013 at 03:40:14PM +0100, Martin Schäfer wrote:
  I try to create database columns with umlauts, using the UTF8 client
 encoding. However, the server seems to mess up the column names. In
 particular, it seems to perform a lowercase operation on each byte of the
 UTF-8 multi-byte sequence.
 
  Here is my code:
 
  const wchar_t *strName = Lid_äß;
  wstring strCreate = wstring(Lcreate table test_umlaut() +
  strName + L integer primary key);
 
  PGconn *pConn = PQsetdbLogin(, , NULL, NULL, dev503, postgres,
 **);
  if (!pConn) FAIL;
  if (PQsetClientEncoding(pConn, UTF-8)) FAIL;
 
  PGresult *pResult = PQexec(pConn, drop table test_umlaut);
  if (pResult) PQclear(pResult);
 
  pResult = PQexec(pConn, ToUtf8(strCreate.c_str()).c_str());
  if (pResult) PQclear(pResult);
 
  pResult = PQexec(pConn, select * from test_umlaut);
  if (!pResult) FAIL;
  if (PQresultStatus(pResult)!=PGRES_TUPLES_OK) FAIL;
  if (PQnfields(pResult)!=1) FAIL;
  const char *fName = PQfname(pResult,0);
 
  ShowW(Name: , strName);
  ShowA(in UTF8:  , ToUtf8(strName).c_str());
  ShowA(from DB:  , fName);
  ShowW(in UTF16: , ToWide(fName).c_str());
 
  PQclear(pResult);
  PQreset(pConn);
 
  (ShowA/W call OutputDebugStringA/W, and ToUtf8/ToWide use
  WideCharToMultiByte/MultiByteToWideChar with CP_UTF8.)
 
  And this is the output generated:
 
  Name: id_äß
  in UTF8:  id_äß
  from DB:  id_ã¤ãÿ
  in UTF16: id_???
 
  It seems like the backend thinks the name is in ANSI encoding, not in UTF-8.
  If I change the strCreate query and add double quotes around the column
 name, then the problem disappears. But the original name is already in
 lowercase, so I think it should also work without quoting the column name.
  Am I missing some setup in either the database or in the use of libpq?
 
  I’m using PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit
 
  The database uses:
  ENCODING = 'UTF8'
  LC_COLLATE = 'English_United Kingdom.1252'
  LC_CTYPE = 'English_United Kingdom.1252'
 
  Thanks for any help,
 
  Martin
 
 
 Hi Martin,
 
 If you do not want the lowercase behavior, you must put double-quotes
 around the column name per the documentation:
 
 http://www.postgresql.org/docs/9.2/interactive/sql-syntax-
 lexical.html#SQL-SYNTAX-IDENTIFIERS
 
 section 4.1.1.
 
 Regards,
 Ken

The original name 'id_äß' is already in lowercase. The backend should leave it 
unchanged IMO.

Regards,
Martin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incorrect cursor behaviour with gist index

2008-10-20 Thread Martin Schäfer
 Okay.  I'll go fix the core code, and you can take out 
 whatever you want in GiST/GIN.

Which PostgreSQL versions will contain the fix?

Regards,

Martin Schaefer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers