Andrew Dunstan wrote:
Heikki Linnakangas wrote:
Another update attached: It occurred to me that the memchr approach is
only safe for server encodings, where the non-first bytes of a multi-byte character always have the hi-bit set.


We currently make the following assumption in the code:

    * These four characters, and the CSV escape and quote characters, are
    * assumed the same in frontend and backend encodings.
    *

The four characters are the carriage return, line feed, backslash and dot.

I think the requirement might well actually be somewhat stronger than that: i.e. that none of these will appear as a non-first byte in any multi-byte client encoding character. If that's right, then we should be able to write CopyReadLineText without bothering about multi-byte chars. If it's not right then I suspect we have some cases that can fail now anyway.

No, we don't require that, and we do handle it correctly. We use pg_encoding_mblen to determine the length of each character in CopyReadLineText when the encoding is a client-only encoding, and only look at the first byte of each character. In CopyReadAttributesText, where we have a similar loop, we've already transformed the input to server encoding.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-patches

Reply via email to