Andrew Dunstan wrote:
Heikki Linnakangas wrote:
Another update attached: It occurred to me that the memchr approach is
only safe for server encodings, where the non-first bytes of a
multi-byte character always have the hi-bit set.
We currently make the following assumption in the code:
* These four characters, and the CSV escape and quote characters, are
* assumed the same in frontend and backend encodings.
*
The four characters are the carriage return, line feed, backslash and dot.
I think the requirement might well actually be somewhat stronger than
that: i.e. that none of these will appear as a non-first byte in any
multi-byte client encoding character. If that's right, then we should be
able to write CopyReadLineText without bothering about multi-byte chars.
If it's not right then I suspect we have some cases that can fail now
anyway.
No, we don't require that, and we do handle it correctly. We use
pg_encoding_mblen to determine the length of each character in
CopyReadLineText when the encoding is a client-only encoding, and only
look at the first byte of each character. In CopyReadAttributesText,
where we have a similar loop, we've already transformed the input to
server encoding.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-patches