Re: [PATCHES] CopyReadLineText optimization

Heikki Linnakangas Thu, 06 Mar 2008 10:53:33 -0800

Andrew Dunstan wrote:

Heikki Linnakangas wrote:
Another update attached: It occurred to me that the memchr approach is
only safe for server encodings, where the non-first bytes of amulti-byte character always have the hi-bit set.
We currently make the following assumption in the code:

    * These four characters, and the CSV escape and quote characters, are
    * assumed the same in frontend and backend encodings.
    *

The four characters are the carriage return, line feed, backslash and dot.
I think the requirement might well actually be somewhat stronger thanthat: i.e. that none of these will appear as a non-first byte in anymulti-byte client encoding character. If that's right, then we should beable to write CopyReadLineText without bothering about multi-byte chars.If it's not right then I suspect we have some cases that can fail nowanyway.

No, we don't require that, and we do handle it correctly. We usepg_encoding_mblen to determine the length of each character inCopyReadLineText when the encoding is a client-only encoding, and onlylook at the first byte of each character. In CopyReadAttributesText,where we have a similar loop, we've already transformed the input toserver encoding.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-patches

Re: [PATCHES] CopyReadLineText optimization

Reply via email to