Re: [PATCHES] CopyReadLineText optimization

Andrew Dunstan Thu, 06 Mar 2008 10:46:25 -0800


Heikki Linnakangas wrote:

Heikki Linnakangas wrote:
Heikki Linnakangas wrote:
Attached is a patch that modifies CopyReadLineText so that it usesmemchr to speed up the scan. The nice thing about memchr is that wecan take advantage of any clever optimizations that might be in libcor compiler.
Here's an updated version of the patch. The principle is the same,but the same optimization is now used for CSV input as well, andthere's more comments.
Another update attached: It occurred to me that the memchr approach isonly safe for server encodings, where the non-first bytes of amulti-byte character always have the hi-bit set.


We currently make the following assumption in the code:

    * These four characters, and the CSV escape and quote characters, are
    * assumed the same in frontend and backend encodings.
    *

The four characters are the carriage return, line feed, backslash and dot.

I think the requirement might well actually be somewhat stronger thanthat: i.e. that none of these will appear as a non-first byte in anymulti-byte client encoding character. If that's right, then we should beable to write CopyReadLineText without bothering about multi-byte chars.If it's not right then I suspect we have some cases that can fail nowanyway. (I believe some client encodings at least use backslash insubsequent chars, and that's a nasty one because the "\." end sequenceis hard coded). I believe all the chars up to 0x2f are safe - thatincludes both quote chars and dot)


cheers

andrew

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-patches

Re: [PATCHES] CopyReadLineText optimization

Reply via email to