Heikki Linnakangas wrote:
Heikki Linnakangas wrote:
Heikki Linnakangas wrote:
Attached is a patch that modifies CopyReadLineText so that it uses
memchr to speed up the scan. The nice thing about memchr is that we
can take advantage of any clever optimizations that might be in libc
or compiler.
Here's an updated version of the patch. The principle is the same,
but the same optimization is now used for CSV input as well, and
there's more comments.
Another update attached: It occurred to me that the memchr approach is
only safe for server encodings, where the non-first bytes of a
multi-byte character always have the hi-bit set.
We currently make the following assumption in the code:
* These four characters, and the CSV escape and quote characters, are
* assumed the same in frontend and backend encodings.
*
The four characters are the carriage return, line feed, backslash and dot.
I think the requirement might well actually be somewhat stronger than
that: i.e. that none of these will appear as a non-first byte in any
multi-byte client encoding character. If that's right, then we should be
able to write CopyReadLineText without bothering about multi-byte chars.
If it's not right then I suspect we have some cases that can fail now
anyway. (I believe some client encodings at least use backslash in
subsequent chars, and that's a nasty one because the "\." end sequence
is hard coded). I believe all the chars up to 0x2f are safe - that
includes both quote chars and dot)
cheers
andrew
--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-patches