Re: [PATCHES] CopyReadLineText optimization

Andrew Dunstan Thu, 06 Mar 2008 11:30:37 -0800


Heikki Linnakangas wrote:

Andrew Dunstan wrote:
Heikki Linnakangas wrote:
Another update attached: It occurred to me that the memchr approach is
only safe for server encodings, where the non-first bytes of amulti-byte character always have the hi-bit set.
We currently make the following assumption in the code:
* These four characters, and the CSV escape and quote characters,are
    * assumed the same in frontend and backend encodings.
    *
The four characters are the carriage return, line feed, backslash anddot.
I think the requirement might well actually be somewhat stronger thanthat: i.e. that none of these will appear as a non-first byte in anymulti-byte client encoding character. If that's right, then we shouldbe able to write CopyReadLineText without bothering about multi-bytechars. If it's not right then I suspect we have some cases that canfail now anyway.
No, we don't require that, and we do handle it correctly. We usepg_encoding_mblen to determine the length of each character inCopyReadLineText when the encoding is a client-only encoding, and onlylook at the first byte of each character. In CopyReadAttributesText,where we have a similar loop, we've already transformed the input toserver encoding.


Oops. I see that now. Funny how I missed it when I went looking for it :-(

I think I understand the patch now :-)

I'm still a bit worried about applying it unless it gets some adaptivebehaviour or something so that we don't cause any serious performanceregressions in some cases. Also, could we perhaps benefit from inliningsome calls, or is your compiler doing that anyway?


cheers

andrew

--
Sent via pgsql-patches mailing list ([email protected])
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-patches

Re: [PATCHES] CopyReadLineText optimization

Reply via email to