Andrew Dunstan wrote:
Another question that occurred to me - did you try using strpbrk() to
look for the next interesting character rather than your homegrown
searcher gadget? If so, how did that perform?
It looks like strpbrk() performs poorly:
unpatched:
testname | min duration
Heikki Linnakangas wrote:
Andrew Dunstan wrote:
Another question that occurred to me - did you try using strpbrk() to
look for the next interesting character rather than your homegrown
searcher gadget? If so, how did that perform?
It looks like strpbrk() performs poorly:
Yes, not
Andrew Dunstan wrote:
Heikki Linnakangas wrote:
Andrew Dunstan wrote:
I'm still a bit worried about applying it unless it gets some
adaptive behaviour or something so that we don't cause any serious
performance regressions in some cases.
I'll try to come up with something. At the most
Heikki Linnakangas wrote:
Andrew Dunstan wrote:
I'm still a bit worried about applying it unless it gets some
adaptive behaviour or something so that we don't cause any serious
performance regressions in some cases.
I'll try to come up with something. At the most conservative end, we
Heikki Linnakangas wrote:
Heikki Linnakangas wrote:
Heikki Linnakangas wrote:
Attached is a patch that modifies CopyReadLineText so that it uses
memchr to speed up the scan. The nice thing about memchr is that we
can take advantage of any clever optimizations that might be in libc
or
Andrew Dunstan wrote:
Heikki Linnakangas wrote:
Another update attached: It occurred to me that the memchr approach is
only safe for server encodings, where the non-first bytes of a
multi-byte character always have the hi-bit set.
We currently make the following assumption in the code:
Heikki Linnakangas [EMAIL PROTECTED] writes:
Andrew Dunstan wrote:
We currently make the following assumption in the code:
* These four characters, and the CSV escape and quote characters, are
* assumed the same in frontend and backend encodings.
The four characters are the carriage
Tom Lane wrote:
BTW, I notice that the code allows CSV escape and quote characters that
have the high bit set (in single-byte server encodings that is). Is
this a good idea? It seems like such are extremely unlikely to be the
same in two different encodings. Maybe we should restrict to the
Heikki Linnakangas wrote:
Andrew Dunstan wrote:
Heikki Linnakangas wrote:
Another update attached: It occurred to me that the memchr approach is
only safe for server encodings, where the non-first bytes of a
multi-byte character always have the hi-bit set.
We currently make the
Andrew Dunstan wrote:
I'm still a bit worried about applying it unless it gets some adaptive
behaviour or something so that we don't cause any serious performance
regressions in some cases.
I'll try to come up with something. At the most conservative end, we
could fall back to the current
Heikki Linnakangas wrote:
Andrew Dunstan wrote:
I'm still a bit worried about applying it unless it gets some
adaptive behaviour or something so that we don't cause any serious
performance regressions in some cases.
I'll try to come up with something. At the most conservative end, we
Andrew Dunstan wrote:
Heikki Linnakangas wrote:
Andrew Dunstan wrote:
I'm still a bit worried about applying it unless it gets some
adaptive behaviour or something so that we don't cause any serious
performance regressions in some cases.
I'll try to come up with something. At the most
On Thu, 6 Mar 2008, Heikki Linnakangas wrote:
At the most conservative end, we could fall back to the current method
on the first escape, quote or backslash character.
I would just count the number of escaped/quote characters on each line,
and then at the end of the line switch modes between
Greg Smith wrote:
On Thu, 6 Mar 2008, Heikki Linnakangas wrote:
At the most conservative end, we could fall back to the current
method on the first escape, quote or backslash character.
I would just count the number of escaped/quote characters on each
line, and then at the end of the line
Heikki Linnakangas wrote:
Heikki Linnakangas wrote:
Attached is a patch that modifies CopyReadLineText so that it uses
memchr to speed up the scan. The nice thing about memchr is that we
can take advantage of any clever optimizations that might be in libc
or compiler.
Here's an updated
Heikki Linnakangas wrote:
So the overhead of using memchr slows us down if there's a lot of
escape or quote characters. The breakeven point seems to be about 1 in
8 characters. I'm not sure if that's a good tradeoff or not...
How about we test the first buffer read in from the file
Your patch has been added to the PostgreSQL unapplied patches list at:
http://momjian.postgresql.org/cgi-bin/pgpatches
It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.
---
Heikki Linnakangas wrote:
Attached is a patch that modifies CopyReadLineText so that it uses
memchr to speed up the scan. The nice thing about memchr is that we can
take advantage of any clever optimizations that might be in libc or
compiler.
Here's an updated version of the patch. The
The purpose of CopyReadLineText is to scan the input buffer, and find
the next newline, taking into account any escape characters. It
currently operates in a loop, one byte at a time, searching for LF, CR,
or a backslash. That's a bit slow: I've been running oprofile on COPY,
and I've seen
] CopyReadLineText optimization
The purpose of CopyReadLineText is to scan the input buffer,
and find the next newline, taking into account any escape
characters. It currently operates in a loop, one byte at a
time, searching for LF, CR, or a backslash. That's a bit
slow: I've been running
20 matches
Mail list logo