Re: [PATCHES] CopyReadLineText optimization

2008-03-10 Thread Heikki Linnakangas
Andrew Dunstan wrote: Another question that occurred to me - did you try using strpbrk() to look for the next interesting character rather than your homegrown searcher gadget? If so, how did that perform? It looks like strpbrk() performs poorly: unpatched: testname | min duration

Re: [PATCHES] CopyReadLineText optimization

2008-03-10 Thread Andrew Dunstan
Heikki Linnakangas wrote: Andrew Dunstan wrote: Another question that occurred to me - did you try using strpbrk() to look for the next interesting character rather than your homegrown searcher gadget? If so, how did that perform? It looks like strpbrk() performs poorly: Yes, not

Re: [PATCHES] CopyReadLineText optimization

2008-03-08 Thread Heikki Linnakangas
Andrew Dunstan wrote: Heikki Linnakangas wrote: Andrew Dunstan wrote: I'm still a bit worried about applying it unless it gets some adaptive behaviour or something so that we don't cause any serious performance regressions in some cases. I'll try to come up with something. At the most

Re: [PATCHES] CopyReadLineText optimization

2008-03-07 Thread Andrew Dunstan
Heikki Linnakangas wrote: Andrew Dunstan wrote: I'm still a bit worried about applying it unless it gets some adaptive behaviour or something so that we don't cause any serious performance regressions in some cases. I'll try to come up with something. At the most conservative end, we

Re: [PATCHES] CopyReadLineText optimization

2008-03-06 Thread Andrew Dunstan
Heikki Linnakangas wrote: Heikki Linnakangas wrote: Heikki Linnakangas wrote: Attached is a patch that modifies CopyReadLineText so that it uses memchr to speed up the scan. The nice thing about memchr is that we can take advantage of any clever optimizations that might be in libc or

Re: [PATCHES] CopyReadLineText optimization

2008-03-06 Thread Heikki Linnakangas
Andrew Dunstan wrote: Heikki Linnakangas wrote: Another update attached: It occurred to me that the memchr approach is only safe for server encodings, where the non-first bytes of a multi-byte character always have the hi-bit set. We currently make the following assumption in the code:

Re: [PATCHES] CopyReadLineText optimization

2008-03-06 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes: Andrew Dunstan wrote: We currently make the following assumption in the code: * These four characters, and the CSV escape and quote characters, are * assumed the same in frontend and backend encodings. The four characters are the carriage

Re: [PATCHES] CopyReadLineText optimization

2008-03-06 Thread Andrew Dunstan
Tom Lane wrote: BTW, I notice that the code allows CSV escape and quote characters that have the high bit set (in single-byte server encodings that is). Is this a good idea? It seems like such are extremely unlikely to be the same in two different encodings. Maybe we should restrict to the

Re: [PATCHES] CopyReadLineText optimization

2008-03-06 Thread Andrew Dunstan
Heikki Linnakangas wrote: Andrew Dunstan wrote: Heikki Linnakangas wrote: Another update attached: It occurred to me that the memchr approach is only safe for server encodings, where the non-first bytes of a multi-byte character always have the hi-bit set. We currently make the

Re: [PATCHES] CopyReadLineText optimization

2008-03-06 Thread Heikki Linnakangas
Andrew Dunstan wrote: I'm still a bit worried about applying it unless it gets some adaptive behaviour or something so that we don't cause any serious performance regressions in some cases. I'll try to come up with something. At the most conservative end, we could fall back to the current

Re: [PATCHES] CopyReadLineText optimization

2008-03-06 Thread Andrew Dunstan
Heikki Linnakangas wrote: Andrew Dunstan wrote: I'm still a bit worried about applying it unless it gets some adaptive behaviour or something so that we don't cause any serious performance regressions in some cases. I'll try to come up with something. At the most conservative end, we

Re: [PATCHES] CopyReadLineText optimization

2008-03-06 Thread Heikki Linnakangas
Andrew Dunstan wrote: Heikki Linnakangas wrote: Andrew Dunstan wrote: I'm still a bit worried about applying it unless it gets some adaptive behaviour or something so that we don't cause any serious performance regressions in some cases. I'll try to come up with something. At the most

Re: [PATCHES] CopyReadLineText optimization

2008-03-06 Thread Greg Smith
On Thu, 6 Mar 2008, Heikki Linnakangas wrote: At the most conservative end, we could fall back to the current method on the first escape, quote or backslash character. I would just count the number of escaped/quote characters on each line, and then at the end of the line switch modes between

Re: [PATCHES] CopyReadLineText optimization

2008-03-06 Thread Andrew Dunstan
Greg Smith wrote: On Thu, 6 Mar 2008, Heikki Linnakangas wrote: At the most conservative end, we could fall back to the current method on the first escape, quote or backslash character. I would just count the number of escaped/quote characters on each line, and then at the end of the line

Re: [PATCHES] CopyReadLineText optimization

2008-03-05 Thread Heikki Linnakangas
Heikki Linnakangas wrote: Heikki Linnakangas wrote: Attached is a patch that modifies CopyReadLineText so that it uses memchr to speed up the scan. The nice thing about memchr is that we can take advantage of any clever optimizations that might be in libc or compiler. Here's an updated

Re: [PATCHES] CopyReadLineText optimization

2008-03-05 Thread Andrew Dunstan
Heikki Linnakangas wrote: So the overhead of using memchr slows us down if there's a lot of escape or quote characters. The breakeven point seems to be about 1 in 8 characters. I'm not sure if that's a good tradeoff or not... How about we test the first buffer read in from the file

Re: [PATCHES] CopyReadLineText optimization

2008-03-03 Thread Bruce Momjian
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. ---

Re: [PATCHES] CopyReadLineText optimization

2008-02-29 Thread Heikki Linnakangas
Heikki Linnakangas wrote: Attached is a patch that modifies CopyReadLineText so that it uses memchr to speed up the scan. The nice thing about memchr is that we can take advantage of any clever optimizations that might be in libc or compiler. Here's an updated version of the patch. The

[PATCHES] CopyReadLineText optimization

2008-02-23 Thread Heikki Linnakangas
The purpose of CopyReadLineText is to scan the input buffer, and find the next newline, taking into account any escape characters. It currently operates in a loop, one byte at a time, searching for LF, CR, or a backslash. That's a bit slow: I've been running oprofile on COPY, and I've seen

Re: [PATCHES] CopyReadLineText optimization

2008-02-23 Thread Luke Lonergan
] CopyReadLineText optimization The purpose of CopyReadLineText is to scan the input buffer, and find the next newline, taking into account any escape characters. It currently operates in a loop, one byte at a time, searching for LF, CR, or a backslash. That's a bit slow: I've been running