Cool!  It's been a while since we've done the same kind of thing :-)

- Luke 

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Heikki Linnakangas
> Sent: Saturday, February 23, 2008 5:30 PM
> To: pgsql-patches@postgresql.org
> Subject: [PATCHES] CopyReadLineText optimization
> 
> The purpose of CopyReadLineText is to scan the input buffer, 
> and find the next newline, taking into account any escape 
> characters. It currently operates in a loop, one byte at a 
> time, searching for LF, CR, or a backslash. That's a bit 
> slow: I've been running oprofile on COPY, and I've seen 
> CopyReadLine to take around ~10% of the CPU time, and Joshua 
> Drake just posted a very similar profile to hackers.
> 
> Attached is a patch that modifies CopyReadLineText so that it 
> uses memchr to speed up the scan. The nice thing about memchr 
> is that we can take advantage of any clever optimizations 
> that might be in libc or compiler.
> 
> In the tests I've been running, it roughly halves the time 
> spent in CopyReadLine (including the new memchr calls), thus 
> reducing the total CPU overhead by ~5%. I'm planning to run 
> more tests with data that has backslashes and with different 
> width tables to see what the worst-case and best-case 
> performance is like. Also, it doesn't work for CSV format at 
> the moment; that needs to be fixed.
> 
> 5% isn't exactly breathtaking, but it's a start. I tried the 
> same trick to CopyReadAttributesText, but unfortunately it 
> doesn't seem to help there because you need to "stop" the 
> efficient word-at-a-time scan that memchr does (at least with 
> glibc, YMMV) whenever there's a column separator, while in 
> CopyReadLineText you get to process the whole line in one 
> call, assuming there's no backslashes.
> 
> -- 
>    Heikki Linnakangas
>    EnterpriseDB   http://www.enterprisedb.com
> 

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to