I did some performance checks after the recent code commit. The good news is that the parsing speed of COPY is now MUCH faster, which is great. It is about 5 times faster - about 100MB/sec on my machine (previously 20MB/sec at best, usually less).
The better news is that my original patch parsing speed reaches 120MB/sec, about 20MB/sec faster than the code that's now in CVS. This can be significant for the long scheme of things and for large data sets. Maybe we can improve the current code a bit more to reach this number. I performed those measurement by executing *only the parsing logic* of the COPY pipeline. All data conversion (functioncall3(string...)) and tuple handling (form_heaptuple etc...) and insertion were manually disabled. So the only code measured is reading from disk and parsing to the attribute level. Cheers, Alon. On 8/7/05 1:21 AM, "Luke Lonergan" <[EMAIL PROTECTED]> wrote: > Tom, > > On 8/6/05 9:08 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > >> "Luke Lonergan" <[EMAIL PROTECTED]> writes: >>>> I had some difficulty in generating test cases that weren't largely >>>> I/O-bound, but AFAICT the patch as applied is about the same speed >>>> as what you submitted. >> >>> You achieve the important objective of knocking the parsing stage down a >>> lot, but your parsing code is actually about 20% slower than Alon's. >> >> I would like to see the exact test case you are using to make this >> claim; the tests I did suggested my code is the same speed or faster. > > I showed mine - you show yours :-) Apparently our e-mail crossed. > >> As best I can tell, my version of CopyReadAttributes is significantly >> quicker than Alon's, approximately balancing out the fact that my >> version of CopyReadLine is slower. I did the latter first, and would >> now be tempted to rewrite it in the same style as CopyReadAttributes, >> ie one pass of memory-to-memory copy using pointers rather than buffer >> indexes. > > See previous timings - looks like Alon's parsing is substantially faster. > However, I'd like him to confirm by running with the "shunt" placed at > different stages, in this case between parse and attribute conversion (not > attribute parse). > >> BTW, late today I figured out a way to get fairly reproducible >> non-I/O-bound numbers about COPY FROM: use a trigger that suppresses >> the actual inserts, thus: >> >> create table foo ... >> create function noway() returns trigger as >> 'begin return null; end' language plpgsql; >> create trigger noway before insert on foo >> for each row execute procedure noway(); >> then repeat: >> copy foo from '/tmp/foo.data'; > > Cool! That's a better way than hacking code and inserting shunts. > > Alon will likely hit this tomorrow. > > - Luke > ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq