Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Luke Lonergan
Simon, That part of the code was specifically written to take advantage of processing pipelines in the hardware, not because the actual theoretical algorithm for that approach was itself faster. Yup, good point. Nobody's said what compiler/hardware they have been using, so since both

Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: Nobody's said what compiler/hardware they have been using, so since both Alon and Tom say their character finding logic is faster, it is likely to be down to that? Name your platforms gentlemen, please. I tested on HPPA with gcc 2.95.3 and on a Pentium 4

Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Tom Lane
Luke Lonergan [EMAIL PROTECTED] writes: Yes, I think one thing we've learned is that there are important parts of the code, those that are in the data path (COPY, sort, spill to disk, etc) that are in dire need of optimization. For instance, the fgetc() pattern should be banned everywhere in

Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Alvaro Herrera
On Wed, Aug 10, 2005 at 09:16:08AM -0700, Luke Lonergan wrote: On 8/10/05 8:37 AM, Tom Lane [EMAIL PROTECTED] wrote: Luke, I dislike whacking people upside the head, but this discussion seems to presume that raw speed on Intel platforms is the only thing that matters. We have a few

Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Bruce Momjian
Luke Lonergan wrote: Tom, On 8/10/05 8:37 AM, Tom Lane [EMAIL PROTECTED] wrote: Luke, I dislike whacking people upside the head, but this discussion seems to presume that raw speed on Intel platforms is the only thing that matters. We have a few other concerns. Portability,

Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Joshua D. Drake
Also, as we proved the last time the correctness argument was thrown in, we can fix the bugs and still make it a lot faster - and I would stick to that whether it's a PA-RISC, DEC Alpha, Intel or AMD or event Ultra Sparc. Luke this comment doesn't work. Do you have a test case that shows that

Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Bruce Momjian
Alvaro Herrera wrote: Another question that comes to mind is: have you tried another compiler? I see you are all using GCC at most 3.4; maybe the new optimizing infrastructure in GCC 4.1 means you can have most of the speedup without uglifying the code. What about Intel's compiler?

Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Luke Lonergan
Alvaro, On 8/10/05 9:46 AM, Alvaro Herrera [EMAIL PROTECTED] wrote: AFAIR he never claimed otherwise ... his point was that to gain that additional speedup, the code has to be made considerable worse (in maintenability terms.) Have you (or Alon) tried to port the rest of the speed

Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Alvaro Herrera
On Wed, Aug 10, 2005 at 12:57:18PM -0400, Bruce Momjian wrote: Alvaro Herrera wrote: Another question that comes to mind is: have you tried another compiler? I see you are all using GCC at most 3.4; maybe the new optimizing infrastructure in GCC 4.1 means you can have most of the speedup

Re: [PATCHES] COPY FROM performance improvements

2005-08-09 Thread Alon Goldshuv
I did some performance checks after the recent code commit. The good news is that the parsing speed of COPY is now MUCH faster, which is great. It is about 5 times faster - about 100MB/sec on my machine (previously 20MB/sec at best, usually less). The better news is that my original patch

Re: [PATCHES] COPY FROM performance improvements

2005-08-09 Thread Andrew Dunstan
Alon Goldshuv wrote: I performed those measurement by executing *only the parsing logic* of the COPY pipeline. All data conversion (functioncall3(string...)) and tuple handling (form_heaptuple etc...) and insertion were manually disabled. So the only code measured is reading from disk and

Re: [PATCHES] COPY FROM performance improvements

2005-08-06 Thread Tom Lane
Alon Goldshuv [EMAIL PROTECTED] writes: New patch attached. It includes very minor changes. These are changes that were committed to CVS 3 weeks ago (copy.c 1.247) which I missed in the previous patch. I've applied this with (rather extensive) revisions. I didn't like what you had done with

Re: [PATCHES] COPY FROM performance improvements

2005-08-06 Thread Luke Lonergan
Tom, Thanks for finding the bugs and reworking things. I had some difficulty in generating test cases that weren't largely I/O-bound, but AFAICT the patch as applied is about the same speed as what you submitted. You achieve the important objective of knocking the parsing stage down a lot,

Re: [PATCHES] COPY FROM performance improvements

2005-08-06 Thread Luke Lonergan
Tom, The previous timings were for a table with 15 columns of mixed type. We also test with 1 column to make the parsing overhead more apparent. In the case of 1 text column with 145MB of input data: Your patch: Time: 6612.599 ms Alon's patch: Time: 6119.244 ms Alon's patch is 7.5%

Re: [PATCHES] COPY FROM performance improvements

2005-08-06 Thread Luke Lonergan
Tom, My direct e-mails to you are apparently blocked, so I'll send this to the list. I've attached the case we use for load performance testing, with the data generator modified to produce a single row version of the dataset. I do believe that you/we will need to invert the processing loop to

Re: [PATCHES] COPY FROM performance improvements

2005-08-06 Thread Tom Lane
Luke Lonergan [EMAIL PROTECTED] writes: I had some difficulty in generating test cases that weren't largely I/O-bound, but AFAICT the patch as applied is about the same speed as what you submitted. You achieve the important objective of knocking the parsing stage down a lot, but your parsing

Re: [PATCHES] COPY FROM performance improvements

2005-08-06 Thread Luke Lonergan
Tom, On 8/6/05 9:08 PM, Tom Lane [EMAIL PROTECTED] wrote: Luke Lonergan [EMAIL PROTECTED] writes: I had some difficulty in generating test cases that weren't largely I/O-bound, but AFAICT the patch as applied is about the same speed as what you submitted. You achieve the important

Re: [PERFORM] [PATCHES] COPY FROM performance improvements

2005-08-02 Thread Alon Goldshuv
Tom, Thanks for pointing it out. I made the small required modifications to match copy.c version 1.247 and sent it to -patches list. New patch is V16. Alon. On 8/1/05 7:51 PM, Tom Lane [EMAIL PROTECTED] wrote: Alon Goldshuv [EMAIL PROTECTED] writes: This patch appears to reverse out the

Re: [PERFORM] [PATCHES] COPY FROM performance improvements

2005-08-01 Thread Tom Lane
Alon Goldshuv [EMAIL PROTECTED] writes: This patch appears to reverse out the most recent committed changes in copy.c. Which changes do you refer to? I thought I accommodated all the recent changes (I recall some changes to the tupletable/tupleslot interface, HEADER in cvs, and hex escapes

Re: [PATCHES] COPY FROM performance improvements

2005-07-22 Thread Joshua D. Drake
Luke Lonergan wrote: Joshua, On 7/21/05 7:53 PM, Joshua D. Drake [EMAIL PROTECTED] wrote: Well I know that isn't true at least not with ANY of the Dells my customers have purchased in the last 18 months. They are still really, really slow. That's too bad, can you cite some model numbers?

Re: [PATCHES] COPY FROM performance improvements

2005-07-22 Thread Patrick Welche
On Thu, Jul 21, 2005 at 09:19:04PM -0700, Luke Lonergan wrote: Joshua, On 7/21/05 7:53 PM, Joshua D. Drake [EMAIL PROTECTED] wrote: Well I know that isn't true at least not with ANY of the Dells my customers have purchased in the last 18 months. They are still really, really slow.

Re: [PATCHES] COPY FROM performance improvements

2005-07-21 Thread Mark Wong
I just ran through a few tests with the v14 patch against 100GB of data from dbt3 and found a 30% improvement; 3.6 hours vs 5.3 hours. Just to give a few details, I only loaded data and started a COPY in parallel for each the data files:

Re: [PATCHES] COPY FROM performance improvements

2005-07-21 Thread Luke Lonergan
Cool! At what rate does your disk setup write sequential data, e.g.: time dd if=/dev/zero of=bigfile bs=8k count=50 (sized for 2x RAM on a system with 2GB) BTW - the Compaq smartarray controllers are pretty broken on Linux from a performance standpoint in our experience. We've had

Re: [PATCHES] COPY FROM performance improvements

2005-07-21 Thread Joshua D. Drake
Luke Lonergan wrote: Cool! At what rate does your disk setup write sequential data, e.g.: time dd if=/dev/zero of=bigfile bs=8k count=50 (sized for 2x RAM on a system with 2GB) BTW - the Compaq smartarray controllers are pretty broken on Linux from a performance standpoint in our

Re: [PATCHES] COPY FROM performance improvements

2005-07-21 Thread Luke Lonergan
Joshua, On 7/21/05 5:08 PM, Joshua D. Drake [EMAIL PROTECTED] wrote: O.k. this strikes me as interesting, now we know that Compaq and Dell are borked for Linux. Is there a name brand server (read Enterprise) that actually does provide reasonable performance? I think late model Dell (post the

Re: [PATCHES] COPY FROM performance improvements

2005-07-21 Thread Luke Lonergan
Joshua, On 7/21/05 7:53 PM, Joshua D. Drake [EMAIL PROTECTED] wrote: Well I know that isn't true at least not with ANY of the Dells my customers have purchased in the last 18 months. They are still really, really slow. That's too bad, can you cite some model numbers? SCSI? I have great

Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Andrew Dunstan
Alon Goldshuv wrote: I revisited my patch and removed the code duplications that were there, and added support for CSV with buffered input, so CSV now runs faster too (although it is not as optimized as the TEXT format parsing). So now TEXT,CSV and BINARY are all parsed in CopyFrom(), like in

Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Mark Wong
On Thu, 14 Jul 2005 17:22:18 -0700 Alon Goldshuv [EMAIL PROTECTED] wrote: I revisited my patch and removed the code duplications that were there, and added support for CSV with buffered input, so CSV now runs faster too (although it is not as optimized as the TEXT format parsing). So now

Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Alon Goldshuv
Hi Mark, I improved the data *parsing* capabilities of COPY, and didn't touch the data conversion or data insertion parts of the code. The parsing improvement will vary largely depending on the ratio of parsing -to- converting and inserting. Therefore, the speed increase really depends on the

Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Mark Wong
Hi Alon, Yeah, that helps. I just need to break up my scripts a little to just load the data and not build indexes. Is the following information good enough to give a guess about the data I'm loading, if you don't mind? ;) Here's a link to my script to create tables:

Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Alon Goldshuv
Mark, Thanks for the info. Yes, isolating indexes out of the picture is a good idea for this purpose. I can't really give a guess to how fast the load rate should be. I don't know how your system is configured, and all the hardware characteristics (and even if I knew that info I may not be able

Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Andrew Dunstan
Mark, You should definitely not be doing this sort of thing, I believe: CREATE TABLE orders ( o_orderkey INTEGER, o_custkey INTEGER, o_orderstatus CHAR(1), o_totalprice REAL, o_orderDATE DATE, o_orderpriority CHAR(15), o_clerk CHAR(15),

Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Mark Wong
Whoopsies, yeah good point about the PRIMARY KEY. I'll fix that. Mark On Tue, 19 Jul 2005 18:17:52 -0400 Andrew Dunstan [EMAIL PROTECTED] wrote: Mark, You should definitely not be doing this sort of thing, I believe: CREATE TABLE orders ( o_orderkey INTEGER, o_custkey

Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Luke Lonergan
Good points on all, another element in the performance expectations is the ratio of CPU speed to I/O subsystem speed, as Alon had hinted earlier. This patch substantially (500%) improves the efficiency of parsing in the COPY path, which, on a 3GHz P4 desktop with a commodity disk drive represents

Re: [PATCHES] COPY FROM performance improvements

2005-06-28 Thread Andrew Dunstan
Luke, Alon OK, I'm going to apply the patch to my copy and try to get my head around it. meanwhile: . we should not be describing things as old or new. The person reading the code might have no knowledge of the history, and should not need to. . we should not have slow and fast either. We

Re: [PATCHES] COPY FROM performance improvements

2005-06-28 Thread Bruce Momjian
Luke Lonergan wrote: Patch to update pgindent with new symbols and fix a bug in an awk section (extra \\ in front of a ')'). Yea, that '\' wasn't needed. I applied the following patch to use // instead of for patterns, and removed the unneeded backslash. I will update the typedefs in a

Re: [PATCHES] COPY FROM performance improvements

2005-06-27 Thread Andrew Dunstan
Luke Lonergan wrote: Yah - I think I fixed several mis-indented comments. I'm using vim with tabstop=4. I personally don't like tabs in text and would prefer them expanded using spaces, but that's a nice way to make small formatting changes look huge in a cvs diff. You might like to

Re: [PATCHES] COPY FROM performance improvements

2005-06-26 Thread Bruce Momjian
Please change 'if(' to 'if (', and remove parenthese like this: for(start = s; (*s != c) (s (start + len)) ; s++) My only other comment is, Yow, that is a massive patch. --- Luke Lonergan wrote: Tom, Is it

Re: [PATCHES] COPY FROM performance improvements

2005-06-26 Thread Bruce Momjian
Luke Lonergan wrote: Attached has spaces between if,for, and foreach and (, e.g., if( is now if (. It definitely looks better to me :-) Massive patch - agreed. Less bloated than it was yesterday though. Good, thanks. What about the Protocol version 2? Looks like it could be added back

Re: [PATCHES] COPY FROM performance improvements

2005-06-26 Thread Bruce Momjian
Luke Lonergan wrote: Bruce, Well, there has been no discussion about removing version 2 support, so it seems it is required. This should do it - see attached. Those parentheses are still there: for (start = s; (*s != c) (s (start + len)) ; s++) It should be: for (start