Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Luke Lonergan
Tom, On 8/10/05 8:37 AM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > Luke, I dislike whacking people upside the head, but this discussion > seems to presume that raw speed on Intel platforms is the only thing > that matters. We have a few other concerns. Portability, readability, > maintainability,

Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Alvaro Herrera
On Wed, Aug 10, 2005 at 12:57:18PM -0400, Bruce Momjian wrote: > Alvaro Herrera wrote: > > Another question that comes to mind is: have you tried another compiler? > > I see you are all using GCC at most 3.4; maybe the new optimizing > > infrastructure in GCC 4.1 means you can have most of the spee

Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Luke Lonergan
Alvaro, On 8/10/05 9:46 AM, "Alvaro Herrera" <[EMAIL PROTECTED]> wrote: > AFAIR he never claimed otherwise ... his point was that to gain that > additional speedup, the code has to be made considerable "worse" (in > maintenability terms.) Have you (or Alon) tried to port the rest of the > speed

Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Bruce Momjian
Alvaro Herrera wrote: > Another question that comes to mind is: have you tried another compiler? > I see you are all using GCC at most 3.4; maybe the new optimizing > infrastructure in GCC 4.1 means you can have most of the speedup without > uglifying the code. What about Intel's compiler? Enterp

Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Joshua D. Drake
Also, as we proved the last time the correctness argument was thrown in, we can fix the bugs and still make it a lot faster - and I would stick to that whether it's a PA-RISC, DEC Alpha, Intel or AMD or event Ultra Sparc. Luke this comment doesn't work. Do you have a test case that shows that o

Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Bruce Momjian
Luke Lonergan wrote: > Tom, > > On 8/10/05 8:37 AM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > > > Luke, I dislike whacking people upside the head, but this discussion > > seems to presume that raw speed on Intel platforms is the only thing > > that matters. We have a few other concerns. Portabili

Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Alvaro Herrera
On Wed, Aug 10, 2005 at 09:16:08AM -0700, Luke Lonergan wrote: > On 8/10/05 8:37 AM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > > > Luke, I dislike whacking people upside the head, but this discussion > > seems to presume that raw speed on Intel platforms is the only thing > > that matters. We have

Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Tom Lane
"Luke Lonergan" <[EMAIL PROTECTED]> writes: > Yes, I think one thing we've learned is that there are important parts > of the code, those that are in the data path (COPY, sort, spill to > disk, etc) that are in dire need of optimization. For instance, the > fgetc() pattern should be banned everywh

Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes: > Nobody's said what compiler/hardware they have been using, so since both > Alon and Tom say their character finding logic is faster, it is likely > to be down to that? Name your platforms gentlemen, please. I tested on HPPA with gcc 2.95.3 and on a Pentium

Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Luke Lonergan
Simon, > That part of the code was specifically written to take advantage of > processing pipelines in the hardware, not because the actual theoretical > algorithm for that approach was itself faster. Yup, good point. > Nobody's said what compiler/hardware they have been using, so since both >

Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Simon Riggs
On Tue, 2005-08-09 at 21:48 -0700, Luke Lonergan wrote: > The key thing that is missing is the lack of micro-parallelism in the > character processing in this version. By "inverting the loop", or putting > the characters into a buffer on the outside, then doing fast character > scanning inside wi

Re: [PATCHES] COPY FROM performance improvements

2005-08-09 Thread Luke Lonergan
Tom, > As best I can tell, my version of CopyReadAttributes is significantly > quicker than Alon's, approximately balancing out the fact that my > version of CopyReadLine is slower. I did the latter first, and would > now be tempted to rewrite it in the same style as CopyReadAttributes, > ie one

Re: [PATCHES] COPY FROM performance improvements

2005-08-09 Thread Luke Lonergan
Andrew, > Arguably this might exaggerate the effect quite significantly. Users > will want to know the real time effect on a complete COPY. Depending on > how much the pasing is in the total time your 20% improvement in parsing > might only be a small fraction of 20% improvement in COPY. Arguabl

Re: [PATCHES] COPY FROM performance improvements

2005-08-09 Thread Andrew Dunstan
Alon Goldshuv wrote: I performed those measurement by executing *only the parsing logic* of the COPY pipeline. All data conversion (functioncall3(string...)) and tuple handling (form_heaptuple etc...) and insertion were manually disabled. So the only code measured is reading from disk and pars

Re: [PATCHES] COPY FROM performance improvements

2005-08-09 Thread Alon Goldshuv
I did some performance checks after the recent code commit. The good news is that the parsing speed of COPY is now MUCH faster, which is great. It is about 5 times faster - about 100MB/sec on my machine (previously 20MB/sec at best, usually less). The better news is that my original patch parsing

Re: [PATCHES] COPY FROM performance improvements

2005-08-06 Thread Luke Lonergan
Tom, On 8/6/05 9:08 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > "Luke Lonergan" <[EMAIL PROTECTED]> writes: >>> I had some difficulty in generating test cases that weren't largely >>> I/O-bound, but AFAICT the patch as applied is about the same speed >>> as what you submitted. > >> You achieve t

Re: [PATCHES] COPY FROM performance improvements

2005-08-06 Thread Tom Lane
"Luke Lonergan" <[EMAIL PROTECTED]> writes: >> I had some difficulty in generating test cases that weren't largely >> I/O-bound, but AFAICT the patch as applied is about the same speed >> as what you submitted. > You achieve the important objective of knocking the parsing stage down a > lot, but y

Re: [PATCHES] COPY FROM performance improvements

2005-08-06 Thread Luke Lonergan
Tom, My direct e-mails to you are apparently blocked, so I'll send this to the list. I've attached the case we use for load performance testing, with the data generator modified to produce a single row version of the dataset. I do believe that you/we will need to invert the processing loop to ge

Re: [PATCHES] COPY FROM performance improvements

2005-08-06 Thread Luke Lonergan
Tom, The previous timings were for a table with 15 columns of mixed type. We also test with 1 column to make the parsing overhead more apparent. In the case of 1 text column with 145MB of input data: Your patch: Time: 6612.599 ms Alon's patch: Time: 6119.244 ms Alon's patch is 7.5% faste

Re: [PATCHES] COPY FROM performance improvements

2005-08-06 Thread Luke Lonergan
Tom, Thanks for finding the bugs and reworking things. > I had some difficulty in generating test cases that weren't largely > I/O-bound, but AFAICT the patch as applied is about the same speed > as what you submitted. You achieve the important objective of knocking the parsing stage down a lot,

Re: [PATCHES] COPY FROM performance improvements

2005-08-06 Thread Tom Lane
"Alon Goldshuv" <[EMAIL PROTECTED]> writes: > New patch attached. It includes very minor changes. These are changes that > were committed to CVS 3 weeks ago (copy.c 1.247) which I missed in the > previous patch. I've applied this with (rather extensive) revisions. I didn't like what you had done

Re: [PATCHES] COPY FROM performance improvements

2005-07-22 Thread Patrick Welche
On Thu, Jul 21, 2005 at 09:19:04PM -0700, Luke Lonergan wrote: > Joshua, > > On 7/21/05 7:53 PM, "Joshua D. Drake" <[EMAIL PROTECTED]> wrote: > > Well I know that isn't true at least not with ANY of the Dells my > > customers have purchased in the last 18 months. They are still really, > > really

Re: [PATCHES] COPY FROM performance improvements

2005-07-22 Thread Joshua D. Drake
Luke Lonergan wrote: Joshua, On 7/21/05 7:53 PM, "Joshua D. Drake" <[EMAIL PROTECTED]> wrote: Well I know that isn't true at least not with ANY of the Dells my customers have purchased in the last 18 months. They are still really, really slow. That's too bad, can you cite some model numbers

Re: [PATCHES] COPY FROM performance improvements

2005-07-21 Thread Andrew Dunstan
this discussion belongs on -performance cheers andrew Luke Lonergan said: > Joshua, > > On 7/21/05 7:53 PM, "Joshua D. Drake" <[EMAIL PROTECTED]> wrote: >> Well I know that isn't true at least not with ANY of the Dells my >> customers have purchased in the last 18 months. They are still really

Re: [PATCHES] COPY FROM performance improvements

2005-07-21 Thread Luke Lonergan
Joshua, On 7/21/05 7:53 PM, "Joshua D. Drake" <[EMAIL PROTECTED]> wrote: > Well I know that isn't true at least not with ANY of the Dells my > customers have purchased in the last 18 months. They are still really, > really slow. That's too bad, can you cite some model numbers? SCSI? > I have gr

Re: [PATCHES] COPY FROM performance improvements

2005-07-21 Thread Joshua D. Drake
I think late model Dell (post the bad chipset problem, circa 2001-2?) and IBM and Sun servers are fine because they all use simple SCSI adapters from LSI or Adaptec. Well I know that isn't true at least not with ANY of the Dells my customers have purchased in the last 18 months. They are stil

Re: [PATCHES] COPY FROM performance improvements

2005-07-21 Thread Luke Lonergan
Joshua, On 7/21/05 5:08 PM, "Joshua D. Drake" <[EMAIL PROTECTED]> wrote: > O.k. this strikes me as interesting, now we know that Compaq and Dell > are borked for Linux. Is there a name brand server (read Enterprise) > that actually does provide reasonable performance? I think late model Dell (po

Re: [PATCHES] COPY FROM performance improvements

2005-07-21 Thread Joshua D. Drake
Luke Lonergan wrote: Cool! At what rate does your disk setup write sequential data, e.g.: time dd if=/dev/zero of=bigfile bs=8k count=50 (sized for 2x RAM on a system with 2GB) BTW - the Compaq smartarray controllers are pretty broken on Linux from a performance standpoint in our experie

Re: [PATCHES] COPY FROM performance improvements

2005-07-21 Thread Luke Lonergan
Cool! At what rate does your disk setup write sequential data, e.g.: time dd if=/dev/zero of=bigfile bs=8k count=50 (sized for 2x RAM on a system with 2GB) BTW - the Compaq smartarray controllers are pretty broken on Linux from a performance standpoint in our experience. We've had disastr

Re: [PATCHES] COPY FROM performance improvements

2005-07-21 Thread Mark Wong
I just ran through a few tests with the v14 patch against 100GB of data from dbt3 and found a 30% improvement; 3.6 hours vs 5.3 hours. Just to give a few details, I only loaded data and started a COPY in parallel for each the data files: http://www.testing.osdl.org/projects/dbt3testing/res

Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Luke Lonergan
Good points on all, another element in the performance expectations is the ratio of CPU speed to I/O subsystem speed, as Alon had hinted earlier. This patch substantially (500%) improves the efficiency of parsing in the COPY path, which, on a 3GHz P4 desktop with a commodity disk drive represents

Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Mark Wong
Whoopsies, yeah good point about the PRIMARY KEY. I'll fix that. Mark On Tue, 19 Jul 2005 18:17:52 -0400 Andrew Dunstan <[EMAIL PROTECTED]> wrote: > Mark, > > You should definitely not be doing this sort of thing, I believe: > > CREATE TABLE orders ( > o_orderkey INTEGER, > o_cust

Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Andrew Dunstan
Mark, You should definitely not be doing this sort of thing, I believe: CREATE TABLE orders ( o_orderkey INTEGER, o_custkey INTEGER, o_orderstatus CHAR(1), o_totalprice REAL, o_orderDATE DATE, o_orderpriority CHAR(15), o_clerk CHAR(15),

Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Alon Goldshuv
Mark, Thanks for the info. Yes, isolating indexes out of the picture is a good idea for this purpose. I can't really give a guess to how fast the load rate should be. I don't know how your system is configured, and all the hardware characteristics (and even if I knew that info I may not be able

Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Mark Wong
Hi Alon, Yeah, that helps. I just need to break up my scripts a little to just load the data and not build indexes. Is the following information good enough to give a guess about the data I'm loading, if you don't mind? ;) Here's a link to my script to create tables: http://developer.osdl.org/m

Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Alon Goldshuv
Hi Mark, I improved the data *parsing* capabilities of COPY, and didn't touch the data conversion or data insertion parts of the code. The parsing improvement will vary largely depending on the ratio of parsing -to- converting and inserting. Therefore, the speed increase really depends on the na

Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Mark Wong
On Thu, 14 Jul 2005 17:22:18 -0700 "Alon Goldshuv" <[EMAIL PROTECTED]> wrote: > I revisited my patch and removed the code duplications that were there, and > added support for CSV with buffered input, so CSV now runs faster too > (although it is not as optimized as the TEXT format parsing). So now

Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Andrew Dunstan
Alon Goldshuv wrote: I revisited my patch and removed the code duplications that were there, and added support for CSV with buffered input, so CSV now runs faster too (although it is not as optimized as the TEXT format parsing). So now TEXT,CSV and BINARY are all parsed in CopyFrom(), like in

Re: [PATCHES] COPY FROM performance improvements

2005-06-28 Thread Bruce Momjian
Luke Lonergan wrote: > Patch to update pgindent with new symbols and fix a bug in an awk section > (extra \\ in front of a ')'). Yea, that '\' wasn't needed. I applied the following patch to use // instead of "" for patterns, and removed the unneeded backslash. I will update the typedefs in a se

Re: [PATCHES] COPY FROM performance improvements

2005-06-28 Thread Andrew Dunstan
Luke, Alon OK, I'm going to apply the patch to my copy and try to get my head around it. meanwhile: . we should not be describing things as "old" or "new". The person reading the code might have no knowledge of the history, and should not need to. . we should not have "slow" and "fast" eith

Re: [PATCHES] COPY FROM performance improvements

2005-06-27 Thread Andrew Dunstan
Luke Lonergan wrote: Andrew, You might like to look at running pgindent (see src/tools/pgindent) over the file before cutting a patch. Since this is usually run over each file just before a release, the only badness should be things from recent patches. I've attached two patches, o

Re: [PATCHES] COPY FROM performance improvements

2005-06-27 Thread Andrew Dunstan
Luke Lonergan wrote: Yah - I think I fixed several mis-indented comments. I'm using vim with tabstop=4. I personally don't like tabs in text and would prefer them expanded using spaces, but that's a nice way to make small formatting changes look huge in a cvs diff. You might like to lo

Re: [PATCHES] COPY FROM performance improvements

2005-06-26 Thread Alvaro Herrera
On Sun, Jun 26, 2005 at 09:17:14PM -0700, Luke Lonergan wrote: > Bruce, > > > > Those parentheses are still there: > > > > for (start = s; (*s != c) && (s < (start + len)) ; s++) > > > > It should be: > > > > for (start = s; *s != c && s < start + len; s++) > > Thanks - didn't unde

Re: [PATCHES] COPY FROM performance improvements

2005-06-26 Thread Bruce Momjian
Luke Lonergan wrote: > Bruce, > > > Well, there has been no discussion about removing version 2 support, so > > it seems it is required. > > This should do it - see attached. Those parentheses are still there: for (start = s; (*s != c) && (s < (start + len)) ; s++) It should be: f

Re: [PATCHES] COPY FROM performance improvements

2005-06-26 Thread Bruce Momjian
Luke Lonergan wrote: > Attached has spaces between if,for, and foreach and "(", e.g., "if(" is now > "if (". It definitely looks better to me :-) > > Massive patch - agreed. Less bloated than it was yesterday though. Good, thanks. > What about the Protocol version 2? Looks like it could be ad

Re: [PATCHES] COPY FROM performance improvements

2005-06-26 Thread Bruce Momjian
Please change 'if(' to 'if (', and remove parenthese like this: for(start = s; (*s != c) && (s < (start + len)) ; s++) My only other comment is, "Yow, that is a massive patch". --- Luke Lonergan wrote: > Tom, > > >

Re: [PATCHES] COPY FROM performance improvements

2005-06-25 Thread Luke Lonergan
> If the StringInfo API were extended to use lengths... Er, like appendBinaryStringInfo() perhaps? ;-) Looks like what we need is there, it only lacks the offset adjustment of our bytebuffer. We should redo the copy.c with appendBinaryStringInfo() I think. - Luke ---

Re: [PATCHES] COPY FROM performance improvements

2005-06-25 Thread Luke Lonergan
Alon, > Hmm, now that I look back at them I can't remember why I thought it is > slower. Certainly using appendStringInfoCharMacro for every char is very > slow, but I could probably use appendStringInfoString and it should be as > fast as using the bytebuffer, they both do a straight forward mem

Re: [PATCHES] COPY FROM performance improvements

2005-06-25 Thread Alon Goldshuv
Hmm, now that I look back at them I can't remember why I thought it is slower. Certainly using appendStringInfoCharMacro for every char is very slow, but I could probably use appendStringInfoString and it should be as fast as using the bytebuffer, they both do a straight forward memcpy. Alon. On

Re: [PATCHES] COPY FROM performance improvements

2005-06-25 Thread Tom Lane
"Alon Goldshuv" <[EMAIL PROTECTED]> writes: > A struct "bytebuffer" is used instead of a StringInfoData for storing the > line and attributes. A StringInfoData is actually really cool and useful, > but we don't really need it's formatting capabilities in COPY FROM (as far > as I know), and so the b

Re: [PATCHES] COPY FROM performance improvements

2005-06-25 Thread Tom Lane
"Michael Paesold" <[EMAIL PROTECTED]> writes: > What about this one: > case COPY_OLD_FE: > +ereport(ERROR, > + (errcode(ERRCODE_CONNECTION_FAILURE), > + errmsg("Old FE protocal not yet supported in fast COPY processing > (CopyGetData())"))); > This can't stay like it is, can it

Re: [PATCHES] COPY FROM performance improvements

2005-06-25 Thread Alon Goldshuv
Just remembered another thing. A struct "bytebuffer" is used instead of a StringInfoData for storing the line and attributes. A StringInfoData is actually really cool and useful, but we don't really need it's formatting capabilities in COPY FROM (as far as I know), and so the bytebuffer is more st

Re: [PATCHES] COPY FROM performance improvements

2005-06-25 Thread Luke Lonergan
Alon, > If I remember more relevant things I'll post them here. > > Hope that helps, please ask more if there are unclear things! Good detail, very useful! - Luke ---(end of broadcast)--- TIP 7: don't forget to increase your free space map sett

Re: [PATCHES] COPY FROM performance improvements

2005-06-25 Thread Alon Goldshuv
> I haven't seen any demand for users to be allowed to specify an escape char > in text mode (BTW, that's what the docs call what you have called delimited > mode). Maybe there's a case for it, but I somewhat doubt it. I ran into users that could not load their data using COPY because their data

Re: [PATCHES] COPY FROM performance improvements

2005-06-25 Thread Luke Lonergan
Andrew, > What I would like to have is a high level description of > . how the new text mode code differs from the old text mode code, and > . which part of the change is responsible for how much performance gain. > > Maybe I have missed that in previous discussion, but this change is > sufficien

Re: [PATCHES] COPY FROM performance improvements

2005-06-25 Thread Alon Goldshuv
> What about this one: > case COPY_OLD_FE: > +ereport(ERROR, > + (errcode(ERRCODE_CONNECTION_FAILURE), > + errmsg("Old FE protocal not yet supported in fast COPY processing > (CopyGetData())"))); > + > > This can't stay like it is, can it? > Who uses the old FE protocol for CO

Re: [PATCHES] COPY FROM performance improvements

2005-06-25 Thread Michael Paesold
Luke Lonergan wrote: I've attached Alon's patch ported to the CVS trunk. It applies cleanly and passes the regressions. With fsync=false it is 40% faster loading a sample dataset with 15 columns of varied type. It's 19% faster with fsync=true. This patch separates the CopyFrom code into two

Re: [PATCHES] COPY FROM performance improvements

2005-06-25 Thread Andrew Dunstan
Alon Goldshuv said: > Andrew, > >> 4. We should indeed do this for CSV, especially since a lot of the >> relevant logic for detecting attribute starts is already there for CSV >> in CopyReadLine. I'm prepared to help you do that if necessary, since >> I'm guilty of perpetrating that code. > > That

Re: [PATCHES] COPY FROM performance improvements

2005-06-25 Thread Andrew Dunstan
Luke Lonergan said: >> 4. We should indeed do this for CSV, especially since a lot of the >> relevant logic for detecting attribute starts is already there for CSV >> in CopyReadLine. I'm prepared to help you do that if necessary, since >> I'm guilty of perpetrating that code. > > That would be aw

Re: [PATCHES] COPY FROM performance improvements

2005-06-25 Thread Alon Goldshuv
Andrew, Thanks for your initial feedback. > 2. This comment raises a flag in my mind: > > + * each attribute begins. If a specific attribute is not used for this > + * COPY command (ommitted from the column list), a value of 0 will be > assigned.+ * For example: for table foo(a,b,c,d,e) and COPY

Re: [PATCHES] COPY FROM performance improvements

2005-06-25 Thread Andrew Dunstan
Luke Lonergan said: > I've attached Alon's patch ported to the CVS trunk. It applies cleanly > and passes the regressions. With fsync=false it is 40% faster loading > a sample dataset with 15 columns of varied type. It's 19% faster with > fsync=true. > > This patch separates the CopyFrom code in

Re: [PATCHES] COPY FROM performance improvements

2005-06-24 Thread Luke Lonergan
We're porting this to 8.1, we should be able to get it done within the next couple of days. The changes needed are to the way that the unmodified CSV and binary code paths are preserved alongside the changed "delimited" text COPY. - Luke ---(end of broadcast)---

Re: [PATCHES] COPY FROM performance improvements

2005-06-23 Thread Luke Lonergan
Note that this patch is against 8.0.2. - Luke ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]