Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Luke Lonergan
Simon,
 
 That part of the code was specifically written to take advantage of
 processing pipelines in the hardware, not because the actual theoretical
 algorithm for that approach was itself faster.

Yup, good point.
 
 Nobody's said what compiler/hardware they have been using, so since both
 Alon and Tom say their character finding logic is faster, it is likely
 to be down to that? Name your platforms gentlemen, please.

In this case, we've been using gcc (3.2.3 RHEL3 Linux, 3.4.3 Solaris 10) on 
Opteron and Intel Xeon and Pentium 4.  Alon's performance comparisons for the 
parse only were done on a HT enabled P4 3.0GHz on RHEL3 with gcc 3.2.3, 
probably with optimization -O2, but possibly -O3.
 
Note that the level of microparallelism on upcoming CPUs is increasing with 
increasing pipeline depth.  Though there will be a step back on the Intel line 
with the introduction of the Centrino-based Xeon cores in 2006/7, other CPUs 
continue the trend, and I expect the next generation of multi-core CPUs to 
possibly introduce threaded micro-architectures which can also be scheduled as 
pipelines.  The gcc 4 compiler introduces auto vectorization, which may enhance 
the optimality of some loops.
 
I think the key thing is to make as much parallelism apparent to the compiler 
as possible, which will generally mean loops.  This means faster code on all 
modern CPUs and it won't hurt older CPU speeds.

 My feeling is that we may learn something here that applies more widely
 across many parts of the code.

Yes, I think one thing we've learned is that there are important parts of the 
code, those that are in the data path (COPY, sort, spill to disk, etc) that are 
in dire need of optimization.  For instance, the fgetc() pattern should be 
banned everywhere in the data path.
 
BTW - we are tracking down (in our spare time :-() the extremely slow sort 
performance.  We're seeing sort times of 1.7MB/s on our fastest machines, even 
when the work_mem is equal to the square root of the sort set.  This is a 
*serious* problem for us and we aren't getting to it - ideas are welcome.
 
Optimization here means both the use of good fundamental algorithms and 
micro-optimization (minimize memory copies, expose long runs of operations to 
the compiler, maximize computational intensity by working in cache-resident 
blocks, etc).
 
- Luke




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 Nobody's said what compiler/hardware they have been using, so since both
 Alon and Tom say their character finding logic is faster, it is likely
 to be down to that? Name your platforms gentlemen, please.

I tested on HPPA with gcc 2.95.3 and on a Pentium 4 with gcc 3.4.3.
Got pretty much the same results on both.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Tom Lane
Luke Lonergan [EMAIL PROTECTED] writes:
 Yes, I think one thing we've learned is that there are important parts
 of the code, those that are in the data path (COPY, sort, spill to
 disk, etc) that are in dire need of optimization.  For instance, the
 fgetc() pattern should be banned everywhere in the data path.

Luke, I dislike whacking people upside the head, but this discussion
seems to presume that raw speed on Intel platforms is the only thing
that matters.  We have a few other concerns.  Portability, readability,
maintainability, and correctness all trump platform-specific
optimizations.  The COPY patch as presented lost badly on all those
counts, and you are lucky that it didn't get rejected completely.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Alvaro Herrera
On Wed, Aug 10, 2005 at 09:16:08AM -0700, Luke Lonergan wrote:

 On 8/10/05 8:37 AM, Tom Lane [EMAIL PROTECTED] wrote:
 
  Luke, I dislike whacking people upside the head, but this discussion
  seems to presume that raw speed on Intel platforms is the only thing
  that matters.  We have a few other concerns.  Portability, readability,
  maintainability, and correctness all trump platform-specific
  optimizations.  The COPY patch as presented lost badly on all those
  counts, and you are lucky that it didn't get rejected completely.
 
 It's a pleasure working with you too Tom :-)
 
 Until you present a result on platform that is faster than Alon's in the
 code that was modified, our proof still stands that his is 20% faster than
 yours.

AFAIR he never claimed otherwise ... his point was that to gain that
additional speedup, the code has to be made considerable worse (in
maintenability terms.)  Have you (or Alon) tried to port the rest of the
speed improvement to the new code?  Maybe it's possible to have at least
some of it without worsening the maintenability too badly.

Another question that comes to mind is: have you tried another compiler?
I see you are all using GCC at most 3.4; maybe the new optimizing
infrastructure in GCC 4.1 means you can have most of the speedup without
uglifying the code.  What about Intel's compiler?

 PostgreSQL needs major improvement to compete with Oracle and even MySQL on
 speed.  No whacking on the head is going to change that.

Certainly.  I think the point is what cost do we want to pay for the
speedup.  I think we all agree that even if we gain a 200% speedup by
rewriting COPY in assembly, it's simply not acceptable.

Another point may be that Bizgres can have a custom patch for the extra
speedup, without inflicting the maintenance cost on the community.

-- 
Alvaro Herrera (alvherre[a]alvh.no-ip.org)
La libertad es como el dinero; el que no la sabe emplear la pierde (Alvarez)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Bruce Momjian
Luke Lonergan wrote:
 Tom,
 
 On 8/10/05 8:37 AM, Tom Lane [EMAIL PROTECTED] wrote:
 
  Luke, I dislike whacking people upside the head, but this discussion
  seems to presume that raw speed on Intel platforms is the only thing
  that matters.  We have a few other concerns.  Portability, readability,
  maintainability, and correctness all trump platform-specific
  optimizations.  The COPY patch as presented lost badly on all those
  counts, and you are lucky that it didn't get rejected completely.
 
 It's a pleasure working with you too Tom :-)
 
 Until you present a result on platform that is faster than Alon's in the
 code that was modified, our proof still stands that his is 20% faster than
 yours.

Well, we could write it in assembler and make it even faster. :-)

I assume no one is suggesting that, so in such cases, we need to weigh
readability with performance.  I have not looked at the patch issues,
but usually loop unrolling is the opposite of readability, so we have to
make a tradeoff.  We have used macros in places where function call
overhead is a major hit, so we can consider loop unrolling in places
that are a major performance hit.  The macros we have used have
maintained the readability of the function call (unless you look at the
macro contents) so perhaps the optimizations you suggest can be done
with a similar eye to readability.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Joshua D. Drake

Also, as we proved the last time the correctness argument was thrown in, we
can fix the bugs and still make it a lot faster - and I would stick to that
whether it's a PA-RISC, DEC Alpha, Intel or AMD or event Ultra Sparc.


Luke this comment doesn't work. Do you have a test case that shows that 
on an Ultra Sparc or PPC that you are accurate? Both of these CPUs are 
pretty big players in the Enterprise space.



PostgreSQL needs major improvement to compete with Oracle and even MySQL on
speed.  No whacking on the head is going to change that.


I am going to assume that you forgot to clarify this statement with IN 
BULK LOADING, because if you didn't I would like to see your test results.


My very real life experience shows that MySQL can't not keep up with 
PostgreSQL under load.


Nobody here argues that PostgreSQL needs to improve. If it didn't need
to improve I would be out of business because it would be perfect.


Sincerely,

Joshua D. Drake





- Luke



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly



--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Bruce Momjian
Alvaro Herrera wrote:
 Another question that comes to mind is: have you tried another compiler?
 I see you are all using GCC at most 3.4; maybe the new optimizing
 infrastructure in GCC 4.1 means you can have most of the speedup without
 uglifying the code.  What about Intel's compiler?

EnterpriseDB mentioned on their web page they use the Intel compiler, so
I assume they saw a speedup by using it:

http://www.enterprisedb.com/edb-db.do

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Luke Lonergan
Alvaro,

On 8/10/05 9:46 AM, Alvaro Herrera [EMAIL PROTECTED] wrote:

 AFAIR he never claimed otherwise ... his point was that to gain that
 additional speedup, the code has to be made considerable worse (in
 maintenability terms.)  Have you (or Alon) tried to port the rest of the
 speed improvement to the new code?  Maybe it's possible to have at least
 some of it without worsening the maintenability too badly.

As I suggested previously, there is another, more maintainable way to get
more performance from the parsing logic.

It involves replacing something like this:


char c = input_routine()
 if (c == '\n') {
 else if (
.
.
.
 }


With something like this:


char [32] carr;

nread = Input_routine_new(carr,32)

  for (i=0; inread; i++) {
if (carr[I] == '\n') {
.
.
.
  }


And this section would run much faster (3x?).

This is what I think could make the overall patch 50% faster than it is now
(on the parsing part).

The issue that I expect we'll hear about is that since the parsing is
already 500% faster, it has vanished in the profile.  That's why Tom's
testing is not showing much difference between his and Alon's code, we
actually drop the other sections to bring it forward where we see the bigger
difference.

However, what I'm arguing here and elsewhere is that there's still a lot
more of this kind of optimization to be done.  12 MB/s COPY speed is not
enough.  There's 40% of the time in processing left to smack down.
  
 Another question that comes to mind is: have you tried another compiler?
 I see you are all using GCC at most 3.4; maybe the new optimizing
 infrastructure in GCC 4.1 means you can have most of the speedup without
 uglifying the code.  What about Intel's compiler?

We have routinely distributed PostgreSQL with the Intel compiler, up until
recently.  Interestingly, GCC now beats it handily in our tests on Opteron
and matches it on Xeon, which is too bad - it's my fav compiler.

The problem with this code is that it doesn't have enough micro-parallelism
without loops on the character parsing core.  The compiler can only do
register optimizations and branch prediction (poorly) unless it is given
more to work with.

 PostgreSQL needs major improvement to compete with Oracle and even MySQL on
 speed.  No whacking on the head is going to change that.
 
 Certainly.  I think the point is what cost do we want to pay for the
 speedup.  I think we all agree that even if we gain a 200% speedup by
 rewriting COPY in assembly, it's simply not acceptable.

Understood, and I totally agree.

 Another point may be that Bizgres can have a custom patch for the extra
 speedup, without inflicting the maintenance cost on the community.

We are committed to making Postgres the best DBMS for Business Intelligence.
Bizgres makes it safe for businesses to rely on open source for their
production uses.  As far as features go, I think the best way for our
customers is to make sure that Bizgres features are supporting the
PostgreSQL core and vis-versa.

- Luke 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] COPY FROM performance improvements

2005-08-10 Thread Alvaro Herrera
On Wed, Aug 10, 2005 at 12:57:18PM -0400, Bruce Momjian wrote:
 Alvaro Herrera wrote:
  Another question that comes to mind is: have you tried another compiler?
  I see you are all using GCC at most 3.4; maybe the new optimizing
  infrastructure in GCC 4.1 means you can have most of the speedup without
  uglifying the code.  What about Intel's compiler?
 
 EnterpriseDB mentioned on their web page they use the Intel compiler, so
 I assume they saw a speedup by using it:
 
   http://www.enterprisedb.com/edb-db.do

Yes, but notice these guys are Greenplum, not EDB.

-- 
Alvaro Herrera (alvherre[a]alvh.no-ip.org)
Hay dos momentos en la vida de un hombre en los que no debería
especular: cuando puede permitírselo y cuando no puede (Mark Twain)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] COPY FROM performance improvements

2005-08-09 Thread Alon Goldshuv
I did some performance checks after the recent code commit.

The good news is that the parsing speed of COPY is now MUCH faster, which is
great. It is about 5 times faster - about 100MB/sec on my machine
(previously 20MB/sec at best, usually less).

The better news is that my original patch parsing speed reaches 120MB/sec,
about 20MB/sec faster than the code that's now in CVS. This can be
significant for the long scheme of things and for large data sets. Maybe we
can improve the current code a bit more to reach this number.

I performed those measurement by executing *only the parsing logic* of the
COPY pipeline. All data conversion (functioncall3(string...)) and tuple
handling (form_heaptuple etc...) and insertion were manually disabled. So
the only code measured is reading from disk and parsing to the attribute
level.

Cheers,
Alon.





On 8/7/05 1:21 AM, Luke Lonergan [EMAIL PROTECTED] wrote:

 Tom,
 
 On 8/6/05 9:08 PM, Tom Lane [EMAIL PROTECTED] wrote:
 
 Luke Lonergan [EMAIL PROTECTED] writes:
 I had some difficulty in generating test cases that weren't largely
 I/O-bound, but AFAICT the patch as applied is about the same speed
 as what you submitted.
 
 You achieve the important objective of knocking the parsing stage down a
 lot, but your parsing code is actually about 20% slower than Alon's.
 
 I would like to see the exact test case you are using to make this
 claim; the tests I did suggested my code is the same speed or faster.
 
 I showed mine - you show yours :-)  Apparently our e-mail crossed.
   
 As best I can tell, my version of CopyReadAttributes is significantly
 quicker than Alon's, approximately balancing out the fact that my
 version of CopyReadLine is slower.  I did the latter first, and would
 now be tempted to rewrite it in the same style as CopyReadAttributes,
 ie one pass of memory-to-memory copy using pointers rather than buffer
 indexes.
 
 See previous timings - looks like Alon's parsing is substantially faster.
 However, I'd like him to confirm by running with the shunt placed at
 different stages, in this case between parse and attribute conversion (not
 attribute parse).
  
 BTW, late today I figured out a way to get fairly reproducible
 non-I/O-bound numbers about COPY FROM: use a trigger that suppresses
 the actual inserts, thus:
 
 create table foo ...
 create function noway() returns trigger as
 'begin return null; end' language plpgsql;
 create trigger noway before insert on foo
   for each row execute procedure noway();
 then repeat:
 copy foo from '/tmp/foo.data';
 
 Cool!  That's a better way than hacking code and inserting shunts.
  
 Alon will likely hit this tomorrow.
 
 - Luke
 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] COPY FROM performance improvements

2005-08-09 Thread Andrew Dunstan



Alon Goldshuv wrote:


I performed those measurement by executing *only the parsing logic* of the
COPY pipeline. All data conversion (functioncall3(string...)) and tuple
handling (form_heaptuple etc...) and insertion were manually disabled. So
the only code measured is reading from disk and parsing to the attribute
level.
 



Arguably this might exaggerate the effect quite significantly.  Users 
will want to know the real time effect on a complete COPY. Depending on 
how much the pasing is in the total time your 20% improvement in parsing 
might only be a small fraction of 20% improvement in COPY.


Like you, I'm happy we have seen a 5 times improvement in parsing. Is it 
possible you can factor out something smallish from your patch that 
might make up the balance?


cheers

andrew

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PATCHES] COPY FROM performance improvements

2005-08-06 Thread Tom Lane
Alon Goldshuv [EMAIL PROTECTED] writes:
 New patch attached. It includes very minor changes. These are changes that
 were committed to CVS 3 weeks ago (copy.c 1.247) which I missed in the
 previous patch.

I've applied this with (rather extensive) revisions.  I didn't like what
you had done with the control structure --- loading the input buffer
only at the outermost loop level was a bad design choice IMHO.  You had
sprinkled the code with an unreasonable number of special cases in order
to try to cope with the effects of that mistake, but there were lots
of problems still left.  Some of the bugs I noticed:

* Broke old-protocol COPY, since that has no provision for stopping at
the EOF marker except by parsing the data carefully to start with.  The
backend would just hang up unless the total data size chanced to be a
multiple of 64K.

* Subtle change in interpretation of \. EOF marker (the existing code
will recognize it even when not at start of line).

* Seems to have thrown away detection of newline format discrepancies.

* Fails for zero-column tables.

* Broke display of column values during error context callback (would
always show the first column contents no matter which one is being
complained of).

* DetectLineEnd mistakenly assumes CR mode if very last character of first
bufferload is CR; need to reserve judgment until next char is available.

* DetectLineEnd fails to account for backslashed control characters,
so it will e.g. accept \ followed by \n as determining the newline
style.

* Fails to apply encoding conversion if first line exceeds copy buf
size, because when DetectLineEnd fails the quick-exit path doesn't do
it.

* There seem to be several bugs associated with the fact that input_buf[]
always has 64K of data in it even when EOF has been reached on the
input.  One example:
echo -n 123 zzz1
psql create temp table t1(f1 text);
psql copy t1 from '/home/tgl/zzz1';
psql select * from t1;
hmm ... where'd that 64K of whitespace come from?

I rewrote the patch in a way that retained most of the speedups without
changing the basic control structure (except for replacing multiple
CopyReadAttribute calls with one CopyReadAttributes call per line).

I had some difficulty in generating test cases that weren't largely
I/O-bound, but AFAICT the patch as applied is about the same speed
as what you submitted.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] COPY FROM performance improvements

2005-08-06 Thread Luke Lonergan
Tom,

Thanks for finding the bugs and reworking things.

 I had some difficulty in generating test cases that weren't largely
 I/O-bound, but AFAICT the patch as applied is about the same speed
 as what you submitted.

You achieve the important objective of knocking the parsing stage down a
lot, but your parsing code is actually about 20% slower than Alon's.

Before your patch:
  Time: 14205.606 ms

With your patch:
  Time: 10565.374 ms

With Alon's patch:
  Time: 10289.845 ms

The parsing part of the code in your version is slower, but as a percentage
of the total it's hidden. The loss of 0.3 seconds on 143MB means:

- If parsing takes a total of 0.9 seconds, the parsing rate is 160MB/s
(143/0.9)

- If we add another 0.3 seconds to parsing to bring it to 1.2, then the
parsing rate becomes 120MB/s

When we improve the next stages of the processing (attribute conversion,
write-to disk), this will stand out a lot more.  Our objective is to get the
COPY rate *much* faster than the current poky rate of 14MB/s (after this
patch).

- Luke

On 8/6/05 2:04 PM, Tom Lane [EMAIL PROTECTED] wrote:

 Alon Goldshuv [EMAIL PROTECTED] writes:
 New patch attached. It includes very minor changes. These are changes that
 were committed to CVS 3 weeks ago (copy.c 1.247) which I missed in the
 previous patch.
 
 I've applied this with (rather extensive) revisions.  I didn't like what
 you had done with the control structure --- loading the input buffer
 only at the outermost loop level was a bad design choice IMHO.  You had
 sprinkled the code with an unreasonable number of special cases in order
 to try to cope with the effects of that mistake, but there were lots
 of problems still left.  Some of the bugs I noticed:
 
 * Broke old-protocol COPY, since that has no provision for stopping at
 the EOF marker except by parsing the data carefully to start with.  The
 backend would just hang up unless the total data size chanced to be a
 multiple of 64K.
 
 * Subtle change in interpretation of \. EOF marker (the existing code
 will recognize it even when not at start of line).
 
 * Seems to have thrown away detection of newline format discrepancies.
 
 * Fails for zero-column tables.
 
 * Broke display of column values during error context callback (would
 always show the first column contents no matter which one is being
 complained of).
 
 * DetectLineEnd mistakenly assumes CR mode if very last character of first
 bufferload is CR; need to reserve judgment until next char is available.
 
 * DetectLineEnd fails to account for backslashed control characters,
 so it will e.g. accept \ followed by \n as determining the newline
 style.
 
 * Fails to apply encoding conversion if first line exceeds copy buf
 size, because when DetectLineEnd fails the quick-exit path doesn't do
 it.
 
 * There seem to be several bugs associated with the fact that input_buf[]
 always has 64K of data in it even when EOF has been reached on the
 input.  One example:
 echo -n 123 zzz1
 psql create temp table t1(f1 text);
 psql copy t1 from '/home/tgl/zzz1';
 psql select * from t1;
 hmm ... where'd that 64K of whitespace come from?
 
 I rewrote the patch in a way that retained most of the speedups without
 changing the basic control structure (except for replacing multiple
 CopyReadAttribute calls with one CopyReadAttributes call per line).
 
 I had some difficulty in generating test cases that weren't largely
 I/O-bound, but AFAICT the patch as applied is about the same speed
 as what you submitted.
 
 regards, tom lane
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster
 



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] COPY FROM performance improvements

2005-08-06 Thread Luke Lonergan
Tom,

The previous timings were for a table with 15 columns of mixed type.  We
also test with 1 column to make the parsing overhead more apparent.  In the
case of 1 text column with 145MB of input data:

Your patch:
  Time: 6612.599 ms

Alon's patch:
  Time: 6119.244 ms


Alon's patch is 7.5% faster here, where it was only 3% faster on the 15
column case.  This is consistent with a large difference in parsing speed
between your approach and Alon's.

I'm pretty sure that the mistake you refer to is responsible for the speed
improvement, and was deliberately chosen to minimize memory copies, etc.
Given that we're looking ahead to getting much higher speeds, approaching
current high performance disk speeds, we've been looking more closely at the
parsing speed.  It comes down to a tradeoff between elegant code and speed.

We'll prove it in lab tests soon, where we measure the parsing rate
directly, but these experiments show it clearly, though indirectly.

- Luke



On 8/6/05 2:04 PM, Tom Lane [EMAIL PROTECTED] wrote:

 Alon Goldshuv [EMAIL PROTECTED] writes:
 New patch attached. It includes very minor changes. These are changes that
 were committed to CVS 3 weeks ago (copy.c 1.247) which I missed in the
 previous patch.
 
 I've applied this with (rather extensive) revisions.  I didn't like what
 you had done with the control structure --- loading the input buffer
 only at the outermost loop level was a bad design choice IMHO.  You had
 sprinkled the code with an unreasonable number of special cases in order
 to try to cope with the effects of that mistake, but there were lots
 of problems still left.  Some of the bugs I noticed:
 
 * Broke old-protocol COPY, since that has no provision for stopping at
 the EOF marker except by parsing the data carefully to start with.  The
 backend would just hang up unless the total data size chanced to be a
 multiple of 64K.
 
 * Subtle change in interpretation of \. EOF marker (the existing code
 will recognize it even when not at start of line).
 
 * Seems to have thrown away detection of newline format discrepancies.
 
 * Fails for zero-column tables.
 
 * Broke display of column values during error context callback (would
 always show the first column contents no matter which one is being
 complained of).
 
 * DetectLineEnd mistakenly assumes CR mode if very last character of first
 bufferload is CR; need to reserve judgment until next char is available.
 
 * DetectLineEnd fails to account for backslashed control characters,
 so it will e.g. accept \ followed by \n as determining the newline
 style.
 
 * Fails to apply encoding conversion if first line exceeds copy buf
 size, because when DetectLineEnd fails the quick-exit path doesn't do
 it.
 
 * There seem to be several bugs associated with the fact that input_buf[]
 always has 64K of data in it even when EOF has been reached on the
 input.  One example:
 echo -n 123 zzz1
 psql create temp table t1(f1 text);
 psql copy t1 from '/home/tgl/zzz1';
 psql select * from t1;
 hmm ... where'd that 64K of whitespace come from?
 
 I rewrote the patch in a way that retained most of the speedups without
 changing the basic control structure (except for replacing multiple
 CopyReadAttribute calls with one CopyReadAttributes call per line).
 
 I had some difficulty in generating test cases that weren't largely
 I/O-bound, but AFAICT the patch as applied is about the same speed
 as what you submitted.
 
 regards, tom lane
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster
 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] COPY FROM performance improvements

2005-08-06 Thread Luke Lonergan
Tom,

My direct e-mails to you are apparently blocked, so I'll send this to the
list.

I've attached the case we use for load performance testing, with the data
generator modified to produce a single row version of the dataset.

I do believe that you/we will need to invert the processing loop to get the
maximum parsing speed.  We will be implementing much higher loading speeds
which require it to compete with Oracle, Netezza, Teradata, so we'll have to
work this out for the best interests of our users.

- Luke



IVP.tgz
Description: Binary data

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] COPY FROM performance improvements

2005-08-06 Thread Tom Lane
Luke Lonergan [EMAIL PROTECTED] writes:
 I had some difficulty in generating test cases that weren't largely
 I/O-bound, but AFAICT the patch as applied is about the same speed
 as what you submitted.

 You achieve the important objective of knocking the parsing stage down a
 lot, but your parsing code is actually about 20% slower than Alon's.

I would like to see the exact test case you are using to make this
claim; the tests I did suggested my code is the same speed or faster.
The particular test case I was using was the tenk1 data from the
regression database, duplicated out to about 600K rows so as to run
long enough to measure with some degree of repeatability.

As best I can tell, my version of CopyReadAttributes is significantly
quicker than Alon's, approximately balancing out the fact that my
version of CopyReadLine is slower.  I did the latter first, and would
now be tempted to rewrite it in the same style as CopyReadAttributes,
ie one pass of memory-to-memory copy using pointers rather than buffer
indexes.

BTW, late today I figured out a way to get fairly reproducible
non-I/O-bound numbers about COPY FROM: use a trigger that suppresses
the actual inserts, thus:

create table foo ...
create function noway() returns trigger as
'begin return null; end' language plpgsql;
create trigger noway before insert on foo
  for each row execute procedure noway();
then repeat:
copy foo from '/tmp/foo.data';

If the source file is not too large to fit in kernel disk cache, then
after the first iteration there is no I/O at all.  I got numbers
that were reproducible within less than 1%, as opposed to 5% or more
variation when the thing was partially I/O bound.  Pretty useless in the
real world, of course, but great for timing COPY's data-pushing.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] COPY FROM performance improvements

2005-08-06 Thread Luke Lonergan
Tom,

On 8/6/05 9:08 PM, Tom Lane [EMAIL PROTECTED] wrote:

 Luke Lonergan [EMAIL PROTECTED] writes:
 I had some difficulty in generating test cases that weren't largely
 I/O-bound, but AFAICT the patch as applied is about the same speed
 as what you submitted.
 
 You achieve the important objective of knocking the parsing stage down a
 lot, but your parsing code is actually about 20% slower than Alon's.
 
 I would like to see the exact test case you are using to make this
 claim; the tests I did suggested my code is the same speed or faster.

I showed mine - you show yours :-)  Apparently our e-mail crossed.
  
 As best I can tell, my version of CopyReadAttributes is significantly
 quicker than Alon's, approximately balancing out the fact that my
 version of CopyReadLine is slower.  I did the latter first, and would
 now be tempted to rewrite it in the same style as CopyReadAttributes,
 ie one pass of memory-to-memory copy using pointers rather than buffer
 indexes.

See previous timings - looks like Alon's parsing is substantially faster.
However, I'd like him to confirm by running with the shunt placed at
different stages, in this case between parse and attribute conversion (not
attribute parse).
 
 BTW, late today I figured out a way to get fairly reproducible
 non-I/O-bound numbers about COPY FROM: use a trigger that suppresses
 the actual inserts, thus:
 
 create table foo ...
 create function noway() returns trigger as
 'begin return null; end' language plpgsql;
 create trigger noway before insert on foo
   for each row execute procedure noway();
 then repeat:
 copy foo from '/tmp/foo.data';

Cool!  That's a better way than hacking code and inserting shunts.
 
Alon will likely hit this tomorrow.

- Luke



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] [PATCHES] COPY FROM performance improvements

2005-08-02 Thread Alon Goldshuv
Tom,

Thanks for pointing it out. I made the small required modifications to match
copy.c version 1.247 and sent it to -patches list. New patch is V16.

Alon.


On 8/1/05 7:51 PM, Tom Lane [EMAIL PROTECTED] wrote:

 Alon Goldshuv [EMAIL PROTECTED] writes:
 This patch appears to reverse out the most recent committed changes in
 copy.c.
 
 Which changes do you refer to? I thought I accommodated all the recent
 changes (I recall some changes to the tupletable/tupleslot interface, HEADER
 in cvs, and hex escapes and maybe one or 2 more). What did I miss?
 
 The latest touch of copy.c, namely this patch:
 
 2005-07-10 17:13  tgl
 
 * doc/src/sgml/ref/create_type.sgml, src/backend/commands/copy.c,
 src/backend/commands/typecmds.c, src/backend/tcop/fastpath.c,
 src/backend/tcop/postgres.c, src/backend/utils/adt/arrayfuncs.c,
 src/backend/utils/adt/date.c, src/backend/utils/adt/numeric.c,
 src/backend/utils/adt/rowtypes.c,
 src/backend/utils/adt/timestamp.c, src/backend/utils/adt/varbit.c,
 src/backend/utils/adt/varchar.c, src/backend/utils/adt/varlena.c,
 src/backend/utils/mb/mbutils.c, src/include/catalog/catversion.h,
 src/include/catalog/pg_proc.h,
 src/test/regress/expected/type_sanity.out,
 src/test/regress/sql/type_sanity.sql: Change typreceive function
 API so that receive functions get the same optional arguments as
 text input functions, ie, typioparam OID and atttypmod.  Make all
 the datatypes that use typmod enforce it the same way in typreceive
 as they do in typinput.  This fixes a problem with failure to
 enforce length restrictions during COPY FROM BINARY.
 
 It was rather obvious, given that the first chunk of the patch backed up
 the file's CVS version stamp from 1.247 to 1.246 :-(
 
 regards, tom lane
 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] [PATCHES] COPY FROM performance improvements

2005-08-01 Thread Tom Lane
Alon Goldshuv [EMAIL PROTECTED] writes:
 This patch appears to reverse out the most recent committed changes in
 copy.c.

 Which changes do you refer to? I thought I accommodated all the recent
 changes (I recall some changes to the tupletable/tupleslot interface, HEADER
 in cvs, and hex escapes and maybe one or 2 more). What did I miss?

The latest touch of copy.c, namely this patch:

2005-07-10 17:13  tgl

* doc/src/sgml/ref/create_type.sgml, src/backend/commands/copy.c,
src/backend/commands/typecmds.c, src/backend/tcop/fastpath.c,
src/backend/tcop/postgres.c, src/backend/utils/adt/arrayfuncs.c,
src/backend/utils/adt/date.c, src/backend/utils/adt/numeric.c,
src/backend/utils/adt/rowtypes.c,
src/backend/utils/adt/timestamp.c, src/backend/utils/adt/varbit.c,
src/backend/utils/adt/varchar.c, src/backend/utils/adt/varlena.c,
src/backend/utils/mb/mbutils.c, src/include/catalog/catversion.h,
src/include/catalog/pg_proc.h,
src/test/regress/expected/type_sanity.out,
src/test/regress/sql/type_sanity.sql: Change typreceive function
API so that receive functions get the same optional arguments as
text input functions, ie, typioparam OID and atttypmod.  Make all
the datatypes that use typmod enforce it the same way in typreceive
as they do in typinput.  This fixes a problem with failure to
enforce length restrictions during COPY FROM BINARY.

It was rather obvious, given that the first chunk of the patch backed up
the file's CVS version stamp from 1.247 to 1.246 :-(

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] COPY FROM performance improvements

2005-07-22 Thread Joshua D. Drake

Luke Lonergan wrote:

Joshua,

On 7/21/05 7:53 PM, Joshua D. Drake [EMAIL PROTECTED] wrote:


Well I know that isn't true at least not with ANY of the Dells my
customers have purchased in the last 18 months. They are still really,
really slow.



That's too bad, can you cite some model numbers?  SCSI?


Yeah I will get them and post, but yes they are all SCSI.





I have great success with Silicon Image as long as I am running them
with Linux software RAID. The LSI controllers are also really nice.



That's good to hear, I gave up on Silicon Image controllers on Linux about 1
year ago, which kernel are you using with success?


Any of the 2.6 kernels. ALso the laster 2.4 (+22 I believe) support it 
pretty well as well.


  Silicon Image

controllers are the most popular, so it's important to see them supported
well, though I'd rather see more SATA headers than 2 off of the built-in
chipsets. 


- Luke




--
Your PostgreSQL solutions provider, Command Prompt, Inc.
24x7 support - 1.800.492.2240, programming, and consulting
Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
http://www.commandprompt.com / http://www.postgresql.org

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] COPY FROM performance improvements

2005-07-22 Thread Patrick Welche
On Thu, Jul 21, 2005 at 09:19:04PM -0700, Luke Lonergan wrote:
 Joshua,
 
 On 7/21/05 7:53 PM, Joshua D. Drake [EMAIL PROTECTED] wrote:
  Well I know that isn't true at least not with ANY of the Dells my
  customers have purchased in the last 18 months. They are still really,
  really slow.
 
 That's too bad, can you cite some model numbers?  SCSI?

I would be interested too, given

   http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=30531


Cheers,

Patrick

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] COPY FROM performance improvements

2005-07-21 Thread Mark Wong
I just ran through a few tests with the v14 patch against 100GB of data
from dbt3 and found a 30% improvement; 3.6 hours vs 5.3 hours.  Just to
give a few details, I only loaded data and started a COPY in parallel
for each the data files:
http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/

Here's a visual of my disk layout, for those familiar with the database schema:

http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/layout-dev4-010-dbt3.html

I have 6 arrays of fourteen 15k rpm drives in a split-bus configuration
attached to a 4-way itanium2 via 6 compaq smartarray pci-x controllers.

Let me know if you have any questions.

Mark

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] COPY FROM performance improvements

2005-07-21 Thread Luke Lonergan
Cool!

At what rate does your disk setup write sequential data, e.g.:
  time dd if=/dev/zero of=bigfile bs=8k count=50

(sized for 2x RAM on a system with 2GB)

BTW - the Compaq smartarray controllers are pretty broken on Linux from a
performance standpoint in our experience.  We've had disastrously bad
results from the SmartArray 5i and 6 controllers on kernels from 2.4 -
2.6.10, on the order of 20MB/s.

For comparison, the results on our dual opteron with a single LSI SCSI
controller with software RAID0 on a 2.6.10 kernel:

[EMAIL PROTECTED] dbfast]$ time dd if=/dev/zero of=bigfile bs=8k
count=50
50+0 records in
50+0 records out

real0m24.702s
user0m0.077s
sys 0m8.794s

Which calculates out to about 161MB/s.

- Luke


On 7/21/05 2:55 PM, Mark Wong [EMAIL PROTECTED] wrote:

 I just ran through a few tests with the v14 patch against 100GB of data
 from dbt3 and found a 30% improvement; 3.6 hours vs 5.3 hours.  Just to
 give a few details, I only loaded data and started a COPY in parallel
 for each the data files:
 http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/
 
 Here's a visual of my disk layout, for those familiar with the database
 schema:
 http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/layout-dev4
 -010-dbt3.html
 
 I have 6 arrays of fourteen 15k rpm drives in a split-bus configuration
 attached to a 4-way itanium2 via 6 compaq smartarray pci-x controllers.
 
 Let me know if you have any questions.
 
 Mark
 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] COPY FROM performance improvements

2005-07-21 Thread Joshua D. Drake

Luke Lonergan wrote:

Cool!

At what rate does your disk setup write sequential data, e.g.:
  time dd if=/dev/zero of=bigfile bs=8k count=50

(sized for 2x RAM on a system with 2GB)

BTW - the Compaq smartarray controllers are pretty broken on Linux from a
performance standpoint in our experience.  We've had disastrously bad
results from the SmartArray 5i and 6 controllers on kernels from 2.4 -
2.6.10, on the order of 20MB/s.


O.k. this strikes me as interesting, now we know that Compaq and Dell 
are borked for Linux. Is there a name brand server (read Enterprise) 
that actually does provide reasonable performance?




For comparison, the results on our dual opteron with a single LSI SCSI
controller with software RAID0 on a 2.6.10 kernel:

[EMAIL PROTECTED] dbfast]$ time dd if=/dev/zero of=bigfile bs=8k
count=50
50+0 records in
50+0 records out

real0m24.702s
user0m0.077s
sys 0m8.794s

Which calculates out to about 161MB/s.

- Luke


On 7/21/05 2:55 PM, Mark Wong [EMAIL PROTECTED] wrote:



I just ran through a few tests with the v14 patch against 100GB of data
from dbt3 and found a 30% improvement; 3.6 hours vs 5.3 hours.  Just to
give a few details, I only loaded data and started a COPY in parallel
for each the data files:
http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/

Here's a visual of my disk layout, for those familiar with the database
schema:
http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/layout-dev4
-010-dbt3.html

I have 6 arrays of fourteen 15k rpm drives in a split-bus configuration
attached to a 4-way itanium2 via 6 compaq smartarray pci-x controllers.

Let me know if you have any questions.

Mark






---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match



--
Your PostgreSQL solutions provider, Command Prompt, Inc.
24x7 support - 1.800.492.2240, programming, and consulting
Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
http://www.commandprompt.com / http://www.postgresql.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PATCHES] COPY FROM performance improvements

2005-07-21 Thread Luke Lonergan
Joshua,

On 7/21/05 5:08 PM, Joshua D. Drake [EMAIL PROTECTED] wrote:

 O.k. this strikes me as interesting, now we know that Compaq and Dell
 are borked for Linux. Is there a name brand server (read Enterprise)
 that actually does provide reasonable performance?

I think late model Dell (post the bad chipset problem, circa 2001-2?) and
IBM and Sun servers are fine because they all use simple SCSI adapters from
LSI or Adaptec.

The HP Smartarray is an aberration, they don't have good driver support for
Linux and as a consequence have some pretty bad problems with both
performance and stability.  On Windows they perform quite well.

Also - there are very big issues with some SATA controllers and Linux we've
seen, particularly the Silicon Image, Highpoint other non-Intel controllers.
Not sure about Nvidia, but the only ones I trust now are 3Ware and the
others mentioned in earlier posts.

- Luke

 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] COPY FROM performance improvements

2005-07-21 Thread Luke Lonergan
Joshua,

On 7/21/05 7:53 PM, Joshua D. Drake [EMAIL PROTECTED] wrote:
 Well I know that isn't true at least not with ANY of the Dells my
 customers have purchased in the last 18 months. They are still really,
 really slow.

That's too bad, can you cite some model numbers?  SCSI?

 I have great success with Silicon Image as long as I am running them
 with Linux software RAID. The LSI controllers are also really nice.

That's good to hear, I gave up on Silicon Image controllers on Linux about 1
year ago, which kernel are you using with success?  Silicon Image
controllers are the most popular, so it's important to see them supported
well, though I'd rather see more SATA headers than 2 off of the built-in
chipsets. 

- Luke



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Andrew Dunstan



Alon Goldshuv wrote:


I revisited my patch and removed the code duplications that were there, and
added support for CSV with buffered input, so CSV now runs faster too
(although it is not as optimized as the TEXT format parsing). So now
TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.

Patch attached.

 



I do not have time to review this  2900 line patch analytically, nor to 
benchmark it. I have done some functional testing of it on Windows, and 
tried to break it in text and CSV modes, and with both Unix and Windows 
type line endings - I have not observed any breakage.


This does need lots of eyeballs, though.

cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Mark Wong
On Thu, 14 Jul 2005 17:22:18 -0700
Alon Goldshuv [EMAIL PROTECTED] wrote:

 I revisited my patch and removed the code duplications that were there, and
 added support for CSV with buffered input, so CSV now runs faster too
 (although it is not as optimized as the TEXT format parsing). So now
 TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.

Hi Alon,

I'm curious, what kind of system are you testing this on?  I'm trying to
load 100GB of data in our dbt3 workload on a 4-way itanium2.  I'm
interested in the results you would expect.

Mark

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Alon Goldshuv
Hi Mark,

I improved the data *parsing* capabilities of COPY, and didn't touch the
data conversion or data insertion parts of the code. The parsing improvement
will vary largely depending on the ratio of parsing -to- converting and
inserting. 

Therefore, the speed increase really depends on the nature of your data:

100GB file with
long data rows (lots of parsing)
Small number of columns (small number of attr conversions per row)
less rows (less tuple insertions)

Will show the best performance improvements.

However, same file size 100GB with
Short data rows (minimal parsing)
large number of columns (large number of attr conversions per row)
AND/OR
more rows (more tuple insertions)

Will show improvements but not as significant.
In general I'll estimate 40%-95% improvement in load speed for the 1st case
and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
etc... This is for TEXT format. As for CSV, it may be faster but not as much
as I specified here. BINARY will stay the same as before.

HTH
Alon.






On 7/19/05 12:54 PM, Mark Wong [EMAIL PROTECTED] wrote:

 On Thu, 14 Jul 2005 17:22:18 -0700
 Alon Goldshuv [EMAIL PROTECTED] wrote:
 
 I revisited my patch and removed the code duplications that were there, and
 added support for CSV with buffered input, so CSV now runs faster too
 (although it is not as optimized as the TEXT format parsing). So now
 TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.
 
 Hi Alon,
 
 I'm curious, what kind of system are you testing this on?  I'm trying to
 load 100GB of data in our dbt3 workload on a 4-way itanium2.  I'm
 interested in the results you would expect.
 
 Mark
 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Mark Wong
Hi Alon,

Yeah, that helps.  I just need to break up my scripts a little to just
load the data and not build indexes.

Is the following information good enough to give a guess about the data
I'm loading, if you don't mind? ;)  Here's a link to my script to create
tables:
http://developer.osdl.org/markw/mt/getfile.py?id=eaf16b7831588729780645b2bb44f7f23437e432path=scripts/pgsql/create_tables.sh.in

File sizes:
-rw-r--r--  1 markw 50 2.3G Jul  8 15:03 customer.tbl
-rw-r--r--  1 markw 50  74G Jul  8 15:03 lineitem.tbl
-rw-r--r--  1 markw 50 2.1K Jul  8 15:03 nation.tbl
-rw-r--r--  1 markw 50  17G Jul  8 15:03 orders.tbl
-rw-r--r--  1 markw 50 2.3G Jul  8 15:03 part.tbl
-rw-r--r--  1 markw 50  12G Jul  8 15:03 partsupp.tbl
-rw-r--r--  1 markw 50  391 Jul  8 15:03 region.tbl
-rw-r--r--  1 markw 50 136M Jul  8 15:03 supplier.tbl

Number of rows:
# wc -l *.tbl
1500 customer.tbl
   600037902 lineitem.tbl
  25 nation.tbl
   15000 orders.tbl
2000 part.tbl
8000 partsupp.tbl
   5 region.tbl
 100 supplier.tbl

Thanks,
Mark

On Tue, 19 Jul 2005 14:05:56 -0700
Alon Goldshuv [EMAIL PROTECTED] wrote:

 Hi Mark,
 
 I improved the data *parsing* capabilities of COPY, and didn't touch the
 data conversion or data insertion parts of the code. The parsing improvement
 will vary largely depending on the ratio of parsing -to- converting and
 inserting. 
 
 Therefore, the speed increase really depends on the nature of your data:
 
 100GB file with
 long data rows (lots of parsing)
 Small number of columns (small number of attr conversions per row)
 less rows (less tuple insertions)
 
 Will show the best performance improvements.
 
 However, same file size 100GB with
 Short data rows (minimal parsing)
 large number of columns (large number of attr conversions per row)
 AND/OR
 more rows (more tuple insertions)
 
 Will show improvements but not as significant.
 In general I'll estimate 40%-95% improvement in load speed for the 1st case
 and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
 etc... This is for TEXT format. As for CSV, it may be faster but not as much
 as I specified here. BINARY will stay the same as before.
 
 HTH
 Alon.
 
 
 
 
 
 
 On 7/19/05 12:54 PM, Mark Wong [EMAIL PROTECTED] wrote:
 
  On Thu, 14 Jul 2005 17:22:18 -0700
  Alon Goldshuv [EMAIL PROTECTED] wrote:
  
  I revisited my patch and removed the code duplications that were there, and
  added support for CSV with buffered input, so CSV now runs faster too
  (although it is not as optimized as the TEXT format parsing). So now
  TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original 
  file.
  
  Hi Alon,
  
  I'm curious, what kind of system are you testing this on?  I'm trying to
  load 100GB of data in our dbt3 workload on a 4-way itanium2.  I'm
  interested in the results you would expect.
  
  Mark
  
 

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Alon Goldshuv
Mark,

Thanks for the info.

Yes, isolating indexes out of the picture is a good idea for this purpose.

I can't really give a guess to how fast the load rate should be. I don't
know how your system is configured, and all the hardware characteristics
(and even if I knew that info I may not be able to guess...). I am pretty
confident that the load will be faster than before, I'll risk that ;-)
Looking into your TPC-H size and metadata I'll estimate that
partsupp,customer and orders will have the most significant increase in load
rate. You could start with those.

I guess the only way to really know is to try... Load several times with the
existing PG-COPY and then load several times with the patched COPY and
compare. I'll be curious to hear your results.

Thx,
Alon.

 


On 7/19/05 2:37 PM, Mark Wong [EMAIL PROTECTED] wrote:

 Hi Alon,
 
 Yeah, that helps.  I just need to break up my scripts a little to just
 load the data and not build indexes.
 
 Is the following information good enough to give a guess about the data
 I'm loading, if you don't mind? ;)  Here's a link to my script to create
 tables:
 http://developer.osdl.org/markw/mt/getfile.py?id=eaf16b7831588729780645b2bb44f
 7f23437e432path=scripts/pgsql/create_tables.sh.in
 
 File sizes:
 -rw-r--r--  1 markw 50 2.3G Jul  8 15:03 customer.tbl
 -rw-r--r--  1 markw 50  74G Jul  8 15:03 lineitem.tbl
 -rw-r--r--  1 markw 50 2.1K Jul  8 15:03 nation.tbl
 -rw-r--r--  1 markw 50  17G Jul  8 15:03 orders.tbl
 -rw-r--r--  1 markw 50 2.3G Jul  8 15:03 part.tbl
 -rw-r--r--  1 markw 50  12G Jul  8 15:03 partsupp.tbl
 -rw-r--r--  1 markw 50  391 Jul  8 15:03 region.tbl
 -rw-r--r--  1 markw 50 136M Jul  8 15:03 supplier.tbl
 
 Number of rows:
 # wc -l *.tbl
 1500 customer.tbl
600037902 lineitem.tbl
   25 nation.tbl
15000 orders.tbl
 2000 part.tbl
 8000 partsupp.tbl
5 region.tbl
  100 supplier.tbl
 
 Thanks,
 Mark
 
 On Tue, 19 Jul 2005 14:05:56 -0700
 Alon Goldshuv [EMAIL PROTECTED] wrote:
 
 Hi Mark,
 
 I improved the data *parsing* capabilities of COPY, and didn't touch the
 data conversion or data insertion parts of the code. The parsing improvement
 will vary largely depending on the ratio of parsing -to- converting and
 inserting. 
 
 Therefore, the speed increase really depends on the nature of your data:
 
 100GB file with
 long data rows (lots of parsing)
 Small number of columns (small number of attr conversions per row)
 less rows (less tuple insertions)
 
 Will show the best performance improvements.
 
 However, same file size 100GB with
 Short data rows (minimal parsing)
 large number of columns (large number of attr conversions per row)
 AND/OR
 more rows (more tuple insertions)
 
 Will show improvements but not as significant.
 In general I'll estimate 40%-95% improvement in load speed for the 1st case
 and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
 etc... This is for TEXT format. As for CSV, it may be faster but not as much
 as I specified here. BINARY will stay the same as before.
 
 HTH
 Alon.
 
 
 
 
 
 
 On 7/19/05 12:54 PM, Mark Wong [EMAIL PROTECTED] wrote:
 
 On Thu, 14 Jul 2005 17:22:18 -0700
 Alon Goldshuv [EMAIL PROTECTED] wrote:
 
 I revisited my patch and removed the code duplications that were there, and
 added support for CSV with buffered input, so CSV now runs faster too
 (although it is not as optimized as the TEXT format parsing). So now
 TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original
 file.
 
 Hi Alon,
 
 I'm curious, what kind of system are you testing this on?  I'm trying to
 load 100GB of data in our dbt3 workload on a 4-way itanium2.  I'm
 interested in the results you would expect.
 
 Mark
 
 
 



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Andrew Dunstan

Mark,

You should definitely not be doing this sort of thing, I believe:

CREATE TABLE orders (
o_orderkey INTEGER,
o_custkey INTEGER,
o_orderstatus CHAR(1),
o_totalprice REAL,
o_orderDATE DATE,
o_orderpriority CHAR(15),
o_clerk CHAR(15),
o_shippriority INTEGER,
o_comment VARCHAR(79),
PRIMARY KEY (o_orderkey))

Create the table with no constraints, load the data, then set up primary keys 
and whatever other constraints you want using ALTER TABLE. Last time I did a 
load like this (albeit 2 orders of magnitude smaller) I saw a 50% speedup from 
deferring constarint creation.


cheers

andrew



Mark Wong wrote:


Hi Alon,

Yeah, that helps.  I just need to break up my scripts a little to just
load the data and not build indexes.

Is the following information good enough to give a guess about the data
I'm loading, if you don't mind? ;)  Here's a link to my script to create
tables:
http://developer.osdl.org/markw/mt/getfile.py?id=eaf16b7831588729780645b2bb44f7f23437e432path=scripts/pgsql/create_tables.sh.in

File sizes:
-rw-r--r--  1 markw 50 2.3G Jul  8 15:03 customer.tbl
-rw-r--r--  1 markw 50  74G Jul  8 15:03 lineitem.tbl
-rw-r--r--  1 markw 50 2.1K Jul  8 15:03 nation.tbl
-rw-r--r--  1 markw 50  17G Jul  8 15:03 orders.tbl
-rw-r--r--  1 markw 50 2.3G Jul  8 15:03 part.tbl
-rw-r--r--  1 markw 50  12G Jul  8 15:03 partsupp.tbl
-rw-r--r--  1 markw 50  391 Jul  8 15:03 region.tbl
-rw-r--r--  1 markw 50 136M Jul  8 15:03 supplier.tbl

Number of rows:
# wc -l *.tbl
   1500 customer.tbl
  600037902 lineitem.tbl
 25 nation.tbl
  15000 orders.tbl
   2000 part.tbl
   8000 partsupp.tbl
  5 region.tbl
100 supplier.tbl

Thanks,
Mark

On Tue, 19 Jul 2005 14:05:56 -0700
Alon Goldshuv [EMAIL PROTECTED] wrote:

 


Hi Mark,

I improved the data *parsing* capabilities of COPY, and didn't touch the
data conversion or data insertion parts of the code. The parsing improvement
will vary largely depending on the ratio of parsing -to- converting and
inserting. 


Therefore, the speed increase really depends on the nature of your data:

100GB file with
long data rows (lots of parsing)
Small number of columns (small number of attr conversions per row)
less rows (less tuple insertions)

Will show the best performance improvements.

However, same file size 100GB with
Short data rows (minimal parsing)
large number of columns (large number of attr conversions per row)
AND/OR
more rows (more tuple insertions)

Will show improvements but not as significant.
In general I'll estimate 40%-95% improvement in load speed for the 1st case
and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
etc... This is for TEXT format. As for CSV, it may be faster but not as much
as I specified here. BINARY will stay the same as before.

HTH
Alon.






On 7/19/05 12:54 PM, Mark Wong [EMAIL PROTECTED] wrote:

   


On Thu, 14 Jul 2005 17:22:18 -0700
Alon Goldshuv [EMAIL PROTECTED] wrote:

 


I revisited my patch and removed the code duplications that were there, and
added support for CSV with buffered input, so CSV now runs faster too
(although it is not as optimized as the TEXT format parsing). So now
TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.
   


Hi Alon,

I'm curious, what kind of system are you testing this on?  I'm trying to
load 100GB of data in our dbt3 workload on a 4-way itanium2.  I'm
interested in the results you would expect.

Mark

 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Mark Wong
Whoopsies, yeah good point about the PRIMARY KEY.  I'll fix that.

Mark

On Tue, 19 Jul 2005 18:17:52 -0400
Andrew Dunstan [EMAIL PROTECTED] wrote:

 Mark,
 
 You should definitely not be doing this sort of thing, I believe:
 
 CREATE TABLE orders (
   o_orderkey INTEGER,
   o_custkey INTEGER,
   o_orderstatus CHAR(1),
   o_totalprice REAL,
   o_orderDATE DATE,
   o_orderpriority CHAR(15),
   o_clerk CHAR(15),
   o_shippriority INTEGER,
   o_comment VARCHAR(79),
   PRIMARY KEY (o_orderkey))
 
 Create the table with no constraints, load the data, then set up primary keys 
 and whatever other constraints you want using ALTER TABLE. Last time I did a 
 load like this (albeit 2 orders of magnitude smaller) I saw a 50% speedup 
 from deferring constarint creation.
 
 
 cheers
 
 andrew
 
 
 
 Mark Wong wrote:
 
 Hi Alon,
 
 Yeah, that helps.  I just need to break up my scripts a little to just
 load the data and not build indexes.
 
 Is the following information good enough to give a guess about the data
 I'm loading, if you don't mind? ;)  Here's a link to my script to create
 tables:
 http://developer.osdl.org/markw/mt/getfile.py?id=eaf16b7831588729780645b2bb44f7f23437e432path=scripts/pgsql/create_tables.sh.in
 
 File sizes:
 -rw-r--r--  1 markw 50 2.3G Jul  8 15:03 customer.tbl
 -rw-r--r--  1 markw 50  74G Jul  8 15:03 lineitem.tbl
 -rw-r--r--  1 markw 50 2.1K Jul  8 15:03 nation.tbl
 -rw-r--r--  1 markw 50  17G Jul  8 15:03 orders.tbl
 -rw-r--r--  1 markw 50 2.3G Jul  8 15:03 part.tbl
 -rw-r--r--  1 markw 50  12G Jul  8 15:03 partsupp.tbl
 -rw-r--r--  1 markw 50  391 Jul  8 15:03 region.tbl
 -rw-r--r--  1 markw 50 136M Jul  8 15:03 supplier.tbl
 
 Number of rows:
 # wc -l *.tbl
 1500 customer.tbl
600037902 lineitem.tbl
   25 nation.tbl
15000 orders.tbl
 2000 part.tbl
 8000 partsupp.tbl
5 region.tbl
  100 supplier.tbl
 
 Thanks,
 Mark
 
 On Tue, 19 Jul 2005 14:05:56 -0700
 Alon Goldshuv [EMAIL PROTECTED] wrote:
 
   
 
 Hi Mark,
 
 I improved the data *parsing* capabilities of COPY, and didn't touch the
 data conversion or data insertion parts of the code. The parsing improvement
 will vary largely depending on the ratio of parsing -to- converting and
 inserting. 
 
 Therefore, the speed increase really depends on the nature of your data:
 
 100GB file with
 long data rows (lots of parsing)
 Small number of columns (small number of attr conversions per row)
 less rows (less tuple insertions)
 
 Will show the best performance improvements.
 
 However, same file size 100GB with
 Short data rows (minimal parsing)
 large number of columns (large number of attr conversions per row)
 AND/OR
 more rows (more tuple insertions)
 
 Will show improvements but not as significant.
 In general I'll estimate 40%-95% improvement in load speed for the 1st case
 and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
 etc... This is for TEXT format. As for CSV, it may be faster but not as much
 as I specified here. BINARY will stay the same as before.
 
 HTH
 Alon.
 
 
 
 
 
 
 On 7/19/05 12:54 PM, Mark Wong [EMAIL PROTECTED] wrote:
 
 
 
 On Thu, 14 Jul 2005 17:22:18 -0700
 Alon Goldshuv [EMAIL PROTECTED] wrote:
 
   
 
 I revisited my patch and removed the code duplications that were there, 
 and
 added support for CSV with buffered input, so CSV now runs faster too
 (although it is not as optimized as the TEXT format parsing). So now
 TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original 
 file.
 
 
 Hi Alon,
 
 I'm curious, what kind of system are you testing this on?  I'm trying to
 load 100GB of data in our dbt3 workload on a 4-way itanium2.  I'm
 interested in the results you would expect.
 
 Mark
 
   
 
 

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Luke Lonergan
Good points on all, another element in the performance expectations is the
ratio of CPU speed to I/O subsystem speed, as Alon had hinted earlier.

This patch substantially (500%) improves the efficiency of parsing in the
COPY path, which, on a 3GHz P4 desktop with a commodity disk drive
represents 8 of a total of 30 seconds of processing time.  So, by reducing
the parsing time from 8 seconds to 1.5 seconds, the overall COPY time is
reduced from 30 seconds to 23.5 seconds, or a speedup of about 20%.

On a dual 2.2GHz Opteron machine with a 6-disk SCSI RAID subsystem capable
of 240MB/s sequential read and writes, the ratios change and we see between
35% and 95% increase in COPY performance, with the bottleneck being CPU.
The disk is only running at about 90MB/s during this period.

I'd expect that as your CPUs slow down relative to your I/O speed, and
Itaniums or IT2s are quite slow, you should see an increased effect of the
parsing improvements.

One good way to validate the effect is to watch the I/O bandwidth using
vmstat 1 (on Linux) while the load is progressing.  When you watch that with
the unpatched source and with the patched source, if they are the same, you
should see no benefit from the patch (you are I/O limited).

If you check your underlying sequential write speed, you will be
bottlenecked at roughly half that in performing COPY because of the
write-through the WAL.

- Luke

On 7/19/05 3:51 PM, Mark Wong [EMAIL PROTECTED] wrote:

 Whoopsies, yeah good point about the PRIMARY KEY.  I'll fix that.
 
 Mark
 
 On Tue, 19 Jul 2005 18:17:52 -0400
 Andrew Dunstan [EMAIL PROTECTED] wrote:
 
 Mark,
 
 You should definitely not be doing this sort of thing, I believe:
 
 CREATE TABLE orders (
 o_orderkey INTEGER,
 o_custkey INTEGER,
 o_orderstatus CHAR(1),
 o_totalprice REAL,
 o_orderDATE DATE,
 o_orderpriority CHAR(15),
 o_clerk CHAR(15),
 o_shippriority INTEGER,
 o_comment VARCHAR(79),
 PRIMARY KEY (o_orderkey))
 
 Create the table with no constraints, load the data, then set up primary keys
 and whatever other constraints you want using ALTER TABLE. Last time I did a
 load like this (albeit 2 orders of magnitude smaller) I saw a 50% speedup
 from deferring constarint creation.
 
 
 cheers
 
 andrew
 
 
 
 Mark Wong wrote:
 
 Hi Alon,
 
 Yeah, that helps.  I just need to break up my scripts a little to just
 load the data and not build indexes.
 
 Is the following information good enough to give a guess about the data
 I'm loading, if you don't mind? ;)  Here's a link to my script to create
 tables:
 http://developer.osdl.org/markw/mt/getfile.py?id=eaf16b7831588729780645b2bb4
 4f7f23437e432path=scripts/pgsql/create_tables.sh.in
 
 File sizes:
 -rw-r--r--  1 markw 50 2.3G Jul  8 15:03 customer.tbl
 -rw-r--r--  1 markw 50  74G Jul  8 15:03 lineitem.tbl
 -rw-r--r--  1 markw 50 2.1K Jul  8 15:03 nation.tbl
 -rw-r--r--  1 markw 50  17G Jul  8 15:03 orders.tbl
 -rw-r--r--  1 markw 50 2.3G Jul  8 15:03 part.tbl
 -rw-r--r--  1 markw 50  12G Jul  8 15:03 partsupp.tbl
 -rw-r--r--  1 markw 50  391 Jul  8 15:03 region.tbl
 -rw-r--r--  1 markw 50 136M Jul  8 15:03 supplier.tbl
 
 Number of rows:
 # wc -l *.tbl
1500 customer.tbl
   600037902 lineitem.tbl
  25 nation.tbl
   15000 orders.tbl
2000 part.tbl
8000 partsupp.tbl
   5 region.tbl
 100 supplier.tbl
 
 Thanks,
 Mark
 
 On Tue, 19 Jul 2005 14:05:56 -0700
 Alon Goldshuv [EMAIL PROTECTED] wrote:
 
  
 
 Hi Mark,
 
 I improved the data *parsing* capabilities of COPY, and didn't touch the
 data conversion or data insertion parts of the code. The parsing
 improvement
 will vary largely depending on the ratio of parsing -to- converting and
 inserting. 
 
 Therefore, the speed increase really depends on the nature of your data:
 
 100GB file with
 long data rows (lots of parsing)
 Small number of columns (small number of attr conversions per row)
 less rows (less tuple insertions)
 
 Will show the best performance improvements.
 
 However, same file size 100GB with
 Short data rows (minimal parsing)
 large number of columns (large number of attr conversions per row)
 AND/OR
 more rows (more tuple insertions)
 
 Will show improvements but not as significant.
 In general I'll estimate 40%-95% improvement in load speed for the 1st case
 and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
 etc... This is for TEXT format. As for CSV, it may be faster but not as
 much
 as I specified here. BINARY will stay the same as before.
 
 HTH
 Alon.
 
 
 
 
 
 
 On 7/19/05 12:54 PM, Mark Wong [EMAIL PROTECTED] wrote:
 

 
 On Thu, 14 Jul 2005 17:22:18 -0700
 Alon Goldshuv [EMAIL PROTECTED] wrote:
 
  
 
 I revisited my patch and removed the code duplications that were there,
 and
 added support for CSV with buffered input, so CSV now runs faster too
 (although it is not as optimized as the TEXT format parsing). So now
 TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original
 file.
   

Re: [PATCHES] COPY FROM performance improvements

2005-06-28 Thread Andrew Dunstan

Luke, Alon

OK, I'm going to apply the patch to my copy and try to get my head 
around it. meanwhile:


. we should not be describing things as old or new.  The person 
reading the code might have no knowledge of the history, and should not 
need to.
. we should not have slow and fast either. We should have text, 
csv and binary.


IOW, the patch comments look slightly like it is intended for after the 
fact application rather than incorporation into the main code.


Are you looking at putting CSV mode into the fast code? Please let me 
know if you have questions about that. There are only a few days left to 
whip this into shape.


cheers

andrew

Luke Lonergan wrote:


Andrew,

 


Something strange has happened. I suspect that you've inadvertantly used
GNU indent or an unpatched BSD indent. pgindent needs a special patched
BSD indent to work according to the PG standards - see the README
   



OK - phew!  I generated new symbols for pgindent and fixed a bug in the awk
scripting within (diff attached) and ran against the CVS tip copy.c and got
only minor changes in formatting that appear to be consistent with the rest
of the code.  I pgindent'ed the COPY FROM performance modded code and it
looks good and tests good.

Only formatting changes to the previous patch for copy.c attached.

Patch to update pgindent with new symbols and fix a bug in an awk section
(extra \\ in front of a ')').

- Luke 

 



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PATCHES] COPY FROM performance improvements

2005-06-28 Thread Bruce Momjian
Luke Lonergan wrote:
 Patch to update pgindent with new symbols and fix a bug in an awk section
 (extra \\ in front of a ')').

Yea, that '\' wasn't needed.  I applied the following patch to use //
instead of  for patterns, and removed the unneeded backslash.

I will update the typedefs in a separate commit.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073
Index: src/tools/pgindent/pgindent
===
RCS file: /cvsroot/pgsql/src/tools/pgindent/pgindent,v
retrieving revision 1.73
diff -c -c -r1.73 pgindent
*** src/tools/pgindent/pgindent 7 Oct 2004 14:15:50 -   1.73
--- src/tools/pgindent/pgindent 28 Jun 2005 23:12:07 -
***
*** 50,62 
if (NR = 2)
print line1;
if (NR = 2 
!   line2 ~ ^{[]*$ 
!   line1 !~ ^struct 
!   line1 !~ ^enum 
!   line1 !~ ^typedef 
!   line1 !~ ^extern[  ][  ]*\C\ 
!   line1 !~ = 
!   line1 ~ \))
print int  pgindent_func_no_var_fix;;
line1 = line2;
}
--- 50,62 
if (NR = 2)
print line1;
if (NR = 2 
!   line2 ~ /^{[]*$/ 
!   line1 !~ /^struct/ 
!   line1 !~ /^enum/ 
!   line1 !~ /^typedef/ 
!   line1 !~ /^extern[  ][  ]*C/ 
!   line1 !~ /=/ 
!   line1 ~ /)/)
print int  pgindent_func_no_var_fix;;
line1 = line2;
}
***
*** 70,77 
line2 = $0;
if (skips  0)
skips--;
!   if (line1 ~ ^#ifdef[   ]*__cplusplus 
!   line2 ~ ^extern[   ]*\C\[]*$)
{
print line1;
print line2;
--- 70,77 
line2 = $0;
if (skips  0)
skips--;
!   if (line1 ~ /^#ifdef[   ]*__cplusplus/ 
!   line2 ~ /^extern[   ]*C[  ]*$/)
{
print line1;
print line2;
***
*** 81,88 
line2 = ;
skips = 2;
}
!   else if (line1 ~ ^#ifdef[  ]*__cplusplus 
!   line2 ~ ^}[]*$)
{
print line1;
print /* Close extern \C\ */;
--- 81,88 
line2 = ;
skips = 2;
}
!   else if (line1 ~ /^#ifdef[  ]*__cplusplus/ 
!   line2 ~ /^}[]*$/)
{
print line1;
print /* Close extern \C\ */;
***
*** 1732,1738 
  # work around misindenting of function with no variables defined
awk '
{
!   if ($0 ~ ^[]*int[  ]*pgindent_func_no_var_fix;)
{
if (getline  $0 != )
print $0;
--- 1732,1738 
  # work around misindenting of function with no variables defined
awk '
{
!   if ($0 ~ /^[]*int[  ]*pgindent_func_no_var_fix;/)
{
if (getline  $0 != )
print $0;
***
*** 1751,1759 
  # line3 = $0;  
  # if (skips  0)
  # skips--;
! # if (line1 ~*{$ 
! # line2 ~*[^;{}]*;$ 
! # line3 ~*}$)
  # {
  # print line2;
  # line2 = ;
--- 1751,1759 
  # line3 = $0;  
  # if (skips  0)
  # skips--;
! # if (line1 ~ /   *{$/ 
! # line2 ~ /   *[^;{}]*;$/ 
! # line3 ~ /   *}$/)
  #   

Re: [PATCHES] COPY FROM performance improvements

2005-06-27 Thread Andrew Dunstan



Luke Lonergan wrote:


Yah - I think I fixed several mis-indented comments.  I'm using vim with
tabstop=4.  I personally don't like tabs in text and would prefer them
expanded using spaces, but that's a nice way to make small formatting
changes look huge in a cvs diff.

 



You might like to look at running pgindent (see src/tools/pgindent) over 
the file before cutting a patch. Since this is usually run over each 
file just before a release, the only badness should be things from 
recent patches.


cheers

andrew

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PATCHES] COPY FROM performance improvements

2005-06-26 Thread Bruce Momjian

Please change 'if(' to 'if (', and remove parenthese like this:

   for(start = s; (*s != c)  (s  (start + len)) ; s++)

My only other comment is, Yow, that is a massive patch.

---

Luke Lonergan wrote:
 Tom,
 
  Is it really faster than appendStringInfoString or
  appendBinaryStringInfo?
 
 Apparently not, attached patch strips out the other bytebuffer and replaces
 it with appendBinaryStringInfo.
 
 - Luke
 

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PATCHES] COPY FROM performance improvements

2005-06-26 Thread Bruce Momjian
Luke Lonergan wrote:
 Attached has spaces between if,for, and foreach and (, e.g., if( is now
 if (.  It definitely looks better to me :-)
 
 Massive patch - agreed.  Less bloated than it was yesterday though.

Good, thanks.

 What about the Protocol version 2?  Looks like it could be added back
 without too much trouble.

Well, there has been no discussion about removing version 2 support, so
it seems it is required.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PATCHES] COPY FROM performance improvements

2005-06-26 Thread Bruce Momjian
Luke Lonergan wrote:
 Bruce,
 
  Well, there has been no discussion about removing version 2 support, so
  it seems it is required.
 
 This should do it - see attached.

Those parentheses are still there:

  for (start = s; (*s != c)  (s  (start + len)) ; s++)

It should be:

  for (start = s; *s != c  s  start + len; s++)

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq