Tom,

Thanks for finding the bugs and reworking things.

> I had some difficulty in generating test cases that weren't largely
> I/O-bound, but AFAICT the patch as applied is about the same speed
> as what you submitted.

You achieve the important objective of knocking the parsing stage down a
lot, but your parsing code is actually about 20% slower than Alon's.

Before your patch:
  Time: 14205.606 ms

With your patch:
  Time: 10565.374 ms

With Alon's patch:
  Time: 10289.845 ms

The parsing part of the code in your version is slower, but as a percentage
of the total it's hidden. The loss of 0.3 seconds on 143MB means:

- If parsing takes a total of 0.9 seconds, the parsing rate is 160MB/s
(143/0.9)

- If we add another 0.3 seconds to parsing to bring it to 1.2, then the
parsing rate becomes 120MB/s

When we improve the next stages of the processing (attribute conversion,
write-to disk), this will stand out a lot more.  Our objective is to get the
COPY rate *much* faster than the current poky rate of 14MB/s (after this
patch).

- Luke

On 8/6/05 2:04 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:

> "Alon Goldshuv" <[EMAIL PROTECTED]> writes:
>> New patch attached. It includes very minor changes. These are changes that
>> were committed to CVS 3 weeks ago (copy.c 1.247) which I missed in the
>> previous patch.
> 
> I've applied this with (rather extensive) revisions.  I didn't like what
> you had done with the control structure --- loading the input buffer
> only at the outermost loop level was a bad design choice IMHO.  You had
> sprinkled the code with an unreasonable number of special cases in order
> to try to cope with the effects of that mistake, but there were lots
> of problems still left.  Some of the bugs I noticed:
> 
> * Broke old-protocol COPY, since that has no provision for stopping at
> the EOF marker except by parsing the data carefully to start with.  The
> backend would just hang up unless the total data size chanced to be a
> multiple of 64K.
> 
> * Subtle change in interpretation of \. EOF marker (the existing code
> will recognize it even when not at start of line).
> 
> * Seems to have thrown away detection of newline format discrepancies.
> 
> * Fails for zero-column tables.
> 
> * Broke display of column values during error context callback (would
> always show the first column contents no matter which one is being
> complained of).
> 
> * DetectLineEnd mistakenly assumes CR mode if very last character of first
> bufferload is CR; need to reserve judgment until next char is available.
> 
> * DetectLineEnd fails to account for backslashed control characters,
> so it will e.g. accept \ followed by \n as determining the newline
> style.
> 
> * Fails to apply encoding conversion if first line exceeds copy buf
> size, because when DetectLineEnd fails the quick-exit path doesn't do
> it.
> 
> * There seem to be several bugs associated with the fact that input_buf[]
> always has 64K of data in it even when EOF has been reached on the
> input.  One example:
> echo -n 123 >zzz1
> psql> create temp table t1(f1 text);
> psql> copy t1 from '/home/tgl/zzz1';
> psql> select * from t1;
> hmm ... where'd that 64K of whitespace come from?
> 
> I rewrote the patch in a way that retained most of the speedups without
> changing the basic control structure (except for replacing multiple
> CopyReadAttribute calls with one CopyReadAttributes call per line).
> 
> I had some difficulty in generating test cases that weren't largely
> I/O-bound, but AFAICT the patch as applied is about the same speed
> as what you submitted.
> 
> regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
> 



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to