Re: [PATCHES] COPY FROM performance improvements

Alon Goldshuv Tue, 19 Jul 2005 14:10:48 -0700

Hi Mark,

I improved the data *parsing* capabilities of COPY, and didn't touch the
data conversion or data insertion parts of the code. The parsing improvement
will vary largely depending on the ratio of parsing -to- converting and
inserting.

Therefore, the speed increase really depends on the nature of your data:

100GB file with
long data rows (lots of parsing)
Small number of columns (small number of attr conversions per row)
less rows (less tuple insertions)

Will show the best performance improvements.

However, same file size 100GB with
Short data rows (minimal parsing)
large number of columns (large number of attr conversions per row)
AND/OR
more rows (more tuple insertions)

Will show improvements but not as significant.
In general I'll estimate 40%-95% improvement in load speed for the 1st case
and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
etc... This is for TEXT format. As for CSV, it may be faster but not as much
as I specified here. BINARY will stay the same as before.

HTH
Alon.

On 7/19/05 12:54 PM, "Mark Wong" <[EMAIL PROTECTED]> wrote:

> On Thu, 14 Jul 2005 17:22:18 -0700
> "Alon Goldshuv" <[EMAIL PROTECTED]> wrote:
> 
>> I revisited my patch and removed the code duplications that were there, and
>> added support for CSV with buffered input, so CSV now runs faster too
>> (although it is not as optimized as the TEXT format parsing). So now
>> TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.
> 
> Hi Alon,
> 
> I'm curious, what kind of system are you testing this on?  I'm trying to
> load 100GB of data in our dbt3 workload on a 4-way itanium2.  I'm
> interested in the results you would expect.
> 
> Mark
> 

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Re: [PATCHES] COPY FROM performance improvements

Reply via email to