You should definitely not be doing this sort of thing, I believe:

        o_orderkey INTEGER,
        o_custkey INTEGER,
        o_orderstatus CHAR(1),
        o_totalprice REAL,
        o_orderDATE DATE,
        o_orderpriority CHAR(15),
        o_clerk CHAR(15),
        o_shippriority INTEGER,
        o_comment VARCHAR(79),
        PRIMARY KEY (o_orderkey))

Create the table with no constraints, load the data, then set up primary keys 
and whatever other constraints you want using ALTER TABLE. Last time I did a 
load like this (albeit 2 orders of magnitude smaller) I saw a 50% speedup from 
deferring constarint creation.



Mark Wong wrote:

Hi Alon,

Yeah, that helps.  I just need to break up my scripts a little to just
load the data and not build indexes.

Is the following information good enough to give a guess about the data
I'm loading, if you don't mind? ;)  Here's a link to my script to create

File sizes:
-rw-r--r--  1 markw 50 2.3G Jul  8 15:03 customer.tbl
-rw-r--r--  1 markw 50  74G Jul  8 15:03 lineitem.tbl
-rw-r--r--  1 markw 50 2.1K Jul  8 15:03 nation.tbl
-rw-r--r--  1 markw 50  17G Jul  8 15:03 orders.tbl
-rw-r--r--  1 markw 50 2.3G Jul  8 15:03 part.tbl
-rw-r--r--  1 markw 50  12G Jul  8 15:03 partsupp.tbl
-rw-r--r--  1 markw 50  391 Jul  8 15:03 region.tbl
-rw-r--r--  1 markw 50 136M Jul  8 15:03 supplier.tbl

Number of rows:
# wc -l *.tbl
   15000000 customer.tbl
  600037902 lineitem.tbl
         25 nation.tbl
  150000000 orders.tbl
   20000000 part.tbl
   80000000 partsupp.tbl
          5 region.tbl
    1000000 supplier.tbl


On Tue, 19 Jul 2005 14:05:56 -0700
"Alon Goldshuv" <[EMAIL PROTECTED]> wrote:

Hi Mark,

I improved the data *parsing* capabilities of COPY, and didn't touch the
data conversion or data insertion parts of the code. The parsing improvement
will vary largely depending on the ratio of parsing -to- converting and
Therefore, the speed increase really depends on the nature of your data:

100GB file with
long data rows (lots of parsing)
Small number of columns (small number of attr conversions per row)
less rows (less tuple insertions)

Will show the best performance improvements.

However, same file size 100GB with
Short data rows (minimal parsing)
large number of columns (large number of attr conversions per row)
more rows (more tuple insertions)

Will show improvements but not as significant.
In general I'll estimate 40%-95% improvement in load speed for the 1st case
and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
etc... This is for TEXT format. As for CSV, it may be faster but not as much
as I specified here. BINARY will stay the same as before.


On 7/19/05 12:54 PM, "Mark Wong" <[EMAIL PROTECTED]> wrote:

On Thu, 14 Jul 2005 17:22:18 -0700
"Alon Goldshuv" <[EMAIL PROTECTED]> wrote:

I revisited my patch and removed the code duplications that were there, and
added support for CSV with buffered input, so CSV now runs faster too
(although it is not as optimized as the TEXT format parsing). So now
TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.
Hi Alon,

I'm curious, what kind of system are you testing this on?  I'm trying to
load 100GB of data in our dbt3 workload on a 4-way itanium2.  I'm
interested in the results you would expect.


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?


Reply via email to