Whoopsies, yeah good point about the PRIMARY KEY.  I'll fix that.

Mark

On Tue, 19 Jul 2005 18:17:52 -0400
Andrew Dunstan <[EMAIL PROTECTED]> wrote:

> Mark,
> 
> You should definitely not be doing this sort of thing, I believe:
> 
> CREATE TABLE orders (
>       o_orderkey INTEGER,
>       o_custkey INTEGER,
>       o_orderstatus CHAR(1),
>       o_totalprice REAL,
>       o_orderDATE DATE,
>       o_orderpriority CHAR(15),
>       o_clerk CHAR(15),
>       o_shippriority INTEGER,
>       o_comment VARCHAR(79),
>       PRIMARY KEY (o_orderkey))
> 
> Create the table with no constraints, load the data, then set up primary keys 
> and whatever other constraints you want using ALTER TABLE. Last time I did a 
> load like this (albeit 2 orders of magnitude smaller) I saw a 50% speedup 
> from deferring constarint creation.
> 
> 
> cheers
> 
> andrew
> 
> 
> 
> Mark Wong wrote:
> 
> >Hi Alon,
> >
> >Yeah, that helps.  I just need to break up my scripts a little to just
> >load the data and not build indexes.
> >
> >Is the following information good enough to give a guess about the data
> >I'm loading, if you don't mind? ;)  Here's a link to my script to create
> >tables:
> >http://developer.osdl.org/markw/mt/getfile.py?id=eaf16b7831588729780645b2bb44f7f23437e432&path=scripts/pgsql/create_tables.sh.in
> >
> >File sizes:
> >-rw-r--r--  1 markw 50 2.3G Jul  8 15:03 customer.tbl
> >-rw-r--r--  1 markw 50  74G Jul  8 15:03 lineitem.tbl
> >-rw-r--r--  1 markw 50 2.1K Jul  8 15:03 nation.tbl
> >-rw-r--r--  1 markw 50  17G Jul  8 15:03 orders.tbl
> >-rw-r--r--  1 markw 50 2.3G Jul  8 15:03 part.tbl
> >-rw-r--r--  1 markw 50  12G Jul  8 15:03 partsupp.tbl
> >-rw-r--r--  1 markw 50  391 Jul  8 15:03 region.tbl
> >-rw-r--r--  1 markw 50 136M Jul  8 15:03 supplier.tbl
> >
> >Number of rows:
> ># wc -l *.tbl
> >    15000000 customer.tbl
> >   600037902 lineitem.tbl
> >          25 nation.tbl
> >   150000000 orders.tbl
> >    20000000 part.tbl
> >    80000000 partsupp.tbl
> >           5 region.tbl
> >     1000000 supplier.tbl
> >
> >Thanks,
> >Mark
> >
> >On Tue, 19 Jul 2005 14:05:56 -0700
> >"Alon Goldshuv" <[EMAIL PROTECTED]> wrote:
> >
> >  
> >
> >>Hi Mark,
> >>
> >>I improved the data *parsing* capabilities of COPY, and didn't touch the
> >>data conversion or data insertion parts of the code. The parsing improvement
> >>will vary largely depending on the ratio of parsing -to- converting and
> >>inserting. 
> >>
> >>Therefore, the speed increase really depends on the nature of your data:
> >>
> >>100GB file with
> >>long data rows (lots of parsing)
> >>Small number of columns (small number of attr conversions per row)
> >>less rows (less tuple insertions)
> >>
> >>Will show the best performance improvements.
> >>
> >>However, same file size 100GB with
> >>Short data rows (minimal parsing)
> >>large number of columns (large number of attr conversions per row)
> >>AND/OR
> >>more rows (more tuple insertions)
> >>
> >>Will show improvements but not as significant.
> >>In general I'll estimate 40%-95% improvement in load speed for the 1st case
> >>and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
> >>etc... This is for TEXT format. As for CSV, it may be faster but not as much
> >>as I specified here. BINARY will stay the same as before.
> >>
> >>HTH
> >>Alon.
> >>
> >>
> >>
> >>
> >>
> >>
> >>On 7/19/05 12:54 PM, "Mark Wong" <[EMAIL PROTECTED]> wrote:
> >>
> >>    
> >>
> >>>On Thu, 14 Jul 2005 17:22:18 -0700
> >>>"Alon Goldshuv" <[EMAIL PROTECTED]> wrote:
> >>>
> >>>      
> >>>
> >>>>I revisited my patch and removed the code duplications that were there, 
> >>>>and
> >>>>added support for CSV with buffered input, so CSV now runs faster too
> >>>>(although it is not as optimized as the TEXT format parsing). So now
> >>>>TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original 
> >>>>file.
> >>>>        
> >>>>
> >>>Hi Alon,
> >>>
> >>>I'm curious, what kind of system are you testing this on?  I'm trying to
> >>>load 100GB of data in our dbt3 workload on a 4-way itanium2.  I'm
> >>>interested in the results you would expect.
> >>>
> >>>Mark
> >>>
> >>>      
> >>>
> >

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to