Re: [PERFORM] Bulk Insert and Index use

Gregory S. Williamson Tue, 10 Aug 2004 21:16:03 -0700

If it has to read a majority (or even a good percentage) of the rows in question a 
sequential scan is probably faster ... and as Jim pointed out, a temp table can often 
be a useful medium for getting speed in a load and then allowing you to clean/alter 
data for a final (easy) push.


G
-----Original Message-----
From:   Rudi Starcevic [mailto:[EMAIL PROTECTED]
Sent:   Tue 8/10/2004 8:33 PM
To:     [EMAIL PROTECTED]
Cc:     
Subject:        Re: [PERFORM] Bulk Insert and Index use
Hi Jim,

Thanks for your time.

 > If the bulk load has the possibility of duplicating data

Yes, each row will require either:

a) One SELECT + One INSERT
or
b) One SELECT + One UPDATE

I did think of using more than one table, ie. temp table.
As each month worth of data is added I expect to see
a change from lots of INSERTS to lots of UPDATES.

Perhaps when the UPDATES become more dominant it would
be best to start using Indexes.

While INSERTS are more prevelant perhaps a seq. scan is better.

I guess of all the options available it boils down to which
is quicker for my data: index or sequential scan.

Many thanks.

Jim J wrote:

> If the bulk load has the possibility of duplicating data, then you need 
> to change methods.  Try bulk loading into a temp table,  index it like 
> the original, eliminate the dups and merge the tables.
> 
> It is also possible to do an insert from the temp table into the final 
> table like:
> insert into original (x,x,x)  (select temp.1, temp.2, etc from temp left 
> join original on temp.street=original.street where original.street is null)
> 
> Good Luck
> Jim
> 
> Rudi Starcevic wrote:
> 
>> Hi,
>>
>> I have a question on bulk checking, inserting into a table and
>> how best to use an index for performance.
>>
>> The data I have to work with is a monthly CD Rom csv data dump of
>> 300,000 property owners from one area/shire.
>>
>> So every CD has 300,000 odd lines, each line of data which fills the 
>> 'property' table.
>>
>> Beginning with the first CD each line should require one SELECT and
>> one INSERT as it will be the first property with this address.
>>
>> The SELECT uses fields like 'street' and 'suburb', to check for an 
>> existing property,
>> so I have built an index on those fields.
>>
>> My question is does each INSERT rebuild the index on the 'street' and 
>> 'suburb' fields?
>> I believe it does but I'm asking to be sure.
>>
>> If this is the case I guess performance will suffer when I have, say, 
>> 200,000
>> rows in the table.
>>
>> Would it be like:
>>
>> a) Use index to search on 'street' and 'suburb'
>> b) No result? Insert new record
>> c) Rebuild index on 'street' and 'suburb'
>>
>> for each row?
>> Would this mean that after 200,000 rows each INSERT will require
>> the index of 000's of rows to be re-indexed?
>>
>> So far I believe my only options are to use either and index
>> or sequential scan and see which is faster.
>>
>> A minute for your thoughts and/or suggestions would be great.
>>
>> Thanks.
>> Regards,
>> Rudi.
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 4: Don't 'kill -9' the postmaster
>>
>>
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
>               http://www.postgresql.org/docs/faqs/FAQ.html
> 
> 


-- 


Regards,
Rudi.

Internet Media Productions

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org




---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Re: [PERFORM] Bulk Insert and Index use

Reply via email to