subject:"\[PERFORM\] bulk insert performance problem"

Re: [PERFORM] bulk insert performance problem

2008-04-08 Thread hubert depesz lubaczewski

On Mon, Apr 07, 2008 at 11:01:18PM -0400, Christian Bourque wrote:
 I have a performance problem with a script that does massive bulk
 insert in 6 tables. When the script starts the performance is really
 good but will degrade minute after minute and take almost a day to
 finish!

how do you do this bulk insert?

depesz

-- 
quicksil1er: postgres is excellent, but like any DB it requires a
highly paid DBA.  here's my CV! :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] bulk insert performance problem

2008-04-08 Thread Mark Stosberg


Christian Bourque wrote:


Any idea? Is there any other improvements I could do?


Are you using the COPY syntax in the import script or individual 
insert statements? Using COPY will always be *much* faster.


I believe COPY always appends to tables rather than replacing the 
contents, you can combine this technique with the possibility of 
splitting up the task into multiple copy statements, but that has never 
been necessary in my case, switching from INSERTS to a COPY statement 
always provided the huge performance improvement I needed.


It's easy to confuse pg_dump -d with psql -d ...it's too bad they 
mean very different things.


For pg_dump, -d causes INSERT statements to be generated instead of a 
COPY statement, and is has been a mistake I made in the past, because I 
expected to work like psql -d, where -d means database name.


I suppose the safe thing to do is to avoid using -d altogether!


Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] bulk insert performance problem

2008-04-08 Thread Matthew


On Tue, 8 Apr 2008, Mark Stosberg wrote:

Any idea? Is there any other improvements I could do?


Are you using the COPY syntax in the import script or individual insert 
statements? Using COPY will always be *much* faster.


PostgreSQL (latest versions at least) has an optimisation if you create a 
table in the same transaction as you load data into it. So, if you have a 
database dump, load it in using psql -1, which wraps the entire operation 
in a single transaction. Of course, a COPY dump will load a lot faster 
than a INSERT dump.


Matthew

--
What goes up must come down. Ask any system administrator.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] bulk insert performance problem

2008-04-08 Thread PFC


I have a performance problem with a script that does massive bulk
insert in 6 tables. When the script starts the performance is really
good but will degrade minute after minute and take almost a day to
finish!


Looks like foreign key checks slow you down.

- Batch INSERTS in transactions (1000-1 per transaction)
- Run ANALYZE once in a while so the FK checks use indexes
	- Are there any DELETEs in your script which might hit nonidexed  
REFERENCES... columns to cascade ?

- Do you really need to check for FKs on the fly while inserting ?
ie. do you handle FK violations ?
Or perhaps your data is already consistent ?
	In this case, load the data without any constraints (and without any  
indexes), and add indexes and foreign key constraints after the loading is  
finished.

- Use COPY instead of INSERT.

	If you use your script to process data, perhaps you could import raw  
unprocessed data in a table (with COPY) and process it with SQL. This is  
usually much faster than doing a zillion inserts.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] bulk insert performance problem

2008-04-07 Thread Christian Bourque

Hi,

I have a performance problem with a script that does massive bulk
insert in 6 tables. When the script starts the performance is really
good but will degrade minute after minute and take almost a day to
finish!

I almost tried everything suggested on this list, changed our external
raid array from raid 5 to raid 10, tweaked postgresql.conf to the best
of my knowledge, moved pg_xlog to a different array, dropped the
tables before running the script. But the performance gain was
negligible even after all these changes...

IMHO the hardware that we use should be up to the task: Dell PowerEdge
6850, 4 x 3.0Ghz Dual Core Xeon, 8GB RAM, 3 x 300GB SAS 10K in raid 5
for / and 6 x 300GB SAS 10K in raid 10 (MD1000) for PG data, the data
filesystem is ext3 mounted with noatime and data=writeback. Running on
openSUSE 10.3 with PostgreSQL 8.2.7. The server is dedicated for
PostgreSQL...

We tested the same script and schema with Oracle 10g on the same
machine and it took only 2.5h to complete!

What I don't understand is that with Oracle the performance seems
always consistent but with PG it deteriorates over time...

Any idea? Is there any other improvements I could do?

Thanks

Christian

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] bulk insert performance problem

2008-04-07 Thread Craig Ringer


Christian Bourque wrote:

Hi,

I have a performance problem with a script that does massive bulk
insert in 6 tables. When the script starts the performance is really
good but will degrade minute after minute and take almost a day to
finish!
  
Would I be correct in guessing that there are foreign key relationships 
between those tables, and that there are significant numbers of indexes 
in use?


The foreign key checking costs will go up as the tables grow, and AFAIK 
the indexes get a bit more expensive to maintain too.


If possible you should probably drop your foreign key relationships and 
drop your indexes, insert your data, then re-create the indexes and 
foreign keys. The foreign keys will be rechecked when you recreate them, 
and it's *vastly* faster to do it that way. Similarly, building an index 
from scratch is quite a bit faster than progressively adding to it. Of 
course, dropping the indices is only useful if you aren't querying the 
tables as you build them.


Also, if you're loading data using stored procedures you should avoid 
the use of exception blocks. I had some major problems with my bulk data 
conversion code due to overuse of exception blocks creating large 
numbers of subtransactions behind the scenes and slowing everything to a 
crawl.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] bulk insert performance problem

2008-04-07 Thread Chris


Craig Ringer wrote:

Christian Bourque wrote:

Hi,

I have a performance problem with a script that does massive bulk
insert in 6 tables. When the script starts the performance is really
good but will degrade minute after minute and take almost a day to
finish!
  
Would I be correct in guessing that there are foreign key relationships 
between those tables, and that there are significant numbers of indexes 
in use?


The foreign key checking costs will go up as the tables grow, and AFAIK 
the indexes get a bit more expensive to maintain too.


If possible you should probably drop your foreign key relationships and 
drop your indexes, insert your data, then re-create the indexes and 
foreign keys. The foreign keys will be rechecked when you recreate them, 
and it's *vastly* faster to do it that way. Similarly, building an index 
from scratch is quite a bit faster than progressively adding to it. Of 
course, dropping the indices is only useful if you aren't querying the 
tables as you build them.


If you are, add analyze commands through the import, eg every 10,000 
rows. Then your checks should be a bit faster.


The other suggestion would be to do block commits:

begin;
do stuff for 5000 rows;
commit;

repeat until finished.

--
Postgresql  php tutorials
http://www.designmagick.com/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] bulk insert performance problem

2008-04-07 Thread bitaoxiao

I use 1 rows,have big blob


2008-04-08 



bitaoxiao 



发件人： Chris 
发送时间： 2008-04-08  11:35:57 
收件人： Christian Bourque 
抄送： pgsql-performance@postgresql.org 
主题： Re: [PERFORM] bulk insert performance problem 
 
Craig Ringer wrote:
 Christian Bourque wrote:
 Hi,

 I have a performance problem with a script that does massive bulk
 insert in 6 tables. When the script starts the performance is really
 good but will degrade minute after minute and take almost a day to
 finish!
   
 Would I be correct in guessing that there are foreign key relationships 
 between those tables, and that there are significant numbers of indexes 
 in use?
 
 The foreign key checking costs will go up as the tables grow, and AFAIK 
 the indexes get a bit more expensive to maintain too.
 
 If possible you should probably drop your foreign key relationships and 
 drop your indexes, insert your data, then re-create the indexes and 
 foreign keys. The foreign keys will be rechecked when you recreate them, 
 and it's *vastly* faster to do it that way. Similarly, building an index 
 from scratch is quite a bit faster than progressively adding to it. Of 
 course, dropping the indices is only useful if you aren't querying the 
 tables as you build them.
If you are, add analyze commands through the import, eg every 10,000 
rows. Then your checks should be a bit faster.
The other suggestion would be to do block commits:
begin;
do stuff for 5000 rows;
commit;
repeat until finished.
-- 
Postgresql  php tutorials
http://www.designmagick.com/
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] bulk insert performance problem

Re: [PERFORM] bulk insert performance problem

Re: [PERFORM] bulk insert performance problem

Re: [PERFORM] bulk insert performance problem

[PERFORM] bulk insert performance problem

Re: [PERFORM] bulk insert performance problem

Re: [PERFORM] bulk insert performance problem

Re: [PERFORM] bulk insert performance problem

8 matches

Site Navigation

Mail list logo

Footer information