Re: [PERFORM] bulk insert performance problem
On Mon, Apr 07, 2008 at 11:01:18PM -0400, Christian Bourque wrote: I have a performance problem with a script that does massive bulk insert in 6 tables. When the script starts the performance is really good but will degrade minute after minute and take almost a day to finish! how do you do this bulk insert? depesz -- quicksil1er: postgres is excellent, but like any DB it requires a highly paid DBA. here's my CV! :) http://www.depesz.com/ - blog dla ciebie (i moje CV) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bulk insert performance problem
Christian Bourque wrote: Any idea? Is there any other improvements I could do? Are you using the COPY syntax in the import script or individual insert statements? Using COPY will always be *much* faster. I believe COPY always appends to tables rather than replacing the contents, you can combine this technique with the possibility of splitting up the task into multiple copy statements, but that has never been necessary in my case, switching from INSERTS to a COPY statement always provided the huge performance improvement I needed. It's easy to confuse pg_dump -d with psql -d ...it's too bad they mean very different things. For pg_dump, -d causes INSERT statements to be generated instead of a COPY statement, and is has been a mistake I made in the past, because I expected to work like psql -d, where -d means database name. I suppose the safe thing to do is to avoid using -d altogether! Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bulk insert performance problem
On Tue, 8 Apr 2008, Mark Stosberg wrote: Any idea? Is there any other improvements I could do? Are you using the COPY syntax in the import script or individual insert statements? Using COPY will always be *much* faster. PostgreSQL (latest versions at least) has an optimisation if you create a table in the same transaction as you load data into it. So, if you have a database dump, load it in using psql -1, which wraps the entire operation in a single transaction. Of course, a COPY dump will load a lot faster than a INSERT dump. Matthew -- What goes up must come down. Ask any system administrator. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bulk insert performance problem
I have a performance problem with a script that does massive bulk insert in 6 tables. When the script starts the performance is really good but will degrade minute after minute and take almost a day to finish! Looks like foreign key checks slow you down. - Batch INSERTS in transactions (1000-1 per transaction) - Run ANALYZE once in a while so the FK checks use indexes - Are there any DELETEs in your script which might hit nonidexed REFERENCES... columns to cascade ? - Do you really need to check for FKs on the fly while inserting ? ie. do you handle FK violations ? Or perhaps your data is already consistent ? In this case, load the data without any constraints (and without any indexes), and add indexes and foreign key constraints after the loading is finished. - Use COPY instead of INSERT. If you use your script to process data, perhaps you could import raw unprocessed data in a table (with COPY) and process it with SQL. This is usually much faster than doing a zillion inserts. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] bulk insert performance problem
Hi, I have a performance problem with a script that does massive bulk insert in 6 tables. When the script starts the performance is really good but will degrade minute after minute and take almost a day to finish! I almost tried everything suggested on this list, changed our external raid array from raid 5 to raid 10, tweaked postgresql.conf to the best of my knowledge, moved pg_xlog to a different array, dropped the tables before running the script. But the performance gain was negligible even after all these changes... IMHO the hardware that we use should be up to the task: Dell PowerEdge 6850, 4 x 3.0Ghz Dual Core Xeon, 8GB RAM, 3 x 300GB SAS 10K in raid 5 for / and 6 x 300GB SAS 10K in raid 10 (MD1000) for PG data, the data filesystem is ext3 mounted with noatime and data=writeback. Running on openSUSE 10.3 with PostgreSQL 8.2.7. The server is dedicated for PostgreSQL... We tested the same script and schema with Oracle 10g on the same machine and it took only 2.5h to complete! What I don't understand is that with Oracle the performance seems always consistent but with PG it deteriorates over time... Any idea? Is there any other improvements I could do? Thanks Christian -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bulk insert performance problem
Christian Bourque wrote: Hi, I have a performance problem with a script that does massive bulk insert in 6 tables. When the script starts the performance is really good but will degrade minute after minute and take almost a day to finish! Would I be correct in guessing that there are foreign key relationships between those tables, and that there are significant numbers of indexes in use? The foreign key checking costs will go up as the tables grow, and AFAIK the indexes get a bit more expensive to maintain too. If possible you should probably drop your foreign key relationships and drop your indexes, insert your data, then re-create the indexes and foreign keys. The foreign keys will be rechecked when you recreate them, and it's *vastly* faster to do it that way. Similarly, building an index from scratch is quite a bit faster than progressively adding to it. Of course, dropping the indices is only useful if you aren't querying the tables as you build them. Also, if you're loading data using stored procedures you should avoid the use of exception blocks. I had some major problems with my bulk data conversion code due to overuse of exception blocks creating large numbers of subtransactions behind the scenes and slowing everything to a crawl. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bulk insert performance problem
Craig Ringer wrote: Christian Bourque wrote: Hi, I have a performance problem with a script that does massive bulk insert in 6 tables. When the script starts the performance is really good but will degrade minute after minute and take almost a day to finish! Would I be correct in guessing that there are foreign key relationships between those tables, and that there are significant numbers of indexes in use? The foreign key checking costs will go up as the tables grow, and AFAIK the indexes get a bit more expensive to maintain too. If possible you should probably drop your foreign key relationships and drop your indexes, insert your data, then re-create the indexes and foreign keys. The foreign keys will be rechecked when you recreate them, and it's *vastly* faster to do it that way. Similarly, building an index from scratch is quite a bit faster than progressively adding to it. Of course, dropping the indices is only useful if you aren't querying the tables as you build them. If you are, add analyze commands through the import, eg every 10,000 rows. Then your checks should be a bit faster. The other suggestion would be to do block commits: begin; do stuff for 5000 rows; commit; repeat until finished. -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bulk insert performance problem
I use 1 rows,have big blob 2008-04-08 bitaoxiao 发件人: Chris 发送时间: 2008-04-08 11:35:57 收件人: Christian Bourque 抄送: pgsql-performance@postgresql.org 主题: Re: [PERFORM] bulk insert performance problem Craig Ringer wrote: Christian Bourque wrote: Hi, I have a performance problem with a script that does massive bulk insert in 6 tables. When the script starts the performance is really good but will degrade minute after minute and take almost a day to finish! Would I be correct in guessing that there are foreign key relationships between those tables, and that there are significant numbers of indexes in use? The foreign key checking costs will go up as the tables grow, and AFAIK the indexes get a bit more expensive to maintain too. If possible you should probably drop your foreign key relationships and drop your indexes, insert your data, then re-create the indexes and foreign keys. The foreign keys will be rechecked when you recreate them, and it's *vastly* faster to do it that way. Similarly, building an index from scratch is quite a bit faster than progressively adding to it. Of course, dropping the indices is only useful if you aren't querying the tables as you build them. If you are, add analyze commands through the import, eg every 10,000 rows. Then your checks should be a bit faster. The other suggestion would be to do block commits: begin; do stuff for 5000 rows; commit; repeat until finished. -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance