[PERFORM] How import big amounts of data?
Hi all, Which is the best way to import data to tables? I have to import 9 rows into a column and doing it as inserts takes ages. Would be faster with copy? is there any other alternative to insert/copy? Cheers! ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] How import big amounts of data?
On Thu, Dec 29, 2005 at 10:48:26AM +0100, Arnau wrote: Which is the best way to import data to tables? I have to import 9 rows into a column and doing it as inserts takes ages. Would be faster with copy? is there any other alternative to insert/copy? There are multiple reasons why your INSERT might be slow: - Are you using multiple transactions instead of batching them in all or a few transactions? (Usually, the per-transaction cost is a lot higher than the per-row insertion cost.) - Do you have a foreign key without a matching index in the other table? (In newer versions of PostgreSQL, EXPLAIN ANALYZE can help with this; do a single insert and see where it ends up doing work. Older won't show such things, though.) - Do you have an insertion trigger taking time? (Ditto wrt. EXPLAIN ANALYZE.) COPY will be faster than INSERT regardless, though (for more than a few rows, at least). /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] How import big amounts of data?
On Thu, 29 Dec 2005, Arnau wrote: Which is the best way to import data to tables? I have to import 9 rows into a column and doing it as inserts takes ages. Would be faster with copy? is there any other alternative to insert/copy? Wrap the inserts inside a BEGIN/COMMIT block and it will be a lot faster. Copy is even faster, but for just 9 rows I wouldn't bother. -- /Dennis Björklund ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] How import big amounts of data?
On Thursday 29 December 2005 10:48, Arnau wrote: Which is the best way to import data to tables? I have to import 9 rows into a column and doing it as inserts takes ages. Would be faster with copy? is there any other alternative to insert/copy? I am doing twice as big imports daily, and found the follwing method most efficient (other than using copy): - Use plpgsql function to do the actual insert (or update/insert if needed). - Inside a transaction, execute SELECT statements with maximum possible number of insert function calls in one go. This minimizes the number of round trips between the client and the server. Teemu ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] How import big amounts of data?
I am doing twice as big imports daily, and found the follwing method most efficient (other than using copy): - Use plpgsql function to do the actual insert (or update/insert if needed). - Inside a transaction, execute SELECT statements with maximum possible number of insert function calls in one go. This minimizes the number of round trips between the client and the server. Thanks Teemu! could you paste an example of one of those functions? ;-) An example of those SELECTS also would be great, I'm not sure I have completly understood what you mean. -- Arnau ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] How import big amounts of data?
On Thursday 29 December 2005 17:19, Arnau wrote: - Use plpgsql function to do the actual insert (or update/insert if needed). - Inside a transaction, execute SELECT statements with maximum possible number of insert function calls in one go. This minimizes the number of round trips between the client and the server. Thanks Teemu! could you paste an example of one of those functions? ;-) An example of those SELECTS also would be great, I'm not sure I have completly understood what you mean. An insert function like: CREATE OR REPLACE FUNCTION insert_values (the_value1 numeric, the_value2 numeric) RETURNS void LANGUAGE plpgsql VOLATILE AS $$ BEGIN INSERT INTO values (value1, value2) VALUES (the_value1, the_value2); RETURN; END; $$; Then execute queries like SELECT insert_values(1,2), insert_values(2,3), insert_values(3,4); with maximum number of insert_values calls as possible. I think the transaction (BEGIN/COMMIT) has little time benefit if you have at least hundreds of calls in one SELECT. Teemu ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Process executing COPY opens and reads every table on the system
I have an instance of PG 7.4 where I would really like to execute some schema changes, but every schema change is blocked waiting for a process doing a COPY. That query is: COPY drill.trades (manager, sec_id, ticker, bridge_tkr, date, type, short, quantity, price, prin, net_money, factor) TO stdout; So it's only involved with a single table in a single schema. Unfortunately, what this process is doing is opening and reading every table in the database: # strace -e open,close -p 29859 Process 29859 attached - interrupt to quit open(/var/lib/postgres/data/base/7932340/2442094542, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/358185104.16, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2023517557.1, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2023517557.3, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2023517557.5, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2023517557.5, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2023517557.3, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2023517557.6, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2023517557.9, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2023517557.3, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/358185104.16, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/358185104.16, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/358185104.16, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2414561511, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2426495316, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2426495316, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2414561511, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2426495316, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2426495316, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2414561511, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/358185104.16, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2429205386, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2429205433, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/358185104.16, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2426495316, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2414561511, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/358185104.16, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/358185104.16, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2429205441, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/358185104.16, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2414561511, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2426495316, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2023517557.3, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2023517557.10, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/358185104.16, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/358185104.16, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2023517557.9, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/2298808676/2298808939.10, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/358185104.15, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2414561511, O_RDWR) = 49 close(49) = 0
[PERFORM] Invulnerable VACUUM process thrashing everything
A few WEEKS ago, the autovacuum on my instance of pg 7.4 unilaterally decided to VACUUM a table which has not been updated in over a year and is more than one terabyte on the disk. Because of the very high transaction load on this database, this VACUUM has been ruining performance for almost a month. Unfortunately is seems invulnerable to killing by signals: # ps ax | grep VACUUM 15308 ?D588:00 postgres: postgres skunk [local] VACUUM # kill -HUP 15308 # ps ax | grep VACUUM 15308 ?D588:00 postgres: postgres skunk [local] VACUUM # kill -INT 15308 # ps ax | grep VACUUM 15308 ?D588:00 postgres: postgres skunk [local] VACUUM # kill -PIPE 15308 # ps ax | grep VACUUM 15308 ?D588:00 postgres: postgres skunk [local] VACUUM o/~ But the cat came back, the very next day ... I assume that if I kill this with SIGKILL, that will bring down every other postgres process, so that should be avoided. But surely there is a way to interrupt this. If I had some reason to shut down the instance, I'd be screwed, it seems. -jwb ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Invulnerable VACUUM process thrashing everything
Ick. Can you get users and foreign connections off that machine, lock them out for some period, and renice the VACUUM? Shedding load and keeping it off while VACUUM runs high priority might allow it to finish in a reasonable amount of time. Or Shedding load and dropping the VACUUM priority might allow a kill signal to get through. Hope this helps, Ron At 05:09 PM 12/29/2005, Jeffrey W. Baker wrote: A few WEEKS ago, the autovacuum on my instance of pg 7.4 unilaterally decided to VACUUM a table which has not been updated in over a year and is more than one terabyte on the disk. Because of the very high transaction load on this database, this VACUUM has been ruining performance for almost a month. Unfortunately is seems invulnerable to killing by signals: # ps ax | grep VACUUM 15308 ?D588:00 postgres: postgres skunk [local] VACUUM # kill -HUP 15308 # ps ax | grep VACUUM 15308 ?D588:00 postgres: postgres skunk [local] VACUUM # kill -INT 15308 # ps ax | grep VACUUM 15308 ?D588:00 postgres: postgres skunk [local] VACUUM # kill -PIPE 15308 # ps ax | grep VACUUM 15308 ?D588:00 postgres: postgres skunk [local] VACUUM o/~ But the cat came back, the very next day ... I assume that if I kill this with SIGKILL, that will bring down every other postgres process, so that should be avoided. But surely there is a way to interrupt this. If I had some reason to shut down the instance, I'd be screwed, it seems. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Invulnerable VACUUM process thrashing everything
On Thu, 2005-12-29 at 22:53 +, Russ Garrett wrote: In my experience a kill -9 has never resulted in any data loss in this situation (it will cause postgres to detect that the process died, shut down, then recover), and most of the time it only causes a 5-10sec outage. I'd definitely hesitate to recommend it in a production context though, especially since I think there are some known race-condition bugs in 7.4. VACUUM *will* respond to a SIGTERM, but it doesn't check very often - I've often had to wait hours for it to determine that it's been killed, and my tables aren't anywhere near 1TB. Maybe this is a place where things could be improved... FWIW, I murdered this process with SIGKILL, and the recovery was very short. Incidentally, I have to kill -9 some of our MySQL instances quite regularly because they do odd things. Not something you want to be doing, especially when MySQL takes 30mins to recover. Agreed. After mysql shutdown with MyISAM, all tables must be checked and usually many need to be repaired. This takes a reallly long time. -jwb Russ Garrett Last.fm Ltd. [EMAIL PROTECTED] Ron wrote: Ick. Can you get users and foreign connections off that machine, lock them out for some period, and renice the VACUUM? Shedding load and keeping it off while VACUUM runs high priority might allow it to finish in a reasonable amount of time. Or Shedding load and dropping the VACUUM priority might allow a kill signal to get through. Hope this helps, Ron At 05:09 PM 12/29/2005, Jeffrey W. Baker wrote: A few WEEKS ago, the autovacuum on my instance of pg 7.4 unilaterally decided to VACUUM a table which has not been updated in over a year and is more than one terabyte on the disk. Because of the very high transaction load on this database, this VACUUM has been ruining performance for almost a month. Unfortunately is seems invulnerable to killing by signals: # ps ax | grep VACUUM 15308 ?D588:00 postgres: postgres skunk [local] VACUUM # kill -HUP 15308 # ps ax | grep VACUUM 15308 ?D588:00 postgres: postgres skunk [local] VACUUM # kill -INT 15308 # ps ax | grep VACUUM 15308 ?D588:00 postgres: postgres skunk [local] VACUUM # kill -PIPE 15308 # ps ax | grep VACUUM 15308 ?D588:00 postgres: postgres skunk [local] VACUUM o/~ But the cat came back, the very next day ... I assume that if I kill this with SIGKILL, that will bring down every other postgres process, so that should be avoided. But surely there is a way to interrupt this. If I had some reason to shut down the instance, I'd be screwed, it seems. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Invulnerable VACUUM process thrashing everything
In my experience a kill -9 has never resulted in any data loss in this situation (it will cause postgres to detect that the process died, shut down, then recover), and most of the time it only causes a 5-10sec outage. I'd definitely hesitate to recommend it in a production context though, especially since I think there are some known race-condition bugs in 7.4. VACUUM *will* respond to a SIGTERM, but it doesn't check very often - I've often had to wait hours for it to determine that it's been killed, and my tables aren't anywhere near 1TB. Maybe this is a place where things could be improved... Incidentally, I have to kill -9 some of our MySQL instances quite regularly because they do odd things. Not something you want to be doing, especially when MySQL takes 30mins to recover. Russ Garrett Last.fm Ltd. [EMAIL PROTECTED] Ron wrote: Ick. Can you get users and foreign connections off that machine, lock them out for some period, and renice the VACUUM? Shedding load and keeping it off while VACUUM runs high priority might allow it to finish in a reasonable amount of time. Or Shedding load and dropping the VACUUM priority might allow a kill signal to get through. Hope this helps, Ron At 05:09 PM 12/29/2005, Jeffrey W. Baker wrote: A few WEEKS ago, the autovacuum on my instance of pg 7.4 unilaterally decided to VACUUM a table which has not been updated in over a year and is more than one terabyte on the disk. Because of the very high transaction load on this database, this VACUUM has been ruining performance for almost a month. Unfortunately is seems invulnerable to killing by signals: # ps ax | grep VACUUM 15308 ?D588:00 postgres: postgres skunk [local] VACUUM # kill -HUP 15308 # ps ax | grep VACUUM 15308 ?D588:00 postgres: postgres skunk [local] VACUUM # kill -INT 15308 # ps ax | grep VACUUM 15308 ?D588:00 postgres: postgres skunk [local] VACUUM # kill -PIPE 15308 # ps ax | grep VACUUM 15308 ?D588:00 postgres: postgres skunk [local] VACUUM o/~ But the cat came back, the very next day ... I assume that if I kill this with SIGKILL, that will bring down every other postgres process, so that should be avoided. But surely there is a way to interrupt this. If I had some reason to shut down the instance, I'd be screwed, it seems. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Invulnerable VACUUM process thrashing everything
Russ Garrett [EMAIL PROTECTED] writes: VACUUM *will* respond to a SIGTERM, but it doesn't check very often - I've often had to wait hours for it to determine that it's been killed, and my tables aren't anywhere near 1TB. Maybe this is a place where things could be improved... Hmm, there are CHECK_FOR_INTERRUPTS calls in all the loops that seem significant to me. Is there anything odd about your database schema? Unusual index types or data types maybe? Also, what PG version are you using? If you notice a VACUUM not responding to SIGTERM promptly, it'd be useful to attach to the backend process with gdb and get a stack trace to find out what it's doing. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] unsubscribe
unsubscribe