Re: [PERFORM] Best practice to load a huge table from ORACLE to PG
Hi, Le dimanche 27 avril 2008, Greg Smith a écrit : than SQL*PLUS. Then on the PostgreSQL side, you could run multiple COPY sessions importing at once to read this data all back in, because COPY will bottleneck at the CPU level before the disks will if you've got reasonable storage hardware. Latest pgloader version has been made to handle this exact case, so if you want to take this route, please consider pgloader 2.3.0: http://pgloader.projects.postgresql.org/#_parallel_loading http://pgfoundry.org/projects/pgloader/ Another good reason to consider using pgloader is when the datafile contains erroneous input lines and you don't want the COPY transaction to abort. Those error lines will get rejected out by pgloader while the correct ones will get COPYied in. Regards, -- dim signature.asc Description: This is a digitally signed message part.
Re: [PERFORM] [pgsql-advocacy] Benchmarks WAS: Sun Talks about MySQL
Josh Berkus [EMAIL PROTECTED] writes: I think TPC-E will make both of these major improvements much more important. I suspect it would be hard to get 8.2 to even pass TPC-E due to the checkpoint dropouts. You'd be surprised, then. We're still horribly, horribly lock-bound on TPC-E; on anything over 4 cores lock resolution chokes us to death. See Jignesh's and Paul's various posts about attempts to fix this. Most of those posts have been about scalability issues with extremely large numbers of sessions. Those are interesting too and they may be limiting our results in benchmarks which depend on such a configuration (which I don't think includes TPC-E, but the benchmark Jignesh has been writing about is some Java application benchmark which may be such a beast) but they don't directly relate to whether we're passing TPC-E. What I was referring to by passing TPC-E was the criteria for a conformant benchmark run. TPC-C has iirc, only two relevant criteria: 95th percentile response time 5s and average response time 95th percentile response time. You can pass those even if 1 transaction in 20 takes 10-20s which is more than enough to cover checkpoints and other random sources of inconsistent performance. TPC-E has more stringent requirements which explicitly require very consistent response times and I doubt 8.2 would have been able to pass them. So the performance limiting factors whether they be i/o, cpu, lock contention, or whatever don't even come into play. We wouldn't have any conformant results whatsoever, not even low values limited by contention. 8.3 however should be in a better position to pass. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [pgsql-advocacy] Benchmarks WAS: Sun Talks about MySQL
Gregory Stark wrote: TPC-E has more stringent requirements which explicitly require very consistent response times and I doubt 8.2 would have been able to pass them. Sure it would. Just not for a very large scale factor ;-). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Replication Syatem
All, We have a table table1 which get insert and updates daily in high numbers, bcoz of which its size is increasing and we have to vacuum it every alternate day. Vacuuming table1 take almost 30min and during that time the site is down. We need to cut down on this downtime.So thought of having a replication system, for which the replicated DB will be up during the master is getting vacuumed. Can anybody guide which will be the best suited replication solution for this. Thanx for any help ~ Gauri
Re: [PERFORM] Replication Syatem
2008/4/28 Gauri Kanekar [EMAIL PROTECTED]: All, We have a table table1 which get insert and updates daily in high numbers, bcoz of which its size is increasing and we have to vacuum it every alternate day. Vacuuming table1 take almost 30min and during that time the site is down. We need to cut down on this downtime.So thought of having a replication system, for which the replicated DB will be up during the master is getting vacuumed. Can anybody guide which will be the best suited replication solution for this. Thanx for any help ~ Gauri I home your not using Vacuum Full... (Standard Reply for this type of question) What version of Postgresql are you using? Have you tried autovacuum? Run plain vacuum even more often on this even more often (like ever half hour) and it should not take as long and save space. If still have trouble run vacuum analyse verbose table1; and see what it says. If your doing it right you should be able to vacuum with the database up. Sounds like you might be happier a fix for the problem rather than a complex work around which will actually solve a completely different problem. Regards Peter.
Re: [PERFORM] Replication Syatem
Peter, We are doing vacuum full every alternate day. We also do vacuum analyze very often. We are currently using 8.1.3 version. Auto vacuum is already on. But the table1 is so busy that auto vacuum don't get sufficient chance to vacuum it :(. Have already tried all the option listed by you, thats y we reached to the decision of having a replication sytsem. So any suggestion on that :). Thanx ~ Gauri On Mon, Apr 28, 2008 at 7:28 PM, Peter Childs [EMAIL PROTECTED] wrote: 2008/4/28 Gauri Kanekar [EMAIL PROTECTED]: All, We have a table table1 which get insert and updates daily in high numbers, bcoz of which its size is increasing and we have to vacuum it every alternate day. Vacuuming table1 take almost 30min and during that time the site is down. We need to cut down on this downtime.So thought of having a replication system, for which the replicated DB will be up during the master is getting vacuumed. Can anybody guide which will be the best suited replication solution for this. Thanx for any help ~ Gauri I home your not using Vacuum Full... (Standard Reply for this type of question) What version of Postgresql are you using? Have you tried autovacuum? Run plain vacuum even more often on this even more often (like ever half hour) and it should not take as long and save space. If still have trouble run vacuum analyse verbose table1; and see what it says. If your doing it right you should be able to vacuum with the database up. Sounds like you might be happier a fix for the problem rather than a complex work around which will actually solve a completely different problem. Regards Peter. -- Regards Gauri
Re: [PERFORM] Replication Syatem
On Mon, 2008-04-28 at 19:35 +0530, Gauri Kanekar wrote: Peter, We are doing vacuum full every alternate day. We also do vacuum analyze very often. We are currently using 8.1.3 version. Auto vacuum is already on. But the table1 is so busy that auto vacuum don't get sufficient chance to vacuum it :(. You should seriously consider upgrading to PG 8.3. There have been substantial improvements to VACUUM since 8.1 Brad. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
Gauri Kanekar wrote: Peter, We are doing vacuum full every alternate day. We also do vacuum analyze very often. We are currently using 8.1.3 version. Auto vacuum is already on. But the table1 is so busy that auto vacuum don't get sufficient chance to vacuum it :(. Have already tried all the option listed by you, thats y we reached to the decision of having a replication sytsem. So any suggestion on that :). Thanx ~ Gauri We use slony for exactly this type of a situation. It's not the most user-friendly piece of software, but it works well enough that I can schedule maintenance windows (we're a 24/7 shop) and do clustering and other tasks on our DB to reclaim space, etc. -salman -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
Thats one of the thingsto be done in near future. But it need some changes from application point of view. :( ... so just got escalated for that reason. But for now, which one will be a well suited replication system ? ~ Gauri On Mon, Apr 28, 2008 at 7:43 PM, Brad Nicholson [EMAIL PROTECTED] wrote: On Mon, 2008-04-28 at 19:35 +0530, Gauri Kanekar wrote: Peter, We are doing vacuum full every alternate day. We also do vacuum analyze very often. We are currently using 8.1.3 version. Auto vacuum is already on. But the table1 is so busy that auto vacuum don't get sufficient chance to vacuum it :(. You should seriously consider upgrading to PG 8.3. There have been substantial improvements to VACUUM since 8.1 Brad. -- Regards Gauri
Re: [PERFORM] Replication Syatem
Salman, Slony don't do automatic failover. And we would appreciate a system with automatic failover :( ~ Gauri On Mon, Apr 28, 2008 at 7:46 PM, salman [EMAIL PROTECTED] wrote: Gauri Kanekar wrote: Peter, We are doing vacuum full every alternate day. We also do vacuum analyze very often. We are currently using 8.1.3 version. Auto vacuum is already on. But the table1 is so busy that auto vacuum don't get sufficient chance to vacuum it :(. Have already tried all the option listed by you, thats y we reached to the decision of having a replication sytsem. So any suggestion on that :). Thanx ~ Gauri We use slony for exactly this type of a situation. It's not the most user-friendly piece of software, but it works well enough that I can schedule maintenance windows (we're a 24/7 shop) and do clustering and other tasks on our DB to reclaim space, etc. -salman -- Regards Gauri
[PERFORM] Where do a novice do to make it run faster?
So, it is time to improve performance, it is running to slow. AFAIK (as a novice) there are a few general areas: 1) hardware 2) rewriting my queries and table structures 3) using more predefined queries 4) tweek parameters in the db conf files Of these points: 1) is nothing I can do about right now, but in the future perhaps. 2) will be quite hard right now since there is more code than time. 3) almost like 2 but perhaps more do-able with the current constraints. 4) This seems to be the easiest one to start with... So what should I do/read concerning point 4? If you have other good suggestions I'd be very interested in that. Thank you :-) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmarks WAS: Sun Talks about MySQL
Greg, What I was referring to by passing TPC-E was the criteria for a conformant benchmark run. TPC-C has iirc, only two relevant criteria: 95th percentile response time 5s and average response time 95th percentile response time. You can pass those even if 1 transaction in 20 takes 10-20s which is more than enough to cover checkpoints and other random sources of inconsistent performance. We can do this now. I'm unhappy because we're at about 1/4 of Oracle performance, but we certainly pass -- even with 8.2. --Josh -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] shared_buffer/DRBD performances
I have complete the benchmarks I have made with Postgres and I have talk about some weeks ago on postgres performance mailing list (see post shared_buffers). On the follow link you can find a doc that contains the graphs generated. http://www.mediafire.com/?lk4woomsxlc Regards Gaetano Mendola -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Where do a novice do to make it run faster?
1) hardware 2) rewriting my queries and table structures 3) using more predefined queries 4) tweek parameters in the db conf files Of these points: 1) is nothing I can do about right now, but in the future perhaps. 2) will be quite hard right now since there is more code than time. 3) almost like 2 but perhaps more do-able with the current constraints. 4) This seems to be the easiest one to start with... So what should I do/read concerning point 4? If you have other good suggestions I'd be very interested in that. Thank you :-) You can provide information postgresql-version, what type of queries you're running, some explain analyze of those, and what type of hardware you're running and what OS is installed. -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
On Mon, Apr 28, 2008 at 07:35:37PM +0530, Gauri Kanekar wrote: Peter, We are doing vacuum full every alternate day. We also do vacuum analyze very often. VACUUM FULL is making your problem worse, not better. Don't do that. We are currently using 8.1.3 version. You need immediately to upgrade to the latest 8.1 stability and security release, which is 8.1.11. This is a drop-in replacement. It's an urgent fix for your case. Auto vacuum is already on. But the table1 is so busy that auto vacuum don't get sufficient chance to vacuum it :(. You probably need to tune autovacuum not to do that table, and just vacuum that table in a constant loop or something. VACUUM should _never_ take the site down. If it does, you're doing it wrong. Have already tried all the option listed by you, thats y we reached to the decision of having a replication sytsem. So any suggestion on that :). I think you will find that no replication system will solve your underlying problems. That said, I happen to work for a company that will sell you a replication system to work with 8.1 if you really want it. A -- Andrew Sullivan [EMAIL PROTECTED] +1 503 667 4564 x104 http://www.commandprompt.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
On Mon, Apr 28, 2008 at 07:48:48PM +0530, Gauri Kanekar wrote: Slony don't do automatic failover. And we would appreciate a system with automatic failover :( No responsible asynchronous system will give you automatic failover. You can lose data that way. A -- Andrew Sullivan [EMAIL PROTECTED] +1 503 667 4564 x104 http://www.commandprompt.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Where do a novice do to make it run faster?
A B wrote: So, it is time to improve performance, it is running to slow. AFAIK (as a novice) there are a few general areas: 1) hardware 2) rewriting my queries and table structures 3) using more predefined queries 4) tweek parameters in the db conf files Of these points: 1) is nothing I can do about right now, but in the future perhaps. 2) will be quite hard right now since there is more code than time. 3) almost like 2 but perhaps more do-able with the current constraints. 4) This seems to be the easiest one to start with... So what should I do/read concerning point 4? If you have other good suggestions I'd be very interested in that. Thank you :-) 1st, change your log settings log_min_duration_statement to something like 1000 (one second). This will allow you to see which statements take the longest. 2nd. Use EXPLAIN ANALYZE on those statements to determine what is taking a long time and focus on optimizing those statements that take the longest to execute. That ought to get you a long way down the road. -Dennis -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Where do a novice do to make it run faster?
A B wrote: So, it is time to improve performance, it is running to slow. AFAIK (as a novice) there are a few general areas: 1) hardware 2) rewriting my queries and table structures 3) using more predefined queries 4) tweek parameters in the db conf files Of these points: 1) is nothing I can do about right now, but in the future perhaps. 2) will be quite hard right now since there is more code than time. 3) almost like 2 but perhaps more do-able with the current constraints. 4) This seems to be the easiest one to start with... So what should I do/read concerning point 4? If you have other good suggestions I'd be very interested in that. Go back to step zero - gather information that would be helpful in giving advice. For starters: - What hardware do you currently have? - What OS and version of PG? - How big is the database? - What is the nature of the workload (small queries or data-mining, how many simultaneous clients, transaction rate, etc.)? - Is PG sharing the machine with other workloads? Then edit your postgresql.conf file to gather data (see http://www.postgresql.org/docs/8.3/interactive/monitoring-stats.html). With stat collection enabled, you can often find some low-hanging fruit like indexes that aren't used (look in pg_stat_user_indexes) - sometime because the query didn't case something in the where-clause correctly. Also look at http://www.postgresql.org/docs/8.3/interactive/runtime-config-logging.html - especially the log_min_duration_statement setting to find long-running queries. You will probably need to try different settings and watch the log. Logging impacts performance so don't just set to log everything and forget. You need to play with it. Don't discount step 2 - you may find you can rewrite one inefficient but frequent query. Or add a useful index on the server. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Where do a novice do to make it run faster?
[EMAIL PROTECTED] (A B) writes: So, it is time to improve performance, it is running to slow. AFAIK (as a novice) there are a few general areas: 1) hardware 2) rewriting my queries and table structures 3) using more predefined queries 4) tweek parameters in the db conf files Of these points: 1) is nothing I can do about right now, but in the future perhaps. 2) will be quite hard right now since there is more code than time. 3) almost like 2 but perhaps more do-able with the current constraints. 4) This seems to be the easiest one to start with... So what should I do/read concerning point 4? If you have other good suggestions I'd be very interested in that. Thank you :-) In the order of ease of implementation, it tends to be... 1. Tweak postgresql.conf 2. Make sure you ran VACUUM + ANALYZE 3. Find some expensive queries and try to improve them, which might involve changing the queries and/or adding relevant indices 4. Add RAM to your server 5. Add disk to your server 6. Redesign your application's DB schema so that it is more performant by design URL below may have some material of value... -- select 'cbbrowne' || '@' || 'cbbrowne.com'; http://linuxfinances.info/info/postgresqlperformance.html It is usually a good idea to put a capacitor of a few microfarads across the output, as shown. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Very poor performance loading 100M of sql data using copy
Hi all: We are loading in a number (100+) of sql files that are about 100M in size. It takes about three hours to load the file. There is very little load on the database other than the copy from operations. We are running postgresql-8.1.3 under Centos 4 on a RAID 1/0 array with 4 disks (so we have only one spindle). The partitions are set up in an LVM and iostat 5 shows (for one report): avg-cpu: %user %nice %sys %iowait %idle 1.70 0.00 0.80 51.40 46.10 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 179.20 1472.00 2561.60 7360 12808 sda1 0.000.000.000 0 sda2 385.20 1462.40 2561.60 7312 12808 dm-0 0.800.006.400 32 dm-1 0.000.000.000 0 dm-2 0.000.000.000 0 dm-3 0.000.000.000 0 dm-4 4.400.0035.20 0 176 dm-5 0.000.000.000 0 dm-6 380.00 1462.40 2520.00 7312 12600 dm-6 is where the data files reside and dm-4 is where the WAL archives are kept. Note all the DM's are on the same RAID 0 device /dev/sda2. A sample psql command file to load the data is: BEGIN; COPY peers (observe_start, observe_end, geo_scope, geo_value, peer_a, peer_b) FROM stdin WITH NULL AS ''; (data here) 3 more copy commands to different tables w/ data COMMIT; The primary keys for the tables being loaded are composite keys using 4-7 columns, so that may be part of the issue. From postgres.conf shared_buffers = 3000 #temp_buffers = 1000# min 100, 8KB each #max_prepared_transactions = 5 # can be 0 or more max_locks_per_transaction). work_mem = 2048 # min 64, size in KB maintenance_work_mem = 65536# min 1024, size in KB #max_stack_depth = 2048 # min 100, size in KB The prior settings for work_mem/maintenance_work_mem were the defaults: #work_mem = 1024 # min 64, size in KB #maintenance_work_mem = 16384 # min 1024, size in KB I also took a look at disk-io hit rates: # select * from pg_statio_user_tables; relid | schema | relname| heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit | ---++--++---+---+--+ 17282 | public | providers| 179485097 | 78832253 |835008 |196903582 | 17264 | public | events | 0 | 0 | | | 17262 | public | days |495 | 219 | 478 | 16 | 17276 | public | peers| 147435004 | 114304828 | 1188908 |295569499 | 17288 | public | rankings | 564638938 | 345456664 | 275607291 | 1341727605 | 17270 | public | market_share | 131932 | 90048 | 5408 | 182100 | market_share did have one tidx_blks_read reported, but all the other fields (toast_blks_read, toast_blks_hit, tidx_blks_read, tidx_blks_hit) were empty for all rows. This looks like we have whole indexes in memory except for the days table, which has a low update rate, so I am not worried about that. However for the heap_blks_read and heap_blks_hit we get a different story: relname | hit_percent --+--- providers | 43.92 days | 44.24 peers | 77.52 rankings | 61.18 market_share | 68.25 so we see a 43 % hit ratio for providers to 77% hit ratio for peers. Not horrible hit rates given that we are more data warehousing than OLTP, but I am not sure what effect increasing these (by increasing shared_buffers I think) will have on the COPY operation. I would suspect none. To try to solve this speed issue: I checked the logs and was seeing a few 2008-04-21 11:36:43 UTC @(2761)i: LOG: checkpoints ... (27 seconds apart) of these, so I changed: checkpoint_segments = 30 checkpoint_warning = 150 in postgres.conf and reloaded postgres. I have only seen one of these log messages in the past week. I have turned of autovacuum. I have increased the maintenance_work_mem as mentioned above. (Although I didn't expect it to do anything unless we drop/recreate indexes). I have increased work_mem as mentioned above. The only things I can think of is increasing shared memory, or dropping indexes. I don't see any indication in the docs that increasing shared memory would help speed up a copy operation. The only indexes we have to drop are the ones on the primary keys (there is one non-primary key index in the database as well). Can you drop an index on the primary key for a table and add it back later? Am I correct in saying: the primary key index is what enforces the unique constraint in the table? If the index
Re: [PERFORM] Replication Syatem
On Mon, 28 Apr 2008, Gauri Kanekar wrote: We are doing vacuum full every alternate day. We also do vacuum analyze very often. We are currently using 8.1.3 version...Have already tried all the option listed by you, thats y we reached to the decision of having a replication sytsem. Andrew Sullivan has already given a response here I agree with, I wanted to expland on that. You have a VACUUM problem. The fact that you need (or feel you need) to VACUUM FULL every other day says there's something very wrong here. The way to solve most VACUUM problems is to VACUUM more often, so that the work in each individual one never gets so big that your system takes an unnaceptable hit, and you shouldn't ever need VACUUM FULL. Since your problem is being aggrevated because you're running a dangerously obsolete version, that's one of the first things you should fix--to at least the latest 8.1 if you can't deal with a larger version migration. The fact that you're happily running 8.1.3 says you most certainly haven't tried all the other options here. Every minute you spend looking into a replication system is wasted time you could be spending on the right fix here. You've fallen into the common trap where you're fixated on a particular technical solution so much that you're now ignoring suggestions on how to resolve the root problem. Replication is hard to get going even on a system that works perfectly, and replicating a known buggy system just to work around a problem really sounds like a bad choice. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very poor performance loading 100M of sql data using copy
John Rouillard wrote: We are running postgresql-8.1.3 under Centos 4 You should upgrade, at least to the latest minor release of the 8.1 series (8.1.11), as there has been a bunch of important bug and security fixes. Or even better, upgrade to 8.3, which has reduced the storage size of especially variable length datatypes like text/char/varchar in particular. As your COPY is I/O bound, reducing storage size will translate directly to improved performance. dm-6 is where the data files reside and dm-4 is where the WAL archives are kept. Note all the DM's are on the same RAID 0 device /dev/sda2. Another reason to upgrade to 8.3: if you CREATE or TRUNCATE the table in the same transaction as you COPY into it, you can avoid WAL logging of the loaded data, which will in the best case double your performance as your WAL is on the same physical drives as the data files. The only indexes we have to drop are the ones on the primary keys (there is one non-primary key index in the database as well). Can you drop an index on the primary key for a table and add it back later? Am I correct in saying: the primary key index is what enforces the unique constraint in the table? If the index is dropped and non-unique primary key data has been added, what happens when you re-add the index? Yes, the index is what enforces the uniqueness. You can drop the primary key constraint, and add it back after the load with ALTER TABLE. If the load introduces any non-unique primary keys, adding the primary key constraint will give you an error and fail. Dropping and recreating the indexes is certainly worth trying. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very poor performance loading 100M of sql data using copy
On Mon, Apr 28, 2008 at 06:53:09PM +0100, Heikki Linnakangas wrote: John Rouillard wrote: We are running postgresql-8.1.3 under Centos 4 You should upgrade, at least to the latest minor release of the 8.1 series (8.1.11), as there has been a bunch of important bug and security fixes. Or even better, upgrade to 8.3, which has reduced the storage size of especially variable length datatypes like text/char/varchar in particular. As your COPY is I/O bound, reducing storage size will translate directly to improved performance. Yup. Just saw that suggestion in an unrelated email. dm-6 is where the data files reside and dm-4 is where the WAL archives are kept. Note all the DM's are on the same RAID 0 device /dev/sda2. Another reason to upgrade to 8.3: if you CREATE or TRUNCATE the table in the same transaction as you COPY into it, you can avoid WAL logging of the loaded data, which will in the best case double your performance as your WAL is on the same physical drives as the data files. We can't do this as we are backfilling a couple of months of data into tables with existing data. The only indexes we have to drop are the ones on the primary keys (there is one non-primary key index in the database as well). Can you drop an index on the primary key for a table and add it back later? Am I correct in saying: the primary key index is what enforces the unique constraint in the table? If the index is dropped and non-unique primary key data has been added, what happens when you re-add the index? Yes, the index is what enforces the uniqueness. You can drop the primary key constraint, and add it back after the load with ALTER TABLE. If the load introduces any non-unique primary keys, adding the primary key constraint will give you an error and fail. That's the part I am worried about. I guess using psql to delete the problem row then re-adding the index will work. Dropping and recreating the indexes is certainly worth trying. Thanks for the info. -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very poor performance loading 100M of sql data using copy
On Mon, 28 Apr 2008, John Rouillard wrote: 2008-04-21 11:36:43 UTC @(2761)i: LOG: checkpoints ... (27 seconds apart) so I changed: checkpoint_segments = 30 checkpoint_warning = 150 That's good, but you might go higher than 30 for a bulk loading operation like this, particularly on 8.1 where checkpoints are no fun. Using 100 is not unreasonable. shared_buffers = 3000 I don't see any indication in the docs that increasing shared memory would help speed up a copy operation. The index blocks use buffer space, and what ends up happening if there's not enough memory is they are written out more than they need to be (and with your I/O hardware you need to avoid writes unless absolutely necessary). Theoretically the OS is caching around that situation but better to avoid it. You didn't say how much RAM you have, but you should start by a factor of 10 increase to 30,000 and see if that helps; if so, try making it large enough to use 1/4 of total server memory. 3000 is only giving the server 24MB of RAM to work with, and it's unfair to expect it to work well in that situation. While not relevant to this exercise you'll need to set effective_cache_size to a useful value one day as well. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Where do a novice do to make it run faster?
You got the order slightly wrong I guess. 1) hardware Would only come first if your RAM is really too small, or you use RAID5 on write-heavy tables, or what limits you is transaction fsync (hint : 8.3). Adding RAM is cheap. 2) rewriting my queries and table structures This should really come first. Log expensive queries. Note that an expensive query can be a slow query, or be a rather fast query that you execute lots of times, or a very simple and fast query that you execute really really too often. Now ask yourself : * What is this query supposed to do ? * Do I need this query ? Example : You put your sessions in a database ? = Perhaps put them in the good old filesystem ? Your PHP is loading lots of configuration from the database for every page. = Cache it, generate some PHP code once and include it, put it in the session if it depends on the user, but don't reload the thing on each page ! This feature is useless = Do you really need to display a birthday cake on your forum for those users who have their birthday today ? UPDATEs... = Do you really need to update the last time a user was online every time ? What about updating it every 5 minutes instead ? * Is this query inside a loop ? = Use JOIN. * Do I need all the rows from this query ? Example : You use pagination and perform the same query changing LIMIT/OFFSET ? = Perform the query once, retrieve the first N pages of result, cache it in the session or in a table. * You have a website ? = Use lighttpd and fastcgi * Do I need all the columns from this query ? * Do I suffer from locking ? etc. Now you should see some easy targets. For the queries that are slow, use EXPLAIN ANALYZE. Question your schema. etc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmarks WAS: Sun Talks about MySQL
Josh Berkus [EMAIL PROTECTED] writes: Greg, What I was referring to by passing TPC-E was the criteria for a conformant benchmark run. TPC-C has iirc, only two relevant criteria: 95th percentile response time 5s and average response time 95th percentile response time. You can pass those even if 1 transaction in 20 takes 10-20s which is more than enough to cover checkpoints and other random sources of inconsistent performance. We can do this now. I'm unhappy because we're at about 1/4 of Oracle performance, but we certainly pass -- even with 8.2. We certainly can pass TPC-C. I'm curious what you mean by 1/4 though? On similar hardware? Or the maximum we can scale to is 1/4 as large as Oracle? Can you point me to the actual benchmark runs you're referring to? But I just made an off-hand comment that I doubt 8.2 could pass TPC-E which has much more stringent requirements. It has requirements like: the throughput computed over any period of one hour, sliding over the Steady State by increments of ten minutes, varies from the Reported Throughput by no more than 2% -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very poor performance loading 100M of sql data using copy
John Rouillard wrote: We can't do this as we are backfilling a couple of months of data into tables with existing data. Is this a one off data loading of historic data or an ongoing thing? The only indexes we have to drop are the ones on the primary keys (there is one non-primary key index in the database as well). If this amount of data importing is ongoing then one thought I would try is partitioning (this could be worthwhile anyway with the amount of data you appear to have). Create an inherited table for the month being imported, load the data into it, then add the check constraints, indexes, and modify the rules/triggers to handle the inserts to the parent table. -- Shane Ambler pgSQL (at) Sheeky (dot) Biz Get Sheeky @ http://Sheeky.Biz -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Best practice to load a huge table from ORACLE to PG
Adonias Malosso wrote: Hi All, I´d like to know what´s the best practice to LOAD a 70 milion rows, 101 columns table from ORACLE to PGSQL. The current approach is to dump the data in CSV and than COPY it to Postgresql. Uhm. 101 columns you say? Sounds interesting. There are dataloaders like: http://pgfoundry.org/projects/pgloader/ which could speed up loading the data over just copy csv. I wonder how much normalizing could help. Tino -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmarks WAS: Sun Talks about MySQL
On Mon, 28 Apr 2008 14:40:25 -0400 Gregory Stark [EMAIL PROTECTED] wrote: We certainly can pass TPC-C. I'm curious what you mean by 1/4 though? On similar hardware? Or the maximum we can scale to is 1/4 as large as Oracle? Can you point me to the actual benchmark runs you're referring to? I would be curious as well considering there has been zero evidence provided to make such a statement. I am not saying it isn't true, it wouldn't be surprising to me if Oracle outperformed PostgreSQL in TPC-C but I would sure like to see in general how wel we do (or don't). Sincerely, Joshua D. Drake -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL SPI Liaison | SPI Director | PostgreSQL political pundit signature.asc Description: PGP signature
Re: [PERFORM] Replication Syatem
On Mon, Apr 28, 2008 at 9:38 AM, Gauri Kanekar [EMAIL PROTECTED] wrote: All, We have a table table1 which get insert and updates daily in high numbers, bcoz of which its size is increasing and we have to vacuum it every alternate day. Vacuuming table1 take almost 30min and during that time the site is down. Slony is an open source replication system built for Postgres. But the real problem is that you are doing a vaccum full every day. This is highly invasive. Take a look at the postgres docs on Vacuuming the db. Analyze is best on a daily basis. If you have a lot of deletes, then try vacuum truncate. The postgres documentation describes the various vaccuum options and explains the merits of each. Hope that helps. Radhika -- It is all a matter of perspective. You choose your view by choosing where to stand. --Larry Wall -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Postgres Benchmark looking for maintainer
Hello, Executive summary : Last year I wrote a database benchmark which simulates a forum. It works on Postgres and MySQL. It could be useful. I have no time to touch this, so it is rotting on my harddrive. Who wants to adopt it ? I will put it on pgfoundry. I can spend a few hours documenting the source and writing some documentation and pass the package to someone who might be interested and more available. Details : The benchmark is a forum type load (actually it came from me arguing with the phpBB team, lol) but, unlike all forums I know, correctly optimized. A bunch of forums are created, and there is a website (in PHP), very basic, which allows you to browse the forums, view topics, and insert posts. It displays the usual forum info like last post, number of topics or posts in forum, number of posts in topic, etc. Then there is a benchmarking client, written in Python. It spawns a number of users who perform real-life actions, like viewing pages, adding posts, and there a few simulated moderators who will, once in a while, destroy topics and even forums. This client can hit the PHP website via HTTP. However postgres is so fast that you would need several PHP servers to kill it. So, I added a multi-backend capability to the client : it can hit the database directly, performing the queries the PHP script would have performed. However, postgres is still so fast that you won't be able to benchmark anything more powerful than a Core 2, the client would need to be rewritten in a compiled language like Java. Also, retrieving the posts' text easily blasted the 100 Mbps connection between server and client, so you would need Gigabit ethernet. So, the load is very realistic (it would mimic a real forum pretty well) ; but in order to benchmark it you must simulate humongous traffic levels. The only difference is that my benchmark does a lot more writing (post insertions) than a normal forum ; I wanted the database to grow big in a few hours. It also works on MySQL so you can get a good laugh. Actually I was able to extract some good performance out of MySQL, after lots of headaches, except that I was never able to make it use more than 1 core. Contrary to the usual benchmarks, the code is optimized for MySQL and for Postgres, and the stored procedures also. Thus, what is compared is not a least-common-denominator implementation that happens to work on both databases, but two implementations specifically targeted and optimized at each database. The benchmark is also pretty simple (unlike the TPC) but it is useful, first it is CPU-bound then IO-bound and clustering the tables does a lot for performance (you can test auto-cluster), checkpoints are very visible, etc. So it can provide useful information that is easier to understand that a very complex benchmark. Originally the purpose of the benchmark was to test postgres' full search ; the code is still there. Regards, Pierre -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Best practice to load a huge table from ORACLE to PG
Jonah, Thank you for the answer. Good to know about this enterprise DB feature. I´ll follow using pgloader. Regards. Adonias Malosso On Sat, Apr 26, 2008 at 10:14 PM, Jonah H. Harris [EMAIL PROTECTED] wrote: On Sat, Apr 26, 2008 at 9:25 AM, Adonias Malosso [EMAIL PROTECTED] wrote: I´d like to know what´s the best practice to LOAD a 70 milion rows, 101 columns table from ORACLE to PGSQL. The fastest and easiest method would be to dump the data from Oracle into CSV/delimited format using something like ociuldr (http://www.anysql.net/en/ociuldr.html) and load it back into PG using pg_bulkload (which is a helluva lot faster than COPY). Of course, you could try other things as well... such as setting up generic connectivity to PG and inserting the data to a PG table over the database link. Similarly, while I hate to see shameless self-plugs in the community, the *fastest* method you could use is dblink_ora_copy, contained in EnterpriseDB's PG+ Advanced Server; it uses an optimized OCI connection to COPY the data directly from Oracle into Postgres, which also saves you the intermediate step of dumping the data. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837 | http://www.enterprisedb.com/
Re: [PERFORM] Replication Syatem
[EMAIL PROTECTED] (Gauri Kanekar) writes: We have a table table1 which get insert and updates daily in high numbers, bcoz of which its size is increasing and we have to vacuum it every alternate day. Vacuuming table1 take almost 30min and during that time the site is down. We need to cut down on this downtime.So thought of having a replication system, for which the replicated DB will be up during the master is getting vacuumed. Can anybody guide which will be the best suited replication solution for this. The only reason that it would be necessary for VACUUM to take the site down would be if you are running version 7.1, which was obsoleted in 2002, which, it should be noted, was SIX YEARS AGO. As has been noted, you seem to be presupposing a remarkably complex solution to resolve a problem which is likely to be better handled via running VACUUM rather more frequently. -- output = reverse(ofni.sesabatadxunil @ enworbbc) http://www3.sympatico.ca/cbbrowne/postgresql.html Rules of the Evil Overlord #181. I will decree that all hay be shipped in tightly-packed bales. Any wagonload of loose hay attempting to pass through a checkpoint will be set on fire. http://www.eviloverlord.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Best practice to load a huge table from ORACLE to PG
On Mon, Apr 28, 2008 at 5:37 PM, Adonias Malosso [EMAIL PROTECTED] wrote: Thank you for the answer. Good to know about this enterprise DB feature. No problem. I´ll follow using pgloader. That's fine. Though, I'd really suggest pg_bulkload, it's quite a bit faster. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
Basically we have some background process which updates table1 and we don't want the application to make any changes to table1 while vacuum. Vacuum requires exclusive lock on table1 and if any of the background or application is ON vacuum don't kick off. Thats the reason we need to get the site down. ~ Gauri On Tue, Apr 29, 2008 at 3:13 AM, Chris Browne [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] (Gauri Kanekar) writes: We have a table table1 which get insert and updates daily in high numbers, bcoz of which its size is increasing and we have to vacuum it every alternate day. Vacuuming table1 take almost 30min and during that time the site is down. We need to cut down on this downtime.So thought of having a replication system, for which the replicated DB will be up during the master is getting vacuumed. Can anybody guide which will be the best suited replication solution for this. The only reason that it would be necessary for VACUUM to take the site down would be if you are running version 7.1, which was obsoleted in 2002, which, it should be noted, was SIX YEARS AGO. As has been noted, you seem to be presupposing a remarkably complex solution to resolve a problem which is likely to be better handled via running VACUUM rather more frequently. -- output = reverse(ofni.sesabatadxunil @ enworbbc) http://www3.sympatico.ca/cbbrowne/postgresql.html Rules of the Evil Overlord #181. I will decree that all hay be shipped in tightly-packed bales. Any wagonload of loose hay attempting to pass through a checkpoint will be set on fire. http://www.eviloverlord.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Regards Gauri
Re: [PERFORM] Replication Syatem
But unless we do full vacuum the space is not recovered. Thats y we prefer full vacuum. ~ Gauri On Tue, Apr 29, 2008 at 10:38 AM, Greg Smith [EMAIL PROTECTED] wrote: On Tue, 29 Apr 2008, Gauri Kanekar wrote: Basically we have some background process which updates table1 and we don't want the application to make any changes to table1 while vacuum. Vacuum requires exclusive lock on table1 and if any of the background or application is ON vacuum don't kick off. VACUUM FULL needs an exclusive lock, the regular one does not in 8.1. It's one of the reasons FULL should be avoided. If you do regular VACUUM frequently enough, you shouldn't ever need to do a FULL one anyway. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Regards Gauri
Re: [PERFORM] Replication Syatem
On Tue, 29 Apr 2008, Gauri Kanekar wrote: Basically we have some background process which updates table1 and we don't want the application to make any changes to table1 while vacuum. Vacuum requires exclusive lock on table1 and if any of the background or application is ON vacuum don't kick off. VACUUM FULL needs an exclusive lock, the regular one does not in 8.1. It's one of the reasons FULL should be avoided. If you do regular VACUUM frequently enough, you shouldn't ever need to do a FULL one anyway. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
Andrew, Can you explain me in detail why u said vacuum full is making the things worst. We do vacuum full, as vacuum verbose analyse dont regain space for us. ~ Gauri On Mon, Apr 28, 2008 at 9:52 PM, Andrew Sullivan [EMAIL PROTECTED] wrote: On Mon, Apr 28, 2008 at 07:35:37PM +0530, Gauri Kanekar wrote: Peter, We are doing vacuum full every alternate day. We also do vacuum analyze very often. VACUUM FULL is making your problem worse, not better. Don't do that. We are currently using 8.1.3 version. You need immediately to upgrade to the latest 8.1 stability and security release, which is 8.1.11. This is a drop-in replacement. It's an urgent fix for your case. Auto vacuum is already on. But the table1 is so busy that auto vacuum don't get sufficient chance to vacuum it :(. You probably need to tune autovacuum not to do that table, and just vacuum that table in a constant loop or something. VACUUM should _never_ take the site down. If it does, you're doing it wrong. Have already tried all the option listed by you, thats y we reached to the decision of having a replication sytsem. So any suggestion on that :). I think you will find that no replication system will solve your underlying problems. That said, I happen to work for a company that will sell you a replication system to work with 8.1 if you really want it. A -- Andrew Sullivan [EMAIL PROTECTED] +1 503 667 4564 x104 http://www.commandprompt.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Regards Gauri
Re: [PERFORM] Replication Syatem
On Tue, Apr 29, 2008 at 10:41 AM, Gauri Kanekar [EMAIL PROTECTED] wrote: But unless we do full vacuum the space is not recovered. Thats y we prefer full vacuum. There is no point in recovering the space by moving tuples and truncating the relation (that's what VACUUM FULL does) because you are doing frequent updates on the table and that would again extend the relation. If you run plain VACUUM, that would recover dead space and update the free space maps. It may not be able to reduce the table size, but you should not be bothered much about it because the following updates/inserts will fill in the fragmented free space. You may want to check your FSM settings as well to make sure that you are tracking free space properly. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
On Tue, Apr 29, 2008 at 11:16 AM, Gauri Kanekar [EMAIL PROTECTED] wrote: Andrew, Can you explain me in detail why u said vacuum full is making the things worst. 1. VACUUM FULL takes exclusive lock on the table. That makes table unavailable for read/writes. 2. VACUUM FULL moves live tuples around. When a tuple is moved, the old index entry is deleted and a new index entry is inserted. This causes index bloats which are hard to recover. We do vacuum full, as vacuum verbose analyse dont regain space for us. As I mentioned in the other reply, you are not gaining much by regaining space. The subsequent UPDATEs/INSERTs will quickly extend the relation and you loose all the work done by VACUUM FULL. Plain VACUUM will update FSM to track the free space scattered across the relation which is later reused by updates/inserts. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance