[GENERAL] large table
I am trying to figure out what would have caused a table in a PostgreSQL 8.4.16 to get into a state where there is only 1 live tuple and has only ever had one 1 tuple but the size of the table is huge. CREATE TABLE public.myTable( myColumn timestamp with time zone NOT NULL); Note: there is no primary key or index on this table. CREATE OR REPLACE FUNCTION public.myFunc()RETURNS VOID AS $$BEGIN UPDATE public.myTable SET myColumn = CLOCK_TIMESTAMP(); IF NOT FOUND THEN INSERT INTO public.myTable(myColumn) VALUES (CLOCK_TIMESTAMP()); END IF;END;$$ LANGUAGE plpgsql VOLATILE STRICT SECURITY DEFINER; CREATE OR REPLACE FUNCTION public.wasUpdated(OUT was_updated boolean)RETURNS BOOLEAN AS $$BEGIN was_updated := COALESCE((SELECT myColumn FROM public.myTable) (CLOCK_TIMESTAMP() - INTERVAL '5 SECOND'), FALSE);END;$$ LANGUAGE plpgsql VOLATILE STRICT SECURITY DEFINER; SELECT *FROM pg_stat_all_tablesWHERE relname = 'myTable'; relid schemaname relname seq_scan seq_tup_read idx_scan idx_tup_fetch n_tup_ins n_tup_upd n_tup_del n_tup_hot_upd n_live_tup n_dead_tup last_vacuum last_autovacuum last_analyze last_autoanalyze 16713 public myTable 3991833 3992001 0 3775409 0 3771173 949135 183 2014-09-18 11:28:47.63545+00 2014-09-18 11:27:47.134432+00 The stats are very far off with n_live_tup at 949135 when there is only a single row in the table. Autovacuum appears to be running on a regular basis. SELECT * FROM pgstattuple('public.myTable'); table_len tuple_count tuple_len tuple_percent dead_tuple_count dead_tuple_len dead_tuple_percent free_space free_precent 34709504 1 32 0 105 3360 0.01 30757308 88.61 The actual size of the table is around 33 MB. The myFunc function is called every 2.5 seconds and the wasUpdated function every 2 seconds by separate processes. I realize that running a FULL VACUUM or CLUSTER command on the table will resolve the issue but I am more interested in a root cause that explains why this table would end up in this state. I have tried to reproduce this issue by running the exact setup and have not been able to get the table to grow like this example. Any plausible cause'es or explanations would be much appreciated. Luke
Re: [GENERAL] large table
On Monday, September 22, 2014 11:17:05 AM Luke Coldiron wrote: The actual size of the table is around 33 MB. The myFunc function is called every 2.5 seconds and the wasUpdated function every 2 seconds by separate processes. I realize that running a FULL VACUUM or CLUSTER command on the table will resolve the issue but I am more interested in a root cause that explains why this table would end up in this state. I have tried to reproduce this issue by running the exact setup and have not been able to get the table to grow like this example. Any plausible cause'es or explanations would be much appreciated. Luke I'd guess that some other process held a transaction open for a couple of week, and that prevented any vacuuming from taking place. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] large table
On Mon, 22 Sep 2014 11:17:05 -0700 Luke Coldiron lukecoldi...@hotmail.com wrote: I am trying to figure out what would have caused a table in a PostgreSQL 8.4.16 to get into a state where there is only 1 live tuple and has only ever had one 1 tuple but the size of the table is huge. CREATE TABLE public.myTable( myColumn timestamp with time zone NOT NULL); Note: there is no primary key or index on this table. CREATE OR REPLACE FUNCTION public.myFunc()RETURNS VOID AS $$BEGIN UPDATE public.myTable SET myColumn = CLOCK_TIMESTAMP(); IF NOT FOUND THEN INSERT INTO public.myTable(myColumn) VALUES (CLOCK_TIMESTAMP()); END IF;END;$$ LANGUAGE plpgsql VOLATILE STRICT SECURITY DEFINER; CREATE OR REPLACE FUNCTION public.wasUpdated(OUT was_updated boolean)RETURNS BOOLEAN AS $$BEGIN was_updated := COALESCE((SELECT myColumn FROM public.myTable) (CLOCK_TIMESTAMP() - INTERVAL '5 SECOND'), FALSE);END;$$ LANGUAGE plpgsql VOLATILE STRICT SECURITY DEFINER; SELECT *FROM pg_stat_all_tablesWHERE relname = 'myTable'; [snip] The actual size of the table is around 33 MB. The myFunc function is called every 2.5 seconds and the wasUpdated function every 2 seconds by separate processes. I realize that running a FULL VACUUM or CLUSTER command on the table will resolve the issue but I am more interested in a root cause that explains why this table would end up in this state. I have tried to reproduce this issue by running the exact setup and have not been able to get the table to grow like this example. Any plausible cause'es or explanations would be much appreciated. The UPDATE in myFunc() creates a new row every 2.5 seconds when it updates the row. The data from those rows is only reclaimed when a vacuum is run. So (for example) if autovacuum only triggers a vacuum every 250 seconds, there will be 249 rows worth of space in the table, on average. With the other process querying the table, it's possible that the row that it's looking at will be a something that _should_ be reclaimable, so vacuum may not clear up all the free space. As far as running the exact setup: if you're not getting the same results, then your setup isn't exactly the same. It's likely that there are things going on in the setup you're curious about that you're not aware of, such as additional queries on the table, additional load that causes operations to take a little longer, thus resulting in different overlap of competing operations, etc. Keep in mind that a short-lived incident might have resulted in table bloat that won't be reclaimed by autovacuum. I.e., if autovacuum wasn't running for a while, this table would just keep bloating; then when you start autovacuum, it will maintain the table size, but it won't get any smaller. I can't make any sense of the data you provided, it's all on seperate rows and I've given up on trying to figure out what number goes with which value, so I don't know exactly what the situation is. It's likely that you can improve on the situation by tweaking the autovacuum settings for this table to vacuum it more aggressively. Although, you don't seem to have a _problem_ that you've stated. Are you seeing performance issues? Is 33M too much data and filling up the drive (not being sarcastic here, as there are various mobile applications where 33M could be important, even now). Because, if this isn't actually causing any problems, I wouldn't really worry about it. -- Bill Moran I need your help to succeed: http://gamesbybill.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] large table
From: ahodg...@simkin.ca To: pgsql-general@postgresql.org Subject: Re: [GENERAL] large table Date: Mon, 22 Sep 2014 11:34:45 -0700 On Monday, September 22, 2014 11:17:05 AM Luke Coldiron wrote: The actual size of the table is around 33 MB. The myFunc function is called every 2.5 seconds and the wasUpdated function every 2 seconds by separate processes. I realize that running a FULL VACUUM or CLUSTER command on the table will resolve the issue but I am more interested in a root cause that explains why this table would end up in this state. I have tried to reproduce this issue by running the exact setup and have not been able to get the table to grow like this example. Any plausible cause'es or explanations would be much appreciated. Luke I'd guess that some other process held a transaction open for a couple of week, and that prevented any vacuuming from taking place. Interesting idea, on the surface I'm not sure how this would have happened in the system but I can certainly explore forcing this to happen and see if the result is similar. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] large table
Date: Mon, 22 Sep 2014 14:38:52 -0400 From: wmo...@potentialtech.com To: lukecoldi...@hotmail.com CC: pgsql-general@postgresql.org Subject: Re: [GENERAL] large table On Mon, 22 Sep 2014 11:17:05 -0700 Luke Coldiron lukecoldi...@hotmail.com wrote: I am trying to figure out what would have caused a table in a PostgreSQL 8.4.16 to get into a state where there is only 1 live tuple and has only ever had one 1 tuple but the size of the table is huge. CREATE TABLE public.myTable( myColumn timestamp with time zone NOT NULL); Note: there is no primary key or index on this table. CREATE OR REPLACE FUNCTION public.myFunc()RETURNS VOID AS $$BEGIN UPDATE public.myTable SET myColumn = CLOCK_TIMESTAMP(); IF NOT FOUND THEN INSERT INTO public.myTable(myColumn) VALUES (CLOCK_TIMESTAMP()); END IF;END;$$ LANGUAGE plpgsql VOLATILE STRICT SECURITY DEFINER; CREATE OR REPLACE FUNCTION public.wasUpdated(OUT was_updated boolean)RETURNS BOOLEAN AS $$BEGIN was_updated := COALESCE((SELECT myColumn FROM public.myTable) (CLOCK_TIMESTAMP() - INTERVAL '5 SECOND'), FALSE);END;$$ LANGUAGE plpgsql VOLATILE STRICT SECURITY DEFINER; SELECT *FROM pg_stat_all_tablesWHERE relname = 'myTable'; [snip] The actual size of the table is around 33 MB. The myFunc function is called every 2.5 seconds and the wasUpdated function every 2 seconds by separate processes. I realize that running a FULL VACUUM or CLUSTER command on the table will resolve the issue but I am more interested in a root cause that explains why this table would end up in this state. I have tried to reproduce this issue by running the exact setup and have not been able to get the table to grow like this example. Any plausible cause'es or explanations would be much appreciated. The UPDATE in myFunc() creates a new row every 2.5 seconds when it updates the row. The data from those rows is only reclaimed when a vacuum is run. So (for example) if autovacuum only triggers a vacuum every 250 seconds, there will be 249 rows worth of space in the table, on average. With the other process querying the table, it's possible that the row that it's looking at will be a something that _should_ be reclaimable, so vacuum may not clear up all the free space. As far as running the exact setup: if you're not getting the same results, then your setup isn't exactly the same. It's likely that there are things going on in the setup you're curious about that you're not aware of, such as additional queries on the table, additional load that causes operations to take a little longer, thus resulting in different overlap of competing operations, etc. It is possible and that is part of what I am trying to discover however I am very familiar with the system / code base and in this case there is a single process updating the timestamp and a single process reading the timestamp. There are no other user processes programmed to interact with this table outside of potentially what Postgres is doing. Keep in mind that a short-lived incident might have resulted in table bloat that won't be reclaimed by autovacuum. I.e., if autovacuum wasn't running for a while, this table would just keep bloating; then when you start autovacuum, it will maintain the table size, but it won't get any smaller. I thought this as well and have run tests with autovacuum turned off and I don't see this issue occur over my 1000s of updates. The updates become hot updates and reuse dead tuples. I can't make any sense of the data you provided, it's all on seperate rows and I've given up on trying to figure out what number goes with which value, so I don't know exactly what the situation is. It's likely that you can improve on the situation by tweaking the autovacuum settings for this table to vacuum it more aggressively. Sorry about that the email client that I am using messed up the formatting. Here is another attempt. SELECT * FROM pg_stat_all_tables WHERE relname = 'myTable'; relid schemaname relname seq_scanseq_tup_readidx_scan idx_tup_fetch n_tup_ins n_tup_upd n_tup_del n_tup_hot_upd n_live_tup n_dead_tup last_vacuum last_autovacuum last_analyze last_autoanalyze16713 public myTable 3995023 3995296 0 3778598 0 3774362 949135 124 2014-09-18 11:28:47.63545+00 2014-09-18 11:27:47.134432+00 SELECT * FROM pgstattuple('public.myTable'); table_len tuple_count tuple_len tuple_percent dead_tuple_countdead_tuple_len dead_tuple_percent free_space free_precent347095041 32 0 105 33600.01 3075730888.61 So far having autovacuum on or off has not caused the problem to occur. Originally I was thinking
Re: [GENERAL] large table
On 9/22/2014 12:33 PM, Luke Coldiron wrote: It is possible and that is part of what I am trying to discover however I am very familiar with the system / code base and in this case there is a single process updating the timestamp and a single process reading the timestamp. There are no other user processes programmed to interact with this table outside of potentially what Postgres is doing. ANY other connection to the same postgres server, even to a different database, that has an open long running transaction (most frequently, Idle In Transaction) will block autovacuum from marking the old tuples as reusable. -- john r pierce 37N 122W somewhere on the middle of the left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] large table
On Mon, 22 Sep 2014 12:15:27 -0700 Luke Coldiron lukecoldi...@hotmail.com wrote: I'd guess that some other process held a transaction open for a couple of week, and that prevented any vacuuming from taking place. Interesting idea, on the surface I'm not sure how this would have happened in the system but I can certainly explore forcing this to happen and see if the result is similar. It happened when I developed with Java+Hibernate. It opened a transaction and made a lot of inserts and deletes while the app run. It created GB size tables with few rows and a permament 'IDLE in TRANSACTION' stops any autovacuum. --- --- Eduardo Morras emorr...@yahoo.es -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] large table
On Mon, 22 Sep 2014 12:46:21 -0700 John R Pierce pie...@hogranch.com wrote: On 9/22/2014 12:33 PM, Luke Coldiron wrote: It is possible and that is part of what I am trying to discover however I am very familiar with the system / code base and in this case there is a single process updating the timestamp and a single process reading the timestamp. There are no other user processes programmed to interact with this table outside of potentially what Postgres is doing. ANY other connection to the same postgres server, even to a different database, that has an open long running transaction (most frequently, Idle In Transaction) will block autovacuum from marking the old tuples as reusable. As a possibility, I've seen this happen when people connected to the DB using various GUI tools (can't remember the exact one where we saw this) that started and held open a transaction without the user realizing it. This prevented autovacuum from getting any useful work done until our Nagios monitoring detected the idle transaction and an engineer tracked down who was doing it and had them close the program. IMHO, too many GUI tools make it too easy to do something without realizing the consequences. On a related note, I'm curious as to how an open transaction affects HOT updates (if at all). This is an area of behavior I have little experience with to date. -- Bill Moran I need your help to succeed: http://gamesbybill.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] large table
Date: Mon, 22 Sep 2014 12:46:21 -0700 From: pie...@hogranch.com To: pgsql-general@postgresql.org Subject: Re: [GENERAL] large table On 9/22/2014 12:33 PM, Luke Coldiron wrote: It is possible and that is part of what I am trying to discover however I am very familiar with the system / code base and in this case there is a single process updating the timestamp and a single process reading the timestamp. There are no other user processes programmed to interact with this table outside of potentially what Postgres is doing. ANY other connection to the same postgres server, even to a different database, that has an open long running transaction (most frequently, Idle In Transaction) will block autovacuum from marking the old tuples as reusable. Good point, I wasn't thinking about this as a possibility. This is a very good possibility considering the behavior of the rest of the system. -- john r pierce 37N 122W somewhere on the middle of the left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] large table starting sequence scan because of default_statistic_target
Hi, we are running a large 8.3 database and had some trouble with a default statistic target. We had set it to one special table some time ago, when we got a problem with a growing table starting with sequence scans. Last week we did manually cluster this table (create table as ... order by; drop table orig, rename table temp to orig ). Of course the statistic target was dropped and we did not remember to set it again. Why does default_statistic_target defaults to 10? The documentation tells me, the only drawback is a longer ANALYZE run. we are setting it to 100 in postgresql.conf and we did not see a much longer run of ANALYZE. Of course, smaller tables won't need a setting of 100. But small tables are usually not very interesting when it comes to performance. With a setting of 10 you run into difficult problems if your table grows. Suddenly an execution plan changes and you get sequence scans on your largest table! We had such problems and it was annoying to have a real slow down just because of this minor configuration parameter. I suggest to setting it to 100 by default: - no problems for small installations - no problems for DBA who always adjust their system in every possible way. - no problems for growing databases with unequal distributed data But maybe there are some other reasons not setting it to a higher value. If so, please tell me. kind regards Janning -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] large table starting sequence scan because of default_statistic_target
On Mon, Mar 16, 2009 at 7:15 AM, Janning Vygen vy...@kicktipp.de wrote: Hi, Why does default_statistic_target defaults to 10? I suggest to setting it to 100 by default: Already done in 8.4 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] large table starting sequence scan because of default_statistic_target
On Monday 16 March 2009 15:13:51 Scott Marlowe wrote: On Mon, Mar 16, 2009 at 7:15 AM, Janning Vygen vy...@kicktipp.de wrote: Hi, Why does default_statistic_target defaults to 10? I suggest to setting it to 100 by default: Already done in 8.4 GREAT! sorry for not searching the archives or changelog before. Janning -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] large table starting sequence scan because of default_statistic_target
On Mon, Mar 16, 2009 at 8:24 AM, Janning Vygen vy...@kicktipp.de wrote: On Monday 16 March 2009 15:13:51 Scott Marlowe wrote: On Mon, Mar 16, 2009 at 7:15 AM, Janning Vygen vy...@kicktipp.de wrote: Hi, Why does default_statistic_target defaults to 10? I suggest to setting it to 100 by default: Already done in 8.4 GREAT! sorry for not searching the archives or changelog before. Hey, no problem, there's plenty of new stuff coming up in 8.4, and a lot of it doesn't show up on the general list anyway. This just makes sure a few more people know about this change. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] large table vacuum issues
On Jan 5, 2008 5:38 AM, Ed L. [EMAIL PROTECTED] wrote: We need some advice on how to handle some large table autovacuum issues. One of our 8.1.2 autovacuums is launching a DB-wide vacuum on our 270GB database to prevent xid wrap-around, but is getting hung-up and/or bogged down for hours on a 40gb table and taking the server performance down with it, apparently due to an IO bottleneck. The autovac child process becomes completely unresponsive to SIGTERM/SIGINT; only a sigkill restart with disabling the autovac daemon gets us back to adequate performance for now. Looks like you haven't been vacuuming for a while , have you? because it seems the autovac was disabled but was invoked forcefully to avoid wraparound. If infact the wraparound happens you will lose data. When autovacuum is processing a table it wouldn't take more time than what a normal vacuum would take. What might help you really is a temporary increase in maint work memory, whats your current setting? how much RAM do you have?, if you can afford more memory, increase it to significantly high value to help speed up the vacuum process. I understand it might impact some other system activity but you need a vacuum and fast, before you lose all data. You need to get rid of dead rows first and then have a healthy vacuuming schedule , either a daily cron job or autovac, setup as your workload . Prevention is ofcourse better than the cure. ;) Thanks, -- Usama Munir Dar http://www.linkedin.com/in/usamadar Consultant Architect Cell:+92 321 5020666 Skype: usamadar
Re: [GENERAL] large table vacuum issues
Ed L. [EMAIL PROTECTED] wrote: On Friday 04 January 2008 6:21 pm, Scott Marlowe wrote: On Jan 4, 2008 6:38 PM, Ed L. [EMAIL PROTECTED] wrote: We need some advice on how to handle some large table autovacuum issues. One of our 8.1.2 First of all, update your 8.1 install to 8.1.10. Failing to keep up with bug fixes is negligent. who knows, you might be getting bitten by a bug that was fixed between 8.1.2 and 8.1.10 Could be. But like you said, who knows. In some environments, downtime for upgrading costs money (and more), too, sometimes even enough to make it negligent to take downtime to keep up with bug fixes (and of course, the new bugs) which may or may not be a factor at hand. Upgrades along the 8.1.x branch take something on the order of 5 minutes (if you're meticulous and serialize the process). If you haven't set yourself up so you can schedule 5 minutes of downtime once a month or so, then the negligence occurred much earlier than at the failure to upgrade. While the time required to restart a DB may be neglible, there are often upstream/downstream dependencies that greatly expand the actual downtime for the customer. Like what? The point to the double-dot branch is that upgrades don't affect dependencies. How much would downtime need to cost before you thought it negligent to upgrade immediately? It's a tradeoff, not well-supported by simple pronouncements, one the customer and provider are best qualified to make. Not really. Unscheduled downtime is _always_ more expensive than scheduled downtime. Scheduled downtime isn't going to put you in breach of contract if you've got an uptime guarantee. If you're really in a situation where you need 100% uptime, then you're still negligent for not having something like Slony to allow you to switch production to another server so you can alternate maintenance between the two. This is something along the RAID 5 argument, no matter how you argue it, it's a bad idea. If you claim you can't afford to buy more hardware, then you made a mistake in pricing out your product to your client. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] large table vacuum issues
We need some advice on how to handle some large table autovacuum issues. One of our 8.1.2 autovacuums is launching a DB-wide vacuum on our 270GB database to prevent xid wrap-around, but is getting hung-up and/or bogged down for hours on a 40gb table and taking the server performance down with it, apparently due to an IO bottleneck. The autovac child process becomes completely unresponsive to SIGTERM/SIGINT; only a sigkill restart with disabling the autovac daemon gets us back to adequate performance for now. We are discussing how to partition the table (difficult due to existing foreign keys in other tables), and archiving/clearing data. Are there any other tricks to get it past this large table for the time being and still get the xid wraparound fix? TIA. Ed ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] large table vacuum issues
On Jan 4, 2008 6:38 PM, Ed L. [EMAIL PROTECTED] wrote: We need some advice on how to handle some large table autovacuum issues. One of our 8.1.2 First of all, update your 8.1 install to 8.1.10. Failing to keep up with bug fixes is negligent. who knows, you might be getting bitten by a bug that was fixed between 8.1.2 and 8.1.10 autovacuums is launching a DB-wide vacuum on our 270GB database to prevent xid wrap-around, but is getting hung-up and/or bogged down for hours on a 40gb table and taking the server performance down with it, apparently due to an IO bottleneck. Have you tried adjusting the #vacuum_cost_delay = 0 # 0-1000 milliseconds #vacuum_cost_page_hit = 1 # 0-1 credits #vacuum_cost_page_miss = 10 # 0-1 credits #vacuum_cost_page_dirty = 20# 0-1 credits #vacuum_cost_limit = 200# 0-1 credits settings to something so as to make vacuum less intrusive? might be the easiest fix. Are there any other tricks to get it past this large table for the time being and still get the xid wraparound fix? the other trick would be to do a dump / restore of your whole db, which can often be quicker than vacuuming it if it's got a lot of dead tuples in it. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] large table vacuum issues
On Friday 04 January 2008 6:21 pm, Scott Marlowe wrote: Have you tried adjusting the #vacuum_cost_delay = 0 # 0-1000 milliseconds #vacuum_cost_page_hit = 1 # 0-1 credits #vacuum_cost_page_miss = 10 # 0-1 credits #vacuum_cost_page_dirty = 20# 0-1 credits #vacuum_cost_limit = 200# 0-1 credits settings to something so as to make vacuum less intrusive? might be the easiest fix. Any particular suggested changes for these parameters? Ed ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] large table vacuum issues
On Friday 04 January 2008 6:21 pm, Scott Marlowe wrote: On Jan 4, 2008 6:38 PM, Ed L. [EMAIL PROTECTED] wrote: We need some advice on how to handle some large table autovacuum issues. One of our 8.1.2 First of all, update your 8.1 install to 8.1.10. Failing to keep up with bug fixes is negligent. who knows, you might be getting bitten by a bug that was fixed between 8.1.2 and 8.1.10 Could be. But like you said, who knows. In some environments, downtime for upgrading costs money (and more), too, sometimes even enough to make it negligent to take downtime to keep up with bug fixes (and of course, the new bugs) which may or may not be a factor at hand. While the time required to restart a DB may be neglible, there are often upstream/downstream dependencies that greatly expand the actual downtime for the customer. How much would downtime need to cost before you thought it negligent to upgrade immediately? It's a tradeoff, not well-supported by simple pronouncements, one the customer and provider are best qualified to make. Ed ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] large table vacuum issues
On Jan 4, 2008 7:41 PM, Ed L. [EMAIL PROTECTED] wrote: On Friday 04 January 2008 6:21 pm, Scott Marlowe wrote: On Jan 4, 2008 6:38 PM, Ed L. [EMAIL PROTECTED] wrote: We need some advice on how to handle some large table autovacuum issues. One of our 8.1.2 First of all, update your 8.1 install to 8.1.10. Failing to keep up with bug fixes is negligent. who knows, you might be getting bitten by a bug that was fixed between 8.1.2 and 8.1.10 Could be. But like you said, who knows. In some environments, downtime for upgrading costs money (and more), too, sometimes even enough to make it negligent to take downtime to keep up with bug fixes (and of course, the new bugs) which may or may not be a factor at hand. While the time required to restart a DB may be neglible, there are often upstream/downstream dependencies that greatly expand the actual downtime for the customer. How much would downtime need to cost before you thought it negligent to upgrade immediately? It's a tradeoff, not well-supported by simple pronouncements, one the customer and provider are best qualified to make. And how much would downtime cost you if your database got corrupted by those bugs and you had to restore from backups? You can use something like slony and a two server setup to reduce the downtime to mere seconds. I know about downtime and its costs, I work at an airline reservation company. Even we have scheduled maintenance windows, albeit few and far between. I find it hard to believe you've had none since 8.1.2 came out. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] large table vacuum issues
Ed L. wrote: On Friday 04 January 2008 6:21 pm, Scott Marlowe wrote: On Jan 4, 2008 6:38 PM, Ed L. [EMAIL PROTECTED] wrote: We need some advice on how to handle some large table autovacuum issues. One of our 8.1.2 First of all, update your 8.1 install to 8.1.10. Failing to keep up with bug fixes is negligent. who knows, you might be getting bitten by a bug that was fixed between 8.1.2 and 8.1.10 Could be. But like you said, who knows. In some environments, downtime for upgrading costs money (and more), too, sometimes even enough to make it negligent to take downtime to keep up with bug fixes (and of course, the new bugs) which may or may not be a factor at hand. While the time required to restart a DB may be neglible, there are often upstream/downstream dependencies that greatly expand the actual downtime for the customer. How much would downtime need to cost before you thought it negligent to upgrade immediately? It's a tradeoff, not well-supported by simple pronouncements, one the customer and provider are best qualified to make. You make a valid argument above but you forget a couple of minor points. How much money does it cost when your customer: * gets sued for a breech of security because they couldn't afford a 30 minute downtime at 3am? (I assume 30 minutes only because you do need to shutdown external services). * looses all there data because of a corner case function they are running that causes pages to become corrupt? Just curious... Sincerely, Joshua D. Drake ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] large table vacuum issues
On Jan 4, 2008 7:29 PM, Ed L. [EMAIL PROTECTED] wrote: On Friday 04 January 2008 6:21 pm, Scott Marlowe wrote: Have you tried adjusting the #vacuum_cost_delay = 0 # 0-1000 milliseconds #vacuum_cost_page_hit = 1 # 0-1 credits #vacuum_cost_page_miss = 10 # 0-1 credits #vacuum_cost_page_dirty = 20# 0-1 credits #vacuum_cost_limit = 200# 0-1 credits settings to something so as to make vacuum less intrusive? might be the easiest fix. Any particular suggested changes for these parameters? Well, it really depends on your I/O subsystem, but a good start would be to read this section of the manual: http://www.postgresql.org/docs/8.2/interactive/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-VACUUM-COST I'd start setting the delay to 10 or 20 and seeing if vacuuming has a low enough impact to allow it to run in the background, even during peak hours. Keep an eye on vmstat / iostat output while vacuum is running to see if you're flooding your I/O or not. note that there's a whole other set of vars for autovacuum (at least in 8.2, don't know about 8.1) that you can set so that regular vacuums can happen with greater or less priority than autovacuuming. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] large table problem
Thanks for the redirect... After profiling my client memory usage and using the built-in cursor functionality I discovered that another part of my program was causing the memory overflow and that the ResultSet iteration was doing exactly what it should have all along. On 4/21/07, Kris Jurka [EMAIL PROTECTED] wrote: On Fri, 20 Apr 2007, Jason Nerothin wrote: I'm trying to work my way around a large query problem. Not too unexpectedly, the app server (EJB3/JPA) is choking on the queries which are unnamed native queries in Java parliance. Work-around attempt 1 was to call directly to the JDBC driver, but the cursor doesn't dispose of the memory in the ResultSet once I've passed it by (OutOfMemoryError) and the documentation suggests that cursor behavior is a little buggy for the current postgres driver. (The docs suggest implementing a custom stored procedure to provide iteration.) I'm not sure what documentation you're reading: http://jdbc.postgresql.org/documentation/82/query.html#query-with-cursor and it works as adverstised. Kris Jurka -- Jason Nerothin Programmer/Analyst IV - Database Administration UCLA-DOE Institute for Genomics Proteomics Howard Hughes Medical Institute 611 C.E. Young Drive East | Tel: (310) 206-3907 105 Boyer Hall, Box 951570 | Fax: (310) 206-3914 Los Angeles, CA 90095. USA | Mail: [EMAIL PROTECTED] http://www.mbi.ucla.edu/~jason
Re: [GENERAL] large table problem
On Fri, 20 Apr 2007, Jason Nerothin wrote: I'm trying to work my way around a large query problem. Not too unexpectedly, the app server (EJB3/JPA) is choking on the queries which are unnamed native queries in Java parliance. Work-around attempt 1 was to call directly to the JDBC driver, but the cursor doesn't dispose of the memory in the ResultSet once I've passed it by (OutOfMemoryError) and the documentation suggests that cursor behavior is a little buggy for the current postgres driver. (The docs suggest implementing a custom stored procedure to provide iteration.) I'm not sure what documentation you're reading: http://jdbc.postgresql.org/documentation/82/query.html#query-with-cursor and it works as adverstised. Kris Jurka ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] large table problem
I'm trying to work my way around a large query problem. In my system, I've created a number of large materialized views that are the output of some computationally expensive stored procedures on other large tables in my system. They are intended to serve as staging tables for the next phase of computation, such that I can call (the presumably efficient): select * from my_mv_table; and then have at the data on the other side of a Java application server. These tables range anywhere from 400,000 to 32,000,000 rows, though individual records are only 5 integers wide. Not too unexpectedly, the app server (EJB3/JPA) is choking on the queries which are unnamed native queries in Java parliance. Work-around attempt 1 was to call directly to the JDBC driver, but the cursor doesn't dispose of the memory in the ResultSet once I've passed it by (OutOfMemoryError) and the documentation suggests that cursor behavior is a little buggy for the current postgres driver. (The docs suggest implementing a custom stored procedure to provide iteration.) Attempt number 2, now underway, is to pass LIMIT and OFFSET values to the query which Postgres handles quite effectively as long as the OFFSET value is less than the total number of rows in the table. When the value is greater than num_rows, the query hangs for minutes. So my question is, does Postgres keep any metadata around about un-indexed table sizes? select count(*) from my_table itself can take a minute to process. If I had ready access to that information, I could kluge up my code with something like: num_rows = getNumRows(); while( offset num_rows ){ processData( select( offset += window_size, window_size ) ) } At the moment the best option I have is to write a stored proceedure to populate a table mv_sizes, but not only is this a pain in the patoot, it just seems sick and wrong. Am I missing something painfully obvious? Jason -- Jason Nerothin Programmer/Analyst IV - Database Administration UCLA-DOE Institute for Genomics Proteomics Howard Hughes Medical Institute 611 C.E. Young Drive East | Tel: (310) 206-3907 105 Boyer Hall, Box 951570 | Fax: (310) 206-3914 Los Angeles, CA 90095. USA | Mail: [EMAIL PROTECTED] http://www.mbi.ucla.edu/~jason
Re: [GENERAL] large table problem
Jason Nerothin [EMAIL PROTECTED] writes: Attempt number 2, now underway, is to pass LIMIT and OFFSET values to the query which Postgres handles quite effectively as long as the OFFSET value is less than the total number of rows in the table. When the value is greater than num_rows, the query hangs for minutes. I don't actually believe the above; using successively larger offsets should get slower and slower in a smooth manner, because the only thing OFFSET does is throw away scanned rows just before they would have been returned to the client. I think you've confused yourself somehow. the documentation suggests that cursor behavior is a little buggy for the current postgres driver. How old a driver are you using? Because a cursor is definitely what you want to use for retrieving millions of rows. It strikes me that pgsql-jdbc might be a more suitable group of people to ask about this than the -general list ... regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] Large Table Performance
List, I've got a problem where I need to make a table that is going to grow by an average of 230,000 records per day. There are only 20 columns in the table, mostly char and integer. It's FK'd in two places to another table for import/export transaction id's and I have a serial primary key and an index on a date column for when I need to search (every search is done inside a date range). I thought it would be OK but after a few weeks of operation I have more than five million records in there. Some queries take more than five minutes to complete and I'm sad about that. How can I make this faster? I could munge dates into integers if their faster, I'm OK with that. What can I tweak in the configuration file to speed things up? What about some drastic schema change that more experience would have shown me? I cannot show the full schema but it's like this: -- My import/export data information table ie_data (id serial primary key, date date, [12 other columns here]) big_transaction_table(id serial primary key, import_id int w/FK, export_id int w/FK, date date, [20 other necessary transaction detail columns]) So when I say select x,y,z from big_transaction_table where date='10/2/2005' and date='10/4/2005' and transaction_status in (1,2,3) order by date; it takes five+ minutes. TIA for any suggestions. /djb ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] Large Table Performance
List, I've got a problem where I need to make a table that is going to grow by an average of 230,000 records per day. There are only 20 columns in the table, mostly char and integer. It's FK'd in two places to another table for import/export transaction id's and I have a serial primary key and an index on a date column for when I need to search (every search is done inside a date range). I thought it would be OK but after a few weeks of operation I have more than five million records in there. Some queries take more than five minutes to complete and I'm sad about that. How can I make this faster? I could munge dates into integers if their faster, I'm OK with that. What can I tweak in the configuration file to speed things up? What about some drastic schema change that more experience would have shown me? I cannot show the full schema but it's like this: -- My import/export data information table ie_data (id serial primary key, date date, [12 other columns here]) big_transaction_table(id serial primary key, import_id int w/FK, export_id int w/FK, date date, [20 other necessary transaction detail columns]) So when I say select x,y,z from big_transaction_table where date='10/2/2005' and date='10/4/2005' and transaction_status in (1,2,3) order by date; it takes five+ minutes. TIA for any suggestions. /djb ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Large Table Performance
On Fri, Oct 21, 2005 at 05:25:22PM -0700, Edoceo Lists wrote: [summary of situation] Some queries take more than five minutes to complete and I'm sad about that. How can I make this faster? You might get more help on pgsql-performance, which is specifically for discussions of performance issues. I could munge dates into integers if their faster, I'm OK with that. Let's identify the problem before thinking about possible solutions. What can I tweak in the configuration file to speed things up? What version of PostgreSQL are you using? What operating system? What kind of hardware (CPU, disks, amount of memory, etc.)? Are you regularly vacuuming and analyzing all tables in the database? Have you tuned your configuration at all? What are the following settings? shared_buffers work_mem (8.x) or sort_mem (7.x) effective_cache_size random_page_cost You might want to read through a tuning guide like this one: http://www.powerpostgresql.com/PerfList So when I say select x,y,z from big_transaction_table where date='10/2/2005' and date='10/4/2005' and transaction_status in (1,2,3) order by date; it takes five+ minutes. Please post the EXPLAIN ANALYZE output (not just EXPLAIN) of this query; that'll show us what the query planner is doing and how accurate its row count estimates are. If the row counts are way off then you might see an improvement by increasing the statistics target for the date and maybe the transaction_status columns. Or you might just need to analyze the table to update its statistics, and possibly vacuum it to get rid of dead tuples. -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Large Table Performance
On 22 Oct 2005, at 01:25, Edoceo Lists wrote: List, I've got a problem where I need to make a table that is going to grow by an average of 230,000 records per day. There are only 20 columns in the table, mostly char and integer. It's FK'd in two places to another table for import/export transaction id's and I have a serial primary key and an index on a date column for when I need to search (every search is done inside a date range). I thought it would be OK but after a few weeks of operation I have more than five million records in there. Some queries take more than five minutes to complete and I'm sad about that. How can I make this faster? I could munge dates into integers if their faster, I'm OK with that. What can I tweak in the configuration file to speed things up? What about some drastic schema change that more experience would have shown me? I cannot show the full schema but it's like this: -- My import/export data information table ie_data (id serial primary key, date date, [12 other columns here]) big_transaction_table(id serial primary key, import_id int w/FK, export_id int w/FK, date date, [20 other necessary transaction detail columns]) So when I say select x,y,z from big_transaction_table where date='10/2/2005' and date='10/4/2005' and transaction_status in (1,2,3) order by date; it takes five+ minutes. TIA for any suggestions. What hardware are you on? What query plans (output from explain) do your queries give you? What PG version? We do about 100,000 rows a minute (300 MB+) a day so I suspect your queries are doing full table scans or something. Of course we don't use any FKs so I suppose they could be biting you. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Large table search question
John Wells wrote: Guys, I have a general question about designing databases for large data sets. I was speaking with a colleague about an application we're preparing to build. One of the application's tables will potentially contain 2 million or more names, containing (at least) the fields first_name, last_name, middle_name and prefix. A common lookup the application will require is the full name, so prefix + first_name + middle_name + last_name. My friend's suggestion was to create a lookup field in the table itself, which would contain a concatenation of these fields created during insert. So, for each record, we'd having each individual field and then a full_name field that would contain the combination of the ind. fields. His argument is that this will make lookups in this manner extremely fast and efficient. Might, might not. No figures to back up his argument. It'll certainly make updates slower and less efficient. In fact, since each row will store the data twice you'll get less rows per disk-page which means (potentially) more disk reads when you need to get several rows. I agree with his assertion, but get the feeling that this is sort of an ugly design. Would a compound index on these fields really be less efficient? Doubtful, I'd certainly not try his solution until I'd tried the simple way first. If you really want to try your friend's approach on PG you can build a functional index. As of 7.4, these can be expressions rather than just indexes so you can do something like: CREATE INDEX my_idx_1 ON table1 ( prefix || ' ' || first_name ...); If you're using 7.3.x you'll need to wrap that expression in a function and index the function instead. In your case though, I'd just build a compound index and leave it at that. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] Large table search question
I don't want to but in, I just find this an interesting discussion and would like to add my 2 cents: I have read this in the manual: (PostgreSQL 7.4beta4 documentation, Chapter 11.3 Multicolumn Indexes) Qoute: Multicolumn indexes should be used sparingly. Most of the time, an index on a single column is sufficient and saves space and time. Indexes with more than three columns are unlikely to be helpful unless the usage of the table is extremely stylized. This makes me think of the usefullness in means of performance off multi-column indices. Furthermore it states that mulicolumn indeces will only be used by the planner if the fields of the index are used with the AND operator in the where clause of your select. (Same chapter). We had a table with 6million+ records and a few tests with explain reveiled that none of the multi-column indeces where actually used! This while regualar analyzes where done, and the data never changes (no mutations). I don't seem to grasp the full meaning of the above. Am I better of using several single field indices, or do mulitcolumn indices offer an advantage? If so in which case? Just made me wander... Regards, Stijn Vanroye -Original Message- From: Richard Huxton [mailto:[EMAIL PROTECTED] Sent: dinsdag 1 juni 2004 10:44 To: John Wells Cc: [EMAIL PROTECTED] Subject: Re: [GENERAL] Large table search question[Scanned] John Wells wrote: Guys, I have a general question about designing databases for large data sets. I was speaking with a colleague about an application we're preparing to build. One of the application's tables will potentially contain 2 million or more names, containing (at least) the fields first_name, last_name, middle_name and prefix. A common lookup the application will require is the full name, so prefix + first_name + middle_name + last_name. My friend's suggestion was to create a lookup field in the table itself, which would contain a concatenation of these fields created during insert. So, for each record, we'd having each individual field and then a full_name field that would contain the combination of the ind. fields. His argument is that this will make lookups in this manner extremely fast and efficient. Might, might not. No figures to back up his argument. It'll certainly make updates slower and less efficient. In fact, since each row will store the data twice you'll get less rows per disk-page which means (potentially) more disk reads when you need to get several rows. I agree with his assertion, but get the feeling that this is sort of an ugly design. Would a compound index on these fields really be less efficient? Doubtful, I'd certainly not try his solution until I'd tried the simple way first. If you really want to try your friend's approach on PG you can build a functional index. As of 7.4, these can be expressions rather than just indexes so you can do something like: CREATE INDEX my_idx_1 ON table1 ( prefix || ' ' || first_name ...); If you're using 7.3.x you'll need to wrap that expression in a function and index the function instead. In your case though, I'd just build a compound index and leave it at that. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [GENERAL] Large table search question
Stijn Vanroye wrote: I don't want to but in, I just find this an interesting discussion and would like to add my 2 cents: I have read this in the manual: (PostgreSQL 7.4beta4 documentation, Chapter 11.3 Multicolumn Indexes) Qoute: Multicolumn indexes should be used sparingly. Most of the time, an index on a single column is sufficient and saves space and time. Indexes with more than three columns are unlikely to be helpful unless the usage of the table is extremely stylized. This makes me think of the usefullness in means of performance off multi-column indices. Furthermore it states that mulicolumn indeces will only be used by the planner if the fields of the index are used with the AND operator in the where clause of your select. (Same chapter). We had a table with 6million+ records and a few tests with explain reveiled that none of the multi-column indeces where actually used! This while regualar analyzes where done, and the data never changes (no mutations). Indeed - in many cases the additional costs of keeping a multi-column index up to date, and of reading it outweigh the benefits on the few queries that actually use them. Looking at John's example, if he defined an index (first_name, last_name) then ordering by last_name can't ever use that index. If you execute a lot of queries for last_name=Smith AND first_name=John then it might well help, there are a lot of Smiths to choose from. On the other hand, my last_name=Huxton and there aren't many of those in the phone book, so if a lot of your data is Huxtons rather than Smiths then you might just want an index on last_name. I don't seem to grasp the full meaning of the above. Am I better of using several single field indices, or do mulitcolumn indices offer an advantage? If so in which case? Just made me wander... There's really no alternative to testing. The statistics tables are very useful here. Unless you have good reason not to, always turn the statistics gathering on, and take snapshot regularly to keep an eye on where PG is exerting the most effort. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [GENERAL] Large table search question[Scanned]
Thanks for the reply. I was afraid it would come down to testing each individual situation.ยต The table I mentioned (6 million+ records) actually is a phonebook. And searching and filtering is possible on almost any combination of fields. So there's an index on each individual field now and that's it. Works relatively fast now anyway. Regards. Richard Huxton wrote: Stijn Vanroye wrote: I don't want to but in, I just find this an interesting discussion and would like to add my 2 cents: I have read this in the manual: (PostgreSQL 7.4beta4 documentation, Chapter 11.3 Multicolumn Indexes) Qoute: Multicolumn indexes should be used sparingly. Most of the time, an index on a single column is sufficient and saves space and time. Indexes with more than three columns are unlikely to be helpful unless the usage of the table is extremely stylized. This makes me think of the usefullness in means of performance off multi-column indices. Furthermore it states that mulicolumn indeces will only be used by the planner if the fields of the index are used with the AND operator in the where clause of your select. (Same chapter). We had a table with 6million+ records and a few tests with explain reveiled that none of the multi-column indeces where actually used! This while regualar analyzes where done, and the data never changes (no mutations). Indeed - in many cases the additional costs of keeping a multi-column index up to date, and of reading it outweigh the benefits on the few queries that actually use them. Looking at John's example, if he defined an index (first_name, last_name) then ordering by last_name can't ever use that index. If you execute a lot of queries for last_name=Smith AND first_name=John then it might well help, there are a lot of Smiths to choose from. On the other hand, my last_name=Huxton and there aren't many of those in the phone book, so if a lot of your data is Huxtons rather than Smiths then you might just want an index on last_name. I don't seem to grasp the full meaning of the above. Am I better of using several single field indices, or do mulitcolumn indices offer an advantage? If so in which case? Just made me wander... There's really no alternative to testing. The statistics tables are very useful here. Unless you have good reason not to, always turn the statistics gathering on, and take snapshot regularly to keep an eye on where PG is exerting the most effort. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[GENERAL] Large table search question
Guys, I have a general question about designing databases for large data sets. I was speaking with a colleague about an application we're preparing to build. One of the application's tables will potentially contain 2 million or more names, containing (at least) the fields first_name, last_name, middle_name and prefix. A common lookup the application will require is the full name, so prefix + first_name + middle_name + last_name. My friend's suggestion was to create a lookup field in the table itself, which would contain a concatenation of these fields created during insert. So, for each record, we'd having each individual field and then a full_name field that would contain the combination of the ind. fields. His argument is that this will make lookups in this manner extremely fast and efficient. I agree with his assertion, but get the feeling that this is sort of an ugly design. Would a compound index on these fields really be less efficient? Thanks for your help! John ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] Large table search question
John Wells [EMAIL PROTECTED] writes: A common lookup the application will require is the full name, so prefix + first_name + middle_name + last_name. My friend's suggestion was to create a lookup field in the table itself, which would contain a concatenation of these fields created during insert. So, for each record, we'd having each individual field and then a full_name field that would contain the combination of the ind. fields. His argument is that this will make lookups in this manner extremely fast and efficient. Not unless you then add an index on that field, which would imply doubly redundant storage of the data (primary fields, lookup field, lookup field's index). You don't actually need the lookup field in Postgres: you can create the computed index directly. For instance create index fooi on foo ((first_name || middle_name || last_name)); select * from foo where (first_name || middle_name || last_name) = 'JohnQPublic'; This is still kinda grim on storage space, but at least it's 2x not 3x. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] Large table load (40 millon rows) - WAL hassles
Version: Postgres 7.1.2 A product we are developing requires frequent loading of a large number of rows into a table. We are using the copy file command, but with WAL we are effectively doubling the amount of disk writing we are doing. After the rows are loaded we do a create index. Is there a way to turn off WAL or otherwise speeding up the table loading process? Would db_restore be faster? The rows are loaded in sorted order. Does this impact index creation negatively or positively? We are currently working with test data but we estimate production data to be 6 - 9 billion rows. Is anyone else running with these volumes? We have a problem with 7.1.X where the WAL logs stay around longer than needed. Maybe someone has a patch with that fix. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Large table load (40 millon rows) - WAL hassles
Version: Postgres 7.1.2 A product we are developing requires frequent loading of a large number of rows into a table. We are using the copy file command, but with WAL we are effectively doubling the amount of disk writing we are doing. After the rows are loaded we do a create index. Is there a way to turn off WAL or otherwise speeding up the table loading process? Would db_restore be faster? The rows are loaded in sorted order. Does this impact index creation negatively or positively? We are currently working with test data but we estimate production data to be 6 - 9 billion rows. Is anyone else running with these volumes? Simon ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl