subject:"\[GENERAL\] large table"

[GENERAL] large table

2014-09-22 Thread Luke Coldiron

I am trying to figure out what would have caused a table in a PostgreSQL 8.4.16 
to get into a state where there is only 1 live tuple and has only ever had one 
1 tuple but the size of the table is huge.

CREATE TABLE public.myTable(  myColumn timestamp with time zone NOT NULL);

Note: there is no primary key or index on this table.
CREATE OR REPLACE FUNCTION public.myFunc()RETURNS VOID AS $$BEGIN   UPDATE 
public.myTable SET myColumn = CLOCK_TIMESTAMP();
   IF NOT FOUND THEN  INSERT INTO public.myTable(myColumn) VALUES 
(CLOCK_TIMESTAMP());   END IF;END;$$ LANGUAGE plpgsql VOLATILE STRICT SECURITY 
DEFINER;
CREATE OR REPLACE FUNCTION public.wasUpdated(OUT was_updated boolean)RETURNS 
BOOLEAN AS $$BEGIN   was_updated := COALESCE((SELECT myColumn FROM 
public.myTable)  (CLOCK_TIMESTAMP() - INTERVAL '5 SECOND'),
 FALSE);END;$$ LANGUAGE plpgsql VOLATILE STRICT 
SECURITY DEFINER;
SELECT *FROM pg_stat_all_tablesWHERE relname = 'myTable';


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  relid
  schemaname
  relname
  seq_scan
  seq_tup_read
  idx_scan
  idx_tup_fetch
  n_tup_ins
  n_tup_upd
  n_tup_del
  n_tup_hot_upd
  n_live_tup
  n_dead_tup
  last_vacuum
  last_autovacuum
  last_analyze
  last_autoanalyze
 
 
  16713
  public
  myTable
  3991833
  3992001
   
   
  0
  3775409
  0
  3771173
  949135
  183
   
  2014-09-18
  11:28:47.63545+00
   
  2014-09-18
  11:27:47.134432+00
 
The stats are very far off with n_live_tup at 949135 when there is only a 
single row in the table. Autovacuum appears to be running on a regular basis.
SELECT *
FROM pgstattuple('public.myTable');


 
 
 
 
 
 
 
 
 
 
  table_len
  tuple_count
  tuple_len
  tuple_percent
  dead_tuple_count
  dead_tuple_len
  dead_tuple_percent
  free_space
  free_precent
 
 
  34709504
  1
  32
  0
  105
  3360
  0.01
  30757308
  88.61
 
The actual size of the table is around 33 MB.
The myFunc function is called every 2.5 seconds and the wasUpdated function 
every 2 seconds by separate processes. 
I realize that running a FULL VACUUM or CLUSTER command on the table will 
resolve the issue but I am more interested in a root cause that explains why 
this table would end up in this state. I have tried to reproduce this issue by 
running the exact setup and have not been able to get the table to grow like 
this example. Any plausible cause'es or explanations would be much appreciated.
Luke

Re: [GENERAL] large table

2014-09-22 Thread Alan Hodgson

On Monday, September 22, 2014 11:17:05 AM Luke Coldiron wrote:
 The actual size of the table is around 33 MB.
 The myFunc function is called every 2.5 seconds and the wasUpdated function
 every 2 seconds by separate processes. I realize that running a FULL VACUUM
 or CLUSTER command on the table will resolve the issue but I am more
 interested in a root cause that explains why this table would end up in
 this state. I have tried to reproduce this issue by running the exact setup
 and have not been able to get the table to grow like this example. Any
 plausible cause'es or explanations would be much appreciated. Luke

I'd guess that some other process held a transaction open for a couple of 
week, and that prevented any vacuuming from taking place.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] large table

2014-09-22 Thread Bill Moran

On Mon, 22 Sep 2014 11:17:05 -0700
Luke Coldiron lukecoldi...@hotmail.com wrote:

 I am trying to figure out what would have caused a table in a PostgreSQL 
 8.4.16 to get into a state where there is only 1 live tuple and has only ever 
 had one 1 tuple but the size of the table is huge.
 
 CREATE TABLE public.myTable(  myColumn timestamp with time zone NOT NULL);
 
 Note: there is no primary key or index on this table.
 CREATE OR REPLACE FUNCTION public.myFunc()RETURNS VOID AS $$BEGIN   UPDATE 
 public.myTable SET myColumn = CLOCK_TIMESTAMP();
IF NOT FOUND THEN  INSERT INTO public.myTable(myColumn) VALUES 
 (CLOCK_TIMESTAMP());   END IF;END;$$ LANGUAGE plpgsql VOLATILE STRICT 
 SECURITY DEFINER;
 CREATE OR REPLACE FUNCTION public.wasUpdated(OUT was_updated boolean)RETURNS 
 BOOLEAN AS $$BEGIN   was_updated := COALESCE((SELECT myColumn FROM 
 public.myTable)  (CLOCK_TIMESTAMP() - INTERVAL '5 SECOND'),  
FALSE);END;$$ LANGUAGE plpgsql VOLATILE 
 STRICT SECURITY DEFINER;
 SELECT *FROM pg_stat_all_tablesWHERE relname = 'myTable';

[snip]

 The actual size of the table is around 33 MB.
 The myFunc function is called every 2.5 seconds and the wasUpdated function 
 every 2 seconds by separate processes. 
 I realize that running a FULL VACUUM or CLUSTER command on the table will 
 resolve the issue but I am more interested in a root cause that explains why 
 this table would end up in this state. I have tried to reproduce this issue 
 by running the exact setup and have not been able to get the table to grow 
 like this example. Any plausible cause'es or explanations would be much 
 appreciated.
  
The UPDATE in myFunc() creates a new row every 2.5 seconds when it updates the 
row.
The data from those rows is only reclaimed when a vacuum is run.  So (for 
example)
if autovacuum only triggers a vacuum every 250 seconds, there will be 249 rows 
worth
of space in the table, on average.  With the other process querying the table, 
it's
possible that the row that it's looking at will be a something that _should_ be
reclaimable, so vacuum may not clear up all the free space.

As far as running the exact setup: if you're not getting the same results, then
your setup isn't exactly the same.  It's likely that there are things going on 
in the
setup you're curious about that you're not aware of, such as additional queries 
on
the table, additional load that causes operations to take a little longer, thus
resulting in different overlap of competing operations, etc.

Keep in mind that a short-lived incident might have resulted in table bloat that
won't be reclaimed by autovacuum.  I.e., if autovacuum wasn't running for a 
while,
this table would just keep bloating; then when you start autovacuum, it will
maintain the table size, but it won't get any smaller.

I can't make any sense of the data you provided, it's all on seperate rows and 
I've
given up on trying to figure out what number goes with which value, so I don't 
know
exactly what the situation is.  It's likely that you can improve on the 
situation
by tweaking the autovacuum settings for this table to vacuum it more 
aggressively.

Although, you don't seem to have a _problem_ that you've stated.  Are you seeing
performance issues?  Is 33M too much data and filling up the drive (not being
sarcastic here, as there are various mobile applications where 33M could be
important, even now).  Because, if this isn't actually causing any problems, I
wouldn't really worry about it.

-- 
Bill Moran
I need your help to succeed:
http://gamesbybill.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] large table

2014-09-22 Thread Luke Coldiron

  From: ahodg...@simkin.ca
  To: pgsql-general@postgresql.org
  Subject: Re: [GENERAL] large table
  Date: Mon, 22 Sep 2014 11:34:45 -0700

  On Monday, September 22, 2014 11:17:05 AM Luke Coldiron wrote:
   The actual size of the table is around 33 MB.
   The myFunc function is called every 2.5 seconds and the wasUpdated 
   function
   every 2 seconds by separate processes. I realize that running a FULL 
   VACUUM
   or CLUSTER command on the table will resolve the issue but I am more
   interested in a root cause that explains why this table would end up in
   this state. I have tried to reproduce this issue by running the exact 
   setup
   and have not been able to get the table to grow like this example. Any
   plausible cause'es or explanations would be much appreciated. Luke

  I'd guess that some other process held a transaction open for a couple of 
  week, and that prevented any vacuuming from taking place.

Interesting idea, on the surface I'm not sure how this would have happened in 
the system but I can certainly explore forcing this to happen and see if the 
result is similar.
  -- 
  Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] large table

2014-09-22 Thread Luke Coldiron

 Date: Mon, 22 Sep 2014 14:38:52 -0400
 From: wmo...@potentialtech.com
 To: lukecoldi...@hotmail.com
 CC: pgsql-general@postgresql.org
 Subject: Re: [GENERAL] large table

 On Mon, 22 Sep 2014 11:17:05 -0700
 Luke Coldiron lukecoldi...@hotmail.com wrote:

  I am trying to figure out what would have caused a table in a PostgreSQL 
  8.4.16 to get into a state where there is only 1 live tuple and has only 
  ever had one 1 tuple but the size of the table is huge.

  CREATE TABLE public.myTable(  myColumn timestamp with time zone NOT NULL);

  Note: there is no primary key or index on this table.
  CREATE OR REPLACE FUNCTION public.myFunc()RETURNS VOID AS $$BEGIN   UPDATE 
  public.myTable SET myColumn = CLOCK_TIMESTAMP();
 IF NOT FOUND THEN  INSERT INTO public.myTable(myColumn) VALUES 
  (CLOCK_TIMESTAMP());   END IF;END;$$ LANGUAGE plpgsql VOLATILE STRICT 
  SECURITY DEFINER;
  CREATE OR REPLACE FUNCTION public.wasUpdated(OUT was_updated 
  boolean)RETURNS BOOLEAN AS $$BEGIN   was_updated := COALESCE((SELECT 
  myColumn FROM public.myTable)  (CLOCK_TIMESTAMP() - INTERVAL '5 SECOND'),  
 FALSE);END;$$ LANGUAGE 
  plpgsql VOLATILE STRICT SECURITY DEFINER;
  SELECT *FROM pg_stat_all_tablesWHERE relname = 'myTable';

 [snip]

  The actual size of the table is around 33 MB.
  The myFunc function is called every 2.5 seconds and the wasUpdated function 
  every 2 seconds by separate processes. 
  I realize that running a FULL VACUUM or CLUSTER command on the table will 
  resolve the issue but I am more interested in a root cause that explains 
  why this table would end up in this state. I have tried to reproduce this 
  issue by running the exact setup and have not been able to get the table to 
  grow like this example. Any plausible cause'es or explanations would be 
  much appreciated.

 The UPDATE in myFunc() creates a new row every 2.5 seconds when it updates 
 the row.
 The data from those rows is only reclaimed when a vacuum is run.  So (for 
 example)
 if autovacuum only triggers a vacuum every 250 seconds, there will be 249 
 rows worth
 of space in the table, on average.  With the other process querying the 
 table, it's
 possible that the row that it's looking at will be a something that _should_ 
 be
 reclaimable, so vacuum may not clear up all the free space.

 As far as running the exact setup: if you're not getting the same results, 
 then
 your setup isn't exactly the same.  It's likely that there are things going 
 on in the
 setup you're curious about that you're not aware of, such as additional 
 queries on
 the table, additional load that causes operations to take a little longer, 
 thus
 resulting in different overlap of competing operations, etc.
It is possible and that is part of what I am trying to discover however I am 
very familiar with the system / code base and in this case there is a single 
process updating the timestamp and a single process reading the timestamp. 
There are no other user processes programmed to interact with this table 
outside of potentially what Postgres is doing.

 Keep in mind that a short-lived incident might have resulted in table bloat 
 that
 won't be reclaimed by autovacuum.  I.e., if autovacuum wasn't running for a 
 while,
 this table would just keep bloating; then when you start autovacuum, it will
 maintain the table size, but it won't get any smaller.
I thought this as well and have run tests with autovacuum turned off and I 
don't see this issue occur over my 1000s of updates. The updates become hot 
updates and reuse dead tuples.

 I can't make any sense of the data you provided, it's all on seperate rows 
 and I've
 given up on trying to figure out what number goes with which value, so I 
 don't know
 exactly what the situation is.  It's likely that you can improve on the 
 situation
 by tweaking the autovacuum settings for this table to vacuum it more 
 aggressively.
Sorry about that the email client that I am using messed up the formatting. 
Here is another attempt.
SELECT *

FROM pg_stat_all_tables

WHERE relname = 'myTable';
relid   schemaname  relname seq_scanseq_tup_readidx_scan
idx_tup_fetch   n_tup_ins   n_tup_upd   n_tup_del   n_tup_hot_upd   
n_live_tup  n_dead_tup  last_vacuum last_autovacuum last_analyze
last_autoanalyze16713   public  myTable 3995023 3995296 0   
3778598 0   3774362 949135  124 2014-09-18 11:28:47.63545+00
2014-09-18 11:27:47.134432+00
SELECT * FROM
pgstattuple('public.myTable');
table_len   tuple_count tuple_len   tuple_percent   
dead_tuple_countdead_tuple_len  dead_tuple_percent  free_space  
free_precent347095041   32  0   105 33600.01
3075730888.61
So far having autovacuum on or off has not caused the problem to occur. 
Originally I was thinking

Re: [GENERAL] large table

2014-09-22 Thread John R Pierce


On 9/22/2014 12:33 PM, Luke Coldiron wrote:


It is possible and that is part of what I am trying to discover 
however I am very familiar with the system / code base and in this 
case there is a single process updating the timestamp and a single 
process reading the timestamp. There are no other user processes 
programmed to interact with this table outside of potentially what 
Postgres is doing.


ANY other connection to the same postgres server, even to a different 
database, that has an open long running transaction (most frequently, 
Idle In Transaction) will block autovacuum from marking the old tuples 
as reusable.




--
john r pierce  37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] large table

2014-09-22 Thread Eduardo Morras

On Mon, 22 Sep 2014 12:15:27 -0700
Luke Coldiron lukecoldi...@hotmail.com wrote:

   I'd guess that some other process held a transaction open for a
   couple of week, and that prevented any vacuuming from taking
   place.
   
 Interesting idea, on the surface I'm not sure how this would have
 happened in the system but I can certainly explore forcing this to
 happen and see if the result is similar.

It happened when I developed with Java+Hibernate. It opened a transaction and 
made a lot of inserts and deletes while the app run. It created GB size tables 
with few rows and a permament 'IDLE in TRANSACTION' stops any autovacuum.

---   ---
Eduardo Morras emorr...@yahoo.es


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] large table

2014-09-22 Thread Bill Moran

On Mon, 22 Sep 2014 12:46:21 -0700
John R Pierce pie...@hogranch.com wrote:

 On 9/22/2014 12:33 PM, Luke Coldiron wrote:
 
  It is possible and that is part of what I am trying to discover 
  however I am very familiar with the system / code base and in this 
  case there is a single process updating the timestamp and a single 
  process reading the timestamp. There are no other user processes 
  programmed to interact with this table outside of potentially what 
  Postgres is doing.
 
 ANY other connection to the same postgres server, even to a different 
 database, that has an open long running transaction (most frequently, 
 Idle In Transaction) will block autovacuum from marking the old tuples 
 as reusable.

As a possibility, I've seen this happen when people connected to the DB
using various GUI tools (can't remember the exact one where we saw this)
that started and held open a transaction without the user realizing it.
This prevented autovacuum from getting any useful work done until our
Nagios monitoring detected the idle transaction and an engineer tracked
down who was doing it and had them close the program.  IMHO, too many
GUI tools make it too easy to do something without realizing the
consequences.

On a related note, I'm curious as to how an open transaction affects HOT
updates (if at all).  This is an area of behavior I have little experience
with to date.

-- 
Bill Moran
I need your help to succeed:
http://gamesbybill.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] large table

2014-09-22 Thread Luke Coldiron

 Date: Mon, 22 Sep 2014 12:46:21 -0700
 From: pie...@hogranch.com
 To: pgsql-general@postgresql.org
 Subject: Re: [GENERAL] large table

 On 9/22/2014 12:33 PM, Luke Coldiron wrote:

  It is possible and that is part of what I am trying to discover 
  however I am very familiar with the system / code base and in this 
  case there is a single process updating the timestamp and a single 
  process reading the timestamp. There are no other user processes 
  programmed to interact with this table outside of potentially what 
  Postgres is doing.

 ANY other connection to the same postgres server, even to a different 
 database, that has an open long running transaction (most frequently, 
 Idle In Transaction) will block autovacuum from marking the old tuples 
 as reusable.

Good point, I wasn't thinking about this as a possibility. This is a very good 
possibility considering the behavior of the rest of the system.

 -- 
 john r pierce  37N 122W
 somewhere on the middle of the left coast

 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] large table starting sequence scan because of default_statistic_target

2009-03-16 Thread Janning Vygen

Hi,

we are running a large 8.3 database and had some trouble with a default 
statistic target. We had set it to one special table some time ago, when we 
got a problem with a growing table starting with sequence scans.

Last week we did manually cluster this table (create table as ... order by; 
drop table orig, rename table temp to orig ). Of course the statistic target 
was dropped and we did not remember to set it again.

Why does default_statistic_target defaults to 10?

The documentation tells me, the only drawback is a longer ANALYZE run. we are 
setting it to 100 in postgresql.conf and we did not see a much longer run of 
ANALYZE. Of course, smaller tables won't need a setting of 100. But small 
tables are usually not very interesting when it comes to performance. 

With a setting of 10 you run into difficult problems if your table grows. 
Suddenly an execution plan changes and you get sequence scans on your largest 
table! We had such problems and it was annoying to have a real slow down just 
because of this minor configuration parameter.

I suggest to setting it to 100 by default:
- no problems for small installations
- no problems for DBA who always adjust their system in every possible way.
- no problems for growing databases with unequal distributed data

But maybe there are some other reasons not setting it to a higher value. If 
so, please tell me.

kind regards
Janning

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] large table starting sequence scan because of default_statistic_target

2009-03-16 Thread Scott Marlowe

On Mon, Mar 16, 2009 at 7:15 AM, Janning Vygen vy...@kicktipp.de wrote:
 Hi,

 Why does default_statistic_target defaults to 10?

 I suggest to setting it to 100 by default:

Already done in 8.4

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] large table starting sequence scan because of default_statistic_target

2009-03-16 Thread Janning Vygen

On Monday 16 March 2009 15:13:51 Scott Marlowe wrote:
 On Mon, Mar 16, 2009 at 7:15 AM, Janning Vygen vy...@kicktipp.de wrote:
  Hi,
 
  Why does default_statistic_target defaults to 10?
 
  I suggest to setting it to 100 by default:

 Already done in 8.4

GREAT! sorry for not searching the archives or changelog before. 

Janning

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] large table starting sequence scan because of default_statistic_target

2009-03-16 Thread Scott Marlowe

On Mon, Mar 16, 2009 at 8:24 AM, Janning Vygen vy...@kicktipp.de wrote:
 On Monday 16 March 2009 15:13:51 Scott Marlowe wrote:
 On Mon, Mar 16, 2009 at 7:15 AM, Janning Vygen vy...@kicktipp.de wrote:
  Hi,
 
  Why does default_statistic_target defaults to 10?
 
  I suggest to setting it to 100 by default:

 Already done in 8.4

 GREAT! sorry for not searching the archives or changelog before.

Hey, no problem, there's plenty of new stuff coming up in 8.4, and a
lot of it doesn't show up on the general list anyway.  This just makes
sure a few more people know about this change.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] large table vacuum issues

2008-01-06 Thread Usama Dar

On Jan 5, 2008 5:38 AM, Ed L. [EMAIL PROTECTED] wrote:

 We need some advice on how to handle some large table autovacuum
 issues.  One of our 8.1.2 autovacuums is launching a DB-wide
 vacuum on our 270GB database to prevent xid wrap-around, but is
 getting hung-up and/or bogged down for hours on a 40gb table and
 taking the server performance down with it, apparently due to an
 IO bottleneck.  The autovac child process becomes completely
 unresponsive to SIGTERM/SIGINT; only a sigkill restart with
 disabling the autovac daemon gets us back to adequate
 performance for now.


Looks like you haven't been vacuuming for a while , have you? because it
seems the autovac was disabled but was invoked forcefully to avoid
wraparound. If infact the wraparound happens you will lose data. When
autovacuum is processing a table it wouldn't take more time than what a
normal vacuum would take.

What might help you really is a temporary increase in maint work memory,
whats your current setting? how much RAM do you have?, if you can afford
more memory, increase it to significantly high value to help speed up the
vacuum process. I understand it might impact some other system activity but
you need a vacuum and fast, before you lose all data.

You need to get rid of dead rows first and then have a healthy vacuuming
schedule , either a daily cron job or autovac, setup as your workload .
Prevention is ofcourse better than the cure. ;)

 Thanks,
-- 
Usama Munir Dar http://www.linkedin.com/in/usamadar
Consultant Architect
Cell:+92 321 5020666
Skype: usamadar

Re: [GENERAL] large table vacuum issues

2008-01-05 Thread Bill Moran

Ed L. [EMAIL PROTECTED] wrote:

 On Friday 04 January 2008 6:21 pm, Scott Marlowe wrote:
  On Jan 4, 2008 6:38 PM, Ed L. [EMAIL PROTECTED] wrote:
   We need some advice on how to handle some large table
   autovacuum issues.  One of our 8.1.2
 
  First of all, update your 8.1 install to 8.1.10.  Failing to
  keep up with bug fixes is negligent.  who knows, you might be
  getting bitten by a bug that was fixed between 8.1.2 and
  8.1.10
 
 Could be.  But like you said, who knows.  In some environments, 
 downtime for upgrading costs money (and more), too, sometimes 
 even enough to make it negligent to take downtime to keep up 
 with bug fixes (and of course, the new bugs) which may or may 
 not be a factor at hand.

Upgrades along the 8.1.x branch take something on the order of
5 minutes (if you're meticulous and serialize the process).

If you haven't set yourself up so you can schedule 5 minutes of
downtime once a month or so, then the negligence occurred much
earlier than at the failure to upgrade.

 While the time required to restart a 
 DB may be neglible, there are often upstream/downstream 
 dependencies that greatly expand the actual downtime for the 
 customer.

Like what?  The point to the double-dot branch is that upgrades
don't affect dependencies.

 How much would downtime need to cost before you 
 thought it negligent to upgrade immediately?  It's a tradeoff, 
 not well-supported by simple pronouncements, one the customer 
 and provider are best qualified to make.

Not really.  Unscheduled downtime is _always_ more expensive than
scheduled downtime.  Scheduled downtime isn't going to put you in
breach of contract if you've got an uptime guarantee.

If you're really in a situation where you need 100% uptime, then
you're still negligent for not having something like Slony to allow
you to switch production to another server so you can alternate
maintenance between the two.

This is something along the RAID 5 argument, no matter how you argue
it, it's a bad idea.  If you claim you can't afford to buy more hardware,
then you made a mistake in pricing out your product to your client.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

[GENERAL] large table vacuum issues

2008-01-04 Thread Ed L.

We need some advice on how to handle some large table autovacuum 
issues.  One of our 8.1.2 autovacuums is launching a DB-wide 
vacuum on our 270GB database to prevent xid wrap-around, but is 
getting hung-up and/or bogged down for hours on a 40gb table and 
taking the server performance down with it, apparently due to an 
IO bottleneck.  The autovac child process becomes completely 
unresponsive to SIGTERM/SIGINT; only a sigkill restart with 
disabling the autovac daemon gets us back to adequate 
performance for now.

We are discussing how to partition the table (difficult due to 
existing foreign keys in other tables), and archiving/clearing 
data.

Are there any other tricks to get it past this large table for 
the time being and still get the xid wraparound fix?

TIA.

Ed

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] large table vacuum issues

2008-01-04 Thread Scott Marlowe

On Jan 4, 2008 6:38 PM, Ed L. [EMAIL PROTECTED] wrote:
 We need some advice on how to handle some large table autovacuum
 issues.  One of our 8.1.2

First of all, update your 8.1 install to 8.1.10.  Failing to keep up
with bug fixes is negligent.  who knows, you might be getting bitten
by a bug that was fixed between 8.1.2 and 8.1.10

 autovacuums is launching a DB-wide
 vacuum on our 270GB database to prevent xid wrap-around, but is
 getting hung-up and/or bogged down for hours on a 40gb table and
 taking the server performance down with it, apparently due to an
 IO bottleneck.

Have you tried adjusting the

#vacuum_cost_delay = 0  # 0-1000 milliseconds
#vacuum_cost_page_hit = 1   # 0-1 credits
#vacuum_cost_page_miss = 10 # 0-1 credits
#vacuum_cost_page_dirty = 20# 0-1 credits
#vacuum_cost_limit = 200# 0-1 credits

settings to something so as to make vacuum less intrusive?  might be
the easiest fix.

 Are there any other tricks to get it past this large table for
 the time being and still get the xid wraparound fix?

the other trick would be to do a dump / restore of your whole db,
which can often be quicker than vacuuming it if it's got a lot of dead
tuples in it.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] large table vacuum issues

2008-01-04 Thread Ed L.

On Friday 04 January 2008 6:21 pm, Scott Marlowe wrote:

 Have you tried adjusting the

 #vacuum_cost_delay = 0  # 0-1000 milliseconds
 #vacuum_cost_page_hit = 1   # 0-1 credits
 #vacuum_cost_page_miss = 10 # 0-1 credits
 #vacuum_cost_page_dirty = 20# 0-1 credits
 #vacuum_cost_limit = 200# 0-1 credits

 settings to something so as to make vacuum less intrusive? 
 might be the easiest fix.

Any particular suggested changes for these parameters?

Ed

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] large table vacuum issues

2008-01-04 Thread Ed L.

On Friday 04 January 2008 6:21 pm, Scott Marlowe wrote:
 On Jan 4, 2008 6:38 PM, Ed L. [EMAIL PROTECTED] wrote:
  We need some advice on how to handle some large table
  autovacuum issues.  One of our 8.1.2

 First of all, update your 8.1 install to 8.1.10.  Failing to
 keep up with bug fixes is negligent.  who knows, you might be
 getting bitten by a bug that was fixed between 8.1.2 and
 8.1.10

Could be.  But like you said, who knows.  In some environments, 
downtime for upgrading costs money (and more), too, sometimes 
even enough to make it negligent to take downtime to keep up 
with bug fixes (and of course, the new bugs) which may or may 
not be a factor at hand.  While the time required to restart a 
DB may be neglible, there are often upstream/downstream 
dependencies that greatly expand the actual downtime for the 
customer.  How much would downtime need to cost before you 
thought it negligent to upgrade immediately?  It's a tradeoff, 
not well-supported by simple pronouncements, one the customer 
and provider are best qualified to make.

Ed

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] large table vacuum issues

2008-01-04 Thread Scott Marlowe

On Jan 4, 2008 7:41 PM, Ed L. [EMAIL PROTECTED] wrote:
 On Friday 04 January 2008 6:21 pm, Scott Marlowe wrote:
  On Jan 4, 2008 6:38 PM, Ed L. [EMAIL PROTECTED] wrote:
   We need some advice on how to handle some large table
   autovacuum issues.  One of our 8.1.2
 
  First of all, update your 8.1 install to 8.1.10.  Failing to
  keep up with bug fixes is negligent.  who knows, you might be
  getting bitten by a bug that was fixed between 8.1.2 and
  8.1.10

 Could be.  But like you said, who knows.  In some environments,
 downtime for upgrading costs money (and more), too, sometimes
 even enough to make it negligent to take downtime to keep up
 with bug fixes (and of course, the new bugs) which may or may
 not be a factor at hand.  While the time required to restart a
 DB may be neglible, there are often upstream/downstream
 dependencies that greatly expand the actual downtime for the
 customer.  How much would downtime need to cost before you
 thought it negligent to upgrade immediately?  It's a tradeoff,
 not well-supported by simple pronouncements, one the customer
 and provider are best qualified to make.

And how much would downtime cost you if your database got corrupted by
those bugs and you had to restore from backups?  You can use something
like slony and a two server setup to reduce the downtime to mere
seconds.  I know about downtime and its costs, I work at an airline
reservation company.  Even we have scheduled maintenance windows,
albeit few and far between.  I find it hard to believe you've had none
since 8.1.2 came out.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] large table vacuum issues

2008-01-04 Thread Joshua D. Drake


Ed L. wrote:

On Friday 04 January 2008 6:21 pm, Scott Marlowe wrote:

On Jan 4, 2008 6:38 PM, Ed L. [EMAIL PROTECTED] wrote:

We need some advice on how to handle some large table
autovacuum issues.  One of our 8.1.2

First of all, update your 8.1 install to 8.1.10.  Failing to
keep up with bug fixes is negligent.  who knows, you might be
getting bitten by a bug that was fixed between 8.1.2 and
8.1.10


Could be.  But like you said, who knows.  In some environments, 
downtime for upgrading costs money (and more), too, sometimes 
even enough to make it negligent to take downtime to keep up 
with bug fixes (and of course, the new bugs) which may or may 
not be a factor at hand.  While the time required to restart a 
DB may be neglible, there are often upstream/downstream 
dependencies that greatly expand the actual downtime for the 
customer.  How much would downtime need to cost before you 
thought it negligent to upgrade immediately?  It's a tradeoff, 
not well-supported by simple pronouncements, one the customer 
and provider are best qualified to make.


You make a valid argument above but you forget a couple of minor points.

How much money does it cost when your customer:

* gets sued for a breech of security because they couldn't afford a 30 
minute downtime at 3am? (I assume 30 minutes only because you do need to 
shutdown external services).


* looses all there data because of a corner case function they are 
running that causes pages to become corrupt?


Just curious...

Sincerely,

Joshua D. Drake



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] large table vacuum issues

2008-01-04 Thread Scott Marlowe

On Jan 4, 2008 7:29 PM, Ed L. [EMAIL PROTECTED] wrote:
 On Friday 04 January 2008 6:21 pm, Scott Marlowe wrote:
 
  Have you tried adjusting the
 
  #vacuum_cost_delay = 0  # 0-1000 milliseconds
  #vacuum_cost_page_hit = 1   # 0-1 credits
  #vacuum_cost_page_miss = 10 # 0-1 credits
  #vacuum_cost_page_dirty = 20# 0-1 credits
  #vacuum_cost_limit = 200# 0-1 credits
 
  settings to something so as to make vacuum less intrusive?
  might be the easiest fix.

 Any particular suggested changes for these parameters?

Well, it really depends on your I/O subsystem, but a good start would
be to read this section of the manual:

http://www.postgresql.org/docs/8.2/interactive/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-VACUUM-COST

I'd start setting the delay to 10 or 20 and seeing if vacuuming has a
low enough impact to allow it to run in the background, even during
peak hours.

Keep an eye on vmstat / iostat output while vacuum is running to see
if you're flooding your I/O or not.

note that there's a whole other set of vars for autovacuum (at least
in 8.2, don't know about 8.1) that you can set so that regular vacuums
can happen with greater or less priority than autovacuuming.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] large table problem

2007-05-03 Thread Jason Nerothin


Thanks for the redirect... After profiling my client memory usage and using
the built-in cursor functionality I discovered that another part of my
program was causing the memory overflow and that the ResultSet iteration was
doing exactly what it should have all along.

On 4/21/07, Kris Jurka [EMAIL PROTECTED] wrote:




On Fri, 20 Apr 2007, Jason Nerothin wrote:

 I'm trying to work my way around a large query problem.

 Not too unexpectedly, the app server (EJB3/JPA) is choking on the
queries
 which are unnamed native queries in Java parliance. Work-around
attempt 1
 was to call directly to the JDBC driver, but the cursor doesn't dispose
of
 the memory in the ResultSet once I've passed it by (OutOfMemoryError)
and
 the documentation suggests that cursor behavior is a little buggy for
the
 current postgres driver. (The docs suggest implementing a custom stored
 procedure to provide iteration.)

I'm not sure what documentation you're reading:

http://jdbc.postgresql.org/documentation/82/query.html#query-with-cursor

and it works as adverstised.

Kris Jurka





--

Jason Nerothin
Programmer/Analyst IV - Database Administration
UCLA-DOE Institute for Genomics  Proteomics
Howard Hughes Medical Institute

611 C.E. Young Drive East   | Tel: (310) 206-3907
105 Boyer Hall, Box 951570  | Fax: (310) 206-3914
Los Angeles, CA 90095. USA | Mail: [EMAIL PROTECTED]

http://www.mbi.ucla.edu/~jason

Re: [GENERAL] large table problem

2007-04-21 Thread Kris Jurka




On Fri, 20 Apr 2007, Jason Nerothin wrote:


I'm trying to work my way around a large query problem.

Not too unexpectedly, the app server (EJB3/JPA) is choking on the queries
which are unnamed native queries in Java parliance. Work-around attempt 1
was to call directly to the JDBC driver, but the cursor doesn't dispose of
the memory in the ResultSet once I've passed it by (OutOfMemoryError) and
the documentation suggests that cursor behavior is a little buggy for the
current postgres driver. (The docs suggest implementing a custom stored
procedure to provide iteration.)


I'm not sure what documentation you're reading:

http://jdbc.postgresql.org/documentation/82/query.html#query-with-cursor

and it works as adverstised.

Kris Jurka

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[GENERAL] large table problem

2007-04-20 Thread Jason Nerothin


I'm trying to work my way around a large query problem.

In my system, I've created a number of large materialized views that are the
output of some computationally expensive stored procedures on other large
tables in my system. They are intended to serve as staging tables for the
next phase of computation, such that I can call (the presumably efficient):

select * from my_mv_table;

and then have at the data on the other side of a Java application server.

These tables range anywhere from 400,000 to 32,000,000 rows, though
individual records are only 5 integers wide.

Not too unexpectedly, the app server (EJB3/JPA) is choking on the queries
which are unnamed native queries in Java parliance. Work-around attempt 1
was to call directly to the JDBC driver, but the cursor doesn't dispose of
the memory in the ResultSet once I've passed it by (OutOfMemoryError) and
the documentation suggests that cursor behavior is a little buggy for the
current postgres driver. (The docs suggest implementing a custom stored
procedure to provide iteration.) Attempt number 2, now underway, is to pass
LIMIT and OFFSET values to the query which Postgres handles quite
effectively as long as the OFFSET value is less than the total number of
rows in the table. When the value is greater than num_rows, the query
hangs for minutes.

So my question is, does Postgres keep any metadata around about un-indexed
table sizes? select count(*) from my_table itself can take a minute to
process. If I had ready access to that information, I could kluge up my code
with something like:

num_rows = getNumRows();
while( offset  num_rows ){
   processData( select( offset += window_size, window_size ) )
}

At the moment the best option I have is to write a stored proceedure to
populate a table mv_sizes, but not only is this a pain in the patoot, it
just seems sick and wrong.

Am I missing something painfully obvious?

Jason
--

Jason Nerothin
Programmer/Analyst IV - Database Administration
UCLA-DOE Institute for Genomics  Proteomics
Howard Hughes Medical Institute

611 C.E. Young Drive East   | Tel: (310) 206-3907
105 Boyer Hall, Box 951570  | Fax: (310) 206-3914
Los Angeles, CA 90095. USA | Mail: [EMAIL PROTECTED]

http://www.mbi.ucla.edu/~jason

Re: [GENERAL] large table problem

2007-04-20 Thread Tom Lane

Jason Nerothin [EMAIL PROTECTED] writes:
 Attempt number 2, now underway, is to pass
 LIMIT and OFFSET values to the query which Postgres handles quite
 effectively as long as the OFFSET value is less than the total number of
 rows in the table. When the value is greater than num_rows, the query
 hangs for minutes.

I don't actually believe the above; using successively larger offsets
should get slower and slower in a smooth manner, because the only thing
OFFSET does is throw away scanned rows just before they would have been
returned to the client.  I think you've confused yourself somehow.

 the documentation suggests that cursor behavior is a little buggy for the
 current postgres driver.

How old a driver are you using?  Because a cursor is definitely what you
want to use for retrieving millions of rows.

It strikes me that pgsql-jdbc might be a more suitable group of people
to ask about this than the -general list ...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

[GENERAL] Large Table Performance

2005-10-22 Thread David Busby


List,
  I've got a problem where I need to make a table that is going to grow by an average of 230,000 records per day. 
There are only 20 columns in the table, mostly char and integer.  It's FK'd in two places to another table for 
import/export transaction id's and I have a serial primary key and an index on a date column for when I need to search 
(every search is done inside a date range).  I thought it would be OK but after a few weeks of operation I have more 
than five million records in there.  Some queries take more than five minutes to complete and I'm sad about that.  How 
can I make this faster?  I could munge dates into integers if their faster, I'm OK with that.  What can I tweak in the 
configuration file to speed things up?  What about some drastic schema change that more experience would have shown me? 
 I cannot show the full schema but it's like this:


-- My import/export data information table
ie_data (id serial primary key, date date, [12 other columns here])

big_transaction_table(id serial primary key, import_id int w/FK, export_id int w/FK, date date, [20 other necessary 
transaction detail columns])


So when I say
select x,y,z from big_transaction_table where date='10/2/2005' and date='10/4/2005' and transaction_status in (1,2,3) 
order by date;

it takes five+ minutes.

TIA for any suggestions.

/djb

---(end of broadcast)---
TIP 6: explain analyze is your friend

[GENERAL] Large Table Performance

2005-10-21 Thread Edoceo Lists


List,
  I've got a problem where I need to make a table that is going to grow by an average of 230,000 records per day. There 
are only 20 columns in the table, mostly char and integer.  It's FK'd in two places to another table for import/export 
transaction id's and I have a serial primary key and an index on a date column for when I need to search (every search 
is done inside a date range).  I thought it would be OK but after a few weeks of operation I have more than five million 
records in there.  Some queries take more than five minutes to complete and I'm sad about that.  How can I make this 
faster?  I could munge dates into integers if their faster, I'm OK with that.  What can I tweak in the configuration 
file to speed things up?  What about some drastic schema change that more experience would have shown me?  I cannot show 
the full schema but it's like this:


-- My import/export data information table
ie_data (id serial primary key, date date, [12 other columns here])

big_transaction_table(id serial primary key, import_id int w/FK, export_id int w/FK, date date, [20 other necessary 
transaction detail columns])


So when I say
select x,y,z from big_transaction_table where date='10/2/2005' and date='10/4/2005' and transaction_status in (1,2,3) 
order by date;

it takes five+ minutes.

TIA for any suggestions.

/djb

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Large Table Performance

2005-10-21 Thread Michael Fuhr

On Fri, Oct 21, 2005 at 05:25:22PM -0700, Edoceo Lists wrote:

[summary of situation]

 Some queries take more than five minutes to complete and I'm sad
 about that.  How can I make this faster?

You might get more help on pgsql-performance, which is specifically
for discussions of performance issues.

 I could munge dates into integers if their faster, I'm OK with that.

Let's identify the problem before thinking about possible solutions.

 What can I tweak in the configuration file to speed things up?

What version of PostgreSQL are you using?  What operating system?
What kind of hardware (CPU, disks, amount of memory, etc.)?  Are
you regularly vacuuming and analyzing all tables in the database?
Have you tuned your configuration at all?  What are the following
settings?

shared_buffers
work_mem (8.x) or sort_mem (7.x)
effective_cache_size
random_page_cost

You might want to read through a tuning guide like this one:

http://www.powerpostgresql.com/PerfList

 So when I say
 select x,y,z from big_transaction_table where date='10/2/2005' and 
 date='10/4/2005' and transaction_status in (1,2,3) order by date;
 it takes five+ minutes.

Please post the EXPLAIN ANALYZE output (not just EXPLAIN) of this
query; that'll show us what the query planner is doing and how
accurate its row count estimates are.  If the row counts are way
off then you might see an improvement by increasing the statistics
target for the date and maybe the transaction_status columns.  Or
you might just need to analyze the table to update its statistics,
and possibly vacuum it to get rid of dead tuples.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Large Table Performance

2005-10-21 Thread Alex Stapleton



On 22 Oct 2005, at 01:25, Edoceo Lists wrote:


List,
  I've got a problem where I need to make a table that is going to  
grow by an average of 230,000 records per day. There are only 20  
columns in the table, mostly char and integer.  It's FK'd in two  
places to another table for import/export transaction id's and I  
have a serial primary key and an index on a date column for when I  
need to search (every search is done inside a date range).  I  
thought it would be OK but after a few weeks of operation I have  
more than five million records in there.  Some queries take more  
than five minutes to complete and I'm sad about that.  How can I  
make this faster?  I could munge dates into integers if their  
faster, I'm OK with that.  What can I tweak in the configuration  
file to speed things up?  What about some drastic schema change  
that more experience would have shown me?  I cannot show the full  
schema but it's like this:


-- My import/export data information table
ie_data (id serial primary key, date date, [12 other columns here])

big_transaction_table(id serial primary key, import_id int w/FK,  
export_id int w/FK, date date, [20 other necessary transaction  
detail columns])


So when I say
select x,y,z from big_transaction_table where date='10/2/2005' and  
date='10/4/2005' and transaction_status in (1,2,3) order by date;

it takes five+ minutes.

TIA for any suggestions.


What hardware are you on? What query plans (output from explain) do  
your queries give you? What PG version?


We do about 100,000 rows a minute (300 MB+) a day so I suspect your  
queries are doing full table scans or something. Of course we don't  
use any FKs so I suppose they could be biting you.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] Large table search question

2004-06-01 Thread Richard Huxton

John Wells wrote:
Guys,
I have a general question about designing databases for large data sets.
I was speaking with a colleague about an application we're preparing to
build.  One of the application's tables will potentially contain 2 million
or more names, containing (at least) the fields first_name, last_name,
middle_name and prefix.
A common lookup the application will require is the full name, so prefix +
first_name + middle_name + last_name.
My friend's suggestion was to create a lookup field in the table itself,
which would contain a concatenation of these fields created during insert.
 So, for each record, we'd having each individual field and then a
full_name field that would contain the combination of the ind. fields. 
His argument is that this will make lookups in this manner extremely fast
and efficient.
Might, might not. No figures to back up his argument. It'll certainly 
make updates slower and less efficient. In fact, since each row will 
store the data twice you'll get less rows per disk-page which means 
(potentially) more disk reads when you need to get several rows.

I agree with his assertion, but get the feeling that this is sort of an
ugly design.  Would a compound index on these fields really be less
efficient?
Doubtful, I'd certainly not try his solution until I'd tried the simple 
way first.

If you really want to try your friend's approach on PG you can build a 
functional index. As of 7.4, these can be expressions rather than just 
indexes so you can do something like:

CREATE INDEX my_idx_1 ON table1 ( prefix || ' ' || first_name ...);
If you're using 7.3.x you'll need to wrap that expression in a function 
and index the function instead.

In your case though, I'd just build a compound index and leave it at that.
--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [GENERAL] Large table search question

2004-06-01 Thread Stijn Vanroye

I don't want to but in, I just find this an interesting discussion and would like to 
add my 2 cents:

I have read this in the manual: (PostgreSQL 7.4beta4 documentation, Chapter 11.3 
Multicolumn Indexes)
Qoute:
Multicolumn indexes should be used sparingly. Most of the time, an index on a single 
column is sufficient and saves space and time. Indexes with more than three columns 
are unlikely to be helpful unless the usage of the table is extremely stylized.
This makes me think of the usefullness in means of performance off multi-column 
indices. Furthermore it states that mulicolumn indeces will only be used by the 
planner if the fields of the index are used with the AND operator in the where clause 
of your select. (Same chapter).

We had a table with 6million+ records and a few tests with explain reveiled that none 
of the multi-column indeces where actually used! This while regualar analyzes where 
done, and the data never changes (no mutations).

I don't seem to grasp the full meaning of the above. Am I better of using several 
single field indices, or do mulitcolumn indices offer an advantage? If so in which 
case? Just made me wander...

Regards,

Stijn Vanroye

-Original Message-
From: Richard Huxton [mailto:[EMAIL PROTECTED]
Sent: dinsdag 1 juni 2004 10:44
To: John Wells
Cc: [EMAIL PROTECTED]
Subject: Re: [GENERAL] Large table search question[Scanned]


John Wells wrote:
 Guys,
 
 I have a general question about designing databases for large data sets.
 
 I was speaking with a colleague about an application we're preparing to
 build.  One of the application's tables will potentially contain 2 million
 or more names, containing (at least) the fields first_name, last_name,
 middle_name and prefix.
 
 A common lookup the application will require is the full name, so prefix +
 first_name + middle_name + last_name.
 
 My friend's suggestion was to create a lookup field in the table itself,
 which would contain a concatenation of these fields created during insert.
  So, for each record, we'd having each individual field and then a
 full_name field that would contain the combination of the ind. fields. 
 His argument is that this will make lookups in this manner extremely fast
 and efficient.

Might, might not. No figures to back up his argument. It'll certainly 
make updates slower and less efficient. In fact, since each row will 
store the data twice you'll get less rows per disk-page which means 
(potentially) more disk reads when you need to get several rows.

 I agree with his assertion, but get the feeling that this is sort of an
 ugly design.  Would a compound index on these fields really be less
 efficient?

Doubtful, I'd certainly not try his solution until I'd tried the simple 
way first.

If you really want to try your friend's approach on PG you can build a 
functional index. As of 7.4, these can be expressions rather than just 
indexes so you can do something like:

CREATE INDEX my_idx_1 ON table1 ( prefix || ' ' || first_name ...);

If you're using 7.3.x you'll need to wrap that expression in a function 
and index the function instead.

In your case though, I'd just build a compound index and leave it at that.

-- 
   Richard Huxton
   Archonet Ltd

---(end of broadcast)---
TIP 8: explain analyze is your friend

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [GENERAL] Large table search question

2004-06-01 Thread Richard Huxton

Stijn Vanroye wrote:
I don't want to but in, I just find this an interesting discussion
and would like to add my 2 cents:
I have read this in the manual: (PostgreSQL 7.4beta4 documentation,
Chapter 11.3 Multicolumn Indexes) Qoute: Multicolumn indexes should
be used sparingly. Most of the time, an index on a single column is
sufficient and saves space and time. Indexes with more than three
columns are unlikely to be helpful unless the usage of the table is
extremely stylized. This makes me think of the usefullness in means
of performance off multi-column indices. Furthermore it states that
mulicolumn indeces will only be used by the planner if the fields of
the index are used with the AND operator in the where clause of your
select. (Same chapter).
We had a table with 6million+ records and a few tests with explain
reveiled that none of the multi-column indeces where actually used!
This while regualar analyzes where done, and the data never changes
(no mutations).
Indeed - in many cases the additional costs of keeping a multi-column 
index up to date, and of reading it outweigh the benefits on the few 
queries that actually use them.

Looking at John's example, if he defined an index (first_name, 
last_name) then ordering by last_name can't ever use that index. If you 
execute a lot of queries for last_name=Smith AND first_name=John 
then it might well help, there are a lot of Smiths to choose from. On 
the other hand, my last_name=Huxton and there aren't many of those in 
the phone book, so if a lot of your data is Huxtons rather than 
Smiths then you might just want an index on last_name.

I don't seem to grasp the full meaning of the above. Am I better of
using several single field indices, or do mulitcolumn indices offer
an advantage? If so in which case? Just made me wander...
There's really no alternative to testing. The statistics tables are very 
useful here. Unless you have good reason not to, always turn the 
statistics gathering on, and take snapshot regularly to keep an eye on 
where PG is exerting the most effort.
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [GENERAL] Large table search question[Scanned]

2004-06-01 Thread Stijn Vanroye

Thanks for the reply. I was afraid it would come down to testing each individual 
situation.µ
The table I mentioned (6 million+ records) actually is a phonebook. And searching and 
filtering is possible on almost any combination of fields. So there's an index on each 
individual field now and that's it. Works relatively fast now anyway.

Regards.


Richard Huxton wrote:
 Stijn Vanroye wrote:
  I don't want to but in, I just find this an interesting discussion
  and would like to add my 2 cents:
  
  I have read this in the manual: (PostgreSQL 7.4beta4 documentation,
  Chapter 11.3 Multicolumn Indexes) Qoute: Multicolumn indexes should
  be used sparingly. Most of the time, an index on a single column is
  sufficient and saves space and time. Indexes with more than three
  columns are unlikely to be helpful unless the usage of the table is
  extremely stylized. This makes me think of the usefullness in means
  of performance off multi-column indices. Furthermore it states that
  mulicolumn indeces will only be used by the planner if the fields of
  the index are used with the AND operator in the where clause of your
  select. (Same chapter).
  
  We had a table with 6million+ records and a few tests with explain
  reveiled that none of the multi-column indeces where actually used!
  This while regualar analyzes where done, and the data never changes
  (no mutations).
 
 Indeed - in many cases the additional costs of keeping a multi-column 
 index up to date, and of reading it outweigh the benefits on the few 
 queries that actually use them.
 
 Looking at John's example, if he defined an index (first_name, 
 last_name) then ordering by last_name can't ever use that 
 index. If you 
 execute a lot of queries for last_name=Smith AND first_name=John 
 then it might well help, there are a lot of Smiths to 
 choose from. On 
 the other hand, my last_name=Huxton and there aren't many 
 of those in 
 the phone book, so if a lot of your data is Huxtons rather than 
 Smiths then you might just want an index on last_name.
 
  I don't seem to grasp the full meaning of the above. Am I better of
  using several single field indices, or do mulitcolumn indices offer
  an advantage? If so in which case? Just made me wander...
 
 There's really no alternative to testing. The statistics 
 tables are very 
 useful here. Unless you have good reason not to, always turn the 
 statistics gathering on, and take snapshot regularly to keep 
 an eye on 
 where PG is exerting the most effort.
 -- 
Richard Huxton
Archonet Ltd
 

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

[GENERAL] Large table search question

2004-05-30 Thread John Wells

Guys,

I have a general question about designing databases for large data sets.

I was speaking with a colleague about an application we're preparing to
build.  One of the application's tables will potentially contain 2 million
or more names, containing (at least) the fields first_name, last_name,
middle_name and prefix.

A common lookup the application will require is the full name, so prefix +
first_name + middle_name + last_name.

My friend's suggestion was to create a lookup field in the table itself,
which would contain a concatenation of these fields created during insert.
 So, for each record, we'd having each individual field and then a
full_name field that would contain the combination of the ind. fields. 
His argument is that this will make lookups in this manner extremely fast
and efficient.

I agree with his assertion, but get the feeling that this is sort of an
ugly design.  Would a compound index on these fields really be less
efficient?

Thanks for your help!

John

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [GENERAL] Large table search question

2004-05-30 Thread Tom Lane

John Wells [EMAIL PROTECTED] writes:
 A common lookup the application will require is the full name, so prefix +
 first_name + middle_name + last_name.

 My friend's suggestion was to create a lookup field in the table itself,
 which would contain a concatenation of these fields created during insert.
  So, for each record, we'd having each individual field and then a
 full_name field that would contain the combination of the ind. fields. 
 His argument is that this will make lookups in this manner extremely fast
 and efficient.

Not unless you then add an index on that field, which would imply doubly
redundant storage of the data (primary fields, lookup field, lookup
field's index).

You don't actually need the lookup field in Postgres: you can create the
computed index directly.  For instance

create index fooi on foo ((first_name || middle_name || last_name));

select * from foo
where (first_name || middle_name || last_name) = 'JohnQPublic';

This is still kinda grim on storage space, but at least it's 2x not 3x.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [GENERAL] Large table load (40 millon rows) - WAL hassles

2001-07-12 Thread Bruce Momjian


 Version: Postgres 7.1.2
 
 A product we are developing requires frequent loading
 of a large number of rows into a table.  We are using
 the copy file command, but with WAL we are effectively
 doubling the amount of disk writing we are doing.
 
 After the rows are loaded we do a create index.
 
 Is there a way to turn off WAL or otherwise speeding up
 the table loading process? Would db_restore be faster?
 
 The rows are loaded in sorted order.  Does this impact
 index creation negatively or positively?
 
 We are currently working with test data but we estimate
 production data to be 6 - 9 billion rows.  Is anyone
 else running with these volumes?
 

We have a problem with 7.1.X where the WAL logs stay around longer than
needed.  Maybe someone has a patch with that fix.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

[GENERAL] Large table load (40 millon rows) - WAL hassles

2001-07-10 Thread simon lai


Version: Postgres 7.1.2

A product we are developing requires frequent loading
of a large number of rows into a table.  We are using
the copy file command, but with WAL we are effectively
doubling the amount of disk writing we are doing.

After the rows are loaded we do a create index.

Is there a way to turn off WAL or otherwise speeding up
the table loading process? Would db_restore be faster?

The rows are loaded in sorted order.  Does this impact
index creation negatively or positively?

We are currently working with test data but we estimate
production data to be 6 - 9 billion rows.  Is anyone
else running with these volumes?

Simon

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

38 matches

Mail list logo