Re: [PERFORM] 12 hour table vacuums

2007-10-24 Thread Harald Fuchs
In article [EMAIL PROTECTED],
Ron St-Pierre [EMAIL PROTECTED] writes:

 For what it's worth NUMERIC columns take more space than you might expect.
 Figure a minimum of 12 bytes your rows are at about 1.5k each even if the
 non-numeric columns aren't large themselves. What are the other columns?

 The NUMERIC columns hold currency related values, with values ranging
 from a few cents to the billions, as well as a few negative numbers.

What's the required precision? If it's just cents (or maybe tenths
thereof), you could use BIGINT to store the amount in this precision.
This would give you exact values with much less space.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] 12 hour table vacuums

2007-10-24 Thread Jean-David Beyer
Ron St-Pierre wrote:
 We vacuum only a few of our tables nightly, this one is the last one
 because it takes longer to run. I'll probably re-index it soon, but I
 would appreciate any advice on how to speed up the vacuum process (and
 the db in general).

I am a novice to postgreSQL, so I have no answers for you. But for my own
education, I am confused by some of your post.
 
 Okay, here's our system:
   postgres 8.1.4

I have postgresql-8.1.9-1.el5

   Linux version 2.4.21

I imagine you mean Linux kernel version; I have 2.6.18-8.1.15.el5PAE

   Red Hat Linux 3.2.3

I have no clue what this means. Red Hat Linux 3 must have been in the early
1990s. RHL 5 came out about 1998 IIRC.

Red Hat Enterprise Linux 3, on the other hand, was not numbered like that,
as I recall. I no longer run that, but my current RHEL5 is named like this:

Red Hat Enterprise Linux Server release 5 (Tikanga)

and for my CentOS 4 system, it is

CentOS release 4.5 (Final)

Did RHEL3 go with the second dot in their release numbers? I do not remember
that.

   8 GB ram
   Intel(R) Xeon(TM) CPU 3.20GHz
   Raid 5
   autovacuum=off

Why would you not have that on?

   serves as the application server and database server
   server is co-located in another city, hardware upgrade is not
 currently an option
 
 Here's the table information:
 The table has 140,000 rows, 130 columns (mostly NUMERIC), 60 indexes.

I have designed databases, infrequently, but since the late 1970s. In my
experience, my tables had relatively few columns, rarely over 10. Are you
sure this table needs so many? Why not, e.g., 13 tables averaging 10 columns
each?

OTOH, 140,000 rows is not all that many. I have a 6,000,000 row table in my
little database on my desktop, and I do not even consider that large.
Imagine the size of a database belonging to the IRS, for example. Surely it
would have at least one row for each taxpayer and each employer (possibly in
two tables, or two databases).

Here are the last few lines of a VACUUM VERBOSE; command for that little
database. The 6,000,000 row table is not in the database at the moment, nor
are some of the other tables, but two relatively (for me) large tables are.

CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  free space map contains 166 pages in 76 relations
DETAIL:  A total of 1280 page slots are in use (including overhead).
1280 page slots are required to track all free space.
Current limits are:  4 page slots, 1000 relations, using 299 KB.
VACUUM
stock= select count(*) from ranks; [table has 10 columns]
 count

 981030
(1 row)

stock= select count(*) from ibd;   [table has 8 columns]
  count
-
 1099789
(1 row)

And this is the time for running that psql process, most of which was
consumed by slow typing on my part.

real1m40.206s
user0m0.027s
sys 0m0.019s

My non-default settings for this are

# - Memory -

shared_buffers = 251000
work_mem  = 32768
max_fsm_pages  = 4

I have 8GBytes RAM on this machine, and postgreSQL is the biggest memory
user. I set shared_buffers high to try to get some entire (small) tables in
RAM and to be sure there is room for indices.

-- 
  .~.  Jean-David Beyer  Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A Registered Machine   241939.
 /( )\ Shrewsbury, New Jerseyhttp://counter.li.org
 ^^-^^ 08:40:01 up 1 day, 58 min, 1 user, load average: 4.08, 4.13, 4.17

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] 12 hour table vacuums

2007-10-23 Thread Bill Moran
In response to Ron St-Pierre [EMAIL PROTECTED]:

 We vacuum only a few of our tables nightly, this one is the last one 
 because it takes longer to run. I'll probably re-index it soon, but I 
 would appreciate any advice on how to speed up the vacuum process (and 
 the db in general).

I doubt anyone can provide meaningful advice without the output of
vacuum verbose.

 
 Okay, here's our system:
postgres 8.1.4
Linux version 2.4.21
Red Hat Linux 3.2.3
8 GB ram
Intel(R) Xeon(TM) CPU 3.20GHz
Raid 5
autovacuum=off
serves as the application server and database server
server is co-located in another city, hardware upgrade is not 
 currently an option
 
 Here's the table information:
 The table has 140,000 rows, 130 columns (mostly NUMERIC), 60 indexes. It 
 is probably our 'key' table in the database and gets called by almost 
 every query (usually joined to others). The table gets updated only 
 about 10 times a day. We were running autovacuum but it interfered with 
 the updates to we shut it off. We vacuum this table nightly, and it 
 currently takes about 12 hours to vacuum it. Not much else is running 
 during this period, nothing that should affect the table.
 
 Here are the current non-default postgresql.conf settings:
 max_connections = 100
 shared_buffers = 5
 work_mem = 9192
 maintenance_work_mem = 786432
 max_fsm_pages = 7
 vacuum_cost_delay = 200
 vacuum_cost_limit = 100
 bgwriter_delay = 1
 fsync = on
 checkpoint_segments = 64
 checkpoint_timeout = 1800
 effective_cache_size = 27
 random_page_cost = 2
 log_destination = 'stderr'
 redirect_stderr = on
 client_min_messages = warning
 log_min_messages = warning
 stats_start_collector = off
 stats_command_string = on
 stats_block_level = on
 stats_row_level = on
 autovacuum = off 
 autovacuum_vacuum_threshold = 2000
 deadlock_timeout = 1
 max_locks_per_transaction = 640
 add_missing_from = on
 
 As I mentioned, any insights into changing the configuration to optimize 
 performance are most welcome.
 
 Thanks
 
 Ron
 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings
 
 
 
 
 
 


-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023


IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] 12 hour table vacuums

2007-10-23 Thread Tom Lane
Ron St-Pierre [EMAIL PROTECTED] writes:
 The table has 140,000 rows, 130 columns (mostly NUMERIC), 60 indexes. It 
 is probably our 'key' table in the database and gets called by almost 
 every query (usually joined to others). The table gets updated only 
 about 10 times a day. We were running autovacuum but it interfered with 
 the updates to we shut it off. We vacuum this table nightly, and it 
 currently takes about 12 hours to vacuum it. Not much else is running 
 during this period, nothing that should affect the table.

Here is your problem:

 vacuum_cost_delay = 200

If you are only vacuuming when nothing else is happening, you shouldn't
be using vacuum_cost_delay at all: set it to 0.  In any case this value
is probably much too high.  I would imagine that if you watch the
machine while the vacuum is running you'll find both CPU and I/O load
near zero ... which is nice, unless you would like the vacuum to finish
sooner.

In unrelated comments:

 maintenance_work_mem = 786432

That seems awfully high, too.

 max_fsm_pages = 7

And this possibly too low --- are you sure you are not leaking disk
space?

 stats_start_collector = off
 stats_command_string = on
 stats_block_level = on
 stats_row_level = on

These are not self-consistent.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] 12 hour table vacuums

2007-10-23 Thread Alvaro Herrera
Ron St-Pierre wrote:

 Okay, here's our system:
   postgres 8.1.4

Upgrade to 8.1.10

 Here's the table information:
 The table has 140,000 rows, 130 columns (mostly NUMERIC), 60 indexes.

60 indexes?  You gotta be kidding.  You really have 60 columns on which
to scan?

 vacuum_cost_delay = 200
 vacuum_cost_limit = 100

Isn't this a bit high?  What happens if you cut the delay to, say, 10?
(considering you've lowered the limit to half the default)

-- 
Alvaro Herrera  Developer, http://www.PostgreSQL.org/
Someone said that it is at least an order of magnitude more work to do
production software than a prototype. I think he is wrong by at least
an order of magnitude.  (Brian Kernighan)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] 12 hour table vacuums

2007-10-23 Thread Ron St-Pierre

Bill Moran wrote:

In response to Ron St-Pierre [EMAIL PROTECTED]:

  
We vacuum only a few of our tables nightly, this one is the last one 
because it takes longer to run. I'll probably re-index it soon, but I 
would appreciate any advice on how to speed up the vacuum process (and 
the db in general).



I doubt anyone can provide meaningful advice without the output of
vacuum verbose.

  

The cron job is still running
 /usr/local/pgsql/bin/vacuumdb -d imperial -t stock.fdata -v -z  
/usr/local/pgsql/bin/fdata.txt

I'll post the output when it's finished.

Ron


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] 12 hour table vacuums

2007-10-23 Thread Gregory Stark
Ron St-Pierre [EMAIL PROTECTED] writes:

 We vacuum only a few of our tables nightly, this one is the last one because 
 it
 takes longer to run. I'll probably re-index it soon, but I would appreciate 
 any
 advice on how to speed up the vacuum process (and the db in general).
...
 vacuum_cost_delay = 200

Well speeding up vacuum isn't really useful in itself. In fact you have vacuum
configured to run quite slowly by having vacuum_cost_delay set so high. You
have it set to sleep 200ms every few pages. If you lower that it'll run faster
but take more bandwidth away from the foreground tasks.

 Here's the table information:
 The table has 140,000 rows, 130 columns (mostly NUMERIC), 60 indexes. 

For what it's worth NUMERIC columns take more space than you might expect.
Figure a minimum of 12 bytes your rows are at about 1.5k each even if the
non-numeric columns aren't large themselves. What are the other columns?

 We were running autovacuum but it interfered with the updates to we shut it
 off. 

Was it just the I/O bandwidth? I'm surprised as your vacuum_cost_delay is
quite high. Manual vacuum doesn't do anything differently from autovacuum,
neither should interfere directly with updates except by taking away
I/O bandwidth.

 We vacuum this table nightly, and it currently takes about 12 hours to
 vacuum it. Not much else is running during this period, nothing that should
 affect the table.

Is this time increasing over time? If once a day isn't enough then you may be
accumulating more and more dead space over time. In which case you may be
better off running it during prime time with a large vacuum_cost_delay (like
the 200 you have configured) rather than trying to get to run fast enough to
fit in the off-peak period.

 deadlock_timeout = 1

I would not suggest having this quite this high. Raising it from the default
is fine but having a value larger than your patience is likely to give you the
false impression that something is hung if you should ever get a deadlock.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] 12 hour table vacuums

2007-10-23 Thread Bill Moran
In response to Ron St-Pierre [EMAIL PROTECTED]:

 Bill Moran wrote:
  In response to Ron St-Pierre [EMAIL PROTECTED]:
 

  We vacuum only a few of our tables nightly, this one is the last one 
  because it takes longer to run. I'll probably re-index it soon, but I 
  would appreciate any advice on how to speed up the vacuum process (and 
  the db in general).
  
 
  I doubt anyone can provide meaningful advice without the output of
  vacuum verbose.

Understood, however I may have spoken too soon.  It appears that Tom
found an obvious issue with your config that seems likely to be the
problem.

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] 12 hour table vacuums

2007-10-23 Thread Ron St-Pierre

Tom Lane wrote:

Here is your problem:

  

vacuum_cost_delay = 200



If you are only vacuuming when nothing else is happening, you shouldn't
be using vacuum_cost_delay at all: set it to 0.  In any case this value
is probably much too high.  I would imagine that if you watch the
machine while the vacuum is running you'll find both CPU and I/O load
near zero ... which is nice, unless you would like the vacuum to finish
sooner.
  
Yeah, I've noticed that CPU, mem and I/O load are really low when this 
is running. I'll change that setting.

In unrelated comments:

  

maintenance_work_mem = 786432



That seems awfully high, too.

  

Any thoughts on a more reasonable value?

max_fsm_pages = 7



And this possibly too low --- 
The default appears to be 2, so I upped it to 7. I'll try 16 
(max_fsm_relations*16).

are you sure you are not leaking disk
space?

  

What do you mean leaking disk space?

stats_start_collector = off
stats_command_string = on
stats_block_level = on
stats_row_level = on



These are not self-consistent.

regards, tom lane

  



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] 12 hour table vacuums

2007-10-23 Thread Ron St-Pierre

Alvaro Herrera wrote:

Ron St-Pierre wrote:

  

Okay, here's our system:
  postgres 8.1.4



Upgrade to 8.1.10
  

Any particular fixes in 8.1.10 that would help with this?
  

Here's the table information:
The table has 140,000 rows, 130 columns (mostly NUMERIC), 60 indexes.



60 indexes?  You gotta be kidding.  You really have 60 columns on which
to scan?

  
Really. 60 indexes. They're the most commonly requested columns for 
company information (we believe). Any ideas on testing our assumptions 
about that? I would like to know definitively what are the most popular 
columns. Do you think that rules would be a good approach for this? 
(Sorry if I'm getting way off topic here)

vacuum_cost_delay = 200
vacuum_cost_limit = 100



Isn't this a bit high?  What happens if you cut the delay to, say, 10?
(considering you've lowered the limit to half the default)

  

Yes, Tom pointed this out too. I'll lower it and check out the results.

Ron


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] 12 hour table vacuums

2007-10-23 Thread Joshua D. Drake
On Tue, 23 Oct 2007 10:00:05 -0700
Ron St-Pierre [EMAIL PROTECTED] wrote:

 Alvaro Herrera wrote:
  Ron St-Pierre wrote:
 

  Okay, here's our system:
postgres 8.1.4
  
 
  Upgrade to 8.1.10

 Any particular fixes in 8.1.10 that would help with this?

  Here's the table information:
  The table has 140,000 rows, 130 columns (mostly NUMERIC), 60
  indexes. 
 
  60 indexes?  You gotta be kidding.  You really have 60 columns on
  which to scan?
 

 Really. 60 indexes. They're the most commonly requested columns for 
 company information (we believe). Any ideas on testing our
 assumptions about that? I would like to know definitively what are
 the most popular columns. Do you think that rules would be a good
 approach for this? (Sorry if I'm getting way off topic here)

I suggest you:

1. Turn on stats and start looking in the stats columns to see what
indexes are actually being used.

2. Strongly review your normalization :)

Joshua D. Drake



-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



signature.asc
Description: PGP signature


Re: [PERFORM] 12 hour table vacuums

2007-10-23 Thread Simon Riggs
On Tue, 2007-10-23 at 08:53 -0700, Ron St-Pierre wrote:
 The table gets updated only 
 about 10 times a day. 

So why are you VACUUMing it nightly? You should do this at the weekend
every 3 months...

8.1 is slower at VACUUMing indexes than later releases, so 60 indexes
are going to hurt quite a lot.

The default maintenance_work_mem is sufficient for this table.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] 12 hour table vacuums

2007-10-23 Thread Alvaro Herrera
Ron St-Pierre wrote:
 Gregory Stark wrote:

 We were running autovacuum but it interfered with the updates to we
 shut it off. 

 Was it just the I/O bandwidth? I'm surprised as your
 vacuum_cost_delay is quite high. Manual vacuum doesn't do anything
 differently from autovacuum, neither should interfere directly with
 updates except by taking away I/O bandwidth.

 I don't know what the problem was. I tried to exclude certain tables
 from autovacuuming, but it autovacuumed anyway.

Probably because of Xid wraparound issues.  Now that you're vacuuming
weekly it shouldn't be a problem.  (It's also much less of a problem in
8.2).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] 12 hour table vacuums

2007-10-23 Thread Alvaro Herrera
Ron St-Pierre wrote:
 Alvaro Herrera wrote:
 Ron St-Pierre wrote:

   
 Okay, here's our system:
   postgres 8.1.4
 

 Upgrade to 8.1.10
   
 Any particular fixes in 8.1.10 that would help with this?

I don't think so, but my guess is that you really want to avoid the
autovacuum bug which makes it vacuum without FREEZE on template0, that
has caused so many problems all over the planet.

 Here's the table information:
 The table has 140,000 rows, 130 columns (mostly NUMERIC), 60 indexes.

 60 indexes?  You gotta be kidding.  You really have 60 columns on which
 to scan?

   
 Really. 60 indexes. They're the most commonly requested columns for company 
 information (we believe). Any ideas on testing our assumptions about that? 
 I would like to know definitively what are the most popular columns. Do you 
 think that rules would be a good approach for this? (Sorry if I'm getting 
 way off topic here)

As Josh Drake already said, you can check pg_stat* views to see which
indexes are not used.  Hard to say anything else without seeing the
definition.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org