Re: [PERFORM] [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal

2007-05-16 Thread Guillaume Cottenceau
Michael Stone mstone+postgres 'at' mathom.us writes:

 On Tue, May 15, 2007 at 06:43:50PM +0200, Guillaume Cottenceau wrote:
 patch - basically, I think the documentation under estimates (or
 sometimes misses) the benefit of VACUUM FULL for scans, and the
 needs of VACUUM FULL if the routine VACUUM hasn't been done
 properly since the database was put in production.
 
 It's also possible to overestimate the benefit of vacuum full, leading
 to people vacuum full'ing almost constantly, then complaining about
 performance due to the associated overhead. I think there have been
 more people on this list whose performance problems were caused by
 unnecessary full vacs than by those whose performance problems were
 caused by insufficient full vacs.

Come on, I don't suggest to remove several bold warnings about
it, the best one being Therefore, frequently using VACUUM FULL
can have an extremely negative effect on the performance of
concurrent database queries. My point is to add the few
additional mentions; I don't think the claims that VACUUM FULL
physically compacts the data, and might be useful in case of too
long time with infrequent VACUUM are incorrect, are they?

-- 
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal

2007-05-16 Thread Jim C. Nasby
On Wed, May 16, 2007 at 09:41:46AM +0200, Guillaume Cottenceau wrote:
 Michael Stone mstone+postgres 'at' mathom.us writes:
 
  On Tue, May 15, 2007 at 06:43:50PM +0200, Guillaume Cottenceau wrote:
  patch - basically, I think the documentation under estimates (or
  sometimes misses) the benefit of VACUUM FULL for scans, and the
  needs of VACUUM FULL if the routine VACUUM hasn't been done
  properly since the database was put in production.
  
  It's also possible to overestimate the benefit of vacuum full, leading
  to people vacuum full'ing almost constantly, then complaining about
  performance due to the associated overhead. I think there have been
  more people on this list whose performance problems were caused by
  unnecessary full vacs than by those whose performance problems were
  caused by insufficient full vacs.
 
 Come on, I don't suggest to remove several bold warnings about
 it, the best one being Therefore, frequently using VACUUM FULL
 can have an extremely negative effect on the performance of
 concurrent database queries. My point is to add the few
 additional mentions; I don't think the claims that VACUUM FULL
 physically compacts the data, and might be useful in case of too
 long time with infrequent VACUUM are incorrect, are they?

Unfortunately they are, to a degree. VACUUM FULL can create a
substantial amount of churn in the indexes, resulting in bloated
indexes. So often you have to REINDEX after you VACUUM FULL.

Long term I think we should ditch 'VACUUM FULL' altogether and create a
COMPACT command (it's very easy for users to get confused between
vacuum all the databases in the cluster or vacuum the entire
database and VACUUM FULL).
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal

2007-05-16 Thread Guillaume Cottenceau
Jim C. Nasby decibel 'at' decibel.org writes:

 On Wed, May 16, 2007 at 09:41:46AM +0200, Guillaume Cottenceau wrote:

[...]

  Come on, I don't suggest to remove several bold warnings about
  it, the best one being Therefore, frequently using VACUUM FULL
  can have an extremely negative effect on the performance of
  concurrent database queries. My point is to add the few
  additional mentions; I don't think the claims that VACUUM FULL
  physically compacts the data, and might be useful in case of too
  long time with infrequent VACUUM are incorrect, are they?
 
 Unfortunately they are, to a degree. VACUUM FULL can create a
 substantial amount of churn in the indexes, resulting in bloated
 indexes. So often you have to REINDEX after you VACUUM FULL.

Ok, VACUUM FULL does his job (it physically compacts the data and
might be useful in case of too long time with infrequent VACUUM),
but we are going to not talk about it because we often needs a
REINDEX after it? The natural conclusion would rather be to
document the fact than REINDEX is needed after VACUUM FULL, isn't
it?

-- 
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal

2007-05-16 Thread Alvaro Herrera
Guillaume Cottenceau wrote:
 Jim C. Nasby decibel 'at' decibel.org writes:
 
  On Wed, May 16, 2007 at 09:41:46AM +0200, Guillaume Cottenceau wrote:
 
 [...]
 
   Come on, I don't suggest to remove several bold warnings about
   it, the best one being Therefore, frequently using VACUUM FULL
   can have an extremely negative effect on the performance of
   concurrent database queries. My point is to add the few
   additional mentions; I don't think the claims that VACUUM FULL
   physically compacts the data, and might be useful in case of too
   long time with infrequent VACUUM are incorrect, are they?
  
  Unfortunately they are, to a degree. VACUUM FULL can create a
  substantial amount of churn in the indexes, resulting in bloated
  indexes. So often you have to REINDEX after you VACUUM FULL.
 
 Ok, VACUUM FULL does his job (it physically compacts the data and
 might be useful in case of too long time with infrequent VACUUM),
 but we are going to not talk about it because we often needs a
 REINDEX after it? The natural conclusion would rather be to
 document the fact than REINDEX is needed after VACUUM FULL, isn't
 it?

Maybe, but we should also mention that CLUSTER is a likely faster
workaround.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal

2007-05-16 Thread Jim C. Nasby
On Wed, May 16, 2007 at 12:09:26PM -0400, Alvaro Herrera wrote:
 Guillaume Cottenceau wrote:
  Jim C. Nasby decibel 'at' decibel.org writes:
  
   On Wed, May 16, 2007 at 09:41:46AM +0200, Guillaume Cottenceau wrote:
  
  [...]
  
Come on, I don't suggest to remove several bold warnings about
it, the best one being Therefore, frequently using VACUUM FULL
can have an extremely negative effect on the performance of
concurrent database queries. My point is to add the few
additional mentions; I don't think the claims that VACUUM FULL
physically compacts the data, and might be useful in case of too
long time with infrequent VACUUM are incorrect, are they?
   
   Unfortunately they are, to a degree. VACUUM FULL can create a
   substantial amount of churn in the indexes, resulting in bloated
   indexes. So often you have to REINDEX after you VACUUM FULL.
  
  Ok, VACUUM FULL does his job (it physically compacts the data and
  might be useful in case of too long time with infrequent VACUUM),
  but we are going to not talk about it because we often needs a
  REINDEX after it? The natural conclusion would rather be to
  document the fact than REINDEX is needed after VACUUM FULL, isn't
  it?
 
 Maybe, but we should also mention that CLUSTER is a likely faster
 workaround.

What this boils down to is that there should probably be a separate
subsection that deals with Oh noes! My tables are too big!
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal

2007-05-16 Thread Michael Stone

On Wed, May 16, 2007 at 12:09:26PM -0400, Alvaro Herrera wrote:

Maybe, but we should also mention that CLUSTER is a likely faster
workaround.


Unless, of course, you don't particularly care about the order of the 
items in your table; you might end up wasting vastly more time rewriting 
tables due to unnecessary clustering than for full vacuums on a table 
that doesn't need it.


Mike Stone

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal

2007-05-16 Thread Chris Browne
[EMAIL PROTECTED] (Michael Stone) writes:
 On Wed, May 16, 2007 at 12:09:26PM -0400, Alvaro Herrera wrote:
Maybe, but we should also mention that CLUSTER is a likely faster
workaround.

 Unless, of course, you don't particularly care about the order of
 the items in your table; you might end up wasting vastly more time
 rewriting tables due to unnecessary clustering than for full vacuums
 on a table that doesn't need it.

Actually, this is irrelevant.

If CLUSTER is faster than VACUUM FULL (and if it isn't, in all cases,
it *frequently* is, and probably will be, nearly always, soon), then
it's a faster workaround.
-- 
output = (cbbrowne @ linuxfinances.info)
http://cbbrowne.com/info/oses.html
What if you slept?  And what if, in your sleep, you dreamed?
 And what if, in your dream, you went to heaven and there
 plucked a strange and beautiful flower?  And what if, when
 you awoke, you had the flower in your hand?  Ah, what then?
--Coleridge

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal

2007-05-16 Thread Michael Stone

On Wed, May 16, 2007 at 03:34:42PM -0400, Chris Browne wrote:

[EMAIL PROTECTED] (Michael Stone) writes:

Unless, of course, you don't particularly care about the order of
the items in your table; you might end up wasting vastly more time
rewriting tables due to unnecessary clustering than for full vacuums
on a table that doesn't need it.


Actually, this is irrelevant.


I think it's perfectly relevant.


If CLUSTER is faster than VACUUM FULL (and if it isn't, in all cases,
it *frequently* is, and probably will be, nearly always, soon), then
it's a faster workaround.


Cluster reorders the table. If a table doesn't have any dead rows and 
you tell someone to run cluster or vacuum full, the vaccuum basically 
won't do anything and the cluster will reorder the whole table. Cluster 
is great for certain access patterns, but I've been noticing this odd 
tendency lately to treat it like a silver bullet.


Mike Stone

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal

2007-05-16 Thread Alvaro Herrera
Michael Stone wrote:
 On Wed, May 16, 2007 at 03:34:42PM -0400, Chris Browne wrote:
 [EMAIL PROTECTED] (Michael Stone) writes:
 Unless, of course, you don't particularly care about the order of
 the items in your table; you might end up wasting vastly more time
 rewriting tables due to unnecessary clustering than for full vacuums
 on a table that doesn't need it.
 
 Actually, this is irrelevant.
 
 I think it's perfectly relevant.
 
 If CLUSTER is faster than VACUUM FULL (and if it isn't, in all cases,
 it *frequently* is, and probably will be, nearly always, soon), then
 it's a faster workaround.
 
 Cluster reorders the table. If a table doesn't have any dead rows and 
 you tell someone to run cluster or vacuum full, the vaccuum basically 
 won't do anything and the cluster will reorder the whole table. Cluster 
 is great for certain access patterns, but I've been noticing this odd 
 tendency lately to treat it like a silver bullet.

Well, it's certainly not a silver bullet; you would use VACUUM (not
full) for most of your needs, and CLUSTER for the rare other cases.  Of
course you would not pick an index at random each time, but rather keep
using the same one, which would supposedly be faster.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal

2007-05-16 Thread Tom Lane
Michael Stone [EMAIL PROTECTED] writes:
 On Wed, May 16, 2007 at 03:34:42PM -0400, Chris Browne wrote:
 If CLUSTER is faster than VACUUM FULL (and if it isn't, in all cases,
 it *frequently* is, and probably will be, nearly always, soon), then
 it's a faster workaround.

 Cluster reorders the table. If a table doesn't have any dead rows and 
 you tell someone to run cluster or vacuum full, the vaccuum basically 
 won't do anything and the cluster will reorder the whole table. Cluster 
 is great for certain access patterns, but I've been noticing this odd 
 tendency lately to treat it like a silver bullet.

Sure, but VACUUM FULL looks even less like a silver bullet.

There's been talk of providing an operation that uses the same
infrastructure as CLUSTER, but doesn't make any attempt to re-order the
table: just seqscan the old heap, transfer still-live tuples into a new
heap, then rebuild indexes from scratch.  This is clearly going to be a
lot faster than a VACUUM FULL under conditions in which the latter would
have to move most of the tuples.  Heikki just fixed one of the major
objections to it (ie, CLUSTER not being MVCC-safe).  The other objection
is that peak transient disk space usage could be much higher than VACUUM
FULL's, but still for a lot of scenarios this'd be better.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal

2007-05-15 Thread Guillaume Cottenceau
Dear all,

After some time spent better understanding how the VACUUM process
works, what problems we had in production and how to improve our
maintenance policy[1], I've come up with a little documentation
patch - basically, I think the documentation under estimates (or
sometimes misses) the benefit of VACUUM FULL for scans, and the
needs of VACUUM FULL if the routine VACUUM hasn't been done
properly since the database was put in production. Find the patch
against snapshot attached (text not filled, to ease reading). It
might help others in my situation in the future.

--- doc/src/sgml/ref/vacuum.sgml	2007-02-01 00:26:04.0 +0100
+++ /tmp/vacuum.sgml	2007-05-15 18:32:14.0 +0200
@@ -164,8 +164,8 @@
para
 The optionFULL/option option is not recommended for routine use,
 but might be useful in special cases.  An example is when you have deleted
-most of the rows in a table and would like the table to physically shrink
-to occupy less disk space.  commandVACUUM FULL/command will usually
+or updated most of the rows in a table and would like the table to physically shrink
+to occupy less disk space and allow faster table scans. commandVACUUM FULL/command will usually
 shrink the table more than a plain commandVACUUM/command would.
 The optionFULL/option option does not shrink indexes; a periodic
 commandREINDEX/ is still recommended.  In fact, it is often faster
--- doc/src/sgml/maintenance.sgml	2007-05-03 17:47:48.0 +0200
+++ /tmp/maintenance.sgml	2007-05-15 18:29:29.0 +0200
@@ -157,7 +157,8 @@
 command. This uses a more aggressive algorithm for reclaiming the
 space consumed by dead row versions. Any space that is freed by
 commandVACUUM FULL/command is immediately returned to the
-operating system. Unfortunately, this variant of the
+operating system, and the table data is physically compacted on
+the disk. Unfortunately, this variant of the
 commandVACUUM/command command acquires an exclusive lock on
 each table while commandVACUUM FULL/command is processing
 it. Therefore, frequently using commandVACUUM FULL/command can
@@ -168,12 +169,15 @@
para
 The standard form of commandVACUUM/ is best used with the goal
 of maintaining a fairly level steady-state usage of disk space. If
-you need to return disk space to the operating system you can use
+you need to return disk space to the operating system, you can use
 commandVACUUM FULL/ mdash; but what's the point of releasing disk
 space that will only have to be allocated again soon?  Moderately
 frequent standard commandVACUUM/ runs are a better approach
 than infrequent commandVACUUM FULL/ runs for maintaining
-heavily-updated tables.
+heavily-updated tables. However, if some heavily-updated tables
+have gone too long with infrequent commandVACUUM/, you can
+use commandVACUUM FULL/ to get performance back (it is much
+slower to scan a table containing almost only dead rows).
/para
 
para

Ref: 
[1] http://archives.postgresql.org/pgsql-performance/2006-08/msg00419.php
http://archives.postgresql.org/pgsql-performance/2007-05/msg00112.php

-- 
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal

2007-05-15 Thread Michael Stone

On Tue, May 15, 2007 at 06:43:50PM +0200, Guillaume Cottenceau wrote:

patch - basically, I think the documentation under estimates (or
sometimes misses) the benefit of VACUUM FULL for scans, and the
needs of VACUUM FULL if the routine VACUUM hasn't been done
properly since the database was put in production.


It's also possible to overestimate the benefit of vacuum full, leading 
to people vacuum full'ing almost constantly, then complaining about 
performance due to the associated overhead. I think there have been more 
people on this list whose performance problems were caused by 
unnecessary full vacs than by those whose performance problems were 
caused by insufficient full vacs.


Mike Stone

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly