Re: [PERFORM] Question on REINDEX

2005-04-19 Thread Tambet Matiisen

 
 Josh Berkus josh@agliodbs.com writes:
  1) When is it necessary to run REINDEX or drop/create
  an index?  All I could really find in the docs is:
 
  If you need to VACUUM FULL, you need to REINDEX as well.  
 For example, 
  if you drop millions of rows from a table.
 
 That's probably a pretty good rule of thumb.  It's worth 
 noting that VACUUM FULL tends to actively bloat indexes, not 
 reduce them in size, because it has to create new index 
 entries for the rows it moves before it can delete the old 
 ones.  So if a VACUUM FULL moves many rows you are likely to 
 see the indexes get bigger not smaller.
 

Is my current understanding correct:

1) VACUUM defragments each page locally - moves free space to the end of
page.

2) VACUUM FULL defragments table globally - tries to fill up all
partially free pages and deletes all resulting empty pages.

3) Both VACUUM and VACUUM FULL do only local defragment for indexes.

4) If you want indexes to become fully defragmented, you need to
REINDEX.


If you happen to use triggers for denormalization, like I do, then you
have a lot of updates, which means that tables and indexes become quicky
cluttered with pages, which contain mostly dead tuples. If those tables
and indexes fill up shared buffers, then PostgreSQL slows down, because
it has to do a lot more IO than normal. Regular VACUUM FULL helped, but
I needed REINDEX as well, otherwise indexes grew bigger than tables
itself!

  Better to up your max_fsm_pages and do regular VACUUMs regularly and
  frequently so that you don't have to REINDEX at all.
 
 Yes, definitely.  Also consider using CLUSTER rather than 
 VACUUM FULL when you need to clean up after massive deletions 
 from a table.  It's not any less intrusive in terms of 
 locking, but it's often faster and it avoids the index bloat 
 problem (since it effectively does a REINDEX).
 

Hmm, thanks for a tip. BTW, is output of 

select count(1), sum(relpages) from pg_class where relkind in
('r','i','t')

good estimate for max_fsm_relations and max_fsm_pages?
Are these parameters used only during VACUUM or in runtime too?

  Tambet

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Question on REINDEX

2005-04-19 Thread Tom Lane
Tambet Matiisen [EMAIL PROTECTED] writes:
 Is my current understanding correct:

 1) VACUUM defragments each page locally - moves free space to the end of
 page.

 2) VACUUM FULL defragments table globally - tries to fill up all
 partially free pages and deletes all resulting empty pages.

Both versions of VACUUM do within-page defragmentation.  Also, both
versions will remove entirely-empty pages at the end of a table.
The difference is that VACUUM FULL actively attempts to make pages
at the end empty, by moving their contents into free space in earlier
pages.  Plain VACUUM never does cross-page data movement, which is
how come it doesn't need as strong a lock.

BTW, VACUUM FULL does the data movement back-to-front, and stops as soon
as it finds a tuple it cannot move down; which is a reasonable strategy
since the goal is merely to make the file shorter.  But it's entirely
likely that there will be lots of empty space left at the end.  For
instance the final state could have one 4K tuple in the last page and
up to 4K-1 free bytes in every earlier page.

 3) Both VACUUM and VACUUM FULL do only local defragment for indexes.

 4) If you want indexes to become fully defragmented, you need to
 REINDEX.

I don't think defragment is a notion that applies to indexes, at least
not in the same way as for tables.  It's true that there is no
cross-page data movement in either case.  In the last release or two
we've been able to recognize and recycle entirely-empty pages in both
btree and hash indexes, but such pages are almost never returned to the
OS; they're put on a freelist for re-use within the index, instead.

If you allow the table to grow to much more than its normal size,
ie, you allow many dead tuples to be formed, then getting back to
normal size is going to require VACUUM FULL + REINDEX (or you can use
CLUSTER or some varieties of ALTER TABLE).  This is not the recommended
maintenance process however.  Sufficiently frequent plain VACUUMs should
generally hold the free space to a tolerable level without requiring
any exclusive locking.

 Hmm, thanks for a tip. BTW, is output of 
 select count(1), sum(relpages) from pg_class where relkind in
 ('r','i','t')
 good estimate for max_fsm_relations and max_fsm_pages?

Within that one database, yes --- don't forget you must sum these
numbers across all DBs in the cluster.  Also you need some slop
in the max_fsm_pages setting because of quantization in the space
usage.  It's probably easier to let VACUUM VERBOSE do the calculation
for you.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Question on REINDEX

2005-04-19 Thread Josh Berkus
Tambet,

 Hmm, thanks for a tip. BTW, is output of

 select count(1), sum(relpages) from pg_class where relkind in
 ('r','i','t')

Well, if you do that for all databases in the cluster, it's the number you 
start with.  However, setting FSM_pages to that would be assuming that you 
excpected 100% of the rows to be replaced by UPDATES or DELETEs before you 
ran VACUUM.   I generally run VACUUM a little sooner than that.

See the end portion of:
http://www.powerpostgresql.com/PerfList

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Question on REINDEX

2005-04-19 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes:
 select count(1), sum(relpages) from pg_class where relkind in
 ('r','i','t')

 Well, if you do that for all databases in the cluster, it's the number you 
 start with.  However, setting FSM_pages to that would be assuming that you 
 excpected 100% of the rows to be replaced by UPDATES or DELETEs before you 
 ran VACUUM.   I generally run VACUUM a little sooner than that.

Not at all.  What it says is that you expect 100% of the pages to have
useful amounts of free space, which is a *much* weaker criterion.

I think you can usually get away with setting max_fsm_pages to less than
your actual disk footprint, but I'm not sure how much less.  It'd
probably depend a lot on your usage pattern --- for instance,
insert-only history tables don't need any FSM space.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Question on REINDEX

2005-04-19 Thread Josh Berkus
Tom,

 Not at all.  What it says is that you expect 100% of the pages to have
 useful amounts of free space, which is a *much* weaker criterion.

Hmmm.  Good point.   

This seems to be another instance where my rule-of-thumb was based on false 
logic but nevertheless arrived at correct numbers.  I've seldom, if ever, set 
FSM_pages above 50% of the pages in the active database ... and never run 
out.

H  actually, it seems like, if you are vacuuming regularly, you only 
*do* need to track pages that have been touched by DELETE or UPDATE.   Other 
pages would have already been vacuumed and not have any useful free space 
left.   Yes?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Question on REINDEX

2005-04-19 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes:
 Not at all.  What it says is that you expect 100% of the pages to have
 useful amounts of free space, which is a *much* weaker criterion.

 H  actually, it seems like, if you are vacuuming regularly, you only 
 *do* need to track pages that have been touched by DELETE or UPDATE.   Other 
 pages would have already been vacuumed and not have any useful free space 
 left.   Yes?

Well, the space has to be remembered until it's reused.  On the other
hand, there's nothing that says FSM has to be aware of all the free
space available at all times --- the real criterion to avoid bloat
is that after a VACUUM, enough space is logged in FSM to satisfy all
the insertions that will happen before the next VACUUM.  So you could
have situations where free space is temporarily forgotten (for lack
of slots in FSM), but other free space gets used instead, and eventually
a later VACUUM re-finds that free space and puts it into FSM.

I think it's true that the more often you vacuum, the less FSM you need,
but this doesn't have much to do with how much free space is actually
out there on disk.  It's because you only need enough FSM to record the
free space you'll need until the next vacuum.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Question on REINDEX

2005-04-19 Thread Alvaro Herrera
On Tue, Apr 19, 2005 at 10:06:40AM -0400, Tom Lane wrote:

 BTW, VACUUM FULL does the data movement back-to-front, and stops as soon
 as it finds a tuple it cannot move down; which is a reasonable strategy
 since the goal is merely to make the file shorter.  But it's entirely
 likely that there will be lots of empty space left at the end.  For
 instance the final state could have one 4K tuple in the last page and
 up to 4K-1 free bytes in every earlier page.

Am I right in thinking that vacuum does at least two passes: one
front-to-back to find removable tuples, and other back-to-front for
movement?  Because if it doesn't work this way, it wouldn't relabel
(change Xmin/Xmax) tuples in early pages.  Or does it do something
different?

I know maintenance_work_mem is used for storing TIDs of to-be-moved
tuples for index cleanup ... how does it relate to the above?

-- 
Alvaro Herrera ([EMAIL PROTECTED])
Crear es tan difĂ­cil como ser libre (Elsa Triolet)

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Question on REINDEX

2005-04-19 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Am I right in thinking that vacuum does at least two passes: one
 front-to-back to find removable tuples, and other back-to-front for
 movement?

VACUUM FULL, yes.  VACUUM only does the first one.

 I know maintenance_work_mem is used for storing TIDs of to-be-moved
 tuples for index cleanup ... how does it relate to the above?

TIDs of to-be-deleted tuples, actually.  Movable tuples aren't stored,
they're just found on-the-fly during the back-to-front pass.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Question on REINDEX

2005-04-18 Thread Josh Berkus
Bill,

 1) When is it necessary to run REINDEX or drop/create
 an index?  All I could really find in the docs is:

If you need to VACUUM FULL, you need to REINDEX as well.  For example, if you 
drop millions of rows from a table.

 2) If reindexing is necessary, how can this be done in
 a non-obtrusive way in a production environment.  Our
 database is being updated constantly.  REINDEX locks
 client apps out while in progress.  Same with CREATE
 INDEX when we drop/create.  The table can have over
 10 million row.  Recreating the indexes seems to take
 hours.  This is too long to lock the client apps out.
 Is there any other solution?

Better to up your max_fsm_pages and do regular VACUUMs regularly and 
frequently so that you don't have to REINDEX at all.


-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Question on REINDEX

2005-04-18 Thread Bruno Wolff III
On Mon, Apr 18, 2005 at 12:21:42 -0700,
  Bill Chandler [EMAIL PROTECTED] wrote:
 
 Running PostgreSQL 7.4.2 on Solaris.
 
 1) When is it necessary to run REINDEX or drop/create
 an index?  All I could really find in the docs is:
 
 In some situations it is worthwhile to rebuild
 indexes periodically with the REINDEX command. (There
 is also contrib/reindexdb which can reindex an entire
 database.) However, PostgreSQL 7.4 has substantially
 reduced the need for this activity compared to earlier
 releases.

In pathologic cases it is possible to have a lot of empty space on a lot
of your index pages. Reindexing would change that to a smaller number.
In earlier versions, I think it was possible to have completely empty
pages and this happened for patterns of use (new values monotonically
increasing, oldest values deleted first) that were actually seen in
practice.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Question on REINDEX

2005-04-18 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes:
 1) When is it necessary to run REINDEX or drop/create
 an index?  All I could really find in the docs is:

 If you need to VACUUM FULL, you need to REINDEX as well.  For example, if you
 drop millions of rows from a table.

That's probably a pretty good rule of thumb.  It's worth noting that
VACUUM FULL tends to actively bloat indexes, not reduce them in size,
because it has to create new index entries for the rows it moves before
it can delete the old ones.  So if a VACUUM FULL moves many rows you
are likely to see the indexes get bigger not smaller.

 Better to up your max_fsm_pages and do regular VACUUMs regularly and 
 frequently so that you don't have to REINDEX at all.

Yes, definitely.  Also consider using CLUSTER rather than VACUUM FULL
when you need to clean up after massive deletions from a table.  It's
not any less intrusive in terms of locking, but it's often faster and it
avoids the index bloat problem (since it effectively does a REINDEX).

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq