Re: [PERFORM] Question on REINDEX
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Am I right in thinking that vacuum does at least two passes: one > front-to-back to find removable tuples, and other back-to-front for > movement? VACUUM FULL, yes. VACUUM only does the first one. > I know maintenance_work_mem is used for storing TIDs of to-be-moved > tuples for index cleanup ... how does it relate to the above? TIDs of to-be-deleted tuples, actually. Movable tuples aren't stored, they're just found on-the-fly during the back-to-front pass. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Question on REINDEX
On Tue, Apr 19, 2005 at 10:06:40AM -0400, Tom Lane wrote: > BTW, VACUUM FULL does the data movement back-to-front, and stops as soon > as it finds a tuple it cannot move down; which is a reasonable strategy > since the goal is merely to make the file shorter. But it's entirely > likely that there will be lots of empty space left at the end. For > instance the final state could have one 4K tuple in the last page and > up to 4K-1 free bytes in every earlier page. Am I right in thinking that vacuum does at least two passes: one front-to-back to find removable tuples, and other back-to-front for movement? Because if it doesn't work this way, it wouldn't relabel (change Xmin/Xmax) tuples in early pages. Or does it do something different? I know maintenance_work_mem is used for storing TIDs of to-be-moved tuples for index cleanup ... how does it relate to the above? -- Alvaro Herrera (<[EMAIL PROTECTED]>) "Crear es tan difĂcil como ser libre" (Elsa Triolet) ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Question on REINDEX
Josh Berkus writes: >> Not at all. What it says is that you expect 100% of the pages to have >> useful amounts of free space, which is a *much* weaker criterion. > H actually, it seems like, if you are vacuuming regularly, you only > *do* need to track pages that have been touched by DELETE or UPDATE. Other > pages would have already been vacuumed and not have any useful free space > left. Yes? Well, the space has to be remembered until it's reused. On the other hand, there's nothing that says FSM has to be aware of all the free space available at all times --- the real criterion to avoid bloat is that after a VACUUM, enough space is logged in FSM to satisfy all the insertions that will happen before the next VACUUM. So you could have situations where free space is temporarily forgotten (for lack of slots in FSM), but other free space gets used instead, and eventually a later VACUUM re-finds that free space and puts it into FSM. I think it's true that the more often you vacuum, the less FSM you need, but this doesn't have much to do with how much free space is actually out there on disk. It's because you only need enough FSM to record the free space you'll need until the next vacuum. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Question on REINDEX
Tom, > Not at all. What it says is that you expect 100% of the pages to have > useful amounts of free space, which is a *much* weaker criterion. Hmmm. Good point. This seems to be another instance where my rule-of-thumb was based on false logic but nevertheless arrived at correct numbers. I've seldom, if ever, set FSM_pages above 50% of the pages in the active database ... and never run out. H actually, it seems like, if you are vacuuming regularly, you only *do* need to track pages that have been touched by DELETE or UPDATE. Other pages would have already been vacuumed and not have any useful free space left. Yes? -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Question on REINDEX
Josh Berkus writes: >> select count(1), sum(relpages) from pg_class where relkind in >> ('r','i','t') > Well, if you do that for all databases in the cluster, it's the number you > start with. However, setting FSM_pages to that would be assuming that you > excpected 100% of the rows to be replaced by UPDATES or DELETEs before you > ran VACUUM. I generally run VACUUM a little sooner than that. Not at all. What it says is that you expect 100% of the pages to have useful amounts of free space, which is a *much* weaker criterion. I think you can usually get away with setting max_fsm_pages to less than your actual disk footprint, but I'm not sure how much less. It'd probably depend a lot on your usage pattern --- for instance, insert-only history tables don't need any FSM space. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Question on REINDEX
Tambet, > Hmm, thanks for a tip. BTW, is output of > > select count(1), sum(relpages) from pg_class where relkind in > ('r','i','t') Well, if you do that for all databases in the cluster, it's the number you start with. However, setting FSM_pages to that would be assuming that you excpected 100% of the rows to be replaced by UPDATES or DELETEs before you ran VACUUM. I generally run VACUUM a little sooner than that. See the end portion of: http://www.powerpostgresql.com/PerfList -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Question on REINDEX
"Tambet Matiisen" <[EMAIL PROTECTED]> writes: > Is my current understanding correct: > 1) VACUUM defragments each page locally - moves free space to the end of > page. > 2) VACUUM FULL defragments table globally - tries to fill up all > partially free pages and deletes all resulting empty pages. Both versions of VACUUM do within-page defragmentation. Also, both versions will remove entirely-empty pages at the end of a table. The difference is that VACUUM FULL actively attempts to make pages at the end empty, by moving their contents into free space in earlier pages. Plain VACUUM never does cross-page data movement, which is how come it doesn't need as strong a lock. BTW, VACUUM FULL does the data movement back-to-front, and stops as soon as it finds a tuple it cannot move down; which is a reasonable strategy since the goal is merely to make the file shorter. But it's entirely likely that there will be lots of empty space left at the end. For instance the final state could have one 4K tuple in the last page and up to 4K-1 free bytes in every earlier page. > 3) Both VACUUM and VACUUM FULL do only local defragment for indexes. > 4) If you want indexes to become fully defragmented, you need to > REINDEX. I don't think "defragment" is a notion that applies to indexes, at least not in the same way as for tables. It's true that there is no cross-page data movement in either case. In the last release or two we've been able to recognize and recycle entirely-empty pages in both btree and hash indexes, but such pages are almost never returned to the OS; they're put on a freelist for re-use within the index, instead. If you allow the table to grow to much more than its "normal" size, ie, you allow many dead tuples to be formed, then getting back to "normal" size is going to require VACUUM FULL + REINDEX (or you can use CLUSTER or some varieties of ALTER TABLE). This is not the recommended maintenance process however. Sufficiently frequent plain VACUUMs should generally hold the free space to a tolerable level without requiring any exclusive locking. > Hmm, thanks for a tip. BTW, is output of > select count(1), sum(relpages) from pg_class where relkind in > ('r','i','t') > good estimate for max_fsm_relations and max_fsm_pages? Within that one database, yes --- don't forget you must sum these numbers across all DBs in the cluster. Also you need some slop in the max_fsm_pages setting because of quantization in the space usage. It's probably easier to let VACUUM VERBOSE do the calculation for you. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Question on REINDEX
> > Josh Berkus writes: > >> 1) When is it necessary to run REINDEX or drop/create > >> an index? All I could really find in the docs is: > > > If you need to VACUUM FULL, you need to REINDEX as well. > For example, > > if you drop millions of rows from a table. > > That's probably a pretty good rule of thumb. It's worth > noting that VACUUM FULL tends to actively bloat indexes, not > reduce them in size, because it has to create new index > entries for the rows it moves before it can delete the old > ones. So if a VACUUM FULL moves many rows you are likely to > see the indexes get bigger not smaller. > Is my current understanding correct: 1) VACUUM defragments each page locally - moves free space to the end of page. 2) VACUUM FULL defragments table globally - tries to fill up all partially free pages and deletes all resulting empty pages. 3) Both VACUUM and VACUUM FULL do only local defragment for indexes. 4) If you want indexes to become fully defragmented, you need to REINDEX. If you happen to use triggers for denormalization, like I do, then you have a lot of updates, which means that tables and indexes become quicky cluttered with pages, which contain mostly dead tuples. If those tables and indexes fill up shared buffers, then PostgreSQL slows down, because it has to do a lot more IO than normal. Regular VACUUM FULL helped, but I needed REINDEX as well, otherwise indexes grew bigger than tables itself! > > Better to up your max_fsm_pages and do regular VACUUMs regularly and > > frequently so that you don't have to REINDEX at all. > > Yes, definitely. Also consider using CLUSTER rather than > VACUUM FULL when you need to clean up after massive deletions > from a table. It's not any less intrusive in terms of > locking, but it's often faster and it avoids the index bloat > problem (since it effectively does a REINDEX). > Hmm, thanks for a tip. BTW, is output of select count(1), sum(relpages) from pg_class where relkind in ('r','i','t') good estimate for max_fsm_relations and max_fsm_pages? Are these parameters used only during VACUUM or in runtime too? Tambet ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Question on REINDEX
Josh Berkus writes: >> 1) When is it necessary to run REINDEX or drop/create >> an index? All I could really find in the docs is: > If you need to VACUUM FULL, you need to REINDEX as well. For example, if you > drop millions of rows from a table. That's probably a pretty good rule of thumb. It's worth noting that VACUUM FULL tends to actively bloat indexes, not reduce them in size, because it has to create new index entries for the rows it moves before it can delete the old ones. So if a VACUUM FULL moves many rows you are likely to see the indexes get bigger not smaller. > Better to up your max_fsm_pages and do regular VACUUMs regularly and > frequently so that you don't have to REINDEX at all. Yes, definitely. Also consider using CLUSTER rather than VACUUM FULL when you need to clean up after massive deletions from a table. It's not any less intrusive in terms of locking, but it's often faster and it avoids the index bloat problem (since it effectively does a REINDEX). regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Question on REINDEX
On Mon, Apr 18, 2005 at 12:21:42 -0700, Bill Chandler <[EMAIL PROTECTED]> wrote: > > Running PostgreSQL 7.4.2 on Solaris. > > 1) When is it necessary to run REINDEX or drop/create > an index? All I could really find in the docs is: > > "In some situations it is worthwhile to rebuild > indexes periodically with the REINDEX command. (There > is also contrib/reindexdb which can reindex an entire > database.) However, PostgreSQL 7.4 has substantially > reduced the need for this activity compared to earlier > releases." In pathologic cases it is possible to have a lot of empty space on a lot of your index pages. Reindexing would change that to a smaller number. In earlier versions, I think it was possible to have completely empty pages and this happened for patterns of use (new values monotonically increasing, oldest values deleted first) that were actually seen in practice. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Question on REINDEX
Bill, > 1) When is it necessary to run REINDEX or drop/create > an index? All I could really find in the docs is: If you need to VACUUM FULL, you need to REINDEX as well. For example, if you drop millions of rows from a table. > 2) If reindexing is necessary, how can this be done in > a non-obtrusive way in a production environment. Our > database is being updated constantly. REINDEX locks > client apps out while in progress. Same with "CREATE > INDEX" when we drop/create. The table can have over > 10 million row. Recreating the indexes seems to take > hours. This is too long to lock the client apps out. > Is there any other solution? Better to up your max_fsm_pages and do regular VACUUMs regularly and frequently so that you don't have to REINDEX at all. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Question on REINDEX
All, A couple of questions regarding REINDEX command: Running PostgreSQL 7.4.2 on Solaris. 1) When is it necessary to run REINDEX or drop/create an index? All I could really find in the docs is: "In some situations it is worthwhile to rebuild indexes periodically with the REINDEX command. (There is also contrib/reindexdb which can reindex an entire database.) However, PostgreSQL 7.4 has substantially reduced the need for this activity compared to earlier releases." What are these situations? We have a database with some large tables. Currently we reindex (actually drop/create) nightly. But as the tables have grown this has become prohibitively time-consuming. According to the above comment it may not be necessary at all. 2) If reindexing is necessary, how can this be done in a non-obtrusive way in a production environment. Our database is being updated constantly. REINDEX locks client apps out while in progress. Same with "CREATE INDEX" when we drop/create. The table can have over 10 million row. Recreating the indexes seems to take hours. This is too long to lock the client apps out. Is there any other solution? thanks, Bill __ Do you Yahoo!? Make Yahoo! your home page http://www.yahoo.com/r/hs ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly