Re: Re: [HACKERS] Why frequently updated tables are an issue
I'm thinking that this subject has some cross-overs with another topic mentioned recently: improving VACUUM performance. Just to recap: Frequently updated tables are a problem because they grow significantly, with many useless row versions in there. The usual solution is to run frequent VACUUMs on that table, though even that isn't enough for some update/change rates because we need to scan the whole table rather than just the hot blocks. The ideas thrown around earlier this month were to create a "Cleaner" process that performed individual block-level VACUUMs on dirty blocks in shared_buffers, prior to their being bgwriten/checkpointed. (It wasn't called that at the time, IIRC, but was described more in terms of the CTID list that would be required to manage this). Such a Cleaner process could solve the problem of frequently updated tables by maintaining the number of row versions at a reasonably low level. There wasn't much excitement when the "speed-up vacuum" idea was discussed, including from myself. Thinking about the frequent-update problem makes me want to revisit this idea now, since it has a much clearer role in solving the frequent update problem. As I look at this now it sounds very much like the freelist problem looked a while back, with a similar-ish solution... Frequent updates will happen on blocks within the T2 cache, since they will be heavily accessed. The more frequently accessed, the more they need cleaning. This is actually the opposite end of the cache from the bgwriter, who is interested in writing blocks that aren't frequently used - to avoid having to continually write the same blocks. As a result, I'm thinking maybe we wouldn't want to have a queue to manage the list of blocks that need both Cleaning and writing. Discuss... Another way to do this is to find a way of identifying the specific blocks that require Cleaning. We could use an event to fire-off the cleaning action, such as when an update causes a block split. In that case, the older block could be added to the Cleaner's cleanlist. The Cleaner then works through the cleanlist, doing a block-level vacuum. That way the Cleaner would stay a reasonable number of row versions back from the point of update to make a VACUUM potentially useful, as well as being fired infrequently enough not to get in the way [since as Alvaro has been pointed out, the indexes need to be visited in addition to the main data block - which could introduce contention for index locks because the frequent updater is probably using an index to reach the data] Also, picking the block-split event would also mean stopping the frequent-updater just at the point where significant extra costs get incurred from all the additional row versions - when a row has versions across many blocks. I'm sure there are some faulted thoughts in all of that, but overall, I think that a Cleaner process to perform block-level VACUUMs becomes more viable when it has an important problem to solve. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Why frequently updated tables are an issue
Chris, Manfred, David, "Mohawk", > > What about a dblink style interface to a non-MVCC SQL database? ÂI > > think someone on this list mentioned that there are open source > > in-memory SQL databases. > > Funny you should mention this. ÂI'm working on a generalized one that > uses PL/PerlU at The problem with this approach (in PLPerl or otherwise) is that the connection overhead to the external data source will cost you far more than MVCC does. The only way it could be a benefit is if you could somehow hold the connecition "open" from query to query, which SPs and Functions are ill-suited for, being more-or-less "stateless". I've personally been looking at Memcached for this issue. However, the overhead of installing memcached is substantial, and I've not been able to test if the in-development PLPerlU allows me to cache the Memcached request object somehow. Andrew suggested a way but I don't have the software installed. Handling this at the middleware layer looks increasingly attractive. > Thanks to Josh Berkus for design ideas & implementation roadmap, and > to Josh Drake, Andrew Dunstan, Abhijit Menon-Sen and (of course) Tom > Lane for the infrastructure that makes this doable. :) You're welcome ... though I'd had in mind access to legacy data rather than caching of disposable results. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Why frequently updated tables are an issue
On Wed, Oct 20, 2004 at 07:10:35PM +0200, Manfred Spraul wrote: > [EMAIL PROTECTED] wrote a few months ago: > > >PostgreSQL's behavior on these cases is poor. I don't think anyone > >who has tried to use PG for this sort of thing will disagree, and > >yes it is getting better. Does anyone else consider this to be a > >problem? If so, I'm open for suggestions on what can be done. > >I've suggested a number of things, and admittedly they have all > >been pretty weak ideas, but they were potentially workable. > > > What about a dblink style interface to a non-MVCC SQL database? I > think someone on this list mentioned that there are open source > in-memory SQL databases. Funny you should mention this. I'm working on a generalized one that uses PL/PerlU at http://fetter.org/pg_ps/ Thanks to Josh Berkus for design ideas & implementation roadmap, and to Josh Drake, Andrew Dunstan, Abhijit Menon-Sen and (of course) Tom Lane for the infrastructure that makes this doable. :) Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote! ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Why frequently updated tables are an issue
[EMAIL PROTECTED] wrote a few months ago: PostgreSQL's behavior on these cases is poor. I don't think anyone who has tried to use PG for this sort of thing will disagree, and yes it is getting better. Does anyone else consider this to be a problem? If so, I'm open for suggestions on what can be done. I've suggested a number of things, and admittedly they have all been pretty weak ideas, but they were potentially workable. What about a dblink style interface to a non-MVCC SQL database? I think someone on this list mentioned that there are open source in-memory SQL databases. -- Manfred ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Why frequently updated tables are an issue
Is there a TODO here? No one explained why frozen tuples are important. --- Tom Lane wrote: > Jan Wieck <[EMAIL PROTECTED]> writes: > > On 6/12/2004 3:45 PM, Tom Lane wrote: > >> I don't think it would help very much to define a bit like that --- I > >> can't believe that very many pages would contain only frozen tuples, > >> unless you were to adopt an aggressive policy of using VACUUM FREEZE > >> a lot. > > > I thought this implies an aggressive policy of freezing everything by > > default. But I guess there is something I am not aware of that makes > > aggressive freezing a bad thing. > > Well, it means extra I/O to freeze tuples that you otherwise probably > never would. So it's not obvious that aggressive freezing in hopes of > saving cycles later is a win. > > >> It might be interesting though to have some kind of "fast vacuum" mode > >> that doesn't worry about freezing tuples, but only reclaiming dead ones. > > > Wouldn't that screw the current FSM population mechanisms? Not that my > > suggestions above wouldn't do that either :-) > > Yeah, that's another "wholesale" mechanism that we'd have to look at > refining. > > regards, tom lane > > ---(end of broadcast)--- > TIP 7: don't forget to increase your free space map settings > -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Why frequently updated tables are an issue
Jan Wieck wrote: > On 6/12/2004 3:45 PM, Tom Lane wrote: > > > Jan Wieck <[EMAIL PROTECTED]> writes: > >> But a per relation bitmap that tells if a block is a) free of dead > >> tuples and b) all remaining tuples in it are frozen could be used to let > >> vacuum skip them (there can't be anything to do). The bit would get > >> reset whenever the block is marked dirty. This would cause vacuum to > >> look at mainly recently touched blocks, likely to be found in the buffer > >> cache anyway and thus dramatically reduce the amount of IO and thereby > >> make high frequent vacuuming less expensive. > > > > I don't think it would help very much to define a bit like that --- I > > can't believe that very many pages would contain only frozen tuples, > > unless you were to adopt an aggressive policy of using VACUUM FREEZE > > a lot. > > I thought this implies an aggressive policy of freezing everything by > default. But I guess there is something I am not aware of that makes > aggressive freezing a bad thing. Why are frozen tuples significant? I assume any page that has no dead tuples could be skipped by vacuum. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Why frequently updated tables are an issue
Jan Wieck <[EMAIL PROTECTED]> writes: > On 6/12/2004 3:45 PM, Tom Lane wrote: >> I don't think it would help very much to define a bit like that --- I >> can't believe that very many pages would contain only frozen tuples, >> unless you were to adopt an aggressive policy of using VACUUM FREEZE >> a lot. > I thought this implies an aggressive policy of freezing everything by > default. But I guess there is something I am not aware of that makes > aggressive freezing a bad thing. Well, it means extra I/O to freeze tuples that you otherwise probably never would. So it's not obvious that aggressive freezing in hopes of saving cycles later is a win. >> It might be interesting though to have some kind of "fast vacuum" mode >> that doesn't worry about freezing tuples, but only reclaiming dead ones. > Wouldn't that screw the current FSM population mechanisms? Not that my > suggestions above wouldn't do that either :-) Yeah, that's another "wholesale" mechanism that we'd have to look at refining. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Why frequently updated tables are an issue
On 6/12/2004 3:45 PM, Tom Lane wrote: Jan Wieck <[EMAIL PROTECTED]> writes: But a per relation bitmap that tells if a block is a) free of dead tuples and b) all remaining tuples in it are frozen could be used to let vacuum skip them (there can't be anything to do). The bit would get reset whenever the block is marked dirty. This would cause vacuum to look at mainly recently touched blocks, likely to be found in the buffer cache anyway and thus dramatically reduce the amount of IO and thereby make high frequent vacuuming less expensive. I don't think it would help very much to define a bit like that --- I can't believe that very many pages would contain only frozen tuples, unless you were to adopt an aggressive policy of using VACUUM FREEZE a lot. I thought this implies an aggressive policy of freezing everything by default. But I guess there is something I am not aware of that makes aggressive freezing a bad thing. It might be interesting though to have some kind of "fast vacuum" mode that doesn't worry about freezing tuples, but only reclaiming dead ones. This could look at only recently touched pages, with perhaps the cooperation of the bgwriter to keep track of candidate pages. (You'd still have to do full-database scans for freezable tuples occasionally ... but not very often.) Wouldn't that screw the current FSM population mechanisms? Not that my suggestions above wouldn't do that either :-) The main thing I don't see how to handle efficiently is getting rid of the index entries for dead rows. The present indexam API is designed for getting rid of index entries wholesale, but you'd need something that works better for retail removal to approach vacuuming this way. Which is a problem so fundamentally embedded in the index implementation itself that is sounds more like a wholesale replacement of the index am than some nifty changes. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Why frequently updated tables are an issue
Jan Wieck <[EMAIL PROTECTED]> writes: > But a per relation bitmap that tells if a block is a) free of dead > tuples and b) all remaining tuples in it are frozen could be used to let > vacuum skip them (there can't be anything to do). The bit would get > reset whenever the block is marked dirty. This would cause vacuum to > look at mainly recently touched blocks, likely to be found in the buffer > cache anyway and thus dramatically reduce the amount of IO and thereby > make high frequent vacuuming less expensive. I don't think it would help very much to define a bit like that --- I can't believe that very many pages would contain only frozen tuples, unless you were to adopt an aggressive policy of using VACUUM FREEZE a lot. It might be interesting though to have some kind of "fast vacuum" mode that doesn't worry about freezing tuples, but only reclaiming dead ones. This could look at only recently touched pages, with perhaps the cooperation of the bgwriter to keep track of candidate pages. (You'd still have to do full-database scans for freezable tuples occasionally ... but not very often.) The main thing I don't see how to handle efficiently is getting rid of the index entries for dead rows. The present indexam API is designed for getting rid of index entries wholesale, but you'd need something that works better for retail removal to approach vacuuming this way. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Why frequently updated tables are an issue
On 6/10/2004 10:37 AM, Shridhar Daithankar wrote: [EMAIL PROTECTED] wrote: The session table is a different issue, but has the same problems. You have an active website, hundreds or thousands of hits a second, and you want to manage sessions for this site. Sessions are created, updated many times, and deleted. Performance degrades steadily until a vacuum. Vacuum has to be run VERY frequently. Prior to lazy vacuum, this was impossible. Both session tables and summary tables have another thing in common, they are not vital data, they hold transitive state information. Yea, sure, data integrity is important, but if you lose these values, you can either recreate it or it isn't too important. Why put that is a database at all? Because, in the case of sessions especially, you need to access this information for other operations. In the case of summary tables, OLAP usually needs to join or include this info. PostgreSQL's behavior on these cases is poor. I don't think anyone who has tried to use PG for this sort of thing will disagree, and yes it is getting better. Does anyone else consider this to be a problem? If so, I'm open for suggestions on what can be done. I've suggested a number of things, and admittedly they have all been pretty weak ideas, but they were potentially workable. There is another as-of-non-feasible and hence rejected approach. Vacuum in postgresql is tied to entire relations/objects since indexes do not have transaction visibility information. It has been suggested in past to add such a visibility to index tuple header so that index and heaps can be cleaned out of order. In such a case other backround processes such as background writer and soon-to-be integrated autovacuum daemon can vacuum pages/buffers rather than relations. That way most used things will remain clean and cost of cleanup will remain outside crtical transaction processing path. This is not feasable because at the time you update or delete a row you would have to visit all it's index entries. The performance impact on that would be immense. But a per relation bitmap that tells if a block is a) free of dead tuples and b) all remaining tuples in it are frozen could be used to let vacuum skip them (there can't be anything to do). The bit would get reset whenever the block is marked dirty. This would cause vacuum to look at mainly recently touched blocks, likely to be found in the buffer cache anyway and thus dramatically reduce the amount of IO and thereby make high frequent vacuuming less expensive. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Why frequently updated tables are an issue
Glen Parker wrote: It has been suggested in past to add such a visibility to index tuple header so that index and heaps can be cleaned out of order. In such a case other backround It seems to me that the benefit of this wouldn't be all that impressive *when accessing the cache*, which is the problem this discussion is about. I doubt. I have seen examnples on general list where people have thousands of dead *pages* for few hundred live tuples. If it is a problem with cache access, it will spill to disk as the problem grows. I don't think postgresql cache is that bad. No matter how you do it RAM is RAM. Problem is with disk bandwidth. See past discussions about vacuum delay patch and improvement it brought around. Vacuum costs disk bandwidth and that affects performance. That remains a fact. Disk access would occur more commonly with large tables, which I'll ignore. Let's say total scan time for a query on a very dirty table is 100ms. It seems safe to assume that the scan time for the index would be *roughly* half that of the heap. If visibilty could be determined by looking at just the index tuple, you'd cut you query scan time down to 50ms. When the clean table case is 7ms total scan time, the difference between 50 and 100 ms is not much of an issue; either way, it's still way to high! However increasing index footprint seems to be a tough sell. And rightly so, IMO. Mee too. Unless somebody comes up with patch that demonstrates the improvement. Obviously people can live with cost of mandatory vacuum so this is not high priority. But one day it will be. OTOH if the perceived benefit is not there, at least it is proven that it is not there. I plan to do it when I find time. But again, I don't face the problem myself(I don't even use postgresql for anything important for that matter) so haven't bothered spending any time on it. As long as it is not high priority, it is going to be a tough sell. Thats not unusual. Shridhar ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Why frequently updated tables are an issue
> It has been suggested in past to add such a visibility to index > tuple header so > that index and heaps can be cleaned out of order. In such a case > other backround It seems to me that the benefit of this wouldn't be all that impressive *when accessing the cache*, which is the problem this discussion is about. Disk access would occur more commonly with large tables, which I'll ignore. Let's say total scan time for a query on a very dirty table is 100ms. It seems safe to assume that the scan time for the index would be *roughly* half that of the heap. If visibilty could be determined by looking at just the index tuple, you'd cut you query scan time down to 50ms. When the clean table case is 7ms total scan time, the difference between 50 and 100 ms is not much of an issue; either way, it's still way to high! > However increasing index footprint seems to be a tough sell. And rightly so, IMO. Glen Parker ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Why frequently updated tables are an issue
[EMAIL PROTECTED] wrote: The session table is a different issue, but has the same problems. You have an active website, hundreds or thousands of hits a second, and you want to manage sessions for this site. Sessions are created, updated many times, and deleted. Performance degrades steadily until a vacuum. Vacuum has to be run VERY frequently. Prior to lazy vacuum, this was impossible. Both session tables and summary tables have another thing in common, they are not vital data, they hold transitive state information. Yea, sure, data integrity is important, but if you lose these values, you can either recreate it or it isn't too important. Why put that is a database at all? Because, in the case of sessions especially, you need to access this information for other operations. In the case of summary tables, OLAP usually needs to join or include this info. PostgreSQL's behavior on these cases is poor. I don't think anyone who has tried to use PG for this sort of thing will disagree, and yes it is getting better. Does anyone else consider this to be a problem? If so, I'm open for suggestions on what can be done. I've suggested a number of things, and admittedly they have all been pretty weak ideas, but they were potentially workable. There is another as-of-non-feasible and hence rejected approach. Vacuum in postgresql is tied to entire relations/objects since indexes do not have transaction visibility information. It has been suggested in past to add such a visibility to index tuple header so that index and heaps can be cleaned out of order. In such a case other backround processes such as background writer and soon-to-be integrated autovacuum daemon can vacuum pages/buffers rather than relations. That way most used things will remain clean and cost of cleanup will remain outside crtical transaction processing path. However increasing index footprint seems to be a tough sell. Besides FSM would need some rework to accomodate/autotune it's behaviour. I am quoting from memory, so don't flame me if I misquote it. Just adding to make this complete. Only from performance point of view, it could solve quite some problems, at least in theory. Shridhar ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Why frequently updated tables are an issue
On Jun 10, 2004, at 10:30 AM, [EMAIL PROTECTED] wrote: Prior to lazy vacuum, this was impossible. Do you know for sure that lazy vacuum and/or autovacuum does not indeed solve / alleviate the symptoms of the general problem of very high rate table updates? Back to lurking! James Robinson Socialserve.com ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[HACKERS] Why frequently updated tables are an issue
OK, the problem I am having with whole discussion, on several fronts, is the idea of performance. If performance and consistent behavior were not *important* issues to a project, a summary table would work fine, and I could just vacuum frequently. Currently a client needs to vacuum two summary tables at least once every two seconds. The performace of the system slowly declines with each summary update, until the next vacuum. After a vacuum, the transaction comes in at about 7ms, it increases to about 35ms~50ms, then we vacuum and we've back to 7ms. When we vacuumed every 30 seconds, it would sometimes get up to whole seconds. There is an important issue here. Yes, MVCC is good. I agree, and no one is arguing against it in a general case, however, there are classes of problems in which MVCC, or at least PostgreSQL's implementation of it, is not the best solution. There are two basic problems which are fundimental issues I've had with PostgreSQL over the years: summary tables and session tables. The summary tables take the place of a "select sum(col) from table" where table is very small. The amount of vacuuming required and the steady degradation of performance prior to each vacuum is a problem that could be addressed by some global variable system. The session table is a different issue, but has the same problems. You have an active website, hundreds or thousands of hits a second, and you want to manage sessions for this site. Sessions are created, updated many times, and deleted. Performance degrades steadily until a vacuum. Vacuum has to be run VERY frequently. Prior to lazy vacuum, this was impossible. Both session tables and summary tables have another thing in common, they are not vital data, they hold transitive state information. Yea, sure, data integrity is important, but if you lose these values, you can either recreate it or it isn't too important. Why put that is a database at all? Because, in the case of sessions especially, you need to access this information for other operations. In the case of summary tables, OLAP usually needs to join or include this info. PostgreSQL's behavior on these cases is poor. I don't think anyone who has tried to use PG for this sort of thing will disagree, and yes it is getting better. Does anyone else consider this to be a problem? If so, I'm open for suggestions on what can be done. I've suggested a number of things, and admittedly they have all been pretty weak ideas, but they were potentially workable. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings