Re: [HACKERS] [Proposal] More Vacuum Statistics
Hello, >Maybe, For DBAs, >It might be better to show vacuum progress in pg_stat_activity. >(if we'd do, add a free-style column like "progress" ?) This column might also >be able to use for other long time commands like ANALYZE, CREATE/RE INDEX and >COPY. To realize this feature, we certainly need to properly change >pgstat_report_activity, use it more and add a new track-activity parameter Very similar idea was proposed in the following http://www.postgresql.org/message-id/1284756643.25048.42.ca...@vanquo.pezone.net IIUC, problem with showing progress in pg_stat_activity is that it introduces compulsary progress calculation overhead in core for every command. As work units of each command varies, common infrastructure might not be able to represent every command progress effectively. An architecture which will display progress only on users demand for each command separately will be more efficient. So, suggestion was rather to have a detailed progress report including "remaining time" for a command on users demand. FWIW, I am working on designing an approach to report VACUUM progress stats for which I will be posting a detailed proposal. The use case is reporting progress for long running VACUUMs. The approach involves using hooks to extract VACUUM progress statistics . The progress can be displayed using psql view (ex. pg_stat_maintenance). Thank you, Rahila Syed -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Naoya Anzai Sent: Tuesday, June 16, 2015 8:41 AM To: Tomas Vondra Cc: pgsql-hackers@postgresql.org; Akio Iwaasa; bench.cof...@gmail.com; Tom Lane; Jeff Janes; Jim Nasby; Andres Freund; Alvaro Herrera Subject: Re: [HACKERS] [Proposal] More Vacuum Statistics Hi, Thank you for comments. and Sorry for my late response. >> >> pg_stat_vacuum view >> >> >> I understand it is not good to simply add more counters in >> pg_stat_*_tables. For now, I'd like to suggest an extension which can >> confirm vacuum statistics like pg_stat_statements. >> >> Similar feature has been already provided by pg_statsinfo package. >> But it is a full-stack package for PG-stats and it needs to redesign >> pg_log and design a repository database for introduce. >> And it is not a core-extension for PostgreSQL. >> (I don't intend to hate pg_statsinfo, >> I think this package is a very convinient tool) >> >> Everyone will be able to do more easily tuning of VACUUM. >> That's all I want. > >I'm still wondering whether these stats will really make the tuning any >easier. What I do right now is looking at pg_stat_all_tables.n_deat_tup >and if it exceeds some threshold, it's a sign that vacuum may need a >bit of tuning. Sometimes it really requires tuning vacuum itself, but >more often than not it's due to something else (a large bulk delete, >autovacuum getting stuck on another table, ...). I don't see how the >new stats would make this any easier. > >Can you give some examples on how the new stats might be used (and >where the current stats are insufficient)? What use cases do you >imagine for those stats? pg_stat_vacuum can keep histories of vacuum statistics for each tables/indices into shared memory.(They are not only last vacuum. This is already able to confirm using pg_stat_all_tables.) It makes easier analysis of vacuum histories because this view can sort or aggregate or filter. My use cases for those stats are following. - examine TRANSITION of vacuum execution time on any table (you can predict the future vacuum execution time) - examine EXECUTION INTERVAL of vacuum for each table (if too frequent, it should make vacuum-threshold tuning to up) - examine REST of dead-tuples just after vacuum (if dead-tuples remain, it may be due to any idle in transaction sessions) > >It might help differentiate the autovacuum activity from the rest of >the system (e.g. there's a lot of I/O going on - how much of that is >coming from autovacuum workers?). This would however require a more >fine-grained reporting, because often the vacuums run for a very long >time, especially on very large tables (which is exactly the case when >this might be handy) - I just had a VACUUM that ran for 12 hours. These >jobs should report the stats incrementally, not just once at the very >end, because that makes it rather useless IMNSHO. +1 Certainly, VACUUM have often much execution time, I just had too. At present, we cannot predict when this vacuum finishes, what this vacuum is doing now, and whether this vacuum have any problem or not. Maybe, For DBAs, It might be better to show vacuum progress in pg_stat_activity. (if we'd do, a
Re: [HACKERS] [Proposal] More Vacuum Statistics
Hi, Thank you for comments. and Sorry for my late response. >> >> pg_stat_vacuum view >> >> >> I understand it is not good to simply add more counters in >> pg_stat_*_tables. For now, I'd like to suggest an extension >> which can confirm vacuum statistics like pg_stat_statements. >> >> Similar feature has been already provided by pg_statsinfo package. >> But it is a full-stack package for PG-stats and it needs to >> redesign pg_log and design a repository database for introduce. >> And it is not a core-extension for PostgreSQL. >> (I don't intend to hate pg_statsinfo, >> I think this package is a very convinient tool) >> >> Everyone will be able to do more easily tuning of VACUUM. >> That's all I want. > >I'm still wondering whether these stats will really make the tuning any >easier. What I do right now is looking at pg_stat_all_tables.n_deat_tup >and if it exceeds some threshold, it's a sign that vacuum may need a bit >of tuning. Sometimes it really requires tuning vacuum itself, but more >often than not it's due to something else (a large bulk delete, >autovacuum getting stuck on another table, ...). I don't see how the new >stats would make this any easier. > >Can you give some examples on how the new stats might be used (and where >the current stats are insufficient)? What use cases do you imagine for >those stats? pg_stat_vacuum can keep histories of vacuum statistics for each tables/indices into shared memory.(They are not only last vacuum. This is already able to confirm using pg_stat_all_tables.) It makes easier analysis of vacuum histories because this view can sort or aggregate or filter. My use cases for those stats are following. - examine TRANSITION of vacuum execution time on any table (you can predict the future vacuum execution time) - examine EXECUTION INTERVAL of vacuum for each table (if too frequent, it should make vacuum-threshold tuning to up) - examine REST of dead-tuples just after vacuum (if dead-tuples remain, it may be due to any idle in transaction sessions) > >It might help differentiate the autovacuum activity from the rest of the >system (e.g. there's a lot of I/O going on - how much of that is coming >from autovacuum workers?). This would however require a more >fine-grained reporting, because often the vacuums run for a very long >time, especially on very large tables (which is exactly the case when >this might be handy) - I just had a VACUUM that ran for 12 hours. These >jobs should report the stats incrementally, not just once at the very >end, because that makes it rather useless IMNSHO. +1 Certainly, VACUUM have often much execution time, I just had too. At present, we cannot predict when this vacuum finishes, what this vacuum is doing now, and whether this vacuum have any problem or not. Maybe, For DBAs, It might be better to show vacuum progress in pg_stat_activity. (if we'd do, add a free-style column like "progress" ?) This column might also be able to use for other long time commands like ANALYZE, CREATE/RE INDEX and COPY. To realize this feature, we certainly need to properly change pgstat_report_activity, use it more and add a new track-activity parameter. Regards, Anzai Naoya --- Naoya Anzai Engineering Department NEC Solution Inovetors, Ltd. E-Mail: nao-an...@xc.jp.nec.com --- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Proposal] More Vacuum Statistics
Hi, On 06/05/15 14:10, Naoya Anzai wrote: Thank you for quick feedback, and I'm sorry for slow response. All of your opinions were very helpful for me. I have confirmed Greg's Idea "Timing events". http://www.postgresql.org/message-id/509300f7.5000...@2ndquadrant.com Greg said at first, "Parsing log files for commonly needed performance data is no fun." Yes, I completely agree with him. > That looks a nice idea but I don't know why this idea has not been commited yet. Anybody knows? Most likely lack of time, I guess. I have reworked my idea since I heard dear hacker's opinions. pg_stat_vacuum view I understand it is not good to simply add more counters in pg_stat_*_tables. For now, I'd like to suggest an extension which can confirm vacuum statistics like pg_stat_statements. I don't see how you want to collect the necessary information from an extension? pg_stat_statements get most of the stats from BufferUsage structure, but vacuum keeps all this internal, AFAIK. So it'd be necessary to make this somehow public - either by creating something like BufferUsage with all the vacuum stats, or perhaps a set of callbacks (either a single log_relation_vacuum or different callbacks for tables and indexs). IMHO the callbacks are a better idea - for example because it naturally handles database-wide vacuum. The global structure makes this difficult, because you'll only see data for all the vacuumed objects (or it'd have to track per-object stats internally, somehow). VACUUM is a most important feature in PostgreSQL, but a special view for vacuum does not exist. Don't you think the fact is inconvenience? At least, I am disgruntled with that we need to parse pg_log for tune VACUUM. +1 My first design of pg_stat_vacuum view is following. (There are two views.) pg_stat_vacuum_table --- dbid schemaname relid relname elapsed page_removed page_remain page_skipped tuple_removed tuple_remain tuple_notremovable buffer_hit buffer_miss buffer_dirty avg_read avg_write vm_count vac_start vac_end is_autovacuum pg_stat_vacuum_index --- dbid shemaname relid indexrelid indexname elapsed num_index_tuples num_pages tuples_removed pages_deleted pages_free is_autovacuum At present, I think memory design of pg_stat_statements can divert into this feature.And I think this module needs to prepare following parameters like pg_stat_statements. I'm not really sure about this. Firstly, the very fist response from TL in this thread was that adding per-table counters is not a particularly good idea, as it'll bloat the statistics files. It's true you're not adding the data into the main stats files, but you effectively establish a new 'vertical partition' with one record per table/index. It might be worth the overhead, if it really brings useful functionality (especially if it's opt-in feature, like pg_stat_statements). Secondly, the main issue of this design IMHO is that it only tracks the very last vacuum run (or do I understand it wrong?). That means even if you snapshot the pg_stat_vacuum views, you'll not know how many vacuums executed in between (and the more frequently you snapshot that, the greater the overhead). The other stats counters have the same issue, but the snapshotting works a bit better because the counters are cumulative (so you can easily do deltas etc.). But that's not the case here - certainly not with the timestamps, for example. I don't think the vacuum start/end timestamps are particularly interesting, TBH - we already have them in pg_stat_all_tables anyway, including the vacuum_count etc. So I'd propose dropping the timestamps, possibly replacing them with a single 'elapsed time', and making all the counters cumulative (so that you can do snapshots and deltas). I'm also wondering whether this should track the vacuum costs (because that determines how aggressive the vacuum is, and how much work will be done in a particular time), if it was anti-wraparound vacuum, if there was also ANALYZE performed, if the autovacuum was interrupted because of user activity, etc. pg_stat_vacuum.max(integer) pg_stat_vacuum.save(boolean) pg_stat_vacuum.excluded_dbnames(text) pg_stat_vacuum.excluded_schemas(text) pg_stat_vacuum.min_duration(integer) ... and so on. To implement this feature, I have to collect each vacuum-stats every lazy_vacuum_* and I need to embed a hook function point where needed. (probably last point of lazy_vacuum_rel). Do you hesitate to add the hook only for this function? Aha! So you plan to use the callbacks. Similar feature has been already provided by pg_statsinfo package. But it is a full-stack package for PG-stats and it needs to redesign pg_log and design a repository database for introduce. And it is not a core-extension for PostgreSQL. (I don't intend to hate pg_statsinfo, I think this package is a very convinient tool) Everyone will be able to do more easily tuning o
Re: [HACKERS] [Proposal] More Vacuum Statistics
Thank you for quick feedback, and I'm sorry for slow response. All of your opinions were very helpful for me. I have confirmed Greg's Idea "Timing events". http://www.postgresql.org/message-id/509300f7.5000...@2ndquadrant.com Greg said at first, "Parsing log files for commonly needed performance data is no fun." Yes, I completely agree with him. That looks a nice idea but I don't know why this idea has not been commited yet. Anybody knows? I have reworked my idea since I heard dear hacker's opinions. pg_stat_vacuum view I understand it is not good to simply add more counters in pg_stat_*_tables. For now, I'd like to suggest an extension which can confirm vacuum statistics like pg_stat_statements. VACUUM is a most important feature in PostgreSQL, but a special view for vacuum does not exist. Don't you think the fact is inconvenience? At least, I am disgruntled with that we need to parse pg_log for tune VACUUM. My first design of pg_stat_vacuum view is following. (There are two views.) pg_stat_vacuum_table --- dbid schemaname relid relname elapsed page_removed page_remain page_skipped tuple_removed tuple_remain tuple_notremovable buffer_hit buffer_miss buffer_dirty avg_read avg_write vm_count vac_start vac_end is_autovacuum pg_stat_vacuum_index --- dbid shemaname relid indexrelid indexname elapsed num_index_tuples num_pages tuples_removed pages_deleted pages_free is_autovacuum At present, I think memory design of pg_stat_statements can divert into this feature.And I think this module needs to prepare following parameters like pg_stat_statements. pg_stat_vacuum.max(integer) pg_stat_vacuum.save(boolean) pg_stat_vacuum.excluded_dbnames(text) pg_stat_vacuum.excluded_schemas(text) pg_stat_vacuum.min_duration(integer) ... and so on. To implement this feature, I have to collect each vacuum-stats every lazy_vacuum_* and I need to embed a hook function point where needed. (probably last point of lazy_vacuum_rel). Do you hesitate to add the hook only for this function? Similar feature has been already provided by pg_statsinfo package. But it is a full-stack package for PG-stats and it needs to redesign pg_log and design a repository database for introduce. And it is not a core-extension for PostgreSQL. (I don't intend to hate pg_statsinfo, I think this package is a very convinient tool) Everyone will be able to do more easily tuning of VACUUM. That's all I want. Any comments are welcome! Best Regards, Naoya Anzai --- Naoya Anzai Engineering Department NEC Solution Inovetors, Ltd. E-Mail: nao-an...@xc.jp.nec.com --- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Proposal] More Vacuum Statistics
On 05/30/15 16:47, Tom Lane wrote: Another reason why that would be a bad place is that a pg_class update forces relcache invalidation and thereby cached-plan invalidation. You don't want that for anything except (1) DDL affecting the table or (2) change in statistics that affect plan choices. It's not random chance that relallvisible is in pg_class while some other stats are in the stats collector's stuff --- the planner looks at relallvisible but not the other stuff. We already update pg_class from autovacuum - see vac_update_relstats(). Presumably we could update the new fields in the same way, without introducing any additional cache invalidations or bloat. But I do agree pg_class really is not the right place for this for the other reasons. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Proposal] More Vacuum Statistics
Hi, On 05/30/15 04:41, Andres Freund wrote: On 2015-05-29 21:30:57 -0500, Jim Nasby wrote: It occurs to me that there's no good reason for vacuum-derived stats to be in the stats file; it's not like users run vacuum anywhere near as often as other commands. It's stats could be kept in pg_class; we're already keeping things like relallvisible there. While it might be viable to store them somewhere but the stat files, I don't think pg_class is a good place. Its size is not any less critical than the stats files. I.e. reading it sits in several rather hot paths, and we keep tuples from it in memory in a lot of places. IMHO stat files is exactly the right place for data like this - I can't really think about other place with less overhead / impact. That of course assumes the new fields really are useful, and I do have my doubts about usefulness of this data. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Proposal] More Vacuum Statistics
Andres Freund writes: > On 2015-05-29 21:30:57 -0500, Jim Nasby wrote: >> It occurs to me that there's no good reason for vacuum-derived stats to be >> in the stats file; it's not like users run vacuum anywhere near as often as >> other commands. It's stats could be kept in pg_class; we're already keeping >> things like relallvisible there. > While it might be viable to store them somewhere but the stat files, I > don't think pg_class is a good place. Its size is not any less critical > than the stats files. I.e. reading it sits in several rather hot paths, > and we keep tuples from it in memory in a lot of places. Another reason why that would be a bad place is that a pg_class update forces relcache invalidation and thereby cached-plan invalidation. You don't want that for anything except (1) DDL affecting the table or (2) change in statistics that affect plan choices. It's not random chance that relallvisible is in pg_class while some other stats are in the stats collector's stuff --- the planner looks at relallvisible but not the other stuff. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Proposal] More Vacuum Statistics
Andres Freund wrote: > On 2015-05-29 21:30:57 -0500, Jim Nasby wrote: > > It occurs to me that there's no good reason for vacuum-derived stats to be > > in the stats file; it's not like users run vacuum anywhere near as often as > > other commands. It's stats could be kept in pg_class; we're already keeping > > things like relallvisible there. > > While it might be viable to store them somewhere but the stat files, I > don't think pg_class is a good place. Its size is not any less critical > than the stats files. I.e. reading it sits in several rather hot paths, > and we keep tuples from it in memory in a lot of places. Greg Smith had this idea about "timing events", https://www.postgresql.org/message-id/50A4BC4E.4030007%402ndQuadrant.com Sounds like this thread is related. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Proposal] More Vacuum Statistics
On 2015-05-29 21:30:57 -0500, Jim Nasby wrote: > It occurs to me that there's no good reason for vacuum-derived stats to be > in the stats file; it's not like users run vacuum anywhere near as often as > other commands. It's stats could be kept in pg_class; we're already keeping > things like relallvisible there. While it might be viable to store them somewhere but the stat files, I don't think pg_class is a good place. Its size is not any less critical than the stats files. I.e. reading it sits in several rather hot paths, and we keep tuples from it in memory in a lot of places. Greetings, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Proposal] More Vacuum Statistics
On 5/28/15 9:14 AM, Tom Lane wrote: Naoya Anzai writes: In my much experience up until now,I have an idea that we can add 2 new vacuum statistics into pg_stat_xxx_tables. Adding new stats in that way requires adding per-table counters, which bloat the statistics files (especially in database with very many tables). I do not think you've made a case for these stats being valuable enough to justify such overhead for everybody. It occurs to me that there's no good reason for vacuum-derived stats to be in the stats file; it's not like users run vacuum anywhere near as often as other commands. It's stats could be kept in pg_class; we're already keeping things like relallvisible there. As far as the first one goes, I don't even think it's especially useful. There might be value in tracking the times of the last few vacuums on a table, but knowing the time for only the latest one doesn't sound like it would prove much. So I'd be inclined to think more along the lines of scanning the postmaster log for autovacuum runtimes, instead of squeezing it into the pg_stats views. You'd also want to know how many pages were scanned, since any decent estimation would need to take table size into account. As for history, that's a problem that exists for *all* our statistics, so anyone that cares about that is going to setup some system to periodically capture the contents of pg_stat_*. A possible alternative so far as the second one goes is to add a function (perhaps in contrib/pg_freespacemap) that simply runs through a table's VM and counts the number of set bits. This would be more accurate (no risk of lost counter updates) and very possibly cheaper overall: it would take longer to find out the number when you wanted it, but you wouldn't be paying the distributed overhead of tracking it when you didn't want it. Seems like a reasonable addition to that contrib module regardless. As Jeff Janes mentioned this info is available in pg_class, but it requires an ANALYZE to update it. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Proposal] More Vacuum Statistics
On Thu, May 28, 2015 at 4:08 AM, Naoya Anzai wrote: > > 2. Page visibility rate of each table > There is no way to know how many page-bits are them of each tables stored > in their visibility maps. If we can show this information, then we will be > able to guess vacuum overhead for the table. For example, if this table is > a > very big table but page visibility rate is high, then we can advise > pg-users > that vacuum for this table will execute faster than they think by low I/O > overhead. > Furthermore, this information can also be used in order to inform pg-users > about "real" index-only-scan usability. > Isn't this already pg_class.relallvisible? Cheers, Jeff
Re: [HACKERS] [Proposal] More Vacuum Statistics
Naoya Anzai writes: > In my much experience up until now,I have an idea that we can add > 2 new vacuum statistics into pg_stat_xxx_tables. Adding new stats in that way requires adding per-table counters, which bloat the statistics files (especially in database with very many tables). I do not think you've made a case for these stats being valuable enough to justify such overhead for everybody. As far as the first one goes, I don't even think it's especially useful. There might be value in tracking the times of the last few vacuums on a table, but knowing the time for only the latest one doesn't sound like it would prove much. So I'd be inclined to think more along the lines of scanning the postmaster log for autovacuum runtimes, instead of squeezing it into the pg_stats views. A possible alternative so far as the second one goes is to add a function (perhaps in contrib/pg_freespacemap) that simply runs through a table's VM and counts the number of set bits. This would be more accurate (no risk of lost counter updates) and very possibly cheaper overall: it would take longer to find out the number when you wanted it, but you wouldn't be paying the distributed overhead of tracking it when you didn't want it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [Proposal] More Vacuum Statistics
Hello, hackers! I'm a technical support engineer of PostgreSQL. In my much experience up until now,I have an idea that we can add 2 new vacuum statistics into pg_stat_xxx_tables. Features I hope is following. MORE VACUUM STATISTICS == Design & Motivation --- 1. Show about how long did vacuum spend for one table This is a very important information for tuning vacuum & redesigning table, but to see this information, we should check pg_log files and it cannot show if log_autovacuum_min_duration=-1.We can already show vacuum end time in current statistics, so we only have to add vacuum start time. Vacuum execution time will be able to estimate by time between start and end ( it is no longer need to check pg_log file ). Furthermore, vacuum execution interval time will be also able to estimate. To implement this feature, at least we need to modify pgstat_report_vacuum. 2. Page visibility rate of each table There is no way to know how many page-bits are them of each tables stored in their visibility maps. If we can show this information, then we will be able to guess vacuum overhead for the table. For example, if this table is a very big table but page visibility rate is high, then we can advise pg-users that vacuum for this table will execute faster than they think by low I/O overhead. Furthermore, this information can also be used in order to inform pg-users about "real" index-only-scan usability. To implement this feature, at least we need to count either number of skipping or setting visible blocks at lazy_scan_heap. I/F --- Pg-users can show this information by select pg_stat_xxx_tables. pg_stat_xxx_tables --- relid schemaname relname seq_scan seq_tup_read idx_scan idx_tup_fetch n_tup_ins n_tup_upd n_tup_del n_tup_hot_upd n_live_tup n_mod_since_analyze page_visibility# add last_vacuum_start--# add last_vacuum_end# rename last_autovacuum_start--# add last_autovacuum_end# rename last_analyze last_autoanalyze vacuum_count autovacuum_count analyze_count autoanalyze_count If hackers agree with my point, I'd like to make a patch for these features. Any comments are welcome. Best Regards, Naoya Anzai --- Naoya Anzai Engineering Department NEC Solution Inovetors, Ltd. E-Mail: nao-an...@xc.jp.nec.com bench.cof...@gmail.com --- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers