Re: [HACKERS] [Proposal] More Vacuum Statistics

2015-06-16 Thread Syed, Rahila
Hello,

>Maybe, For DBAs,
>It might be better to show vacuum progress in pg_stat_activity.
>(if we'd do, add a free-style column like "progress" ?) This column might also 
>be able to use for other long time commands like ANALYZE, CREATE/RE INDEX and 
>COPY. To realize this feature, we certainly need to properly change 
>pgstat_report_activity, use it more and add a new track-activity parameter
 Very similar idea was proposed in the following
http://www.postgresql.org/message-id/1284756643.25048.42.ca...@vanquo.pezone.net

IIUC, problem with showing progress in pg_stat_activity is that it introduces 
compulsary progress calculation overhead  in core for every command.
As work units of each command varies, common infrastructure might not be able 
to represent every command progress effectively.
An architecture which will display progress only  on users demand  for each 
command separately will be more efficient.
So, suggestion was rather to have a detailed progress report including 
"remaining time" for a command on users demand.

FWIW, I am working on designing an approach to report VACUUM progress stats for 
which I will be posting a detailed proposal.
The use case is reporting progress for long running VACUUMs. The approach 
involves using hooks to extract VACUUM progress statistics .
The progress can be displayed using psql view (ex. pg_stat_maintenance).

Thank you,
Rahila Syed


-Original Message-
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Naoya Anzai
Sent: Tuesday, June 16, 2015 8:41 AM
To: Tomas Vondra
Cc: pgsql-hackers@postgresql.org; Akio Iwaasa; bench.cof...@gmail.com; Tom 
Lane; Jeff Janes; Jim Nasby; Andres Freund; Alvaro Herrera
Subject: Re: [HACKERS] [Proposal] More Vacuum Statistics

Hi,

Thank you for comments. and Sorry for my late response.

>> 
>> pg_stat_vacuum view
>> 
>>
>> I understand it is not good to simply add more counters in 
>> pg_stat_*_tables. For now, I'd like to suggest an extension which can 
>> confirm vacuum statistics like pg_stat_statements.
>>
>> Similar feature has been already provided by pg_statsinfo package.
>> But it is a full-stack package for PG-stats and it needs to redesign 
>> pg_log and design a repository database for introduce.
>> And it is not a core-extension for PostgreSQL.
>> (I don't intend to hate pg_statsinfo,
>>   I think this package is a very convinient tool)
>>
>> Everyone will be able to do more easily tuning of VACUUM.
>> That's all I want.
>
>I'm still wondering whether these stats will really make the tuning any 
>easier. What I do right now is looking at pg_stat_all_tables.n_deat_tup 
>and if it exceeds some threshold, it's a sign that vacuum may need a 
>bit of tuning. Sometimes it really requires tuning vacuum itself, but 
>more often than not it's due to something else (a large bulk delete, 
>autovacuum getting stuck on another table, ...). I don't see how the 
>new stats would make this any easier.
>
>Can you give some examples on how the new stats might be used (and 
>where the current stats are insufficient)? What use cases do you 
>imagine for those stats?

pg_stat_vacuum can keep histories of vacuum statistics for each tables/indices 
into shared memory.(They are not only last vacuum. 
This is already able to confirm using pg_stat_all_tables.) It makes easier 
analysis of vacuum histories because this view can sort or aggregate or filter.

My use cases for those stats are following.

- examine TRANSITION of vacuum execution time on any table  (you can predict 
the future vacuum execution time)
- examine EXECUTION INTERVAL of vacuum for each table  (if too frequent, it 
should make vacuum-threshold tuning to up)
- examine REST of dead-tuples just after vacuum  (if dead-tuples remain, it may 
be due to any idle in transaction sessions)

>
>It might help differentiate the autovacuum activity from the rest of 
>the system (e.g. there's a lot of I/O going on - how much of that is 
>coming from autovacuum workers?). This would however require a more 
>fine-grained reporting, because often the vacuums run for a very long 
>time, especially on very large tables (which is exactly the case when 
>this might be handy) - I just had a VACUUM that ran for 12 hours. These 
>jobs should report the stats incrementally, not just once at the very 
>end, because that makes it rather useless IMNSHO.

+1

Certainly, VACUUM have often much execution time, I just had too.
At present, we cannot predict when this vacuum finishes, what this vacuum is 
doing now, and whether this vacuum have any problem or not.

Maybe, For DBAs,
It might be better to show vacuum progress in pg_stat_activity.
(if we'd do, a

Re: [HACKERS] [Proposal] More Vacuum Statistics

2015-06-15 Thread Naoya Anzai
Hi,

Thank you for comments. and Sorry for my late response.

>> 
>> pg_stat_vacuum view
>> 
>>
>> I understand it is not good to simply add more counters in
>> pg_stat_*_tables. For now, I'd like to suggest an extension
>> which can confirm vacuum statistics like pg_stat_statements.
>>
>> Similar feature has been already provided by pg_statsinfo package.
>> But it is a full-stack package for PG-stats and it needs to
>> redesign pg_log and design a repository database for introduce.
>> And it is not a core-extension for PostgreSQL.
>> (I don't intend to hate pg_statsinfo,
>>   I think this package is a very convinient tool)
>>
>> Everyone will be able to do more easily tuning of VACUUM.
>> That's all I want.
>
>I'm still wondering whether these stats will really make the tuning any 
>easier. What I do right now is looking at pg_stat_all_tables.n_deat_tup 
>and if it exceeds some threshold, it's a sign that vacuum may need a bit 
>of tuning. Sometimes it really requires tuning vacuum itself, but more 
>often than not it's due to something else (a large bulk delete, 
>autovacuum getting stuck on another table, ...). I don't see how the new 
>stats would make this any easier.
>
>Can you give some examples on how the new stats might be used (and where 
>the current stats are insufficient)? What use cases do you imagine for 
>those stats?

pg_stat_vacuum can keep histories of vacuum statistics for each 
tables/indices into shared memory.(They are not only last vacuum. 
This is already able to confirm using pg_stat_all_tables.) It makes 
easier analysis of vacuum histories because this view can sort or 
aggregate or filter.

My use cases for those stats are following.

- examine TRANSITION of vacuum execution time on any table
 (you can predict the future vacuum execution time)
- examine EXECUTION INTERVAL of vacuum for each table
 (if too frequent, it should make vacuum-threshold tuning to up)
- examine REST of dead-tuples just after vacuum
 (if dead-tuples remain, it may be due to any idle in transaction sessions)

>
>It might help differentiate the autovacuum activity from the rest of the 
>system (e.g. there's a lot of I/O going on - how much of that is coming 
>from autovacuum workers?). This would however require a more 
>fine-grained reporting, because often the vacuums run for a very long 
>time, especially on very large tables (which is exactly the case when 
>this might be handy) - I just had a VACUUM that ran for 12 hours. These 
>jobs should report the stats incrementally, not just once at the very 
>end, because that makes it rather useless IMNSHO.

+1

Certainly, VACUUM have often much execution time, I just had too.
At present, we cannot predict when this vacuum finishes, what this vacuum 
is doing now, and whether this vacuum have any problem or not.

Maybe, For DBAs,
It might be better to show vacuum progress in pg_stat_activity.
(if we'd do, add a free-style column like "progress" ?)
This column might also be able to use for other long time commands 
like ANALYZE, CREATE/RE INDEX and COPY. To realize this feature,
we certainly need to properly change pgstat_report_activity, 
use it more and add a new track-activity parameter.

Regards,

Anzai Naoya
---
Naoya Anzai
Engineering Department
NEC Solution Inovetors, Ltd.
E-Mail: nao-an...@xc.jp.nec.com
---



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Proposal] More Vacuum Statistics

2015-06-07 Thread Tomas Vondra

Hi,

On 06/05/15 14:10, Naoya Anzai wrote:

Thank you for quick feedback, and I'm sorry for slow response.
All of your opinions were very helpful for me.

I have confirmed Greg's Idea "Timing events".
http://www.postgresql.org/message-id/509300f7.5000...@2ndquadrant.com

Greg said at first,
"Parsing log files for commonly needed performance data is no fun."
Yes, I completely agree with him.

>

That looks a nice idea but I don't know why this idea has
not been commited yet. Anybody knows?


Most likely lack of time, I guess.



I have reworked my idea since I heard dear hacker's opinions.


pg_stat_vacuum view


I understand it is not good to simply add more counters in
pg_stat_*_tables. For now, I'd like to suggest an extension
which can confirm vacuum statistics like pg_stat_statements.


I don't see how you want to collect the necessary information from an
extension? pg_stat_statements get most of the stats from BufferUsage 
structure, but vacuum keeps all this internal, AFAIK.


So it'd be necessary to make this somehow public - either by creating 
something like BufferUsage with all the vacuum stats, or perhaps a set 
of callbacks (either a single log_relation_vacuum or different callbacks 
for tables and indexs).


IMHO the callbacks are a better idea - for example because it naturally 
handles database-wide vacuum. The global structure makes this difficult, 
because you'll only see data for all the vacuumed objects (or it'd have 
to track per-object stats internally, somehow).



VACUUM is a most important feature in PostgreSQL, but a
special view for vacuum does not exist. Don't you think
the fact is inconvenience? At least, I am disgruntled with
that we need to parse pg_log for tune VACUUM.


+1


My first design of pg_stat_vacuum view is following.
(There are two views.)

pg_stat_vacuum_table
---
dbid
schemaname
relid
relname
elapsed
page_removed
page_remain
page_skipped
tuple_removed
tuple_remain
tuple_notremovable
buffer_hit
buffer_miss
buffer_dirty
avg_read
avg_write
vm_count
vac_start
vac_end
is_autovacuum

pg_stat_vacuum_index
---
dbid
shemaname
relid
indexrelid
indexname
elapsed
num_index_tuples
num_pages
tuples_removed
pages_deleted
pages_free
is_autovacuum

At present, I think memory design of pg_stat_statements can
divert into this feature.And I think this module needs to
prepare following parameters like pg_stat_statements.


I'm not really sure about this.

Firstly, the very fist response from TL in this thread was that adding 
per-table counters is not a particularly good idea, as it'll bloat the 
statistics files. It's true you're not adding the data into the main 
stats files, but you effectively establish a new 'vertical partition' 
with one record per table/index. It might be worth the overhead, if it 
really brings useful functionality (especially if it's opt-in feature, 
like pg_stat_statements).


Secondly, the main issue of this design IMHO is that it only tracks the 
very last vacuum run (or do I understand it wrong?). That means even if 
you snapshot the pg_stat_vacuum views, you'll not know how many vacuums 
executed in between (and the more frequently you snapshot that, the 
greater the overhead). The other stats counters have the same issue, but 
the snapshotting works a bit better because the counters are cumulative 
(so you can easily do deltas etc.). But that's not the case here - 
certainly not with the timestamps, for example.


I don't think the vacuum start/end timestamps are particularly 
interesting, TBH - we already have them in pg_stat_all_tables anyway, 
including the vacuum_count etc. So I'd propose dropping the timestamps, 
possibly replacing them with a single 'elapsed time', and making all the 
counters cumulative (so that you can do snapshots and deltas).


I'm also wondering whether this should track the vacuum costs (because 
that determines how aggressive the vacuum is, and how much work will be 
done in a particular time), if it was anti-wraparound vacuum, if there 
was also ANALYZE performed, if the autovacuum was interrupted because of 
user activity, etc.



pg_stat_vacuum.max(integer)
pg_stat_vacuum.save(boolean)
pg_stat_vacuum.excluded_dbnames(text)
pg_stat_vacuum.excluded_schemas(text)
pg_stat_vacuum.min_duration(integer)
... and so on.

To implement this feature, I have to collect each vacuum-stats
every lazy_vacuum_* and I need to embed a hook function point
where needed. (probably last point of lazy_vacuum_rel).
Do you hesitate to add the hook only for this function?


Aha! So you plan to use the callbacks.



Similar feature has been already provided by pg_statsinfo package.
But it is a full-stack package for PG-stats and it needs to
redesign pg_log and design a repository database for introduce.
And it is not a core-extension for PostgreSQL.
(I don't intend to hate pg_statsinfo,
  I think this package is a very convinient tool)

Everyone will be able to do more easily tuning o

Re: [HACKERS] [Proposal] More Vacuum Statistics

2015-06-06 Thread Naoya Anzai
Thank you for quick feedback, and I'm sorry for slow response.
All of your opinions were very helpful for me.

I have confirmed Greg's Idea "Timing events".
http://www.postgresql.org/message-id/509300f7.5000...@2ndquadrant.com

Greg said at first,
"Parsing log files for commonly needed performance data is no fun."
Yes, I completely agree with him.

That looks a nice idea but I don't know why this idea has 
not been commited yet. Anybody knows?

I have reworked my idea since I heard dear hacker's opinions.


pg_stat_vacuum view


I understand it is not good to simply add more counters in 
pg_stat_*_tables. For now, I'd like to suggest an extension 
which can confirm vacuum statistics like pg_stat_statements.

VACUUM is a most important feature in PostgreSQL, but a 
special view for vacuum does not exist. Don't you think 
the fact is inconvenience? At least, I am disgruntled with 
that we need to parse pg_log for tune VACUUM.

My first design of pg_stat_vacuum view is following. 
(There are two views.)

pg_stat_vacuum_table
---
dbid
schemaname
relid
relname
elapsed
page_removed
page_remain
page_skipped
tuple_removed
tuple_remain
tuple_notremovable
buffer_hit
buffer_miss
buffer_dirty
avg_read
avg_write
vm_count
vac_start
vac_end
is_autovacuum

pg_stat_vacuum_index
---
dbid
shemaname
relid
indexrelid
indexname
elapsed
num_index_tuples
num_pages
tuples_removed
pages_deleted
pages_free
is_autovacuum

At present, I think memory design of pg_stat_statements can 
divert into this feature.And I think this module needs to 
prepare following parameters like pg_stat_statements.

pg_stat_vacuum.max(integer)
pg_stat_vacuum.save(boolean)
pg_stat_vacuum.excluded_dbnames(text)
pg_stat_vacuum.excluded_schemas(text)
pg_stat_vacuum.min_duration(integer)
... and so on.

To implement this feature, I have to collect each vacuum-stats 
every lazy_vacuum_* and I need to embed a hook function point 
where needed. (probably last point of lazy_vacuum_rel).
Do you hesitate to add the hook only for this function?

Similar feature has been already provided by pg_statsinfo package.
But it is a full-stack package for PG-stats and it needs to 
redesign pg_log and design a repository database for introduce.
And it is not a core-extension for PostgreSQL.
(I don't intend to hate pg_statsinfo, 
 I think this package is a very convinient tool)

Everyone will be able to do more easily tuning of VACUUM.
That's all I want.

Any comments are welcome!

Best Regards,

Naoya Anzai

---
Naoya Anzai
Engineering Department
NEC Solution Inovetors, Ltd.
E-Mail: nao-an...@xc.jp.nec.com
---


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Proposal] More Vacuum Statistics

2015-05-30 Thread Tomas Vondra



On 05/30/15 16:47, Tom Lane wrote:



Another reason why that would be a bad place is that a pg_class update
forces relcache invalidation and thereby cached-plan invalidation.
You don't want that for anything except (1) DDL affecting the table or
(2) change in statistics that affect plan choices.  It's not random
chance that relallvisible is in pg_class while some other stats are in
the stats collector's stuff --- the planner looks at relallvisible
but not the other stuff.


We already update pg_class from autovacuum - see vac_update_relstats(). 
Presumably we could update the new fields in the same way, without 
introducing any additional cache invalidations or bloat.


But I do agree pg_class really is not the right place for this for the 
other reasons.


--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Proposal] More Vacuum Statistics

2015-05-30 Thread Tomas Vondra

Hi,

On 05/30/15 04:41, Andres Freund wrote:

On 2015-05-29 21:30:57 -0500, Jim Nasby wrote:

It occurs to me that there's no good reason for vacuum-derived stats to be
in the stats file; it's not like users run vacuum anywhere near as often as
other commands. It's stats could be kept in pg_class; we're already keeping
things like relallvisible there.


While it might be viable to store them somewhere but the stat files,
I don't think pg_class is a good place. Its size is not any less
critical than the stats files. I.e. reading it sits in several rather
hot paths, and we keep tuples from it in memory in a lot of places.


IMHO stat files is exactly the right place for data like this - I can't 
really think about other place with less overhead / impact. That of 
course assumes the new fields really are useful, and I do have my doubts 
about usefulness of this data.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Proposal] More Vacuum Statistics

2015-05-30 Thread Tom Lane
Andres Freund  writes:
> On 2015-05-29 21:30:57 -0500, Jim Nasby wrote:
>> It occurs to me that there's no good reason for vacuum-derived stats to be
>> in the stats file; it's not like users run vacuum anywhere near as often as
>> other commands. It's stats could be kept in pg_class; we're already keeping
>> things like relallvisible there.

> While it might be viable to store them somewhere but the stat files, I
> don't think pg_class is a good place. Its size is not any less critical
> than the stats files. I.e. reading it sits in several rather hot paths,
> and we keep tuples from it in memory in a lot of places.

Another reason why that would be a bad place is that a pg_class update
forces relcache invalidation and thereby cached-plan invalidation.
You don't want that for anything except (1) DDL affecting the table or
(2) change in statistics that affect plan choices.  It's not random
chance that relallvisible is in pg_class while some other stats are in
the stats collector's stuff --- the planner looks at relallvisible
but not the other stuff.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Proposal] More Vacuum Statistics

2015-05-29 Thread Alvaro Herrera
Andres Freund wrote:
> On 2015-05-29 21:30:57 -0500, Jim Nasby wrote:
> > It occurs to me that there's no good reason for vacuum-derived stats to be
> > in the stats file; it's not like users run vacuum anywhere near as often as
> > other commands. It's stats could be kept in pg_class; we're already keeping
> > things like relallvisible there.
> 
> While it might be viable to store them somewhere but the stat files, I
> don't think pg_class is a good place. Its size is not any less critical
> than the stats files. I.e. reading it sits in several rather hot paths,
> and we keep tuples from it in memory in a lot of places.

Greg Smith had this idea about "timing events",
https://www.postgresql.org/message-id/50A4BC4E.4030007%402ndQuadrant.com
Sounds like this thread is related.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Proposal] More Vacuum Statistics

2015-05-29 Thread Andres Freund
On 2015-05-29 21:30:57 -0500, Jim Nasby wrote:
> It occurs to me that there's no good reason for vacuum-derived stats to be
> in the stats file; it's not like users run vacuum anywhere near as often as
> other commands. It's stats could be kept in pg_class; we're already keeping
> things like relallvisible there.

While it might be viable to store them somewhere but the stat files, I
don't think pg_class is a good place. Its size is not any less critical
than the stats files. I.e. reading it sits in several rather hot paths,
and we keep tuples from it in memory in a lot of places.

Greetings,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Proposal] More Vacuum Statistics

2015-05-29 Thread Jim Nasby

On 5/28/15 9:14 AM, Tom Lane wrote:

Naoya Anzai  writes:

In my much experience up until now,I have an idea that we can add
2 new vacuum statistics into pg_stat_xxx_tables.


Adding new stats in that way requires adding per-table counters, which
bloat the statistics files (especially in database with very many tables).
I do not think you've made a case for these stats being valuable enough
to justify such overhead for everybody.


It occurs to me that there's no good reason for vacuum-derived stats to 
be in the stats file; it's not like users run vacuum anywhere near as 
often as other commands. It's stats could be kept in pg_class; we're 
already keeping things like relallvisible there.



As far as the first one goes, I don't even think it's especially useful.
There might be value in tracking the times of the last few vacuums on a
table, but knowing the time for only the latest one doesn't sound like it
would prove much.  So I'd be inclined to think more along the lines of
scanning the postmaster log for autovacuum runtimes, instead of squeezing
it into the pg_stats views.


You'd also want to know how many pages were scanned, since any decent 
estimation would need to take table size into account.


As for history, that's a problem that exists for *all* our statistics, 
so anyone that cares about that is going to setup some system to 
periodically capture the contents of pg_stat_*.



A possible alternative so far as the second one goes is to add a function
(perhaps in contrib/pg_freespacemap) that simply runs through a table's
VM and counts the number of set bits.  This would be more accurate (no
risk of lost counter updates) and very possibly cheaper overall: it would
take longer to find out the number when you wanted it, but you wouldn't be
paying the distributed overhead of tracking it when you didn't want it.


Seems like a reasonable addition to that contrib module regardless. As 
Jeff Janes mentioned this info is available in pg_class, but it requires 
an ANALYZE to update it.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Proposal] More Vacuum Statistics

2015-05-29 Thread Jeff Janes
On Thu, May 28, 2015 at 4:08 AM, Naoya Anzai 
wrote:

>
> 2. Page visibility rate of each table
> There is no way to know how many page-bits are them of each tables stored
> in their visibility maps. If we can show this information, then we will be
> able to guess vacuum overhead for the table. For example, if this table is
> a
> very big table but page visibility rate is high, then we can advise
> pg-users
> that vacuum for this table will execute faster than they think by low I/O
> overhead.
> Furthermore, this information can also be used in order to inform pg-users
> about "real" index-only-scan usability.
>

Isn't this already pg_class.relallvisible?


Cheers,

Jeff


Re: [HACKERS] [Proposal] More Vacuum Statistics

2015-05-28 Thread Tom Lane
Naoya Anzai  writes:
> In my much experience up until now,I have an idea that we can add 
> 2 new vacuum statistics into pg_stat_xxx_tables.

Adding new stats in that way requires adding per-table counters, which
bloat the statistics files (especially in database with very many tables).
I do not think you've made a case for these stats being valuable enough
to justify such overhead for everybody.

As far as the first one goes, I don't even think it's especially useful.
There might be value in tracking the times of the last few vacuums on a
table, but knowing the time for only the latest one doesn't sound like it
would prove much.  So I'd be inclined to think more along the lines of
scanning the postmaster log for autovacuum runtimes, instead of squeezing
it into the pg_stats views.

A possible alternative so far as the second one goes is to add a function
(perhaps in contrib/pg_freespacemap) that simply runs through a table's
VM and counts the number of set bits.  This would be more accurate (no
risk of lost counter updates) and very possibly cheaper overall: it would
take longer to find out the number when you wanted it, but you wouldn't be
paying the distributed overhead of tracking it when you didn't want it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [Proposal] More Vacuum Statistics

2015-05-28 Thread Naoya Anzai
Hello, hackers!

I'm a technical support engineer of PostgreSQL.

In my much experience up until now,I have an idea that we can add 
2 new vacuum statistics into pg_stat_xxx_tables.
Features I hope is following.

MORE VACUUM STATISTICS
==

Design & Motivation
---
1. Show about how long did vacuum spend for one table
This is a very important information for tuning vacuum & redesigning table, 
but to see this information, we should check pg_log files and it cannot 
show if log_autovacuum_min_duration=-1.We can already show vacuum end 
time in current statistics, so we only have to add vacuum start time.
Vacuum execution time will be able to estimate by time between start 
and end ( it is no longer need to check pg_log file ).
Furthermore, vacuum execution interval time will be also able to estimate.

To implement this feature, at least we need to modify pgstat_report_vacuum.

2. Page visibility rate of each table
There is no way to know how many page-bits are them of each tables stored 
in their visibility maps. If we can show this information, then we will be 
able to guess vacuum overhead for the table. For example, if this table is a 
very big table but page visibility rate is high, then we can advise pg-users 
that vacuum for this table will execute faster than they think by low I/O 
overhead.
Furthermore, this information can also be used in order to inform pg-users 
about "real" index-only-scan usability.

To implement this feature, at least we need to count either number of 
skipping or setting visible blocks at lazy_scan_heap.

I/F
---
Pg-users can show this information by select pg_stat_xxx_tables.

pg_stat_xxx_tables
---
relid
schemaname
relname
seq_scan
seq_tup_read
idx_scan
idx_tup_fetch
n_tup_ins
n_tup_upd
n_tup_del
n_tup_hot_upd
n_live_tup
n_mod_since_analyze
page_visibility# add
last_vacuum_start--# add
last_vacuum_end# rename
last_autovacuum_start--# add
last_autovacuum_end# rename
last_analyze
last_autoanalyze
vacuum_count
autovacuum_count
analyze_count
autoanalyze_count

If hackers agree with my point,
I'd like to make a patch for these features.

Any comments are welcome.

Best Regards,

Naoya Anzai
---
Naoya Anzai
Engineering Department
NEC Solution Inovetors, Ltd.
E-Mail: nao-an...@xc.jp.nec.com
bench.cof...@gmail.com
---


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers