Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-25 Thread Rahila Syed
>Oops.  I forgot to credit you in the commit message.  Sorry about that.
:-(
No problem :). Thanks for the commit.

Thank you,
Rahila Syed


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-24 Thread Amit Langote
On 2016/03/24 22:01, Robert Haas wrote:
> On Thu, Mar 24, 2016 at 8:45 AM, Rahila Syed  wrote:
>>
>> -   values[i+3] =
>> UInt32GetDatum(beentry->st_progress_param[i]);
>> +   values[i+3] =
>> Int64GetDatum(beentry->st_progress_param[i]);
>>
>>
>> Attached patch fixes this.
> 
> Uggh, what a stupid mistake on my part.
> 
> Committed.  Thanks for the patch.

Thanks Rahila and Robert.

- Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-24 Thread Robert Haas
On Thu, Mar 24, 2016 at 9:01 AM, Robert Haas  wrote:
> On Thu, Mar 24, 2016 at 8:45 AM, Rahila Syed  wrote:
>> Server crash was reported on running  vacuum progress checker view on 32-bit
>> machine.
>> Please find attached a fix for the same.
>>
>> Crash was because 32 bit machine considers int8 as being passed by reference
>> while creating the tuple descriptor. At the time of filling the tuple store,
>> the code (heap_fill_tuple) checks this tuple descriptor before inserting the
>> value into the tuple store. It finds the attribute type pass by reference
>> and hence it treats the value as a pointer when it is not and thus it fails
>> at the time of memcpy.
>>
>> This happens because appropriate conversion function is not employed while
>> storing the value of that particular attribute into the values array before
>> copying it into tuple store.
>>
>> -   values[i+3] =
>> UInt32GetDatum(beentry->st_progress_param[i]);
>> +   values[i+3] =
>> Int64GetDatum(beentry->st_progress_param[i]);
>>
>>
>> Attached patch fixes this.
>
> Uggh, what a stupid mistake on my part.
>
> Committed.  Thanks for the patch.

Oops.  I forgot to credit you in the commit message.  Sorry about that.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-24 Thread Robert Haas
On Thu, Mar 24, 2016 at 8:45 AM, Rahila Syed  wrote:
> Server crash was reported on running  vacuum progress checker view on 32-bit
> machine.
> Please find attached a fix for the same.
>
> Crash was because 32 bit machine considers int8 as being passed by reference
> while creating the tuple descriptor. At the time of filling the tuple store,
> the code (heap_fill_tuple) checks this tuple descriptor before inserting the
> value into the tuple store. It finds the attribute type pass by reference
> and hence it treats the value as a pointer when it is not and thus it fails
> at the time of memcpy.
>
> This happens because appropriate conversion function is not employed while
> storing the value of that particular attribute into the values array before
> copying it into tuple store.
>
> -   values[i+3] =
> UInt32GetDatum(beentry->st_progress_param[i]);
> +   values[i+3] =
> Int64GetDatum(beentry->st_progress_param[i]);
>
>
> Attached patch fixes this.

Uggh, what a stupid mistake on my part.

Committed.  Thanks for the patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-24 Thread Rahila Syed
Hello,

Server crash was reported on running  vacuum progress checker view on
32-bit machine.
Please find attached a fix for the same.

Crash was because 32 bit machine considers int8 as being passed by
reference while creating the tuple descriptor. At the time of filling the
tuple store, the code (heap_fill_tuple) checks this tuple descriptor before
inserting the value into the tuple store. It finds the attribute type pass
by reference and hence it treats the value as a pointer when it is not and
thus it fails at the time of memcpy.

This happens because appropriate conversion function is not employed while
storing the value of that particular attribute into the values array before
copying it into tuple store.

-   values[i+3] =
UInt32GetDatum(beentry->st_progress_param[i]);
+   values[i+3] =
Int64GetDatum(beentry->st_progress_param[i]);


Attached patch fixes this.

Thank you,
Rahila Syed

On Wed, Mar 16, 2016 at 11:30 PM, Robert Haas  wrote:

> On Wed, Mar 16, 2016 at 6:44 AM, Rahila Syed 
> wrote:
> >>Sorta.  Committed after renaming what you called heap blocks vacuumed
> >>back to heap blocks scanned, adding heap blocks vacuumed, removing the
> >>overall progress meter which I don't believe will be anything close to
> >>accurate, fixing some stylistic stuff, arranging to update multiple
> >>counters automatically where it could otherwise produce confusion,
> >>moving the new view near similar ones in the file, reformatting it to
> >>follow the style of the rest of the file, exposing the counter
> >>#defines via a header file in case extensions want to use them, and
> >>overhauling and substantially expanding the documentation
> >
> > We have following lines,
> >
> > /* report that everything is scanned and vacuumed */
> > pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED,
> > blkno);
> > pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED,
> > blkno);
> >
> >
> > which appear before final vacuum cycle happens for any remaining dead
> tuples
> > which may span few pages if I am not mistaken.
> >
> > IMO, reporting final count of heap_blks_scanned is correct here, but
> > reporting final heap_blks_vacuumed can happen after the final VACUUM
> cycle
> > for more accuracy.
>
> You are quite right.  Good catch.  Fixed that, and applied Vinayak's
> patch too, and fixed another mistake I saw while I was at it.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 0f6f891..64c4cc4 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -612,7 +612,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 		{
 			values[2] = ObjectIdGetDatum(beentry->st_progress_command_target);
 			for(i = 0; i < PGSTAT_NUM_PROGRESS_PARAM; i++)
-values[i+3] = UInt32GetDatum(beentry->st_progress_param[i]);
+values[i+3] = Int64GetDatum(beentry->st_progress_param[i]);
 		}
 		else
 		{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-20 Thread Robert Haas
On Wed, Mar 16, 2016 at 6:44 AM, Rahila Syed  wrote:
>>Sorta.  Committed after renaming what you called heap blocks vacuumed
>>back to heap blocks scanned, adding heap blocks vacuumed, removing the
>>overall progress meter which I don't believe will be anything close to
>>accurate, fixing some stylistic stuff, arranging to update multiple
>>counters automatically where it could otherwise produce confusion,
>>moving the new view near similar ones in the file, reformatting it to
>>follow the style of the rest of the file, exposing the counter
>>#defines via a header file in case extensions want to use them, and
>>overhauling and substantially expanding the documentation
>
> We have following lines,
>
> /* report that everything is scanned and vacuumed */
> pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED,
> blkno);
> pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED,
> blkno);
>
>
> which appear before final vacuum cycle happens for any remaining dead tuples
> which may span few pages if I am not mistaken.
>
> IMO, reporting final count of heap_blks_scanned is correct here, but
> reporting final heap_blks_vacuumed can happen after the final VACUUM cycle
> for more accuracy.

You are quite right.  Good catch.  Fixed that, and applied Vinayak's
patch too, and fixed another mistake I saw while I was at it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-16 Thread Rahila Syed
>Sorta.  Committed after renaming what you called heap blocks vacuumed
>back to heap blocks scanned, adding heap blocks vacuumed, removing the
>overall progress meter which I don't believe will be anything close to
>accurate, fixing some stylistic stuff, arranging to update multiple
>counters automatically where it could otherwise produce confusion,
>moving the new view near similar ones in the file, reformatting it to
>follow the style of the rest of the file, exposing the counter
>#defines via a header file in case extensions want to use them, and
>overhauling and substantially expanding the documentation

We have following lines,

/* report that everything is scanned and vacuumed */
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED,
blkno);
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED,
blkno);


which appear before final vacuum cycle happens for any remaining dead
tuples which may span few pages if I am not mistaken.

IMO, reporting final count of heap_blks_scanned is correct here, but
reporting final heap_blks_vacuumed can happen after the final VACUUM cycle
for more accuracy.

Thank you,
Rahila Syed


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-16 Thread pokurev
Hi,
> -Original Message-
> From: Robert Haas [mailto:robertmh...@gmail.com]
> Sent: Wednesday, March 16, 2016 2:34 AM
> To: Amit Langote <langote_amit...@lab.ntt.co.jp>
> Cc: Amit Langote <amitlangot...@gmail.com>; SPS ポクレ ヴィナヤック(
> 三技術) <poku...@pm.nttdata.co.jp>; Kyotaro HORIGUCHI
> <horiguchi.kyot...@lab.ntt.co.jp>; pgsql-hackers@postgresql.org; SPS 坂野
> 昌平(三技術) <ban...@nttdata.co.jp>
> Subject: Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.
> 
> On Tue, Mar 15, 2016 at 1:16 AM, Amit Langote
> <langote_amit...@lab.ntt.co.jp> wrote:
> > On 2016/03/15 3:41, Robert Haas wrote:
> >> On Sat, Mar 12, 2016 at 7:49 AM, Amit Langote
> <amitlangot...@gmail.com> wrote:
> >>> Instead, the attached patch adds a IndexBulkDeleteProgressCallback
> >>> which AMs should call for every block that's read (say, right before
> >>> a call to ReadBufferExtended) as part of a given vacuum run. The
> >>> callback with help of some bookkeeping state can count each block
> >>> and report to pgstat_progress API. Now, I am not sure if all AMs
> >>> read 1..N blocks for every vacuum or if it's possible that some
> >>> blocks are read more than once in single vacuum, etc.  IOW, some
> >>> AM's processing may be non-linear and counting blocks 1..N (where N
> >>> is reported total index blocks) may not be possible.  However, this
> >>> is the best I could think of as doing what we are trying to do here.
> >>> Maybe index AM experts can chime in on that.
> >>>
> >>> Thoughts?
> >>
> >> Well, I think you need to study the index AMs and figure this out.
> >
> > OK.  I tried to put calls to the callback in appropriate places, but
> > couldn't get the resulting progress numbers to look sane.  So I ended
> > up concluding that any attempt to do so is futile unless I analyze
> > each AM's vacuum code carefully to be able to determine in advance the
> > max bound on the count of blocks that the callback will report.
> > Anyway, as you suggest, we can improve it later.
> 
> I don't think there is any way to bound that, because new blocks can get
> added to the index concurrently, and we might end up needing to scan
> them.  Reporting the number of blocks scanned can still be useful, though -
> any chance you can just implement that part of it?
> 
> >> But I think for starters you should write a patch that reports the 
> >> following:
> >>
> >> 1. phase
> >> 2. number of heap blocks scanned
> >> 3. number of heap blocks vacuumed
> >> 4. number of completed index vac cycles 5. number of dead tuples
> >> collected since the last index vac cycle 6. number of dead tuples
> >> that we can store before needing to perform an index vac cycle
> >>
> >> All of that should be pretty straightforward, and then we'd have
> >> something we can ship.  We can add the detailed index reporting
> >> later, when we get to it, perhaps for 9.7.
> >
> > OK, I agree with this plan.  Attached updated patch implements this.
> 
> Sorta.  Committed after renaming what you called heap blocks vacuumed
> back to heap blocks scanned, adding heap blocks vacuumed, removing the
> overall progress meter which I don't believe will be anything close to
> accurate, fixing some stylistic stuff, arranging to update multiple counters
> automatically where it could otherwise produce confusion, moving the new
> view near similar ones in the file, reformatting it to follow the style of the
> rest of the file, exposing the counter #defines via a header file in case
> extensions want to use them, and overhauling and substantially expanding
> the documentation.

Thank you for committing this feature.
There is one minor bug.
s/ pgstat_progress_update_params/ pgstat_progress_update_multi_param/g
 Attached patch fixes a minor bug.

Regards,
Vinayak Pokale


pgstat_progress_function-typo.patch
Description: pgstat_progress_function-typo.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-15 Thread Amit Langote
On 2016/03/16 2:33, Robert Haas wrote:
> On Tue, Mar 15, 2016 at 1:16 AM, Amit Langote
>  wrote:
>> On 2016/03/15 3:41, Robert Haas wrote:
>>> Well, I think you need to study the index AMs and figure this out.
>>
>> OK.  I tried to put calls to the callback in appropriate places, but
>> couldn't get the resulting progress numbers to look sane.  So I ended up
>> concluding that any attempt to do so is futile unless I analyze each AM's
>> vacuum code carefully to be able to determine in advance the max bound on
>> the count of blocks that the callback will report.  Anyway, as you
>> suggest, we can improve it later.
> 
> I don't think there is any way to bound that, because new blocks can
> get added to the index concurrently, and we might end up needing to
> scan them.  Reporting the number of blocks scanned can still be
> useful, though - any chance you can just implement that part of it?

Do you mean the changes I made to index bulk delete API for this purpose
in last few versions of this patch?  With it, I have observed that
reported scanned blocks count (that is incremented for every
ReadBufferExtended() call across a index vacuum cycle using the new
callback) can be non-deterministic.  Would there be an accompanying
index_blocks_total (pg_indexes_size()-based) in the view, as well?

>>> But I think for starters you should write a patch that reports the 
>>> following:
>>>
>>> 1. phase
>>> 2. number of heap blocks scanned
>>> 3. number of heap blocks vacuumed
>>> 4. number of completed index vac cycles
>>> 5. number of dead tuples collected since the last index vac cycle
>>> 6. number of dead tuples that we can store before needing to perform
>>> an index vac cycle
>>>
>>> All of that should be pretty straightforward, and then we'd have
>>> something we can ship.  We can add the detailed index reporting later,
>>> when we get to it, perhaps for 9.7.
>>
>> OK, I agree with this plan.  Attached updated patch implements this.
> 
> Sorta.  Committed after renaming what you called heap blocks vacuumed
> back to heap blocks scanned, adding heap blocks vacuumed, removing the
> overall progress meter which I don't believe will be anything close to
> accurate, fixing some stylistic stuff, arranging to update multiple
> counters automatically where it could otherwise produce confusion,
> moving the new view near similar ones in the file, reformatting it to
> follow the style of the rest of the file, exposing the counter
> #defines via a header file in case extensions want to use them, and
> overhauling and substantially expanding the documentation.

Thanks a lot for committing this.  The committed version is much better.
Some of the things therein should really have been part of the final
*submitted* patch; I will try to improve in that area in my submissions
henceforth.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-15 Thread Robert Haas
On Tue, Mar 15, 2016 at 1:16 AM, Amit Langote
 wrote:
> On 2016/03/15 3:41, Robert Haas wrote:
>> On Sat, Mar 12, 2016 at 7:49 AM, Amit Langote  
>> wrote:
>>> Instead, the attached patch adds a IndexBulkDeleteProgressCallback
>>> which AMs should call for every block that's read (say, right before a
>>> call to ReadBufferExtended) as part of a given vacuum run. The
>>> callback with help of some bookkeeping state can count each block and
>>> report to pgstat_progress API. Now, I am not sure if all AMs read 1..N
>>> blocks for every vacuum or if it's possible that some blocks are read
>>> more than once in single vacuum, etc.  IOW, some AM's processing may
>>> be non-linear and counting blocks 1..N (where N is reported total
>>> index blocks) may not be possible.  However, this is the best I could
>>> think of as doing what we are trying to do here. Maybe index AM
>>> experts can chime in on that.
>>>
>>> Thoughts?
>>
>> Well, I think you need to study the index AMs and figure this out.
>
> OK.  I tried to put calls to the callback in appropriate places, but
> couldn't get the resulting progress numbers to look sane.  So I ended up
> concluding that any attempt to do so is futile unless I analyze each AM's
> vacuum code carefully to be able to determine in advance the max bound on
> the count of blocks that the callback will report.  Anyway, as you
> suggest, we can improve it later.

I don't think there is any way to bound that, because new blocks can
get added to the index concurrently, and we might end up needing to
scan them.  Reporting the number of blocks scanned can still be
useful, though - any chance you can just implement that part of it?

>> But I think for starters you should write a patch that reports the following:
>>
>> 1. phase
>> 2. number of heap blocks scanned
>> 3. number of heap blocks vacuumed
>> 4. number of completed index vac cycles
>> 5. number of dead tuples collected since the last index vac cycle
>> 6. number of dead tuples that we can store before needing to perform
>> an index vac cycle
>>
>> All of that should be pretty straightforward, and then we'd have
>> something we can ship.  We can add the detailed index reporting later,
>> when we get to it, perhaps for 9.7.
>
> OK, I agree with this plan.  Attached updated patch implements this.

Sorta.  Committed after renaming what you called heap blocks vacuumed
back to heap blocks scanned, adding heap blocks vacuumed, removing the
overall progress meter which I don't believe will be anything close to
accurate, fixing some stylistic stuff, arranging to update multiple
counters automatically where it could otherwise produce confusion,
moving the new view near similar ones in the file, reformatting it to
follow the style of the rest of the file, exposing the counter
#defines via a header file in case extensions want to use them, and
overhauling and substantially expanding the documentation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-14 Thread Amit Langote
On 2016/03/15 3:41, Robert Haas wrote:
> On Sat, Mar 12, 2016 at 7:49 AM, Amit Langote  wrote:
>> Instead, the attached patch adds a IndexBulkDeleteProgressCallback
>> which AMs should call for every block that's read (say, right before a
>> call to ReadBufferExtended) as part of a given vacuum run. The
>> callback with help of some bookkeeping state can count each block and
>> report to pgstat_progress API. Now, I am not sure if all AMs read 1..N
>> blocks for every vacuum or if it's possible that some blocks are read
>> more than once in single vacuum, etc.  IOW, some AM's processing may
>> be non-linear and counting blocks 1..N (where N is reported total
>> index blocks) may not be possible.  However, this is the best I could
>> think of as doing what we are trying to do here. Maybe index AM
>> experts can chime in on that.
>>
>> Thoughts?
> 
> Well, I think you need to study the index AMs and figure this out.

OK.  I tried to put calls to the callback in appropriate places, but
couldn't get the resulting progress numbers to look sane.  So I ended up
concluding that any attempt to do so is futile unless I analyze each AM's
vacuum code carefully to be able to determine in advance the max bound on
the count of blocks that the callback will report.  Anyway, as you
suggest, we can improve it later.

> But I think for starters you should write a patch that reports the following:
> 
> 1. phase
> 2. number of heap blocks scanned
> 3. number of heap blocks vacuumed
> 4. number of completed index vac cycles
> 5. number of dead tuples collected since the last index vac cycle
> 6. number of dead tuples that we can store before needing to perform
> an index vac cycle
> 
> All of that should be pretty straightforward, and then we'd have
> something we can ship.  We can add the detailed index reporting later,
> when we get to it, perhaps for 9.7.

OK, I agree with this plan.  Attached updated patch implements this.

Thanks,
Amit
>From 0f830273cb2afb528b8b9ed2dcbaf136d4c3e64f Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 15 Mar 2016 10:14:33 +0900
Subject: [PATCH] WIP: Implement progress reporting for VACUUM command.

This basically utilizes the pgstat_progress* API to report a handful of
paramters to indicate its progress.  lazy_vacuum_rel() has already been
taught in b6fb6471 to report command start and end so that it's listed
in pg_stat_get_progress_info('VACUUM'). This commit makes lazy_scan_heap
to report following parameters: vacuum phase, total number of heap blocks,
number of heap blocks vacuumed, dead tuples found since last index vacuum
cycle, dead tuple slots left to fill until the next index vacuum run.
Following phases are identified and reported whenever one changes to
another: 'scanning heap', 'vacuuming indexes', 'vacuuming heap', and
finally 'cleanup'.

A view named pg_stat_progress_vacuum has been added with the following
columns: pid (int), datid (oid), datname (name), relid (oid),
vacuum_phase (text), heap_blks_total, heap_blks_vacuumed,
index_vacuum_count, dead_tup_found, dead_tup_slots (all bigint),
percent_done (numeric).
---
 doc/src/sgml/monitoring.sgml |  114 ++
 src/backend/catalog/system_views.sql |   25 
 src/backend/commands/vacuumlazy.c|   68 
 src/test/regress/expected/rules.out  |   22 +++
 4 files changed, 229 insertions(+), 0 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index ec5328e..e5ffa10 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -507,6 +507,12 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   yet included in pg_stat_user_functions).
  
 
+ 
+  pg_stat_progress_vacuumpg_stat_progress_vacuum
+  One row for each backend (including autovacuum worker processes) running
+  VACUUM, showing current progress in terms of heap pages it
+  has finished processing.
+ 
 

   
@@ -2204,6 +2210,114 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
controls exactly which functions are tracked.
   
 
+  
+   pg_stat_progress_vacuum View
+   
+
+
+  Column
+  Type
+  Description
+ 
+
+
+   
+
+ pid
+ integer
+ Process ID of backend
+
+
+ datid
+ oid
+ OID of the database this backend is connected to
+
+
+ datname
+ name
+ Name of the database this backend is connected to
+
+
+ relid
+ oid
+ OID of the table being vacuumed
+
+
+ vacuum_phase
+ text
+ Current processing phase of vacuum.
+   Possible values are:
+   
+
+ 
+  scanning heap
+ 
+
+
+ 
+  vacuuming indexes
+ 
+
+
+ 
+  vacuuming heap
+ 
+
+
+ 
+  

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-14 Thread Robert Haas
On Sat, Mar 12, 2016 at 7:49 AM, Amit Langote  wrote:
> On Fri, Mar 11, 2016 at 2:31 PM, Amit Langote
>  wrote:
>> On 2016/03/11 13:16, Robert Haas wrote:
>>> On Thu, Mar 10, 2016 at 9:04 PM, Amit Langote
>>>  wrote:
 So, from what I understand here, we should not put total count of index
 pages into st_progress_param; rather, have the client (reading
 pg_stat_progress_vacuum) derive it using pg_indexes_size() (?), as and
 when necessary.  However, only server is able to tell the current position
 within an index vacuuming round (or how many pages into a given index
 vacuuming round), so report that using some not-yet-existent mechanism.
>>>
>>> Isn't that mechanism what you are trying to create in 0003?
>>
>> Right, 0003 should hopefully become that mechanism.
>
> About 0003:
>
> Earlier, it was trying to report vacuumed index block count using
> lazy_tid_reaped() callback for which I had added a index_blkno
> argument to IndexBulkDeleteCallback. Turns out it's not such a good
> place to do what we are trying to do.  This callback is called for
> every heap pointer in an index. Not all index pages contain heap
> pointers, which means the existing callback does not allow to count
> all the index blocks that AM would read to finish a given index vacuum
> run.
>
> Instead, the attached patch adds a IndexBulkDeleteProgressCallback
> which AMs should call for every block that's read (say, right before a
> call to ReadBufferExtended) as part of a given vacuum run. The
> callback with help of some bookkeeping state can count each block and
> report to pgstat_progress API. Now, I am not sure if all AMs read 1..N
> blocks for every vacuum or if it's possible that some blocks are read
> more than once in single vacuum, etc.  IOW, some AM's processing may
> be non-linear and counting blocks 1..N (where N is reported total
> index blocks) may not be possible.  However, this is the best I could
> think of as doing what we are trying to do here. Maybe index AM
> experts can chime in on that.
>
> Thoughts?

Well, I think you need to study the index AMs and figure this out.

But I think for starters you should write a patch that reports the following:

1. phase
2. number of heap blocks scanned
3. number of heap blocks vacuumed
4. number of completed index vac cycles
5. number of dead tuples collected since the last index vac cycle
6. number of dead tuples that we can store before needing to perform
an index vac cycle

All of that should be pretty straightforward, and then we'd have
something we can ship.  We can add the detailed index reporting later,
when we get to it, perhaps for 9.7.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-14 Thread Amit Langote
Hi,

Thanks for taking a look at the patch.

On Mon, Mar 14, 2016 at 6:55 PM, Rahila Syed  wrote:
> Hello,
>
> While I am still looking at this WIP patch, I had one suggestion.
>
> Instead of making changes in the index AM API can we have a call to update
> the shared state using pgstat_progress* API
>
> directly from specific index level code?
>
> Like  pgstat_count_index_scan(rel) call from _bt_first does. Though this
> function basically updates local structures and sends the count to stat
> collector via messages we can have a function which will instead modify the
> shared state using the progress API committed recently.

I chose the callback approach because we need to count the index
blocks within the context of a given vacuum run.  For example, as
proposed, progress_callback_state (in this case, a pointer to the
LVRelStats struct for a given vacuum run) keeps the block count for a
given index vacuum run.  It is reset when next index vacuuming round
starts.  Also, remember that the count is across all indexes.

If we call pgstat_progress API directly from within AM, what I just
described above seems difficult to achieve modularly. But maybe, I'm
missing something.

Aside from whether we should use one of the above two methods, I think
we also have to figure out, for each AM, how to count correctly
considering non-linearity (tree traversal, recursion and such) of most
AMs' vacuum scans.

Thanks,
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-14 Thread Rahila Syed
Hello,

While I am still looking at this WIP patch, I had one suggestion.

Instead of making changes in the index AM API can we have a call to update
the shared state using pgstat_progress* API

directly from specific index level code?

Like  pgstat_count_index_scan(rel) call from _bt_first does. Though this
function basically updates local structures and sends the count to stat
collector via messages we can have a function which will instead modify the
shared state using the progress API committed recently.

Thank you,

Rahila Syed


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-12 Thread Jim Nasby

On 3/10/16 7:48 AM, Robert Haas wrote:

I think the problem is that you can't show the name of a non-global
SQL object (such as a relation) unless the object is in the current
database.  Many of the views in the first group are database-local
views, while things like pg_locks span all databases.  We can show the
datid/relid always, but if we have a relname column it will have to be
NULL unless the datid is our database.


I would prefer that if the object is in another database we at least 
display the OID. That way, if you're logging this info you can go back 
later and figure out what was going on.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-12 Thread Amit Langote
On Fri, Mar 11, 2016 at 2:31 PM, Amit Langote
 wrote:
> On 2016/03/11 13:16, Robert Haas wrote:
>> On Thu, Mar 10, 2016 at 9:04 PM, Amit Langote
>>  wrote:
>>> So, from what I understand here, we should not put total count of index
>>> pages into st_progress_param; rather, have the client (reading
>>> pg_stat_progress_vacuum) derive it using pg_indexes_size() (?), as and
>>> when necessary.  However, only server is able to tell the current position
>>> within an index vacuuming round (or how many pages into a given index
>>> vacuuming round), so report that using some not-yet-existent mechanism.
>>
>> Isn't that mechanism what you are trying to create in 0003?
>
> Right, 0003 should hopefully become that mechanism.

About 0003:

Earlier, it was trying to report vacuumed index block count using
lazy_tid_reaped() callback for which I had added a index_blkno
argument to IndexBulkDeleteCallback. Turns out it's not such a good
place to do what we are trying to do.  This callback is called for
every heap pointer in an index. Not all index pages contain heap
pointers, which means the existing callback does not allow to count
all the index blocks that AM would read to finish a given index vacuum
run.

Instead, the attached patch adds a IndexBulkDeleteProgressCallback
which AMs should call for every block that's read (say, right before a
call to ReadBufferExtended) as part of a given vacuum run. The
callback with help of some bookkeeping state can count each block and
report to pgstat_progress API. Now, I am not sure if all AMs read 1..N
blocks for every vacuum or if it's possible that some blocks are read
more than once in single vacuum, etc.  IOW, some AM's processing may
be non-linear and counting blocks 1..N (where N is reported total
index blocks) may not be possible.  However, this is the best I could
think of as doing what we are trying to do here. Maybe index AM
experts can chime in on that.

Thoughts?

Thanks,
Amit


0001-WIP-Implement-progress-reporting-for-VACUUM-command-v11.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-10 Thread Amit Langote
On 2016/03/11 13:16, Robert Haas wrote:
> On Thu, Mar 10, 2016 at 9:04 PM, Amit Langote
>  wrote:
>> So, from what I understand here, we should not put total count of index
>> pages into st_progress_param; rather, have the client (reading
>> pg_stat_progress_vacuum) derive it using pg_indexes_size() (?), as and
>> when necessary.  However, only server is able to tell the current position
>> within an index vacuuming round (or how many pages into a given index
>> vacuuming round), so report that using some not-yet-existent mechanism.
> 
> Isn't that mechanism what you are trying to create in 0003?

Right, 0003 should hopefully become that mechanism.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-10 Thread Robert Haas
On Thu, Mar 10, 2016 at 9:04 PM, Amit Langote
 wrote:
> So, from what I understand here, we should not put total count of index
> pages into st_progress_param; rather, have the client (reading
> pg_stat_progress_vacuum) derive it using pg_indexes_size() (?), as and
> when necessary.  However, only server is able to tell the current position
> within an index vacuuming round (or how many pages into a given index
> vacuuming round), so report that using some not-yet-existent mechanism.

Isn't that mechanism what you are trying to create in 0003?  But
otherwise, yes, you've accurate summarized what I think we should do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-10 Thread Amit Langote
On 2016/03/10 23:29, Robert Haas wrote:
> On Thu, Mar 10, 2016 at 3:08 AM, Amit Langote
>  wrote:
>> Hi Vinayak,
>>
>> Thanks for the quick review!
> 
> Committed 0001 earlier this morning.

Thanks!

> On 0002:
> 
> +   /* total_index_blks */
> +   current_index_blks = (BlockNumber *) palloc(nindexes *
> sizeof(BlockNumber));
> +   total_index_blks = 0;
> +   for (i = 0; i < nindexes; i++)
> +   {
> +   BlockNumber nblocks =
> RelationGetNumberOfBlocks(Irel[i]);
> +
> +   current_index_blks[i] = nblocks;
> +   total_index_blks += nblocks;
> +   }
> +   pgstat_progress_update_param(PROG_PARAM_VAC_IDX_BLKS, 
> total_index_blks);
> 
> I think this is a bad idea.  The value calculated here isn't
> necessarily accurate, because the number of index blocks can change
> between the time this is calculated and the time the indexes are
> actually vacuumed.  If a client just wants the length of the indexes
> in round figures, that's already SQL-visible, and there's little
> reason to make VACUUM do it all the time whether anyone is looking at
> the progress information or not.  Note that I'm not complaining about
> the fact that you exposed the heap block count, because in that case
> you are exposing the actual value that VACUUM is using to guide its
> work.  The client can get the *current* length of the relation, but
> the value you are exposing gives you the number of blocks *this
> particular VACUUM intends to scan*.  That has some incremental value -
> but the index information doesn't have the same thing going for it.

So, from what I understand here, we should not put total count of index
pages into st_progress_param; rather, have the client (reading
pg_stat_progress_vacuum) derive it using pg_indexes_size() (?), as and
when necessary.  However, only server is able to tell the current position
within an index vacuuming round (or how many pages into a given index
vacuuming round), so report that using some not-yet-existent mechanism.

> On 0003:
> 
> I think you should make this work for all AMs, not just btree, and
> then consolidate it with 0002.

OK.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-10 Thread Robert Haas
On Thu, Mar 10, 2016 at 3:08 AM, Amit Langote
 wrote:
> Hi Vinayak,
>
> Thanks for the quick review!

Committed 0001 earlier this morning.

On 0002:

+   /* total_index_blks */
+   current_index_blks = (BlockNumber *) palloc(nindexes *
sizeof(BlockNumber));
+   total_index_blks = 0;
+   for (i = 0; i < nindexes; i++)
+   {
+   BlockNumber nblocks =
RelationGetNumberOfBlocks(Irel[i]);
+
+   current_index_blks[i] = nblocks;
+   total_index_blks += nblocks;
+   }
+   pgstat_progress_update_param(PROG_PARAM_VAC_IDX_BLKS, total_index_blks);

I think this is a bad idea.  The value calculated here isn't
necessarily accurate, because the number of index blocks can change
between the time this is calculated and the time the indexes are
actually vacuumed.  If a client just wants the length of the indexes
in round figures, that's already SQL-visible, and there's little
reason to make VACUUM do it all the time whether anyone is looking at
the progress information or not.  Note that I'm not complaining about
the fact that you exposed the heap block count, because in that case
you are exposing the actual value that VACUUM is using to guide its
work.  The client can get the *current* length of the relation, but
the value you are exposing gives you the number of blocks *this
particular VACUUM intends to scan*.  That has some incremental value -
but the index information doesn't have the same thing going for it.

On 0003:

I think you should make this work for all AMs, not just btree, and
then consolidate it with 0002.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-10 Thread Robert Haas
On Thu, Mar 10, 2016 at 6:10 AM, Kyotaro HORIGUCHI
 wrote:
> So, I have looked into system_views.sql and picked up what
> catalogs/views shows objects in such way, that is, showing both
> object id and its name.
>
> Show by name: pg_policies, pg_rules, pg_tablespg_matviews,
>   pg_indexes, pg_stats, pg_prepared_xacts, pg_seclabels,
>   pg_stat(io)_*_tables/indexes.schemaname
>   pg_stat_*_functions.schemaname
>
> Show by oid : pg_locks, pg_user_mappings.umid
>
> Both: pg_stat(io)_*_tables/indexes.relid/relname, 
> indexrelid/indexname;
>   pg_stat_activity.datid/datname, usesysid/usename
>   pg_stat_activity.datid/datname, usesysid/usename
>   pg_replication_slots.datoid/database
>   pg_stat_database(_conflicts).datid/datname
>   pg_stat_*_functions.funcid/funcname
>   pg_user_mappings.srvid/srvname,umuser/usename
>
> It's surprising to see this result for me. The nature of this
> view is near to pg_stat* views so it is proper to show *both of
> database and relation* in both of oid and name.
>
> Thoughts?

I think the problem is that you can't show the name of a non-global
SQL object (such as a relation) unless the object is in the current
database.  Many of the views in the first group are database-local
views, while things like pg_locks span all databases.  We can show the
datid/relid always, but if we have a relname column it will have to be
NULL unless the datid is our database.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-10 Thread Kyotaro HORIGUCHI
Hi,

At Thu, 10 Mar 2016 08:21:36 +,  wrote in 
<8e09c2fe530d4008aa0019e38c1d5...@mp-msgss-mbx007.msg.nttdata.co.jp>
> > > So maybe we can add datname as separate column in
> > pg_stat_progress_vacuum, I think it's not required only datid is sufficient.
> > > Any comment?
> > 
> > Why do you think showing the name may be unacceptable?  Wouldn't that
> > be a little more user-friendly?  Though maybe, we can follow the
> > pg_stat_activity style and have both instead, as you suggest.  Attached
> > updated version does that.
> +1
> I think reporting both (datid and datname) is more user-friendly.
> Thank you.

I don't like showing both oid and name and only "user friendry"
doesn't seem to justify adding redundant columns in-a-sense.

So, I have looked into system_views.sql and picked up what
catalogs/views shows objects in such way, that is, showing both
object id and its name.

Show by name: pg_policies, pg_rules, pg_tablespg_matviews,
  pg_indexes, pg_stats, pg_prepared_xacts, pg_seclabels,
  pg_stat(io)_*_tables/indexes.schemaname
  pg_stat_*_functions.schemaname

Show by oid : pg_locks, pg_user_mappings.umid

Both: pg_stat(io)_*_tables/indexes.relid/relname, indexrelid/indexname;
  pg_stat_activity.datid/datname, usesysid/usename
  pg_stat_activity.datid/datname, usesysid/usename
  pg_replication_slots.datoid/database
  pg_stat_database(_conflicts).datid/datname
  pg_stat_*_functions.funcid/funcname
  pg_user_mappings.srvid/srvname,umuser/usename

It's surprising to see this result for me. The nature of this
view is near to pg_stat* views so it is proper to show *both of
database and relation* in both of oid and name.

Thoughts?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-10 Thread pokurev
Hi Amit,

Thank you for updating the patch.
> -Original Message-
> From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp]
> Sent: Thursday, March 10, 2016 5:09 PM
> To: SPS ポクレ ヴィナヤック(三技術) <poku...@pm.nttdata.co.jp>;
> robertmh...@gmail.com
> Cc: horiguchi.kyot...@lab.ntt.co.jp; amitlangot...@gmail.com; pgsql-
> hack...@postgresql.org; SPS 坂野 昌平(三技術) <ban...@nttdata.co.jp>
> Subject: Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.
> 
> 
> Hi Vinayak,
> 
> Thanks for the quick review!
> 
> On 2016/03/10 16:22, poku...@pm.nttdata.co.jp wrote:
> >> On 2016/03/10 14:29, Amit Langote wrote:
> >> Updated patches attached.
> > In 0002-
> 
> [ snip ]
> 
> > I think we need to use datid instead of datname.
> > Robert added datid in pg_stat_get_progress_info() and we are using that
> function here.
> > +values[1] = ObjectIdGetDatum(beentry->st_databaseid);
> 
> [ snip ]
> 
> > So I think it's better to report datid not datname.
> > The definition of view is simply like:
> > +CREATE VIEW pg_stat_progress_vacuum AS
> > +SELECT
> > +S.pid AS pid,
> > +S.datid AS datid,
> > +S.relid AS relid,
> > +CASE S.param1
> > +WHEN 1 THEN 'scanning heap'
> > +WHEN 2 THEN 'vacuuming indexes'
> > +WHEN 3 THEN 'vacuuming heap'
> > +WHEN 4 THEN 'cleanup'
> > +ELSE 'unknown phase'
> > +END AS processing_phase,
> > +S.param2 AS total_heap_blocks,
> > +S.param3 AS current_heap_block,
> > +S.param4 AS total_index_blocks,
> > +S.param5 AS index_blocks_done,
> > +S.param6 AS index_vacuum_count,
> > +CASE S.param2
> > +WHEN 0 THEN round(100.0, 2)
> > +   ELSE round((S.param3 + 1) * 100.0 / S.param2, 2)
> > +END AS percent_done
> > +FROM pg_stat_get_progress_info('VACUUM') AS S;
> >
> > So maybe we can add datname as separate column in
> pg_stat_progress_vacuum, I think it's not required only datid is sufficient.
> > Any comment?
> 
> Why do you think showing the name may be unacceptable?  Wouldn't that
> be a little more user-friendly?  Though maybe, we can follow the
> pg_stat_activity style and have both instead, as you suggest.  Attached
> updated version does that.
+1
I think reporting both (datid and datname) is more user-friendly.
Thank you.

Regards,
Vinayak Pokale

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-10 Thread Amit Langote

Hi Vinayak,

Thanks for the quick review!

On 2016/03/10 16:22, poku...@pm.nttdata.co.jp wrote:
>> On 2016/03/10 14:29, Amit Langote wrote:
>> Updated patches attached.
> In 0002-

[ snip ]

> I think we need to use datid instead of datname.
> Robert added datid in pg_stat_get_progress_info() and we are using that 
> function here.
> +values[1] = ObjectIdGetDatum(beentry->st_databaseid);

[ snip ]

> So I think it's better to report datid not datname.
> The definition of view is simply like:
> +CREATE VIEW pg_stat_progress_vacuum AS
> +SELECT
> +S.pid AS pid,
> +S.datid AS datid,
> +S.relid AS relid,
> +CASE S.param1
> +WHEN 1 THEN 'scanning heap'
> +WHEN 2 THEN 'vacuuming indexes'
> +WHEN 3 THEN 'vacuuming heap'
> +WHEN 4 THEN 'cleanup'
> +ELSE 'unknown phase'
> +END AS processing_phase,
> +S.param2 AS total_heap_blocks,
> +S.param3 AS current_heap_block,
> +S.param4 AS total_index_blocks,
> +S.param5 AS index_blocks_done,
> +S.param6 AS index_vacuum_count,
> +CASE S.param2
> +WHEN 0 THEN round(100.0, 2)
> + ELSE round((S.param3 + 1) * 100.0 / S.param2, 2)
> +END AS percent_done
> +FROM pg_stat_get_progress_info('VACUUM') AS S;
> 
> So maybe we can add datname as separate column in pg_stat_progress_vacuum, I 
> think it's not required only datid is sufficient.
> Any comment?

Why do you think showing the name may be unacceptable?  Wouldn't that be a
little more user-friendly?  Though maybe, we can follow the
pg_stat_activity style and have both instead, as you suggest.  Attached
updated version does that.

Thanks,
Amit
>From 7cb702c7fae9fceef3048a82522390844c5a67cc Mon Sep 17 00:00:00 2001
From: amit 
Date: Thu, 10 Mar 2016 11:25:20 +0900
Subject: [PATCH 1/3] Some minor fixes in commit b6fb6471.

---
 src/backend/postmaster/pgstat.c |7 ---
 src/backend/utils/adt/pgstatfuncs.c |2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index ce5da3e..4424cb8 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -2861,8 +2861,8 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 /*---
  * pgstat_progress_start_command() -
  *
- * Set st_command in own backend entry.  Also, zero-initialize
- * st_progress_param array.
+ * Set st_progress_command (and st_progress_command_target) in own backend
+ * entry.  Also, zero-initialize st_progress_param array.
  *---
  */
 void
@@ -2904,7 +2904,8 @@ pgstat_progress_update_param(int index, int64 val)
 /*---
  * pgstat_progress_end_command() -
  *
- * Update index'th member in st_progress_param[] of own backend entry.
+ * Reset st_progress_command (and st_progress_command_target) in own backend
+ * entry.  This signals the end of the command.
  *---
  */
 void
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 0c790ff..2fb51fa 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -614,7 +614,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 		else
 		{
 			nulls[2] = true;
-			for (i = 1; i < PGSTAT_NUM_PROGRESS_PARAM + 1; i++)
+			for (i = 0; i < PGSTAT_NUM_PROGRESS_PARAM; i++)
 nulls[i+3] = true;
 		}
 
-- 
1.7.1

>From 9e3ac989d8583c6ba5e74945602aeaacfaee127f Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 7 Mar 2016 14:38:34 +0900
Subject: [PATCH 2/3] WIP: Implement progress reporting for VACUUM command.

This basically utilizes the pgstat_progress* API to report a handful of
paramters to indicate its progress.  lazy_vacuum_rel() and lazy_scan_heap()
have been altered to report command start, command target table, and the
following parameters: processing phase, number of heap blocks, number of
index blocks (all indexes), current heap block number in the main scan loop
(whenever changes), index blocks vacuumed (once per finished index vacuum),
and number of index vacuum passes (every time when all indexes are vacuumed).
Following processing phases are identified and reported whenever one changes
to another: 'scanning heap', 'vacuuming indexes', 'vacuuming heap', and
'cleanup'.

TODO: find a way to report index pages vacuumed in a more granular manner than
the current report per index vacuumed.

A view named pg_stat_progress_vacuum has been added that shows these values.
---
 doc/src/sgml/monitoring.sgml |  111 ++
 src/backend/catalog/system_views.sql |   25 
 src/backend/commands/vacuumlazy.c|   73 ++-
 src/test/regress/expected/rules.out  |   23 +++
 4 files changed, 231 insertions(+), 1 deletions(-)

diff --git 

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-09 Thread pokurev
Hi Amit,

Thank you for updating the patch.

> -Original Message-
> From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp]
> Sent: Thursday, March 10, 2016 3:36 PM
> To: Robert Haas <robertmh...@gmail.com>
> Cc: Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp>; Amit Langote
> <amitlangot...@gmail.com>; SPS ポクレ ヴィナヤック(三技術)
> <poku...@pm.nttdata.co.jp>; pgsql-hackers@postgresql.org; SPS 坂野 昌
> 平(三技術) <ban...@nttdata.co.jp>
> Subject: Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.
> 
> On 2016/03/10 14:29, Amit Langote wrote:
> > I rebased remainder patches (attached).
> >
> > 0001 is a small patch to fix issues reported by Tomas and Vinayak.
> > 0002 and 0003 are WIP patches to implement progress reporting for
> vacuum.
> 
> Oops, in 0002, I wrongly joined with pg_class in the definition of
> pg_stat_progress_vacuum to output the schema-qualified name of the table
> being vacuumed.  That means we need to connect to the correct database,
> which is undesirable. Updated version fixes that (shows database name and
> relid).  You may also have noticed that I said pg_stat_progress_vacuum, not
> pg_stat_vacuum_progress (IMHO, the former is a better name).
> 
> Updated patches attached.
In 0002-
+CREATE VIEW pg_stat_progress_vacuum AS
+SELECT
+S.pid AS pid,
+D.datname AS database,
+S.relid AS relid,
.
.
.
.
+FROM pg_database D, pg_stat_get_progress_info('VACUUM') AS S
+WHERE S.datid = D.oid;
I think we need to use datid instead of datname.
Robert added datid in pg_stat_get_progress_info() and we are using that 
function here.
+values[1] = ObjectIdGetDatum(beentry->st_databaseid);

+DATA(insert OID = 3318 (  pg_stat_get_progress_info   PGNSP PGUID 12 1 
100 0 0 f f f f f t s r 1 0 2249 "25" 
"{25,23,26,26,20,20,20,20,20,20,20,20,20,20}" "{i,o,o,o,o,o,o,o,o,o,o,o,o,o}" 
"{cmdtype,pid,datid,relid,param1,param2,param3,param4,param5,param6,param7,param8,param9,param10}"
 _null_ _null_ pg_stat_get_progress_info _null_ _null_ _null_ ));

So I think it's better to report datid not datname.
The definition of view is simply like:
+CREATE VIEW pg_stat_progress_vacuum AS
+SELECT
+S.pid AS pid,
+S.datid AS datid,
+S.relid AS relid,
+CASE S.param1
+WHEN 1 THEN 'scanning heap'
+WHEN 2 THEN 'vacuuming indexes'
+WHEN 3 THEN 'vacuuming heap'
+WHEN 4 THEN 'cleanup'
+ELSE 'unknown phase'
+END AS processing_phase,
+S.param2 AS total_heap_blocks,
+S.param3 AS current_heap_block,
+S.param4 AS total_index_blocks,
+S.param5 AS index_blocks_done,
+S.param6 AS index_vacuum_count,
+CASE S.param2
+WHEN 0 THEN round(100.0, 2)
+   ELSE round((S.param3 + 1) * 100.0 / S.param2, 2)
+END AS percent_done
+FROM pg_stat_get_progress_info('VACUUM') AS S;

In the pg_stat_activity view, datid and datname are the separate columns. So 
maybe we can add datname as separate column in pg_stat_progress_vacuum, but I 
think it's not required only datid is sufficient.
Any comment?

Regards,
Vinayak Pokale

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-09 Thread Amit Langote
On 2016/03/10 14:29, Amit Langote wrote:
> I rebased remainder patches (attached).
> 
> 0001 is a small patch to fix issues reported by Tomas and Vinayak.  0002
> and 0003 are WIP patches to implement progress reporting for vacuum.

Oops, in 0002, I wrongly joined with pg_class in the definition of
pg_stat_progress_vacuum to output the schema-qualified name of the table
being vacuumed.  That means we need to connect to the correct database,
which is undesirable. Updated version fixes that (shows database name and
relid).  You may also have noticed that I said pg_stat_progress_vacuum,
not pg_stat_vacuum_progress (IMHO, the former is a better name).

Updated patches attached.

Thanks,
Amit
>From 7cb702c7fae9fceef3048a82522390844c5a67cc Mon Sep 17 00:00:00 2001
From: amit 
Date: Thu, 10 Mar 2016 11:25:20 +0900
Subject: [PATCH 1/3] Some minor fixes in commit b6fb6471.

---
 src/backend/postmaster/pgstat.c |7 ---
 src/backend/utils/adt/pgstatfuncs.c |2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index ce5da3e..4424cb8 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -2861,8 +2861,8 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 /*---
  * pgstat_progress_start_command() -
  *
- * Set st_command in own backend entry.  Also, zero-initialize
- * st_progress_param array.
+ * Set st_progress_command (and st_progress_command_target) in own backend
+ * entry.  Also, zero-initialize st_progress_param array.
  *---
  */
 void
@@ -2904,7 +2904,8 @@ pgstat_progress_update_param(int index, int64 val)
 /*---
  * pgstat_progress_end_command() -
  *
- * Update index'th member in st_progress_param[] of own backend entry.
+ * Reset st_progress_command (and st_progress_command_target) in own backend
+ * entry.  This signals the end of the command.
  *---
  */
 void
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 0c790ff..2fb51fa 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -614,7 +614,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 		else
 		{
 			nulls[2] = true;
-			for (i = 1; i < PGSTAT_NUM_PROGRESS_PARAM + 1; i++)
+			for (i = 0; i < PGSTAT_NUM_PROGRESS_PARAM; i++)
 nulls[i+3] = true;
 		}
 
-- 
1.7.1

>From 3ca85c000ec2cd6148d0b3adb35aefa7e29ab23d Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 7 Mar 2016 14:38:34 +0900
Subject: [PATCH 2/3] WIP: Implement progress reporting for VACUUM command.

This basically utilizes the pgstat_progress* API to report a handful of
paramters to indicate its progress.  lazy_vacuum_rel() and lazy_scan_heap()
have been altered to report command start, command target table, and the
following parameters: processing phase, number of heap blocks, number of
index blocks (all indexes), current heap block number in the main scan loop
(whenever changes), index blocks vacuumed (once per finished index vacuum),
and number of index vacuum passes (every time when all indexes are vacuumed).
Following processing phases are identified and reported whenever one changes
to another: 'scanning heap', 'vacuuming indexes', 'vacuuming heap', and
'cleanup'.

TODO: find a way to report index pages vacuumed in a more granular manner than
the current report per index vacuumed.

A view named pg_stat_progress_vacuum has been added that shows these values.
---
 doc/src/sgml/monitoring.sgml |  106 ++
 src/backend/catalog/system_views.sql |   24 
 src/backend/commands/vacuumlazy.c|   73 +++-
 src/test/regress/expected/rules.out  |   22 +++
 4 files changed, 224 insertions(+), 1 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 85459d0..082f94c 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -507,6 +507,12 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   yet included in pg_stat_user_functions).
  
 
+ 
+  pg_stat_progress_vacuumpg_stat_progress_vacuum
+  One row for each backend (including autovacuum worker processes) running
+  VACUUM, showing current progress in terms of heap pages it
+  has finished processing.
+ 
 

   
@@ -1822,6 +1828,106 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
controls exactly which functions are tracked.
   
 
+  
+   pg_stat_progress_vacuum View
+   
+
+
+  Column
+  Type
+  Description
+ 
+
+
+   
+
+ pid
+ integer
+ Process ID of backend
+
+
+ database
+ name
+ Name of the database this backend is connected to
+
+
+ relid
+ Oid
+ OID of the table being vacuumed
+
+
+ processing_phase
+ 

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-09 Thread Amit Langote
On 2016/03/10 2:16, Robert Haas wrote:
> On Wed, Mar 9, 2016 at 2:37 AM, Amit Langote
>  wrote:
>> On 2016/03/09 10:11, Amit Langote wrote:
>>> The attached revision addresses above and one of Horiguchi-san's comments
>>> in his email yesterday.
>>
>> I fixed one more issue in 0002 per Horiguchi-san's comment.  Sorry about
>> so many versions.
> 
> I've committed 0001 with heavy revisions.  Just because we don't need
> an SQL-visible function to clear the command progress doesn't mean we
> don't need to clear it at all; rather, it has to happen automatically.
> I also did a bunch of identifier renaming, added datid to the view
> output, adjusted the comments, and so on.  Please rebase the remainder
> of the series.  Thanks.

Great, thanks a lot for the review and committing in much better shape!

I rebased remainder patches (attached).

0001 is a small patch to fix issues reported by Tomas and Vinayak.  0002
and 0003 are WIP patches to implement progress reporting for vacuum.

Thanks,
Amit
>From 7cb702c7fae9fceef3048a82522390844c5a67cc Mon Sep 17 00:00:00 2001
From: amit 
Date: Thu, 10 Mar 2016 11:25:20 +0900
Subject: [PATCH 1/3] Some minor fixes in commit b6fb6471.

---
 src/backend/postmaster/pgstat.c |7 ---
 src/backend/utils/adt/pgstatfuncs.c |2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index ce5da3e..4424cb8 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -2861,8 +2861,8 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 /*---
  * pgstat_progress_start_command() -
  *
- * Set st_command in own backend entry.  Also, zero-initialize
- * st_progress_param array.
+ * Set st_progress_command (and st_progress_command_target) in own backend
+ * entry.  Also, zero-initialize st_progress_param array.
  *---
  */
 void
@@ -2904,7 +2904,8 @@ pgstat_progress_update_param(int index, int64 val)
 /*---
  * pgstat_progress_end_command() -
  *
- * Update index'th member in st_progress_param[] of own backend entry.
+ * Reset st_progress_command (and st_progress_command_target) in own backend
+ * entry.  This signals the end of the command.
  *---
  */
 void
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 0c790ff..2fb51fa 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -614,7 +614,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 		else
 		{
 			nulls[2] = true;
-			for (i = 1; i < PGSTAT_NUM_PROGRESS_PARAM + 1; i++)
+			for (i = 0; i < PGSTAT_NUM_PROGRESS_PARAM; i++)
 nulls[i+3] = true;
 		}
 
-- 
1.7.1

>From 4567da933b25a9e23fe1a72a6994d3a9b7bc1ea4 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 7 Mar 2016 14:38:34 +0900
Subject: [PATCH 2/3] WIP: Implement progress reporting for VACUUM command.

This basically utilizes the pgstat_progress* API to report a handful of
paramters to indicate its progress.  lazy_vacuum_rel() and lazy_scan_heap()
have been altered to report command start, command target table, and the
following parameters: processing phase, number of heap blocks, number of
index blocks (all indexes), current heap block number in the main scan loop
(whenever changes), index blocks vacuumed (once per finished index vacuum),
and number of index vacuum passes (every time when all indexes are vacuumed).
Following processing phases are identified and reported whenever one changes
to another: 'scanning heap', 'vacuuming indexes', 'vacuuming heap', and
'cleanup'.

TODO: find a way to report index pages vacuumed in a more granular manner than
the current report per index vacuumed.

A view named pg_stat_vacuum_progress has been added that shows these values.
---
 doc/src/sgml/monitoring.sgml |  106 ++
 src/backend/catalog/system_views.sql |   24 
 src/backend/commands/vacuumlazy.c|   73 +++-
 src/test/regress/expected/rules.out  |   24 
 4 files changed, 226 insertions(+), 1 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 85459d0..544f959 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -507,6 +507,12 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   yet included in pg_stat_user_functions).
  
 
+ 
+  pg_stat_progress_vacuumpg_stat_progress_vacuum
+  One row for each backend (including autovacuum worker processes) running
+  VACUUM, showing current progress in terms of heap pages it
+  has finished processing.
+ 
 

   
@@ -1822,6 +1828,106 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
controls exactly which functions are tracked.
   
 
+  
+   pg_stat_progress_vacuum View
+   
+

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-09 Thread pokurev
Hi,
Thank you very much for committing this feature.
> -Original Message-
> From: Robert Haas [mailto:robertmh...@gmail.com]
> Sent: Thursday, March 10, 2016 2:17 AM
> To: Amit Langote <langote_amit...@lab.ntt.co.jp>
> Cc: Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp>; Amit Langote
> <amitlangot...@gmail.com>; SPS ポクレ ヴィナヤック(三技術)
> <poku...@pm.nttdata.co.jp>; pgsql-hackers@postgresql.org; SPS 坂野 昌
> 平(三技術) <ban...@nttdata.co.jp>
> Subject: Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.
> 
> On Wed, Mar 9, 2016 at 2:37 AM, Amit Langote
> <langote_amit...@lab.ntt.co.jp> wrote:
> > On 2016/03/09 10:11, Amit Langote wrote:
> >> The attached revision addresses above and one of Horiguchi-san's
> >> comments in his email yesterday.
> >
> > I fixed one more issue in 0002 per Horiguchi-san's comment.  Sorry
> > about so many versions.
> 
> I've committed 0001 with heavy revisions.  Just because we don't need an
> SQL-visible function to clear the command progress doesn't mean we don't
> need to clear it at all; rather, it has to happen automatically.
> I also did a bunch of identifier renaming, added datid to the view output,
> adjusted the comments, and so on.  Please rebase the remainder of the
> series.  Thanks.
Some minor typos need to fix.
+/*---
 + * pgstat_progress_start_command() -
 + *
 + * Set st_command in own backend entry.  Also, zero-initialize
 + * st_progress_param array.
 + *---
 + */
In the description we need to use st_progress_command instead of st_command.

+/*---
 + * pgstat_progress_end_command() -
 + *
 + * Update index'th member in st_progress_param[] of own backend entry.
 + *---
 + */
Here also need to change the description.

Regards,
Vinayak Pokale

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-09 Thread Tomas Vondra
Hi,


On Wed, 2016-03-09 at 12:16 -0500, Robert Haas wrote:
> On Wed, Mar 9, 2016 at 2:37 AM, Amit Langote
>  wrote:
> > On 2016/03/09 10:11, Amit Langote wrote:
> >> The attached revision addresses above and one of Horiguchi-san's comments
> >> in his email yesterday.
> >
> > I fixed one more issue in 0002 per Horiguchi-san's comment.  Sorry about
> > so many versions.
> 
> I've committed 0001 with heavy revisions.  Just because we don't need
> an SQL-visible function to clear the command progress doesn't mean we
> don't need to clear it at all; rather, it has to happen automatically.
> I also did a bunch of identifier renaming, added datid to the view
> output, adjusted the comments, and so on.  Please rebase the remainder
> of the series.  Thanks.

I'm pretty sure this piece of code ends up accessing subscripts above
array bounds (and gcc 4.6.4 complains about that):

#define PG_STAT_GET_PROGRESS_COLS PGSTAT_NUM_PROGRESS_PARAM + 3

...

boolnulls[PG_STAT_GET_PROGRESS_COLS];

...

nulls[2] = true;
for (i = 1; i < PGSTAT_NUM_PROGRESS_PARAM + 1; i++)
nulls[i+3] = true;

Now let's say PARAM=10, which means COLS=13. The last index accessed by
the loop will be i=10, which means we'll do this:

nulls[13] = true;

which is above bounds.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-09 Thread Robert Haas
On Wed, Mar 9, 2016 at 2:37 AM, Amit Langote
 wrote:
> On 2016/03/09 10:11, Amit Langote wrote:
>> The attached revision addresses above and one of Horiguchi-san's comments
>> in his email yesterday.
>
> I fixed one more issue in 0002 per Horiguchi-san's comment.  Sorry about
> so many versions.

I've committed 0001 with heavy revisions.  Just because we don't need
an SQL-visible function to clear the command progress doesn't mean we
don't need to clear it at all; rather, it has to happen automatically.
I also did a bunch of identifier renaming, added datid to the view
output, adjusted the comments, and so on.  Please rebase the remainder
of the series.  Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-08 Thread pokurev
Hi Amit,

> -Original Message-
> From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp]
> Sent: Wednesday, March 09, 2016 4:29 PM
> To: Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp>
> Cc: robertmh...@gmail.com; amitlangot...@gmail.com; SPS ポクレ ヴィ
> ナヤック(三技術) <poku...@pm.nttdata.co.jp>; pgsql-
> hack...@postgresql.org; SPS 坂野 昌平(三技術) <ban...@nttdata.co.jp>
> Subject: Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.
> 
> > On 2016/03/08 18:19, Kyotaro HORIGUCHI wrote:
> >> +WHEN 0 THEN 100::numeric(5, 2)
> >> +ELSE ((S.param3 + 1)::numeric / S.param2 *
> 100)::numeric(5, 2)
> >>
> >> This usage of numeric seems overkill to me.
> >
> > Hmm, how could this rather be written?
> 
> OK, agreed about the overkill. Following might be better:
> 
> + WHEN 0 THEN round(100.0, 2)
> + ELSE round((S.param3 + 1) * 100.0 / S.param2, 2)
+1

> Will update that patch.
> 
> Thanks,
> Amit
> 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-08 Thread Amit Langote
On 2016/03/09 10:11, Amit Langote wrote:
> The attached revision addresses above and one of Horiguchi-san's comments
> in his email yesterday.

I fixed one more issue in 0002 per Horiguchi-san's comment.  Sorry about
so many versions.

Thanks,
Amit
>From 9473230af72e0a0e3b60a8ddf1922698f7f17145 Mon Sep 17 00:00:00 2001
From: Amit 
Date: Sun, 28 Feb 2016 01:50:07 +0900
Subject: [PATCH 1/3] Provide a way for utility commands to report progress

Commands can update a set of parameters in shared memory using:

  pgstat_progress_update_param(param_index, param_value)

Up to 10 independent 64-bit integer values can be published by commands.
In addition to those, a command should always report its BackendCommandType
and the OID of the relation it is about to begin processing at the beginning
of the processing using:

  pgstat_progress_start_command(cmdtype)
  pgstat_progress_set_command_target(relid)

A view can be defined in system_views.sql that outputs the values returned
by pg_stat_get_progress_info(cmdtype), where 'cmdtype' is numeric value as
mentioned above.  Each such view has columns corresponding to the counters
published by respective commands.
---
 src/backend/postmaster/pgstat.c |   60 +++
 src/backend/utils/adt/pgstatfuncs.c |   91 +++
 src/include/catalog/pg_proc.h   |2 +
 src/include/pgstat.h|   25 ++
 4 files changed, 178 insertions(+), 0 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index da768c6..a7d0133 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -2731,6 +2731,7 @@ pgstat_bestart(void)
 	beentry->st_clienthostname[NAMEDATALEN - 1] = '\0';
 	beentry->st_appname[NAMEDATALEN - 1] = '\0';
 	beentry->st_activity[pgstat_track_activity_query_size - 1] = '\0';
+	beentry->st_command = COMMAND_INVALID;
 
 	pgstat_increment_changecount_after(beentry);
 
@@ -2851,6 +2852,65 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	pgstat_increment_changecount_after(beentry);
 }
 
+/*---
+ * pgstat_progress_update_param() -
+ *
+ * Update index'th member in st_progress_param[] of own backend entry.
+ *---
+ */
+void
+pgstat_progress_update_param(int index, int64 val)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	if(!beentry)
+		return;
+
+	if (!pgstat_track_activities)
+		return;
+
+	Assert(index >= 0 && index < PGSTAT_NUM_PROGRESS_PARAM);
+
+	pgstat_increment_changecount_before(beentry);
+	beentry->st_progress_param[index] = val;
+	pgstat_increment_changecount_after(beentry);
+}
+
+/*---
+ * pgstat_progress_start_command()-
+ *
+ * Set st_command in own backend entry.  Also, zero-initialize
+ * st_progress_param array.
+ *---
+ */
+void
+pgstat_progress_start_command(BackendCommandType cmdtype)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	pgstat_increment_changecount_before(beentry);
+	beentry->st_command = cmdtype;
+	MemSet(>st_progress_param, 0,
+		   sizeof(beentry->st_progress_param));
+	pgstat_increment_changecount_after(beentry);
+}
+
+/*---
+ * pgstat_progress_set_command_target()-
+ *
+ * Set st_command_target in own backend entry.
+ *---
+ */
+void
+pgstat_progress_set_command_target(Oid relid)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	pgstat_increment_changecount_before(beentry);
+	beentry->st_command_target = relid;
+	pgstat_increment_changecount_after(beentry);
+}
+
 /* --
  * pgstat_report_appname() -
  *
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 1b22fcc..a12310d 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -64,6 +64,7 @@ extern Datum pg_stat_get_backend_xact_start(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_backend_start(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_backend_client_addr(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_backend_client_port(PG_FUNCTION_ARGS);
+extern Datum pg_stat_get_progress_info(PG_FUNCTION_ARGS);
 
 extern Datum pg_stat_get_db_numbackends(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_db_xact_commit(PG_FUNCTION_ARGS);
@@ -525,6 +526,96 @@ pg_stat_get_backend_idset(PG_FUNCTION_ARGS)
 }
 
 /*
+ * Returns progress parameter values of backends running a given command
+ */
+Datum
+pg_stat_get_progress_info(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_PROGRESS_COLS	PGSTAT_NUM_PROGRESS_PARAM + 2
+	int			num_backends = pgstat_fetch_stat_numbackends();
+	int			curr_backend;
+	int32		cmdtype = PG_GETARG_INT32(0);
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-08 Thread Amit Langote
> On 2016/03/08 18:19, Kyotaro HORIGUCHI wrote:
>> +  WHEN 0 THEN 100::numeric(5, 2)
>> +  ELSE ((S.param3 + 1)::numeric / S.param2 * 
>> 100)::numeric(5, 2)
>>
>> This usage of numeric seems overkill to me.
> 
> Hmm, how could this rather be written?

OK, agreed about the overkill. Following might be better:

+ WHEN 0 THEN round(100.0, 2)
+ ELSE round((S.param3 + 1) * 100.0 / S.param2, 2)

Will update that patch.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-08 Thread Amit Langote

Horiguchi-san,

Thanks for the review!

On 2016/03/08 18:19, Kyotaro HORIGUCHI wrote:
>> Updated versions attached.
>>
>> * changed st_progress_param to int64 and so did the argument of
>> pgstat_progress_update_param().  Likewise changed param1..param10 of
>> pg_stat_get_progress_info()'s output columns to bigint.
>>
>> * Added back the Oid field st_command_target and corresponding function
>> pgstat_progress_set_command_target(Oid).
> 
> + beentry->st_command = COMMAND_INVALID;
> + MemSet(>st_progress_param, 0, 
> sizeof(beentry->st_progress_param));
> 
> The MemSet seems useless since it gets the same initialization on
> setting st_command.

Right, every backend start should not have to pay that price.  Fixed in
the latest version.

> 
> + /*
> +  * Report values for only those backends which are running the 
> given
> +  * command.  XXX - privilege check is maybe dubious.
> +  */
> + if (!beentry ||
> + beentry->st_command != cmdtype ||
> + !has_privs_of_role(GetUserId(), beentry->st_userid))
> + continue;
> 
> We can simplly ignore unpriviledged backends, or all zeroz or
> nulls to signal that the caller has no priviledge.

As suggested by Robert, used pg_stat_get_activity() style.  In this case,
show 'pid' to all but the rest only to role members.

> 0002
> 
> +   FROM pg_stat_get_progress_info(1) AS S;
> 
> Ah... This magick number seems quite bad.. The function should
> take the command type in maybe string type.
> 
> +   FROM pg_stat_get_progress_info('lazy vacuum') AS S;
> 
> Using an array of the names would be acceptable, maybe.
> 
> | char *progress_command_names[] = {'lazy vacuum', NULL};

Hm, I think the way it is *may* be OK the way it is, but...

As done in the patch, the way we identify commands is with the enum
BackendCommandType:

+typedef enum BackendCommandType
+{
+COMMAND_INVALID = 0,
+COMMAND_LAZY_VACUUM
+} BackendCommandType;

Perhaps we could create a struct:

typedef struct PgStatProgressCommand
{
char *cmd_name;
BackendCommandTypecmd_type;
} PgStatProgressCommand;

static const struct PgStatProgressCommand commands[] = {
{"vacuum", COMMAND_LAZY_VACUUM},
{NULL, COMMAND_INVALID}
};

> However the numbers for the phases ('scanning heap' and so..) is
> acceptable for me for reasons uncertain to me, it also could be
> represented in names but is might be rahter bothersome..

In initial versions of the patch, it used to be char * that was passed for
identifying phases.  But, then we got rid of char * progress parameters
altogether. So, there are no longer any text columns in
pg_stat_get_progress_info()'s result.  It may not work out well in long
run to forever not have those (your recent example comes to mind).

> 
> +   WHEN 0 THEN 100::numeric(5, 2)
> +   ELSE ((S.param3 + 1)::numeric / S.param2 * 
> 100)::numeric(5, 2)
> 
> This usage of numeric seems overkill to me.

Hmm, how could this rather be written?

>> * I attempted to implement a method to report index blocks done from
>> lazy_tid_reaped() albeit with some limitations. Patch 0003 is that
>> attempt.  In summary, I modified the index bulk delete callback interface
>> to receive a BlockNumber argument index_blkno:

[ snip ]

>> way of traversing the index pages. I dared only touch the btree case to
>> make it pass current block number to the callback. It finishes with
>> index_blks_done << total_index_blks since I guess the callback is called
>> only on the leaf pages. Any ideas?
> 
> To the contrary, I suppose it counts one index page more than
> once for the cases of uncorrelated heaps. index_blks_vacuumd can
> exceed RelationGetNumberOfBlocks() in extreme cases. If I'm not
> missing something, it stands on a quite fragile graound.

Yeah, the method is not entirely foolproof yet.

>> * I am also tempted to add num_dead_tuples and dead_tuples_vacuumed to add
>> granularity to 'vacuuming heap' phase but didn't in this version. Should we?
> 
> How do you think they are to be used?

I just realized there are objections to some columns be counters for pages
and others counting tuples.  So, I guess I withdraw.  I am just worried
that 'vacuuming heap' phase may take arbitrarily long if dead tuples array
is big.  Since we were thinking of adding more granularity to 'vacuuming
indexes' phase, I thought we should do for the former too.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-08 Thread Amit Langote
On 2016/03/09 9:22, Kyotaro HORIGUCHI wrote:
>> On Wed, Mar 9, 2016 at 12:24 AM, Robert Haas  wrote:
>>> This patch has been worked on by so many people and reviewed by so
>>> many people that I can't keep track of who should be credited when it
>>> gets committed.  Could someone provide a list of author(s) and
>>> reviewer(s)?
>>
>> Original authors are Rahila Syed and Vinayak Pokale.
>>
>> I have been reviewing this for last few CFs. I sent in last few
>> revisions as well.
> 
> The owner of this is Vinayak and, ah, I forgot to add myself as a
> reviewer. I have also reviewed this for last few CFs. 
> 
> So, as looking into CF app, it seems not so inconsistent with the
> persons who appears in this thread for thses three CFs.
> 
> Authors: Vinayak Pokale, Rahila Syed, Amit Langote
> Reviewers: Amit Langote, Kyotaro Horiguchi
> 
> Is there anyone who shold be added in this list?

Jim Nasby, Thom Brown, Masahiko Sawada, Fujii Masao, Masanori Oyama and of
course, Robert himself.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-08 Thread Amit Langote
On 2016/03/09 0:24, Robert Haas wrote:
> On Tue, Mar 8, 2016 at 3:02 AM, Amit Langote
>  wrote:
>> Updated versions attached.
>>
>> * changed st_progress_param to int64 and so did the argument of
>> pgstat_progress_update_param().  Likewise changed param1..param10 of
>> pg_stat_get_progress_info()'s output columns to bigint.
>>
>> * Added back the Oid field st_command_target and corresponding function
>> pgstat_progress_set_command_target(Oid).
> 
> What the heck do we have an SQL-visible pg_stat_reset_local_progress()
> for?  Surely if we ever need that, it's a bug.

OK, now I am not sure what I was thinking adding that function. Removed.

> I think pgstat_progress_update_param() should Assert(index >= 0 &&
> index < N_PROGRESS_PARAM).  But I'd rename N_PROGRESS_PARAM to
> PGSTAT_NUM_PROGRESS_PARAM.

Agreed, done.

> Regarding "XXX - privilege check is maybe dubious" - I think the
> privilege check here should match pg_stat_activity.  If it does,
> there's nothing dubious about that IMHO.

OK, done.  So, it shows pid column to all, while rest of the values -
relid, param1..param10 are only shown to role members.  Unlike
pg_stat_activity, there is no text column to stash a "" message into, so all that's done is to output null values.

The attached revision addresses above and one of Horiguchi-san's comments
in his email yesterday.

Thanks a lot for the review.

Thanks,
Amit
>From 9473230af72e0a0e3b60a8ddf1922698f7f17145 Mon Sep 17 00:00:00 2001
From: Amit 
Date: Sun, 28 Feb 2016 01:50:07 +0900
Subject: [PATCH 1/3] Provide a way for utility commands to report progress

Commands can update a set of parameters in shared memory using:

  pgstat_progress_update_param(param_index, param_value)

Up to 10 independent 64-bit integer values can be published by commands.
In addition to those, a command should always report its BackendCommandType
and the OID of the relation it is about to begin processing at the beginning
of the processing using:

  pgstat_progress_start_command(cmdtype)
  pgstat_progress_set_command_target(relid)

A view can be defined in system_views.sql that outputs the values returned
by pg_stat_get_progress_info(cmdtype), where 'cmdtype' is numeric value as
mentioned above.  Each such view has columns corresponding to the counters
published by respective commands.
---
 src/backend/postmaster/pgstat.c |   60 +++
 src/backend/utils/adt/pgstatfuncs.c |   91 +++
 src/include/catalog/pg_proc.h   |2 +
 src/include/pgstat.h|   25 ++
 4 files changed, 178 insertions(+), 0 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index da768c6..a7d0133 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -2731,6 +2731,7 @@ pgstat_bestart(void)
 	beentry->st_clienthostname[NAMEDATALEN - 1] = '\0';
 	beentry->st_appname[NAMEDATALEN - 1] = '\0';
 	beentry->st_activity[pgstat_track_activity_query_size - 1] = '\0';
+	beentry->st_command = COMMAND_INVALID;
 
 	pgstat_increment_changecount_after(beentry);
 
@@ -2851,6 +2852,65 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	pgstat_increment_changecount_after(beentry);
 }
 
+/*---
+ * pgstat_progress_update_param() -
+ *
+ * Update index'th member in st_progress_param[] of own backend entry.
+ *---
+ */
+void
+pgstat_progress_update_param(int index, int64 val)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	if(!beentry)
+		return;
+
+	if (!pgstat_track_activities)
+		return;
+
+	Assert(index >= 0 && index < PGSTAT_NUM_PROGRESS_PARAM);
+
+	pgstat_increment_changecount_before(beentry);
+	beentry->st_progress_param[index] = val;
+	pgstat_increment_changecount_after(beentry);
+}
+
+/*---
+ * pgstat_progress_start_command()-
+ *
+ * Set st_command in own backend entry.  Also, zero-initialize
+ * st_progress_param array.
+ *---
+ */
+void
+pgstat_progress_start_command(BackendCommandType cmdtype)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	pgstat_increment_changecount_before(beentry);
+	beentry->st_command = cmdtype;
+	MemSet(>st_progress_param, 0,
+		   sizeof(beentry->st_progress_param));
+	pgstat_increment_changecount_after(beentry);
+}
+
+/*---
+ * pgstat_progress_set_command_target()-
+ *
+ * Set st_command_target in own backend entry.
+ *---
+ */
+void
+pgstat_progress_set_command_target(Oid relid)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	pgstat_increment_changecount_before(beentry);
+	beentry->st_command_target = relid;
+	pgstat_increment_changecount_after(beentry);
+}
+
 /* --
  * pgstat_report_appname() -
  *
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 1b22fcc..a12310d 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -64,6 +64,7 @@ extern Datum 

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-08 Thread Kyotaro HORIGUCHI
At Wed, 9 Mar 2016 01:16:26 +0900, Amit Langote  wrote 
in 
> On Wed, Mar 9, 2016 at 12:24 AM, Robert Haas  wrote:
> > This patch has been worked on by so many people and reviewed by so
> > many people that I can't keep track of who should be credited when it
> > gets committed.  Could someone provide a list of author(s) and
> > reviewer(s)?
> 
> Original authors are Rahila Syed and Vinayak Pokale.
> 
> I have been reviewing this for last few CFs. I sent in last few
> revisions as well.

The owner of this is Vinayak and, ah, I forgot to add myself as a
reviewer. I have also reviewed this for last few CFs. 

So, as looking into CF app, it seems not so inconsistent with the
persons who appears in this thread for thses three CFs.

Authors: Vinayak Pokale, Rahila Syed, Amit Langote
Reviewers: Amit Langote, Kyotaro Horiguchi

Is there anyone who shold be added in this list?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-08 Thread Amit Langote
On Wed, Mar 9, 2016 at 12:24 AM, Robert Haas  wrote:
> This patch has been worked on by so many people and reviewed by so
> many people that I can't keep track of who should be credited when it
> gets committed.  Could someone provide a list of author(s) and
> reviewer(s)?

Original authors are Rahila Syed and Vinayak Pokale.

I have been reviewing this for last few CFs. I sent in last few
revisions as well.

Thanks,
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-08 Thread Robert Haas
On Tue, Mar 8, 2016 at 3:02 AM, Amit Langote
 wrote:
> Updated versions attached.
>
> * changed st_progress_param to int64 and so did the argument of
> pgstat_progress_update_param().  Likewise changed param1..param10 of
> pg_stat_get_progress_info()'s output columns to bigint.
>
> * Added back the Oid field st_command_target and corresponding function
> pgstat_progress_set_command_target(Oid).

What the heck do we have an SQL-visible pg_stat_reset_local_progress()
for?  Surely if we ever need that, it's a bug.

I think pgstat_progress_update_param() should Assert(index >= 0 &&
index < N_PROGRESS_PARAM).  But I'd rename N_PROGRESS_PARAM to
PGSTAT_NUM_PROGRESS_PARAM.

Regarding "XXX - privilege check is maybe dubious" - I think the
privilege check here should match pg_stat_activity.  If it does,
there's nothing dubious about that IMHO.

This patch has been worked on by so many people and reviewed by so
many people that I can't keep track of who should be credited when it
gets committed.  Could someone provide a list of author(s) and
reviewer(s)?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-08 Thread Kyotaro HORIGUCHI
You're so quick.

At Tue, 8 Mar 2016 17:02:24 +0900, Amit Langote  
wrote in <56de8710.4070...@lab.ntt.co.jp>
> On 2016/03/07 23:48, Robert Haas wrote:
> >> I don't like to treat the target object id differently from other
> >> parameters. It could not be needed at all, or could be needed two
> >> or more in contrast. Although oids are not guaranteed to fit
> >> uint32, we have already stored BlockNumber there.
> > 
> > Well...
> > 
> > There's not much point in deciding that the parameters are uint32,
> > because we don't have that type at the SQL level.
> > pgstat_progress_update_param() really ought to take either int32 or
> > int64 as an argument, because that's what we can actually handle from
> > SQL, and it seems pretty clear that int64 is better since otherwise we
> > can't fit, among other things, a block number.
> > 
> > Given that, I tend to think that treating the command target specially
> > and passing that as an OID is reasonable.  We're not going to be able
> > to pass variable-sized arrays through this mechanism, ever, because
> > our shared memory segment doesn't work like that.  And it seems to me
> > that nearly every command somebody might want to report progress on
> > will touch, basically, one relation a a time.  So I don't see the harm
> > in hardcoding that idea into the facility.

We'd concatenate two int32s into int64s but widening each
parameters to int64 would be preferable. Additional 4 bytes by
the defalut number of maxbackends 100 by 10 parameters = 4kb, 4MB
for 1000 backends is not so big for modern machines?

> Updated versions attached.
> 
> * changed st_progress_param to int64 and so did the argument of
> pgstat_progress_update_param().  Likewise changed param1..param10 of
> pg_stat_get_progress_info()'s output columns to bigint.
> 
> * Added back the Oid field st_command_target and corresponding function
> pgstat_progress_set_command_target(Oid).

+   beentry->st_command = COMMAND_INVALID;
+   MemSet(>st_progress_param, 0, 
sizeof(beentry->st_progress_param));

The MemSet seems useless since it gets the same initialization on
setting st_command.

+   /*
+* Report values for only those backends which are running the 
given
+* command.  XXX - privilege check is maybe dubious.
+*/
+   if (!beentry ||
+   beentry->st_command != cmdtype ||
+   !has_privs_of_role(GetUserId(), beentry->st_userid))
+   continue;

We can simplly ignore unpriviledged backends, or all zeroz or
nulls to signal that the caller has no priviledge.

0002

+   FROM pg_stat_get_progress_info(1) AS S;

Ah... This magick number seems quite bad.. The function should
take the command type in maybe string type.

+   FROM pg_stat_get_progress_info('lazy vacuum') AS S;

Using an array of the names would be acceptable, maybe.

| char *progress_command_names[] = {'lazy vacuum', NULL};

However the numbers for the phases ('scanning heap' and so..) is
acceptable for me for reasons uncertain to me, it also could be
represented in names but is might be rahter bothersome..

+ WHEN 0 THEN 100::numeric(5, 2)
+ ELSE ((S.param3 + 1)::numeric / S.param2 * 
100)::numeric(5, 2)

This usage of numeric seems overkill to me.



> * I attempted to implement a method to report index blocks done from
> lazy_tid_reaped() albeit with some limitations. Patch 0003 is that
> attempt.  In summary, I modified the index bulk delete callback interface
> to receive a BlockNumber argument index_blkno:
> 
>  /* Typedef for callback function to determine if a tuple is bulk-deletable */
> -typedef bool (*IndexBulkDeleteCallback) (ItemPointer itemptr, void *state);
> +typedef bool (*IndexBulkDeleteCallback) (ItemPointer itemptr,
> + BlockNumber index_blkno,
> + void *state);
> 
> Then added 2 more fields to LVRelStats:
> 
> @@ -143,6 +143,8 @@ typedef struct LVRelStats
>  int num_index_scans;
>  TransactionId latestRemovedXid;
>  boollock_waiter_detected;
> +BlockNumber last_index_blkno;
> +BlockNumber index_blks_vacuumed;
> 
> Then in lazy_tid_reaped(), if the index block number received in the
> index_blkno argument has changed from the previous call, increment the
> count of index blocks processed and
> pgstat_report_update_param(index_blks_done). I wonder if we should reset
> the the saved block number and the count for every index vacuumed by
> lazy_vacuum_index(). Right now, total_index_blks counts all indexes and
> counting blocks using the rough method mentioned above is sensible only
> for one index at time.  Actually, the method has different kinds of
> problems to deal with anyway. For example, with a btree index, one can
> expect that the final count does not match total_index_blks obtained using
> 

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-08 Thread Amit Langote
On 2016/03/07 23:48, Robert Haas wrote:
> On Sun, Mar 6, 2016 at 11:02 PM, Kyotaro HORIGUCHI
>  wrote:
>> The 0001-P.. adds the following interface functions.
>>
>> +extern void pgstat_progress_set_command(BackendCommandType cmdtype);
>> +extern void pgstat_progress_set_command_target(Oid objid);
>> +extern void pgstat_progress_update_param(int index, uint32 val);
>> +extern void pgstat_reset_local_progress(void);
>> +extern int pgstat_progress_get_num_param(BackendCommandType cmdtype);
>>
>> I don't like to treat the target object id differently from other
>> parameters. It could not be needed at all, or could be needed two
>> or more in contrast. Although oids are not guaranteed to fit
>> uint32, we have already stored BlockNumber there.
> 
> Well...
> 
> There's not much point in deciding that the parameters are uint32,
> because we don't have that type at the SQL level.
> pgstat_progress_update_param() really ought to take either int32 or
> int64 as an argument, because that's what we can actually handle from
> SQL, and it seems pretty clear that int64 is better since otherwise we
> can't fit, among other things, a block number.
> 
> Given that, I tend to think that treating the command target specially
> and passing that as an OID is reasonable.  We're not going to be able
> to pass variable-sized arrays through this mechanism, ever, because
> our shared memory segment doesn't work like that.  And it seems to me
> that nearly every command somebody might want to report progress on
> will touch, basically, one relation a a time.  So I don't see the harm
> in hardcoding that idea into the facility.

Updated versions attached.

* changed st_progress_param to int64 and so did the argument of
pgstat_progress_update_param().  Likewise changed param1..param10 of
pg_stat_get_progress_info()'s output columns to bigint.

* Added back the Oid field st_command_target and corresponding function
pgstat_progress_set_command_target(Oid).

* I attempted to implement a method to report index blocks done from
lazy_tid_reaped() albeit with some limitations. Patch 0003 is that
attempt.  In summary, I modified the index bulk delete callback interface
to receive a BlockNumber argument index_blkno:

 /* Typedef for callback function to determine if a tuple is bulk-deletable */
-typedef bool (*IndexBulkDeleteCallback) (ItemPointer itemptr, void *state);
+typedef bool (*IndexBulkDeleteCallback) (ItemPointer itemptr,
+ BlockNumber index_blkno,
+ void *state);

Then added 2 more fields to LVRelStats:

@@ -143,6 +143,8 @@ typedef struct LVRelStats
 int num_index_scans;
 TransactionId latestRemovedXid;
 boollock_waiter_detected;
+BlockNumber last_index_blkno;
+BlockNumber index_blks_vacuumed;

Then in lazy_tid_reaped(), if the index block number received in the
index_blkno argument has changed from the previous call, increment the
count of index blocks processed and
pgstat_report_update_param(index_blks_done). I wonder if we should reset
the the saved block number and the count for every index vacuumed by
lazy_vacuum_index(). Right now, total_index_blks counts all indexes and
counting blocks using the rough method mentioned above is sensible only
for one index at time.  Actually, the method has different kinds of
problems to deal with anyway. For example, with a btree index, one can
expect that the final count does not match total_index_blks obtained using
RelationGetNumberOfBlocks().  Moreover, each AM has its own idiosyncratic
way of traversing the index pages. I dared only touch the btree case to
make it pass current block number to the callback. It finishes with
index_blks_done << total_index_blks since I guess the callback is called
only on the leaf pages. Any ideas?

* I am also tempted to add num_dead_tuples and dead_tuples_vacuumed to add
granularity to 'vacuuming heap' phase but didn't in this version. Should we?

Thanks,
Amit
>From 1608f3bd8153045da9cb3db269597b92042a4d9c Mon Sep 17 00:00:00 2001
From: Amit 
Date: Sun, 28 Feb 2016 01:50:07 +0900
Subject: [PATCH 1/3] Provide a way for utility commands to report progress

Commands can update a set of parameters in shared memory using:

  pgstat_progress_update_param(param_index, param_value)

Up to 10 independent 64-bit integer values can be published by commands.
In addition to those, a command should always report its BackendCommandType
and the OID of the relation it is about to begin processing at the beginning
of the processing using:

  pgstat_progress_start_command(cmdtype)
  pgstat_progress_set_command_target(relid)

A view can be defined in system_views.sql that outputs the values returned
by pg_stat_get_progress_info(cmdtype), where 'cmdtype' is numeric value as
mentioned above.  Each such view has columns corresponding to the counters
published by respective commands.

There is a 

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-07 Thread Robert Haas
On Sun, Mar 6, 2016 at 11:02 PM, Kyotaro HORIGUCHI
 wrote:
> At Sat, 5 Mar 2016 16:41:29 +0900, Amit Langote  
> wrote in 
>> On Sat, Mar 5, 2016 at 4:24 PM, Amit Langote  wrote:
>> > So, I took the Vinayak's latest patch and rewrote it a little
>> ...
>> > I broke it into two:
>> >
>> > 0001-Provide-a-way-for-utility-commands-to-report-progres.patch
>> > 0002-Implement-progress-reporting-for-VACUUM-command.patch
>>
>> Oops, unamended commit messages in those patches are misleading.  So,
>> please find attached corrected versions.
>
> The 0001-P.. adds the following interface functions.
>
> +extern void pgstat_progress_set_command(BackendCommandType cmdtype);
> +extern void pgstat_progress_set_command_target(Oid objid);
> +extern void pgstat_progress_update_param(int index, uint32 val);
> +extern void pgstat_reset_local_progress(void);
> +extern int pgstat_progress_get_num_param(BackendCommandType cmdtype);
>
> I don't like to treat the target object id differently from other
> parameters. It could not be needed at all, or could be needed two
> or more in contrast. Although oids are not guaranteed to fit
> uint32, we have already stored BlockNumber there.

Well...

There's not much point in deciding that the parameters are uint32,
because we don't have that type at the SQL level.
pgstat_progress_update_param() really ought to take either int32 or
int64 as an argument, because that's what we can actually handle from
SQL, and it seems pretty clear that int64 is better since otherwise we
can't fit, among other things, a block number.

Given that, I tend to think that treating the command target specially
and passing that as an OID is reasonable.  We're not going to be able
to pass variable-sized arrays through this mechanism, ever, because
our shared memory segment doesn't work like that.  And it seems to me
that nearly every command somebody might want to report progress on
will touch, basically, one relation a a time.  So I don't see the harm
in hardcoding that idea into the facility.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-07 Thread Amit Langote
On 2016/03/07 19:11, Amit Langote wrote:
> we should re-introduce[1] a fixed-size char st_progress_message[] field.

Sorry, that [1] does not refer to anything, just a leftover from my draft.
 I thought I had a link handy for an email where some sort of
justification was given as to why st_progress_message field was removed
from the patch.  I couldn't find it.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-07 Thread Amit Langote

Horiguchi-san,

Thanks for a quick reply, :)

On 2016/03/07 18:18, Kyotaro HORIGUCHI wrote:
> At Mon, 7 Mar 2016 16:16:30 +0900, Amit Langote wrote:
>> On 2016/03/07 13:02, Kyotaro HORIGUCHI wrote:
>>> The 0001-P.. adds the following interface functions.
>>>
>>> I don't like to treat the target object id differently from other
>>> parameters. It could not be needed at all, or could be needed two
>>> or more in contrast. Although oids are not guaranteed to fit
>>> uint32, we have already stored BlockNumber there.
>>
>> I thought giving cmdtype and objid each its own slot would make things a
>> little bit clearer than stuffing them into st_progress_param[0] and
>> st_progress_param[1], respectively.  Is that what you are suggesting?
>> Although as I've don, a separate field st_command_objid may be a bit too 
>> much.
> 
> I mentioned only of object id as you seem to take me. The command
> type is essential unlike the target object ids. It is needed by
> all statistics views of this kind to filter required backends.

Yep.

>>> # I think that integer arrays might be needed to be passed as a
>>> # parameter, but it would be the another issue.
>>
>> Didn't really think about it.  Maybe we should consider a scenario that
>> would require it.
> 
> Imagine to provide a statictics of a vacuum commnad as a
> whole. It will vacuum several relations at once so the view could
> be like the following.
> 
> select * from pg_stat_vacuum_command;
> - [ Record 1 ]
> worker_pid : 3243
> command: vacuum full
> rels_scheduled : {16387, 16390, 16393}
> rels_finished  : {16384}
> status : Processing 16384, awiting for a lock.
> ..
> 
> This needs arrays if we want this but it would be another issue
> as I said.

Oh, I see. This does call for at least some consideration of how to
support variable size parameter values.

By the way, looking at the "status" message in your example, it doesn't
seem like a candidate for evaluation in a CASE..WHEN expression?  Maybe,
we should re-introduce[1] a fixed-size char st_progress_message[] field.
Since, ISTM, such a command's internal code is in better position to
compute that kind of message string.  IIRC, a reason that was given to not
have such a field was, among other things, the copy overhead of message
strings.  But commands like the one in your example, could afford that
much overhead since the frequency of message change would be less and less
compared with the time elapsed between the changes anyway.

>> I think the fixed number of parameters in the form of a fixed-size array
>> is because st_progress_param[] is part of a shared memory structure as
>> discussed before.  Although such interface has been roughly modeled on how
>> pg_statistic catalog and pg_stats view or get_attstatsslot() function
>> work, shared memory structures take the place of the catalog, so there are
>> some restrictions (fixed size array being one).
> 
> It depends on how easy we take it to widen the parameter slots in
> shared memory:p Anyway I don't stick that since it doesn't make
> a siginificant difference.

Your above example makes me wonder how we can provide for it.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-07 Thread Kyotaro HORIGUCHI
Hi, Amit.

At Mon, 7 Mar 2016 16:16:30 +0900, Amit Langote  
wrote in <56dd2ace.5050...@lab.ntt.co.jp>
> 
> Horiguchi-san,
> 
> Thanks a lot for taking a look!
> 
> On 2016/03/07 13:02, Kyotaro HORIGUCHI wrote:
> > At Sat, 5 Mar 2016 16:41:29 +0900, Amit Langote wrote:
> >> On Sat, Mar 5, 2016 at 4:24 PM, Amit Langote  
> >> wrote:
> >>> So, I took the Vinayak's latest patch and rewrote it a little
> >> ...
> >>> I broke it into two:
> >>>
> >>> 0001-Provide-a-way-for-utility-commands-to-report-progres.patch
> >>> 0002-Implement-progress-reporting-for-VACUUM-command.patch
> >>
> >> Oops, unamended commit messages in those patches are misleading.  So,
> >> please find attached corrected versions.
> > 
> > The 0001-P.. adds the following interface functions.
> > 
> > +extern void pgstat_progress_set_command(BackendCommandType cmdtype);
> > +extern void pgstat_progress_set_command_target(Oid objid);
> > +extern void pgstat_progress_update_param(int index, uint32 val);
> > +extern void pgstat_reset_local_progress(void);
> > +extern int pgstat_progress_get_num_param(BackendCommandType cmdtype);
> > 
> > I don't like to treat the target object id differently from other
> > parameters. It could not be needed at all, or could be needed two
> > or more in contrast. Although oids are not guaranteed to fit
> > uint32, we have already stored BlockNumber there.
> 
> I thought giving cmdtype and objid each its own slot would make things a
> little bit clearer than stuffing them into st_progress_param[0] and
> st_progress_param[1], respectively.  Is that what you are suggesting?
> Although as I've don, a separate field st_command_objid may be a bit too much.

I mentioned only of object id as you seem to take me. The command
type is essential unlike the target object ids. It is needed by
all statistics views of this kind to filter required backends.

> If they are not special fields, I think we don't need special interface
> functions *set_command() and *set_command_target().  But I am still
> inclined toward keeping the former.
> 
> > # I think that integer arrays might be needed to be passed as a
> > # parameter, but it would be the another issue.
> 
> Didn't really think about it.  Maybe we should consider a scenario that
> would require it.

Imagine to provide a statictics of a vacuum commnad as a
whole. It will vacuum several relations at once so the view could
be like the following.

select * from pg_stat_vacuum_command;
- [ Record 1 ]
worker_pid : 3243
command: vacuum full
rels_scheduled : {16387, 16390, 16393}
rels_finished  : {16384}
status : Processing 16384, awiting for a lock.
..

This needs arrays if we want this but it would be another issue
as I said.


> > pg_stat_get_progress_info returns a tuple with 10 integer columns
> > (plus an object id). The reason why I suggested use of an integer
> > array is that it allows the API to serve arbitrary number of
> > parmeters without a modification of API, and array indexes are
> > coloreless than any concrete names. Howerver I don't stick to
> > that if we agree that it is ok to have fixed number of paremters.
> 
> I think the fixed number of parameters in the form of a fixed-size array
> is because st_progress_param[] is part of a shared memory structure as
> discussed before.  Although such interface has been roughly modeled on how
> pg_statistic catalog and pg_stats view or get_attstatsslot() function
> work, shared memory structures take the place of the catalog, so there are
> some restrictions (fixed size array being one).

It depends on how easy we take it to widen the parameter slots in
shared memory:p Anyway I don't stick that since it doesn't make
a siginificant difference.

> Regarding index into st_progress_param[], pgstat.c/pgstatfuncs.c should
> not bother what it is.  As exemplified in patch 0002, individual index
> numbers can be defined as macros by individual command modules (suggested
> by Robert recently) with certain convention for readability such as the
> following in lazyvacuum.c:
> 
> #define PROG_PAR_VAC_RELID 0
> #define PROG_PAR_VAC_PHASE_ID  1
> #define PROG_PAR_VAC_HEAP_BLKS 2
> #define PROG_PAR_VAC_CUR_HEAP_BLK  3
> ... so on.
> 
> Then, to report a changed parameter:
> 
> pgstat_progress_update_param(PROG_PAR_VAC_PHASE_ID, LV_PHASE_SCAN_HEAP);
> ...
> pgstat_progress_update_param(PROG_PAR_VAC_CUR_HEAP_BLK, blkno);

Yeah, it seems fine for me.

> by the way, following is proargnames[] for pg_stat_get_progress_info():
> 
> cmdtype integer,
> OUT pid integer,
> OUT param1 integer,
> OUT param2 integer,
> ...
> OUT param10 integer
> 
> So, it is a responsibility of a command specific progress view definition
> that it interprets values of param1..param10 appropriately.  In fact, the
> implementer of the progress reporting for a command determines what goes
> into which slot of st_progress_param[], 

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-06 Thread Amit Langote

Horiguchi-san,

Thanks a lot for taking a look!

On 2016/03/07 13:02, Kyotaro HORIGUCHI wrote:
> At Sat, 5 Mar 2016 16:41:29 +0900, Amit Langote wrote:
>> On Sat, Mar 5, 2016 at 4:24 PM, Amit Langote  wrote:
>>> So, I took the Vinayak's latest patch and rewrote it a little
>> ...
>>> I broke it into two:
>>>
>>> 0001-Provide-a-way-for-utility-commands-to-report-progres.patch
>>> 0002-Implement-progress-reporting-for-VACUUM-command.patch
>>
>> Oops, unamended commit messages in those patches are misleading.  So,
>> please find attached corrected versions.
> 
> The 0001-P.. adds the following interface functions.
> 
> +extern void pgstat_progress_set_command(BackendCommandType cmdtype);
> +extern void pgstat_progress_set_command_target(Oid objid);
> +extern void pgstat_progress_update_param(int index, uint32 val);
> +extern void pgstat_reset_local_progress(void);
> +extern int   pgstat_progress_get_num_param(BackendCommandType cmdtype);
> 
> I don't like to treat the target object id differently from other
> parameters. It could not be needed at all, or could be needed two
> or more in contrast. Although oids are not guaranteed to fit
> uint32, we have already stored BlockNumber there.

I thought giving cmdtype and objid each its own slot would make things a
little bit clearer than stuffing them into st_progress_param[0] and
st_progress_param[1], respectively.  Is that what you are suggesting?
Although as I've don, a separate field st_command_objid may be a bit too much.

If they are not special fields, I think we don't need special interface
functions *set_command() and *set_command_target().  But I am still
inclined toward keeping the former.

> # I think that integer arrays might be needed to be passed as a
> # parameter, but it would be the another issue.

Didn't really think about it.  Maybe we should consider a scenario that
would require it.

> pg_stat_get_progress_info returns a tuple with 10 integer columns
> (plus an object id). The reason why I suggested use of an integer
> array is that it allows the API to serve arbitrary number of
> parmeters without a modification of API, and array indexes are
> coloreless than any concrete names. Howerver I don't stick to
> that if we agree that it is ok to have fixed number of paremters.

I think the fixed number of parameters in the form of a fixed-size array
is because st_progress_param[] is part of a shared memory structure as
discussed before.  Although such interface has been roughly modeled on how
pg_statistic catalog and pg_stats view or get_attstatsslot() function
work, shared memory structures take the place of the catalog, so there are
some restrictions (fixed size array being one).

Regarding index into st_progress_param[], pgstat.c/pgstatfuncs.c should
not bother what it is.  As exemplified in patch 0002, individual index
numbers can be defined as macros by individual command modules (suggested
by Robert recently) with certain convention for readability such as the
following in lazyvacuum.c:

#define PROG_PAR_VAC_RELID 0
#define PROG_PAR_VAC_PHASE_ID  1
#define PROG_PAR_VAC_HEAP_BLKS 2
#define PROG_PAR_VAC_CUR_HEAP_BLK  3
... so on.

Then, to report a changed parameter:

pgstat_progress_update_param(PROG_PAR_VAC_PHASE_ID, LV_PHASE_SCAN_HEAP);
...
pgstat_progress_update_param(PROG_PAR_VAC_CUR_HEAP_BLK, blkno);

by the way, following is proargnames[] for pg_stat_get_progress_info():

cmdtype integer,
OUT pid integer,
OUT param1 integer,
OUT param2 integer,
...
OUT param10 integer

So, it is a responsibility of a command specific progress view definition
that it interprets values of param1..param10 appropriately.  In fact, the
implementer of the progress reporting for a command determines what goes
into which slot of st_progress_param[], to begin with.

> pgstat_progress_get_num_param looks not good in the aspect of
> genericity. I'd like to define it as an integer array by idexed
> by the command type if it is needed. However it seems to me to be
> enough that pg_stat_get_progress_info always returns 10 integers
> regardless of what the numbers are for. The user sql function,
> pg_stat_vacuum_progress as the first user, knows how many numbers
> should be read for its work. It reads zeroes safely even if it
> reads more than what the producer side offered (unless it tries
> to divide something with it).

Thinking a bit, perhaps we don't need num_param(cmdtpye) function or array
at all as you seem to suggest.  It serves no useful purpose now that I see
it. pg_stat_get_progress_info() should simply copy
st_progress_param[0...PG_STAT_GET_PROGRESS_COLS-1] to the result and view
definer knows what's what.

Attached updated patches which incorporate above mentioned changes.  If
Vinayak has something else in mind about anything, he can weigh in.

Thanks,
Amit
>From 52a398f5104cd50f8bfcc4fd1fbb5bb102eddbf5 Mon Sep 17 00:00:00 2001
From: Amit 

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-06 Thread Kyotaro HORIGUCHI
Hi, Thank you for the patch.

At Sat, 5 Mar 2016 16:41:29 +0900, Amit Langote  wrote 
in 
> On Sat, Mar 5, 2016 at 4:24 PM, Amit Langote  wrote:
> > So, I took the Vinayak's latest patch and rewrote it a little
> ...
> > I broke it into two:
> >
> > 0001-Provide-a-way-for-utility-commands-to-report-progres.patch
> > 0002-Implement-progress-reporting-for-VACUUM-command.patch
> 
> Oops, unamended commit messages in those patches are misleading.  So,
> please find attached corrected versions.

The 0001-P.. adds the following interface functions.

+extern void pgstat_progress_set_command(BackendCommandType cmdtype);
+extern void pgstat_progress_set_command_target(Oid objid);
+extern void pgstat_progress_update_param(int index, uint32 val);
+extern void pgstat_reset_local_progress(void);
+extern int pgstat_progress_get_num_param(BackendCommandType cmdtype);

I don't like to treat the target object id differently from other
parameters. It could not be needed at all, or could be needed two
or more in contrast. Although oids are not guaranteed to fit
uint32, we have already stored BlockNumber there.

# I think that integer arrays might be needed to be passed as a
# parameter, but it would be the another issue.

pg_stat_get_progress_info returns a tuple with 10 integer columns
(plus an object id). The reason why I suggested use of an integer
array is that it allows the API to serve arbitrary number of
parmeters without a modification of API, and array indexes are
coloreless than any concrete names. Howerver I don't stick to
that if we agree that it is ok to have fixed number of paremters.

pgstat_progress_get_num_param looks not good in the aspect of
genericity. I'd like to define it as an integer array by idexed
by the command type if it is needed. However it seems to me to be
enough that pg_stat_get_progress_info always returns 10 integers
regardless of what the numbers are for. The user sql function,
pg_stat_vacuum_progress as the first user, knows how many numbers
should be read for its work. It reads zeroes safely even if it
reads more than what the producer side offered (unless it tries
to divide something with it).


What do you think about this?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-06 Thread pokurev
Hi Amit,

Thank you for updating the patch. I am testing it and I will try to improve it.

Regards,
Vinayak
> -Original Message-
> From: Amit Langote [mailto:amitlangot...@gmail.com]
> Sent: Saturday, March 05, 2016 4:41 PM
> To: Robert Haas <robertmh...@gmail.com>
> Cc: SPS ポクレ ヴィナヤック(三技術) <poku...@pm.nttdata.co.jp>;
> Amit Langote <langote_amit...@lab.ntt.co.jp>; Kyotaro HORIGUCHI
> <horiguchi.kyot...@lab.ntt.co.jp>; pgsql-hackers@postgresql.org; SPS 坂野
> 昌平(三技術) <ban...@nttdata.co.jp>
> Subject: Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.
> 
> On Sat, Mar 5, 2016 at 4:24 PM, Amit Langote <amitlangot...@gmail.com>
> wrote:
> > So, I took the Vinayak's latest patch and rewrote it a little
> ...
> > I broke it into two:
> >
> > 0001-Provide-a-way-for-utility-commands-to-report-progres.patch
> > 0002-Implement-progress-reporting-for-VACUUM-command.patch
> 
> Oops, unamended commit messages in those patches are misleading.  So,
> please find attached corrected versions.
> 
> Thanks,
> Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-04 Thread Amit Langote
On Sat, Mar 5, 2016 at 4:24 PM, Amit Langote  wrote:
> So, I took the Vinayak's latest patch and rewrote it a little
...
> I broke it into two:
>
> 0001-Provide-a-way-for-utility-commands-to-report-progres.patch
> 0002-Implement-progress-reporting-for-VACUUM-command.patch

Oops, unamended commit messages in those patches are misleading.  So,
please find attached corrected versions.

Thanks,
Amit


0001-Provide-a-way-for-utility-commands-to-report-progres-v2.patch
Description: Binary data


0002-Implement-progress-reporting-for-VACUUM-command-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-04 Thread Amit Langote
On Sat, Mar 5, 2016 at 7:11 AM, Robert Haas  wrote:
> On Fri, Feb 26, 2016 at 3:28 AM,   wrote:
>> Thank you for your comments.
>> Please find attached patch addressing following comments.
>
> I'm positive I've said this at least once before while reviewing this
> patch, and I think more than once: we should be trying to build a
> general progress-reporting facility here with vacuum as the first
> user.  Therefore, for example, pg_stat_get_progress_info's output
> columns should have generic names, not names specific to VACUUM.
> pg_stat_vacuum_progress can alias them to a vacuum-specific name.  See
> for example the relationship between pg_stats and pg_statistic.
>
> I think VACUUM should have three phases, not two.  lazy_vacuum_index()
> and lazy_vacuum_heap() are lumped together right now, but I think they
> shouldn't be.
>
> Please create named constants for the first argument to
> pgstat_report_progress_update_counter(), maybe with names like
> PROGRESS_VACUUM_WHATEVER.
>
> +   /* Update current block number of the relation */
> +   pgstat_report_progress_update_counter(2, blkno + 1);
>
> Why + 1?
>
> I thought we had a plan to update the counter of scanned index pages
> after each index page was vacuumed by the AM.  Doing it only after
> vacuuming the entire index is much less granular and generally less
> useful.   See 
> http://www.postgresql.org/message-id/56500356.4070...@bluetreble.com
>
> +   if (blkno == nblocks - 1 &&
> vacrelstats->num_dead_tuples == 0 && nindexes != 0
> +   && vacrelstats->num_index_scans == 0)
> +   total_index_pages = 0;
>
> I'm not sure what this is trying to do, perhaps because there is no
> comment explaining it.  Whatever the intent, I suspect that such a
> complex test is likely to be fragile.  Perhaps there is a better way?

So, I took the Vinayak's latest patch and rewrote it a little while
maintaining the original idea but modifying code to some degree.  Hope
original author(s) are okay with it.  Vinayak, do see if the rewritten
patch is alright and improve it anyway you want.

I broke it into two:

0001-Provide-a-way-for-utility-commands-to-report-progres.patch
0002-Implement-progress-reporting-for-VACUUM-command.patch

The code review comments received recently (including mine) have been
incorporated.

However, I didn't implement the report-per-index-page-vacuumed bit but
should be easy to code once the details are finalized (questions like
whether it requires modifying any existing interfaces, etc).

Thanks,
Amit


0001-Provide-a-way-for-utility-commands-to-report-progres.patch
Description: Binary data


0002-Implement-progress-reporting-for-VACUUM-command.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-04 Thread Robert Haas
On Fri, Feb 26, 2016 at 3:28 AM,   wrote:
> Thank you for your comments.
> Please find attached patch addressing following comments.
>
>>As I might have written upthread, transferring the whole string
>>as a progress message is useless at least in this scenario. Since
>>they are a set of fixed messages, each of them can be represented
>>by an identifier, an integer number. I don't see a reason for
>>sending the whole of a string beyond a backend.
> Agreed. I used following macros.
> #define VACUUM_PHASE_SCAN_HEAP  1
> #define VACUUM_PHASE_VACUUM_INDEX_HEAP  2
>
>>I guess num_index_scans could better be reported after all the indexes are
>>done, that is, after the for loop ends.
> Agreed.  I have corrected it.
>
>> CREATE VIEW pg_stat_vacuum_progress AS
>>   SELECT S.s[1] as pid,
>>  S.s[2] as relid,
>>  CASE S.s[3]
>>WHEN 1 THEN 'Scanning Heap'
>>WHEN 2 THEN 'Vacuuming Index and Heap'
>>ELSE 'Unknown phase'
>>  END,
>>
>>   FROM pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as S;
>>
>> # The name of the function could be other than *_command_progress.
> The name of function is updated as pg_stat_get_progress_info() and also 
> updated the function.
> Updated the pg_stat_vacuum_progress view as suggested.

I'm positive I've said this at least once before while reviewing this
patch, and I think more than once: we should be trying to build a
general progress-reporting facility here with vacuum as the first
user.  Therefore, for example, pg_stat_get_progress_info's output
columns should have generic names, not names specific to VACUUM.
pg_stat_vacuum_progress can alias them to a vacuum-specific name.  See
for example the relationship between pg_stats and pg_statistic.

I think VACUUM should have three phases, not two.  lazy_vacuum_index()
and lazy_vacuum_heap() are lumped together right now, but I think they
shouldn't be.

Please create named constants for the first argument to
pgstat_report_progress_update_counter(), maybe with names like
PROGRESS_VACUUM_WHATEVER.

+   /* Update current block number of the relation */
+   pgstat_report_progress_update_counter(2, blkno + 1);

Why + 1?

I thought we had a plan to update the counter of scanned index pages
after each index page was vacuumed by the AM.  Doing it only after
vacuuming the entire index is much less granular and generally less
useful.   See 
http://www.postgresql.org/message-id/56500356.4070...@bluetreble.com

+   if (blkno == nblocks - 1 &&
vacrelstats->num_dead_tuples == 0 && nindexes != 0
+   && vacrelstats->num_index_scans == 0)
+   total_index_pages = 0;

I'm not sure what this is trying to do, perhaps because there is no
comment explaining it.  Whatever the intent, I suspect that such a
complex test is likely to be fragile.  Perhaps there is a better way?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-01 Thread Kyotaro HORIGUCHI
Hello, thank for testing this.

At Sat, 27 Feb 2016 17:19:05 +, 大山真実  wrote 
in 
> Hi!
> 
> I'm interesting this patch and tested it. I found two strange thing.
> 
> * Incorrect counting
> 
> Reproduce:
>   1. Client1 execute "VACUUM"
>   2. Client2 execute "VACUUM"
>   3. Client3 execute "SELECT * FROM pg_stat_vacuum_progress".
>  pid  | relid | phase | total_heap_blks | current_heap_blkno |
> total_index_pages | scanned_index_pages | index_scan_count |
> percent_complete
> --+---+---+-++---+-+--+--
>  9267 | 16551 | Scanning Heap |  164151 |316 |
> 27422 |   7 |1 |0
>  9764 | 16554 | Scanning Heap |   2 |  2 |
> 2 |   27422 |1 |  100
> (2 rows)
> 
>   Client2 is waiting for Clinet1 "VACUUM" but percent_complete of Client2
> "VACUUM" is 100.
> * Not end VACUUM ANALYZE in spite of "percent_complete=100"

The inidividual record is telling about *one* relation now under
vacuuming (or just after the processing), not about all relations
to be vacuumed as a whole. It is the specification of this patch
for now. However it cannot tell how long the invoker should wait
for the vauum to end, it seems to be way difficult to calculate
statistics against the all relations to be processed.

Anyway other status messages such as "Waiting for " would be
necessary.

>   Client_1 execute "VACUUM ANALYZE", then Client_2 execute "SELECT * FROM
> pg_stat_vacuum_progress".
> 
>  pid  | relid | phase | total_heap_blks | current_heap_blkno |
> total_index_pages | scanned_index_pages | index_scan_count |
> percent_complete
> --+---+---+-++---+-+--+--
>  9277 | 16551 | Scanning Heap |  163935 | 163935 |
> 27422 |   7 |1 |  100
> (1 row
> 
>   percent_complete is 100 but Client_1 "VACUUM ANALYZE" do not response yet.
> 
>   Of course, Client_1 is executing analyze after vacuum. But it seem to me
> that this confuses users.
>   If percent_complete becomes 100 that row should be deleted quickly.

Maybe some works other than vacuuming pages is performing or
waiting a lock to be acquired. If it is a matter of progress, it
should be counted in the progress, but not for something like
waiting for a lock. It is a matter of status messages.

> Regards,
> Masanori Ohyama
> NTT Open Source Software Center
> 
> 2016年2月27日(土) 13:54 Vinayak Pokale :
> 
> > Hello,
> >
> > On Fri, Feb 26, 2016 at 6:19 PM, Amit Langote <
> > langote_amit...@lab.ntt.co.jp> wrote:
> >
> >>
> >> Hi Vinayak,
> >>
> >> Thanks for updating the patch! A quick comment:
> >>
> >> On 2016/02/26 17:28, poku...@pm.nttdata.co.jp wrote:
> >> >> CREATE VIEW pg_stat_vacuum_progress AS
> >> >>   SELECT S.s[1] as pid,
> >> >>  S.s[2] as relid,
> >> >>  CASE S.s[3]
> >> >>WHEN 1 THEN 'Scanning Heap'
> >> >>WHEN 2 THEN 'Vacuuming Index and Heap'
> >> >>ELSE 'Unknown phase'
> >> >>  END,
> >> >>
> >> >>   FROM pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as S;
> >> >>
> >> >> # The name of the function could be other than *_command_progress.
> >> > The name of function is updated as pg_stat_get_progress_info() and also
> >> updated the function.
> >> > Updated the pg_stat_vacuum_progress view as suggested.
> >>
> >> So, pg_stat_get_progress_info() now accepts a parameter to distinguish
> >> different commands.  I see the following in its definition:
> >>
> >> +   /*  Report values for only those backends which are
> >> running VACUUM
> >> command */
> >> +   if (cmdtype == COMMAND_LAZY_VACUUM)
> >> +   {
> >> +   /*Progress can only be viewed by role member.*/
> >> +   if (has_privs_of_role(GetUserId(),
> >> beentry->st_userid))
> >> +   {
> >> +   values[2] =
> >> UInt32GetDatum(beentry->st_progress_param[0]);
> >> +   values[3] =
> >> UInt32GetDatum(beentry->st_progress_param[1]);
> >> +   values[4] =
> >> UInt32GetDatum(beentry->st_progress_param[2]);
> >> +   values[5] =
> >> UInt32GetDatum(beentry->st_progress_param[3]);
> >> +   values[6] =
> >> UInt32GetDatum(beentry->st_progress_param[4]);
> >> +   values[7] =
> >> UInt32GetDatum(beentry->st_progress_param[5]);
> >> +   if (beentry->st_progress_param[1] 

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-02-27 Thread 大山真実
Hi!

I'm interesting this patch and tested it. I found two strange thing.

* Incorrect counting

Reproduce:
  1. Client1 execute "VACUUM"
  2. Client2 execute "VACUUM"
  3. Client3 execute "SELECT * FROM pg_stat_vacuum_progress".
 pid  | relid | phase | total_heap_blks | current_heap_blkno |
total_index_pages | scanned_index_pages | index_scan_count |
percent_complete
--+---+---+-++---+-+--+--
 9267 | 16551 | Scanning Heap |  164151 |316 |
27422 |   7 |1 |0
 9764 | 16554 | Scanning Heap |   2 |  2 |
2 |   27422 |1 |  100
(2 rows)

  Client2 is waiting for Clinet1 "VACUUM" but percent_complete of Client2
"VACUUM" is 100.

* Not end VACUUM ANALYZE in spite of "percent_complete=100"

  Client_1 execute "VACUUM ANALYZE", then Client_2 execute "SELECT * FROM
pg_stat_vacuum_progress".

 pid  | relid | phase | total_heap_blks | current_heap_blkno |
total_index_pages | scanned_index_pages | index_scan_count |
percent_complete
--+---+---+-++---+-+--+--
 9277 | 16551 | Scanning Heap |  163935 | 163935 |
27422 |   7 |1 |  100
(1 row

  percent_complete is 100 but Client_1 "VACUUM ANALYZE" do not response yet.

  Of course, Client_1 is executing analyze after vacuum. But it seem to me
that this confuses users.
  If percent_complete becomes 100 that row should be deleted quickly.

Regards,
Masanori Ohyama
NTT Open Source Software Center

2016年2月27日(土) 13:54 Vinayak Pokale :

> Hello,
>
> On Fri, Feb 26, 2016 at 6:19 PM, Amit Langote <
> langote_amit...@lab.ntt.co.jp> wrote:
>
>>
>> Hi Vinayak,
>>
>> Thanks for updating the patch! A quick comment:
>>
>> On 2016/02/26 17:28, poku...@pm.nttdata.co.jp wrote:
>> >> CREATE VIEW pg_stat_vacuum_progress AS
>> >>   SELECT S.s[1] as pid,
>> >>  S.s[2] as relid,
>> >>  CASE S.s[3]
>> >>WHEN 1 THEN 'Scanning Heap'
>> >>WHEN 2 THEN 'Vacuuming Index and Heap'
>> >>ELSE 'Unknown phase'
>> >>  END,
>> >>
>> >>   FROM pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as S;
>> >>
>> >> # The name of the function could be other than *_command_progress.
>> > The name of function is updated as pg_stat_get_progress_info() and also
>> updated the function.
>> > Updated the pg_stat_vacuum_progress view as suggested.
>>
>> So, pg_stat_get_progress_info() now accepts a parameter to distinguish
>> different commands.  I see the following in its definition:
>>
>> +   /*  Report values for only those backends which are
>> running VACUUM
>> command */
>> +   if (cmdtype == COMMAND_LAZY_VACUUM)
>> +   {
>> +   /*Progress can only be viewed by role member.*/
>> +   if (has_privs_of_role(GetUserId(),
>> beentry->st_userid))
>> +   {
>> +   values[2] =
>> UInt32GetDatum(beentry->st_progress_param[0]);
>> +   values[3] =
>> UInt32GetDatum(beentry->st_progress_param[1]);
>> +   values[4] =
>> UInt32GetDatum(beentry->st_progress_param[2]);
>> +   values[5] =
>> UInt32GetDatum(beentry->st_progress_param[3]);
>> +   values[6] =
>> UInt32GetDatum(beentry->st_progress_param[4]);
>> +   values[7] =
>> UInt32GetDatum(beentry->st_progress_param[5]);
>> +   if (beentry->st_progress_param[1] != 0)
>> +   values[8] =
>> Float8GetDatum(beentry->st_progress_param[2] * 100 /
>> beentry->st_progress_param[1]);
>> +   else
>> +   nulls[8] = true;
>> +   }
>> +   else
>> +   {
>> +   nulls[2] = true;
>> +   nulls[3] = true;
>> +   nulls[4] = true;
>> +   nulls[5] = true;
>> +   nulls[6] = true;
>> +   nulls[7] = true;
>> +   nulls[8] = true;
>> +   }
>> +   }
>>
>> How about doing this in a separate function which takes the command id as
>> parameter and returns an array of values and the number of values (per
>> command id). pg_stat_get_progress_info() then creates values[] and nulls[]
>> arrays from that and returns that as result set.  It will be a cleaner
>> separation of 

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-02-26 Thread Vinayak Pokale
Hello,

On Fri, Feb 26, 2016 at 6:19 PM, Amit Langote  wrote:

>
> Hi Vinayak,
>
> Thanks for updating the patch! A quick comment:
>
> On 2016/02/26 17:28, poku...@pm.nttdata.co.jp wrote:
> >> CREATE VIEW pg_stat_vacuum_progress AS
> >>   SELECT S.s[1] as pid,
> >>  S.s[2] as relid,
> >>  CASE S.s[3]
> >>WHEN 1 THEN 'Scanning Heap'
> >>WHEN 2 THEN 'Vacuuming Index and Heap'
> >>ELSE 'Unknown phase'
> >>  END,
> >>
> >>   FROM pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as S;
> >>
> >> # The name of the function could be other than *_command_progress.
> > The name of function is updated as pg_stat_get_progress_info() and also
> updated the function.
> > Updated the pg_stat_vacuum_progress view as suggested.
>
> So, pg_stat_get_progress_info() now accepts a parameter to distinguish
> different commands.  I see the following in its definition:
>
> +   /*  Report values for only those backends which are
> running VACUUM
> command */
> +   if (cmdtype == COMMAND_LAZY_VACUUM)
> +   {
> +   /*Progress can only be viewed by role member.*/
> +   if (has_privs_of_role(GetUserId(),
> beentry->st_userid))
> +   {
> +   values[2] =
> UInt32GetDatum(beentry->st_progress_param[0]);
> +   values[3] =
> UInt32GetDatum(beentry->st_progress_param[1]);
> +   values[4] =
> UInt32GetDatum(beentry->st_progress_param[2]);
> +   values[5] =
> UInt32GetDatum(beentry->st_progress_param[3]);
> +   values[6] =
> UInt32GetDatum(beentry->st_progress_param[4]);
> +   values[7] =
> UInt32GetDatum(beentry->st_progress_param[5]);
> +   if (beentry->st_progress_param[1] != 0)
> +   values[8] =
> Float8GetDatum(beentry->st_progress_param[2] * 100 /
> beentry->st_progress_param[1]);
> +   else
> +   nulls[8] = true;
> +   }
> +   else
> +   {
> +   nulls[2] = true;
> +   nulls[3] = true;
> +   nulls[4] = true;
> +   nulls[5] = true;
> +   nulls[6] = true;
> +   nulls[7] = true;
> +   nulls[8] = true;
> +   }
> +   }
>
> How about doing this in a separate function which takes the command id as
> parameter and returns an array of values and the number of values (per
> command id). pg_stat_get_progress_info() then creates values[] and nulls[]
> arrays from that and returns that as result set.  It will be a cleaner
> separation of activities, perhaps.
>
> +1

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-02-26 Thread Amit Langote

Hi Vinayak,

Thanks for updating the patch! A quick comment:

On 2016/02/26 17:28, poku...@pm.nttdata.co.jp wrote:
>> CREATE VIEW pg_stat_vacuum_progress AS 
>>   SELECT S.s[1] as pid, 
>>  S.s[2] as relid, 
>>  CASE S.s[3] 
>>WHEN 1 THEN 'Scanning Heap' 
>>WHEN 2 THEN 'Vacuuming Index and Heap' 
>>ELSE 'Unknown phase' 
>>  END, 
>> 
>>   FROM pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as S; 
>>
>> # The name of the function could be other than *_command_progress.
> The name of function is updated as pg_stat_get_progress_info() and also 
> updated the function.
> Updated the pg_stat_vacuum_progress view as suggested.

So, pg_stat_get_progress_info() now accepts a parameter to distinguish
different commands.  I see the following in its definition:

+   /*  Report values for only those backends which are running 
VACUUM
command */
+   if (cmdtype == COMMAND_LAZY_VACUUM)
+   {
+   /*Progress can only be viewed by role member.*/
+   if (has_privs_of_role(GetUserId(), beentry->st_userid))
+   {
+   values[2] = 
UInt32GetDatum(beentry->st_progress_param[0]);
+   values[3] = 
UInt32GetDatum(beentry->st_progress_param[1]);
+   values[4] = 
UInt32GetDatum(beentry->st_progress_param[2]);
+   values[5] = 
UInt32GetDatum(beentry->st_progress_param[3]);
+   values[6] = 
UInt32GetDatum(beentry->st_progress_param[4]);
+   values[7] = 
UInt32GetDatum(beentry->st_progress_param[5]);
+   if (beentry->st_progress_param[1] != 0)
+   values[8] = 
Float8GetDatum(beentry->st_progress_param[2] * 100 /
beentry->st_progress_param[1]);
+   else
+   nulls[8] = true;
+   }
+   else
+   {
+   nulls[2] = true;
+   nulls[3] = true;
+   nulls[4] = true;
+   nulls[5] = true;
+   nulls[6] = true;
+   nulls[7] = true;
+   nulls[8] = true;
+   }
+   }

How about doing this in a separate function which takes the command id as
parameter and returns an array of values and the number of values (per
command id). pg_stat_get_progress_info() then creates values[] and nulls[]
arrays from that and returns that as result set.  It will be a cleaner
separation of activities, perhaps.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-02-26 Thread pokurev
Hello,

Thank you for your comments.
Please find attached patch addressing following comments.

>As I might have written upthread, transferring the whole string 
>as a progress message is useless at least in this scenario. Since 
>they are a set of fixed messages, each of them can be represented 
>by an identifier, an integer number. I don't see a reason for 
>sending the whole of a string beyond a backend. 
Agreed. I used following macros.
#define VACUUM_PHASE_SCAN_HEAP  1 
#define VACUUM_PHASE_VACUUM_INDEX_HEAP  2

>I guess num_index_scans could better be reported after all the indexes are 
>done, that is, after the for loop ends.
Agreed.  I have corrected it.

> CREATE VIEW pg_stat_vacuum_progress AS 
>   SELECT S.s[1] as pid, 
>  S.s[2] as relid, 
>  CASE S.s[3] 
>WHEN 1 THEN 'Scanning Heap' 
>WHEN 2 THEN 'Vacuuming Index and Heap' 
>ELSE 'Unknown phase' 
>  END, 
> 
>   FROM pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as S; 
> 
> # The name of the function could be other than *_command_progress.
The name of function is updated as pg_stat_get_progress_info() and also updated 
the function.
Updated the pg_stat_vacuum_progress view as suggested.

Regards,
Vinayak


Vacuum_progress_checker_v12.patch
Description: Vacuum_progress_checker_v12.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-02-16 Thread Amit Langote

Hi,

On 2016/02/16 18:25, Kyotaro HORIGUCHI wrote:
> At Tue, 16 Feb 2016 10:39:27 +0900, Amit Langote wrote:
>> On 2016/02/15 20:21, Kyotaro HORIGUCHI wrote:
>>> CREATE FUNCTION
>>>  pg_stat_get_command_progress(IN cmdtype integer)
>>>  RETURNS SETOF integer[] as $$
>>>
>>> SELECT * from pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as x
>>>  x
>>> -_
>>> {1233, 16233, 1, }
>>> {3244, 16236, 2, }
>>> 
>>
>> I am not sure what we would pass as argument to the (SQL) function
>> pg_stat_get_command_progress() in the system view definition for
>> individual commands - what is PROGRESS_COMMAND_VACUUM exactly? Would
>> string literals like "vacuum", "cluster", etc. to represent command names
>> work?
> 
> Sorry, it is a symbol to tell pg_stat_get_command_progress() to
> return stats numbers of backends running VACUUM. It should have
> been COMMAND_LAZY_VACUUM for this patch. If we want progress of
> CREATE INDEX, it would be COMMAND_CREATE_INDEX.

Oh I see:

CREATE VIEW pg_stat_vacuum_prgress AS
  SELECT * from pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as x

is actually:

CREATE VIEW pg_stat_vacuum_prgress AS
  SELECT * from pg_stat_get_command_progress(1) as x

where PROGRESS_COMMAND_VACUUM is 1 in backend code (macro, enum,
whatever).  I was confused because we never say relkind = RELKIND_INDEX in
SQL queries, :)

>> That said, there is discussion upthread about more precise reporting on
>> index vacuuming by utilizing the lazy_tid_reaped() (the index bulk delete
>> callback) as a place where we can report what index block number we are
>> at.  I think that would mean the current IndexBulkDeleteCallback signature
>> is insufficient, which is the following:
>>
>> /* Typedef for callback function to determine if a tuple is bulk-deletable */
>> typedef bool (*IndexBulkDeleteCallback) (ItemPointer itemptr, void *state);
>>
>> One more parameter would be necessary:
>>
>> typedef bool (*IndexBulkDeleteCallback) (ItemPointer itemptr, BlockNumber
>> current_index_blkno, void *state);
> 
> It could work for btree but doesn't for, for example,
> gin. ginbulkdelete finds the next page in the following way.
> 
>>  blkno = GinPageGetOpaque(page)->rightlink;
> 
> We should use another value to fagure the progress. If the
> callback is called centainly the same or around the same number
> of times with the total page numbers, the callback should just
> increment a static counter for processed pages.
> 
>> That would also require changing all the am specific vacuumpage routines
>> (like btvacuumpage) to also pass the new argument. Needless to say, some
>> bookkeeping information would also need to be kept in LVRelStats (the
>> "state" in above signature).
>>
>> Am I missing something?
> 
> So, maybe missing the case of other than btree..

More or less, the callback is called maxoffset number of times for all
index pages containing pointers to heap tuples. Robert said upthread that
counting in granularity lower than pages may not be useful:

"Let's report blocks, not tuples. The reason is that
pg_class.reltuples is only an estimate and might be wildly wrong on
occasion, but the length of the relation in blocks can be known with
certainty."

With the existing interface of the callback, it's difficult to keep the
count of pages, hence a proposal to enhance the interface. Also, now I
wonder whether scanned_index_pages will always converge to whatever
total_index_pages we get from RelationGetNumberOfBlocks(index), because
callback is not called for *every* index page and tends to differ per
index method (am). Thanks for pointing me to confirm so.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-02-16 Thread Kyotaro HORIGUCHI
Hello,

At Tue, 16 Feb 2016 10:39:27 +0900, Amit Langote 
 wrote in <56c27dcf.7020...@lab.ntt.co.jp>
> 
> Hello,
> 
> On 2016/02/15 20:21, Kyotaro HORIGUCHI wrote:
> > At Mon, 8 Feb 2016 11:37:17 +0900, Amit Langote wrote:
> >> On 2016/02/05 17:15, poku...@pm.nttdata.co.jp wrote:
> >>> Please find attached updated patch.
> 
> [ ... ]
> 
> >>
> >> Instead of passing the array of char *'s, why not just pass a single char
> >> *, because that's what it's doing - updating a single message. So,
> >> something like:
> > 
> > As I might have written upthread, transferring the whole string
> > as a progress message is useless at least in this scenario. Since
> > they are a set of fixed messages, each of them can be represented
> > by an identifier, an integer number. I don't see a reason for
> > sending the whole of a string beyond a backend.
> 
> This tends to make sense. Perhaps, they could be macros:
> 
> #define VACUUM_PHASE_SCAN_HEAP1
> #define VACUUM_PHASE_VACUUM_INDEX_HEAP2

Exactly. Or an enum.

> > Next, the function pg_stat_get_command_progress() has a somewhat
> > generic name, but it seems to reuturn the data only for the
> > backends with beentry->st_command = COMMAND_LAZY_VACUUM and has
> > the column names specific for vucuum like process. If the
> > function is intended to be generic, it might be better to return
> > a set of integer[] for given type. Otherwise it should have a
> > name represents its objective.
> 
> Agreed.
> 
> > 
> > CREATE FUNCTION
> >  pg_stat_get_command_progress(IN cmdtype integer)
> >  RETURNS SETOF integer[] as $$
> > 
> > SELECT * from pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as x
> >  x
> > -_
> > {1233, 16233, 1, }
> > {3244, 16236, 2, }
> > 
> 
> I am not sure what we would pass as argument to the (SQL) function
> pg_stat_get_command_progress() in the system view definition for
> individual commands - what is PROGRESS_COMMAND_VACUUM exactly? Would
> string literals like "vacuum", "cluster", etc. to represent command names
> work?

Sorry, it is a symbol to tell pg_stat_get_command_progress() to
return stats numbers of backends running VACUUM. It should have
been COMMAND_LAZY_VACUUM for this patch. If we want progress of
CREATE INDEX, it would be COMMAND_CREATE_INDEX.

> > 
> > CREATE VIEW pg_stat_vacuum_progress AS
> >   SELECT S.s[1] as pid,
> >  S.s[2] as relid,
> >  CASE S.s[3] 
> >WHEN 1 THEN 'Scanning Heap'
> >WHEN 2 THEN 'Vacuuming Index and Heap'
> >ELSE 'Unknown phase'
> >  END,
> >
> >   FROM pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as S;
> > 
> > # The name of the function could be other than *_command_progress.
> > 
> > Any thoughts or opinions?
> 
> How about pg_stat_get_progress_info()?

I think it's good.

> >> + snprintf(progress_message[0], PROGRESS_MESSAGE_LENGTH, "%s", 
> >> phase2);
> >> + pgstat_report_progress_update_message(0, progress_message);
> >>   /* Remove index entries */
> >>   for (i = 0; i < nindexes; i++)
> >> + {
> >>   lazy_vacuum_index(Irel[i],
> >> [i],
> >> vacrelstats);
> >> + scanned_index_pages += 
> >> RelationGetNumberOfBlocks(Irel[i]);
> >> + /* Update the scanned index pages and number of 
> >> index scan */
> >> + pgstat_report_progress_update_counter(3, 
> >> scanned_index_pages);
> >> + pgstat_report_progress_update_counter(4, 
> >> vacrelstats->num_index_scans
> >> + 1);
> >> + }
> >>   /* Remove tuples from heap */
> >>   lazy_vacuum_heap(onerel, vacrelstats);
> >>   vacrelstats->num_index_scans++;
> >> + scanned_index_pages = 0;
> >>
> >> I guess num_index_scans could better be reported after all the indexes are
> >> done, that is, after the for loop ends.
> > 
> > Precise reporting would be valuable if vacuuming indexes takes a
> > long time. It seems to me to be fine as it is since updating of
> > stat counters wouldn't add any significant overhead.
> 
> Sorry, my comment may be a bit unclear. vacrelstats->num_index_scans
> doesn't count individual indexes vacuumed but rather the number of times
> "all" the indexes of a table are vacuumed, IOW, the number of times the
> vacuum phase runs. Purpose of counter #4 there seems to be to report the
> latter. OTOH, reporting scanned_index_pages per index as is done in the
> patch is alright.

I got it. Sorry for my misreading. Yes, you're
right. index_scan_count can take atmost 1 by the code. That's
odd.

> That said, there is discussion upthread about more precise reporting on
> index vacuuming by utilizing the lazy_tid_reaped() (the index bulk delete
> 

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-02-15 Thread Amit Langote

Hello,

On 2016/02/15 20:21, Kyotaro HORIGUCHI wrote:
> At Mon, 8 Feb 2016 11:37:17 +0900, Amit Langote wrote:
>> On 2016/02/05 17:15, poku...@pm.nttdata.co.jp wrote:
>>> Please find attached updated patch.

[ ... ]

>>
>> Instead of passing the array of char *'s, why not just pass a single char
>> *, because that's what it's doing - updating a single message. So,
>> something like:
> 
> As I might have written upthread, transferring the whole string
> as a progress message is useless at least in this scenario. Since
> they are a set of fixed messages, each of them can be represented
> by an identifier, an integer number. I don't see a reason for
> sending the whole of a string beyond a backend.

This tends to make sense. Perhaps, they could be macros:

#define VACUUM_PHASE_SCAN_HEAP  1
#define VACUUM_PHASE_VACUUM_INDEX_HEAP  2

> Next, the function pg_stat_get_command_progress() has a somewhat
> generic name, but it seems to reuturn the data only for the
> backends with beentry->st_command = COMMAND_LAZY_VACUUM and has
> the column names specific for vucuum like process. If the
> function is intended to be generic, it might be better to return
> a set of integer[] for given type. Otherwise it should have a
> name represents its objective.

Agreed.

> 
> CREATE FUNCTION
>  pg_stat_get_command_progress(IN cmdtype integer)
>  RETURNS SETOF integer[] as $$
> 
> SELECT * from pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as x
>  x
> -_
> {1233, 16233, 1, }
> {3244, 16236, 2, }
> 

I am not sure what we would pass as argument to the (SQL) function
pg_stat_get_command_progress() in the system view definition for
individual commands - what is PROGRESS_COMMAND_VACUUM exactly? Would
string literals like "vacuum", "cluster", etc. to represent command names
work?

> 
> CREATE VIEW pg_stat_vacuum_progress AS
>   SELECT S.s[1] as pid,
>  S.s[2] as relid,
>  CASE S.s[3] 
>WHEN 1 THEN 'Scanning Heap'
>WHEN 2 THEN 'Vacuuming Index and Heap'
>ELSE 'Unknown phase'
>  END,
>
>   FROM pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as S;
> 
> # The name of the function could be other than *_command_progress.
> 
> Any thoughts or opinions?

How about pg_stat_get_progress_info()?

>> + snprintf(progress_message[0], PROGRESS_MESSAGE_LENGTH, "%s", 
>> phase2);
>> + pgstat_report_progress_update_message(0, progress_message);
>>   /* Remove index entries */
>>   for (i = 0; i < nindexes; i++)
>> + {
>>   lazy_vacuum_index(Irel[i],
>> [i],
>> vacrelstats);
>> + scanned_index_pages += 
>> RelationGetNumberOfBlocks(Irel[i]);
>> + /* Update the scanned index pages and number of index 
>> scan */
>> + pgstat_report_progress_update_counter(3, 
>> scanned_index_pages);
>> + pgstat_report_progress_update_counter(4, 
>> vacrelstats->num_index_scans
>> + 1);
>> + }
>>   /* Remove tuples from heap */
>>   lazy_vacuum_heap(onerel, vacrelstats);
>>   vacrelstats->num_index_scans++;
>> + scanned_index_pages = 0;
>>
>> I guess num_index_scans could better be reported after all the indexes are
>> done, that is, after the for loop ends.
> 
> Precise reporting would be valuable if vacuuming indexes takes a
> long time. It seems to me to be fine as it is since updating of
> stat counters wouldn't add any significant overhead.

Sorry, my comment may be a bit unclear. vacrelstats->num_index_scans
doesn't count individual indexes vacuumed but rather the number of times
"all" the indexes of a table are vacuumed, IOW, the number of times the
vacuum phase runs. Purpose of counter #4 there seems to be to report the
latter. OTOH, reporting scanned_index_pages per index as is done in the
patch is alright.

That said, there is discussion upthread about more precise reporting on
index vacuuming by utilizing the lazy_tid_reaped() (the index bulk delete
callback) as a place where we can report what index block number we are
at.  I think that would mean the current IndexBulkDeleteCallback signature
is insufficient, which is the following:

/* Typedef for callback function to determine if a tuple is bulk-deletable */
typedef bool (*IndexBulkDeleteCallback) (ItemPointer itemptr, void *state);

One more parameter would be necessary:

typedef bool (*IndexBulkDeleteCallback) (ItemPointer itemptr, BlockNumber
current_index_blkno, void *state);

That would also require changing all the am specific vacuumpage routines
(like btvacuumpage) to also pass the new argument. Needless to say, some
bookkeeping information would also need to be kept in LVRelStats (the
"state" in above signature).

Am I missing something?


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-02-15 Thread Kyotaro HORIGUCHI
Hello,

At Mon, 8 Feb 2016 11:37:17 +0900, Amit Langote  
wrote in <56b7ff5d.7030...@lab.ntt.co.jp>
> 
> Hi Vinayak,
> 
> Thanks for updating the patch, a couple of comments:
> 
> On 2016/02/05 17:15, poku...@pm.nttdata.co.jp wrote:
> > Hello,
> > 
> > Please find attached updated patch.
> >> The point of having pgstat_report_progress_update_counter() is so that 
> >> you can efficiently update a single counter without having to update 
> >> everything, when only one counter has changed.  But here you are 
> >> calling this function a whole bunch of times in a row, which 
> >> completely misses the point - if you are updating all the counters, 
> >> it's more efficient to use an interface that does them all at once 
> >> instead of one at a time.
> > 
> > The pgstat_report_progress_update_counter() is called at appropriate places 
> > in the attached patch.
> 
> + charprogress_message[N_PROGRESS_PARAM][PROGRESS_MESSAGE_LENGTH];
> 
> [ ... ]
> 
> + snprintf(progress_message[0], PROGRESS_MESSAGE_LENGTH, "%s", phase1);
> + pgstat_report_progress_update_message(0, progress_message);
> 
> [ ... ]
> 
> + snprintf(progress_message[0], PROGRESS_MESSAGE_LENGTH, 
> "%s", phase2);
> + pgstat_report_progress_update_message(0, 
> progress_message);
> 
> Instead of passing the array of char *'s, why not just pass a single char
> *, because that's what it's doing - updating a single message. So,
> something like:
> 
> + char progress_message[PROGRESS_MESSAGE_LENGTH];
> 
> [ ... ]
> 
> + snprintf(progress_message, PROGRESS_MESSAGE_LENGTH, "%s", phase1);
> + pgstat_report_progress_update_message(0, progress_message);
> 
> [ ... ]
> 
> + snprintf(progress_message, PROGRESS_MESSAGE_LENGTH, "%s", phase2);
> + pgstat_report_progress_update_message(0, progress_message);
> 
> And also:
> 
> +/*---
> + * pgstat_report_progress_update_message()-
> + *
> + *Called to update phase of VACUUM progress
> + *---
> + */
> +void
> +pgstat_report_progress_update_message(int index, char *msg)
> +{
> 
> [ ... ]
> 
> + pgstat_increment_changecount_before(beentry);
> + strncpy((char *)beentry->st_progress_message[index], msg,
> PROGRESS_MESSAGE_LENGTH);
> + pgstat_increment_changecount_after(beentry);


As I might have written upthread, transferring the whole string
as a progress message is useless at least in this scenario. Since
they are a set of fixed messages, each of them can be represented
by an identifier, an integer number. I don't see a reason for
sending the whole of a string beyond a backend.

Next, the function pg_stat_get_command_progress() has a somewhat
generic name, but it seems to reuturn the data only for the
backends with beentry->st_command = COMMAND_LAZY_VACUUM and has
the column names specific for vucuum like process. If the
function is intended to be generic, it might be better to return
a set of integer[] for given type. Otherwise it should have a
name represents its objective.

CREATE FUNCTION
 pg_stat_get_command_progress(IN cmdtype integer)
 RETURNS SETOF integer[] as $$

SELECT * from pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as x
 x
-_
{1233, 16233, 1, }
{3244, 16236, 2, }


CREATE VIEW pg_stat_vacuum_progress AS
  SELECT S.s[1] as pid,
 S.s[2] as relid,
 CASE S.s[3] 
   WHEN 1 THEN 'Scanning Heap'
   WHEN 2 THEN 'Vacuuming Index and Heap'
   ELSE 'Unknown phase'
 END,
   
  FROM pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as S;

# The name of the function could be other than *_command_progress.

Any thoughts or opinions?


> One more comment:
> 
> @@ -1120,14 +1157,23 @@ lazy_scan_heap(Relation onerel, LVRelStats
> *vacrelstats,
>   /* Log cleanup info before we touch indexes */
>   vacuum_log_cleanup_info(onerel, vacrelstats);
> 
> + snprintf(progress_message[0], PROGRESS_MESSAGE_LENGTH, "%s", 
> phase2);
> + pgstat_report_progress_update_message(0, progress_message);
>   /* Remove index entries */
>   for (i = 0; i < nindexes; i++)
> + {
>   lazy_vacuum_index(Irel[i],
> [i],
> vacrelstats);
> + scanned_index_pages += 
> RelationGetNumberOfBlocks(Irel[i]);
> + /* Update the scanned index pages and number of index 
> scan */
> + pgstat_report_progress_update_counter(3, 
> scanned_index_pages);
> + pgstat_report_progress_update_counter(4, 
> vacrelstats->num_index_scans
> + 1);
> + }
>   /* Remove tuples from heap */
>   lazy_vacuum_heap(onerel, vacrelstats);
>   vacrelstats->num_index_scans++;
> + scanned_index_pages = 0;
> 
> I guess 

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-02-08 Thread Alvaro Herrera
Since things are clearly still moving here, I closed it as
returned-with-feedback.  Please submit to the next CF so that we don't
lose it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-02-07 Thread Amit Langote

Hi Vinayak,

Thanks for updating the patch, a couple of comments:

On 2016/02/05 17:15, poku...@pm.nttdata.co.jp wrote:
> Hello,
> 
> Please find attached updated patch.
>> The point of having pgstat_report_progress_update_counter() is so that 
>> you can efficiently update a single counter without having to update 
>> everything, when only one counter has changed.  But here you are 
>> calling this function a whole bunch of times in a row, which 
>> completely misses the point - if you are updating all the counters, 
>> it's more efficient to use an interface that does them all at once 
>> instead of one at a time.
> 
> The pgstat_report_progress_update_counter() is called at appropriate places 
> in the attached patch.

+   charprogress_message[N_PROGRESS_PARAM][PROGRESS_MESSAGE_LENGTH];

[ ... ]

+   snprintf(progress_message[0], PROGRESS_MESSAGE_LENGTH, "%s", phase1);
+   pgstat_report_progress_update_message(0, progress_message);

[ ... ]

+   snprintf(progress_message[0], PROGRESS_MESSAGE_LENGTH, 
"%s", phase2);
+   pgstat_report_progress_update_message(0, 
progress_message);

Instead of passing the array of char *'s, why not just pass a single char
*, because that's what it's doing - updating a single message. So,
something like:

+ char progress_message[PROGRESS_MESSAGE_LENGTH];

[ ... ]

+ snprintf(progress_message, PROGRESS_MESSAGE_LENGTH, "%s", phase1);
+ pgstat_report_progress_update_message(0, progress_message);

[ ... ]

+ snprintf(progress_message, PROGRESS_MESSAGE_LENGTH, "%s", phase2);
+ pgstat_report_progress_update_message(0, progress_message);

And also:

+/*---
+ * pgstat_report_progress_update_message()-
+ *
+ *Called to update phase of VACUUM progress
+ *---
+ */
+void
+pgstat_report_progress_update_message(int index, char *msg)
+{

[ ... ]

+   pgstat_increment_changecount_before(beentry);
+   strncpy((char *)beentry->st_progress_message[index], msg,
PROGRESS_MESSAGE_LENGTH);
+   pgstat_increment_changecount_after(beentry);


One more comment:

@@ -1120,14 +1157,23 @@ lazy_scan_heap(Relation onerel, LVRelStats
*vacrelstats,
/* Log cleanup info before we touch indexes */
vacuum_log_cleanup_info(onerel, vacrelstats);

+   snprintf(progress_message[0], PROGRESS_MESSAGE_LENGTH, "%s", 
phase2);
+   pgstat_report_progress_update_message(0, progress_message);
/* Remove index entries */
for (i = 0; i < nindexes; i++)
+   {
lazy_vacuum_index(Irel[i],
  [i],
  vacrelstats);
+   scanned_index_pages += 
RelationGetNumberOfBlocks(Irel[i]);
+   /* Update the scanned index pages and number of index 
scan */
+   pgstat_report_progress_update_counter(3, 
scanned_index_pages);
+   pgstat_report_progress_update_counter(4, 
vacrelstats->num_index_scans
+ 1);
+   }
/* Remove tuples from heap */
lazy_vacuum_heap(onerel, vacrelstats);
vacrelstats->num_index_scans++;
+   scanned_index_pages = 0;

I guess num_index_scans could better be reported after all the indexes are
done, that is, after the for loop ends.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-02-05 Thread pokurev
Hello,

Please find attached updated patch.
>The point of having pgstat_report_progress_update_counter() is so that 
>you can efficiently update a single counter without having to update 
>everything, when only one counter has changed.  But here you are 
>calling this function a whole bunch of times in a row, which 
>completely misses the point - if you are updating all the counters, 
>it's more efficient to use an interface that does them all at once 
>instead of one at a time.

The pgstat_report_progress_update_counter() is called at appropriate places in 
the attached patch.

>So I've spent a fair amount of time debugging really-long-running 
>VACUUM processes with customers, and generally what I really want to 
>know is:
 What block number are we at? <<<

Agreed. The attached patch reported current block number.

Regards,
Vinayak


Vacuum_progress_checker_v11.patch
Description: Vacuum_progress_checker_v11.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-31 Thread Amit Langote
On 2016/01/29 21:02, Rahila Syed wrote:
>> Okay, I agree that reporting just the current blkno is simple and good
>> enough. How about numbers of what we're going to report as the "Vacuuming
>> Index and Heap" phase? I guess we can still keep the scanned_index_pages
>> and index_scan_count So we have:
>> +CREATE VIEW pg_stat_vacuum_progress AS
>> +   SELECT
>> +  S.pid,
>> +  S.relid,
>> +  S.phase,
>> +  S.total_heap_blks,
>> +  S.current_heap_blkno,
>> +  S.total_index_pages,
>> +  S.scanned_index_pages,
>> +  S.index_scan_count
>> +  S.percent_complete,
>> +   FROM pg_stat_get_vacuum_progress() AS S;
>> I guess it won't remain pg_stat_get_"vacuum"_progress(
>> ), though.
> 
> Apart from these, as suggested in [1] , finer grained reporting from index
> vacuuming phase can provide better insight. Currently we report number of
> blocks processed once at the end of vacuuming of each index.
> IIUC, what was suggested in [1] was instrumenting lazy_tid_reaped with a
> counter to count number of index tuples processed so far as lazy_tid_reaped
> is called for every index tuple to see if it matches any of the dead tuple
> tids.

Agreed. Although, as Robert already suggested, I too would vote for
counting pages, not tuples. I think there was an independent patch doing
something of that sort somewhere upthread.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-29 Thread Robert Haas
On Fri, Jan 29, 2016 at 7:02 AM, Rahila Syed  wrote:
> Apart from these, as suggested in [1] , finer grained reporting from index
> vacuuming phase can provide better insight. Currently we report number of
> blocks processed once at the end of vacuuming of each index.
> IIUC, what was suggested in [1] was instrumenting lazy_tid_reaped with a
> counter to count number of index tuples processed so far as lazy_tid_reaped
> is called for every index tuple to see if it matches any of the dead tuple
> tids.
>
> So additional parameters for each index can be,
> scanned_index_tuples
> total_index_tuples (from pg_class.reltuples entry)

Let's report blocks, not tuples.  The reason is that
pg_class.reltuples is only an estimate and might be wildly wrong on
occasion, but the length of the relation in blocks can be known with
certainty.

But other than that I agree with this.  Fine-grained is key.  If it's
not fine grained, then people really won't be able to tell what's
going on when VACUUM doesn't finish in a timely fashion.  And the
whole point is we want to be able to know that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-29 Thread Rahila Syed
>Okay, I agree that reporting just the current blkno is simple and good
>enough. How about numbers of what we're going to report as the "Vacuuming
>Index and Heap" phase? I guess we can still keep the scanned_index_pages
>and index_scan_count So we have:
>+CREATE VIEW pg_stat_vacuum_progress AS
>+   SELECT
>+  S.pid,
>+  S.relid,
>+  S.phase,
>+  S.total_heap_blks,
>+  S.current_heap_blkno,
>+  S.total_index_pages,
>+  S.scanned_index_pages,
>+  S.index_scan_count
>+  S.percent_complete,
>+   FROM pg_stat_get_vacuum_progress() AS S;
>I guess it won't remain pg_stat_get_"vacuum"_progress(
>), though.

Apart from these, as suggested in [1] , finer grained reporting from index
vacuuming phase can provide better insight. Currently we report number of
blocks processed once at the end of vacuuming of each index.
IIUC, what was suggested in [1] was instrumenting lazy_tid_reaped with a
counter to count number of index tuples processed so far as lazy_tid_reaped
is called for every index tuple to see if it matches any of the dead tuple
tids.

So additional parameters for each index can be,
scanned_index_tuples
total_index_tuples (from pg_class.reltuples entry)

Thank you,
Rahila Syed

[1]. http://www.postgresql.org/message-id/56500356.4070...@bluetreble.com


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-28 Thread Amit Langote

On 2016/01/28 23:53, Robert Haas wrote:
> On Thu, Jan 28, 2016 at 8:41 AM, Amit Langote  wrote:
>> Or keep scanned_heap_pages as is and add a skipped_pages (or
>> skipped_heap_pages). I guess the latter would be updated not only for
>> all visible skipped pages but also pin skipped pages. That is,
>> updating its counter right after vacrelstats->pinskipped_pages++ which
>> there are a couple of instances of. Likewise a good (and only?) time
>> to update the former's counter would be right after
>> vacrelstats->scanned_pages++. Although, I see at least one place where
>> both are incremented so maybe I'm not entirely correct about the last
>> two sentences.
> 
> So I've spent a fair amount of time debugging really-long-running
> VACUUM processes with customers, and generally what I really want to
> know is:
> 
 What block number are we at? <<<
> 
> Because, if I know that, and I can see how fast that's increasing,
> then I can estimate whether the VACUUM is going to end in a reasonable
> period of time or not.  So my preference is to not bother breaking out
> skipped pages, but just report the block number and call it good.  I
> will defer to a strong consensus on something else, but reporting the
> block number has the advantage of being dead simple and, in my
> experience, that would answer the question that I typically have.

Okay, I agree that reporting just the current blkno is simple and good
enough. How about numbers of what we're going to report as the "Vacuuming
Index and Heap" phase? I guess we can still keep the scanned_index_pages
and index_scan_count. So we have:

+CREATE VIEW pg_stat_vacuum_progress AS
+   SELECT
+  S.pid,
+  S.relid,
+  S.phase,
+  S.total_heap_blks,
+  S.current_heap_blkno,
+  S.total_index_pages,
+  S.scanned_index_pages,
+  S.index_scan_count
+  S.percent_complete,
+   FROM pg_stat_get_vacuum_progress() AS S;

I guess it won't remain pg_stat_get_"vacuum"_progress(), though.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-28 Thread Rahila Syed
>+if(!scan_all)
>+scanned_heap_pages = scanned_heap_pages +
>next_not_all_visible_block;

>I don't want to be too much of a stickler for details here, but it
>seems to me that this is an outright lie.

Initially the scanned_heap_pages were meant to report just the scanned
pages and skipped pages were not added to the count.  Instead the skipped
pages were deduced from number of total heap pages to be scanned to make
the number of scanned pages eventually add up to total heap pages.   As per
comments received later total heap pages were kept constant and skipped
pages count was added to scanned pages to make the count add up to total
heap pages at the end of scan. That said, as suggested, scanned_heap_pages
should be renamed to current_heap_page to report current blkno in
lazy_scan_heap loop which will add up to total heap pages(nblocks) at the
end of scan. And scanned_heap_pages can be reported as a separate number
which wont contain skipped pages.


>+/*
>+ * Reporting vacuum progress to statistics collector
>+ */

>This patch doesn't report anything to the statistics collector, nor should
it.
Yes. This was incorrectly added initially by referring to similar
pgstat_report interface functions.

Thank you,
Rahila Syed

On Thu, Jan 28, 2016 at 2:27 AM, Robert Haas  wrote:

> On Tue, Jan 26, 2016 at 11:37 PM, Vinayak Pokale 
> wrote:
> > Hi,
> >
> > Please find attached updated patch with an updated interface.
>
> Well, this isn't right.  You've got this sort of thing:
>
> +scanned_index_pages += RelationGetNumberOfBlocks(Irel[i]);
> +/* Report progress to the statistics collector */
> +pgstat_report_progress_update_message(0, progress_message);
> +pgstat_report_progress_update_counter(1, scanned_heap_pages);
> +pgstat_report_progress_update_counter(3, scanned_index_pages);
> +pgstat_report_progress_update_counter(4,
> vacrelstats->num_index_scans + 1);
>
> The point of having pgstat_report_progress_update_counter() is so that
> you can efficiently update a single counter without having to update
> everything, when only one counter has changed.  But here you are
> calling this function a whole bunch of times in a row, which
> completely misses the point - if you are updating all the counters,
> it's more efficient to use an interface that does them all at once
> instead of one at a time.
>
> But there's a second problem here, too, which is that I think you've
> got this code in the wrong place.  The second point of the
> pgstat_report_progress_update_counter interface is that this should be
> cheap enough that we can do it every time the counter changes.  That's
> not what you are doing here.  You're updating the counters at various
> points in the code and just pushing new values for all of them
> regardless of which ones have changed.  I think you should find a way
> that you can update the value immediately at the exact moment it
> changes.  If that seems like too much of a performance hit we can talk
> about it, but I think the value of this feature will be greatly
> weakened if users can't count on it to be fully and continuously up to
> the moment.  When something gets stuck, you want to know where it's
> stuck, not approximately kinda where it's stuck.
>
> +if(!scan_all)
> +scanned_heap_pages = scanned_heap_pages +
> next_not_all_visible_block;
>
> I don't want to be too much of a stickler for details here, but it
> seems to me that this is an outright lie.  The number of scanned pages
> does not go up when we decide to skip some pages, because scanning and
> skipping aren't the same thing.  If we're only going to report one
> number here, it needs to be called something like "current heap page",
> and then you can just report blkno at the top of each iteration of
> lazy_scan_heap's main loop.  If you want to report the numbers of
> scanned and skipped pages separately that'd be OK too, but you can't
> call it the number of scanned pages and then actually report a value
> that is not that.
>
> +/*
> + * Reporting vacuum progress to statistics collector
> + */
>
> This patch doesn't report anything to the statistics collector, nor should
> it.
>
> Instead of making the SQL-visible function
> pg_stat_get_vacuum_progress(), I think it should be something more
> generic like pg_stat_get_command_progress().  Maybe VACUUM will be the
> only command that reports into that feature for right now, but I'd
> hope for us to change that pretty soon after we get the first patch
> committed.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-28 Thread Amit Langote
Hi,

On Thu, Jan 28, 2016 at 7:38 PM, Rahila Syed  wrote:
>>+if(!scan_all)
>>+scanned_heap_pages = scanned_heap_pages +
>>next_not_all_visible_block;
>
>>I don't want to be too much of a stickler for details here, but it
>>seems to me that this is an outright lie.
>
> Initially the scanned_heap_pages were meant to report just the scanned pages
> and skipped pages were not added to the count.  Instead the skipped pages
> were deduced from number of total heap pages to be scanned to make the
> number of scanned pages eventually add up to total heap pages.   As per
> comments received later total heap pages were kept constant and skipped
> pages count was added to scanned pages to make the count add up to total
> heap pages at the end of scan. That said, as suggested, scanned_heap_pages
> should be renamed to current_heap_page to report current blkno in
> lazy_scan_heap loop which will add up to total heap pages(nblocks) at the
> end of scan. And scanned_heap_pages can be reported as a separate number
> which wont contain skipped pages.

Or keep scanned_heap_pages as is and add a skipped_pages (or
skipped_heap_pages). I guess the latter would be updated not only for
all visible skipped pages but also pin skipped pages. That is,
updating its counter right after vacrelstats->pinskipped_pages++ which
there are a couple of instances of. Likewise a good (and only?) time
to update the former's counter would be right after
vacrelstats->scanned_pages++. Although, I see at least one place where
both are incremented so maybe I'm not entirely correct about the last
two sentences.

Thanks,
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-28 Thread Robert Haas
On Thu, Jan 28, 2016 at 8:41 AM, Amit Langote  wrote:
> Or keep scanned_heap_pages as is and add a skipped_pages (or
> skipped_heap_pages). I guess the latter would be updated not only for
> all visible skipped pages but also pin skipped pages. That is,
> updating its counter right after vacrelstats->pinskipped_pages++ which
> there are a couple of instances of. Likewise a good (and only?) time
> to update the former's counter would be right after
> vacrelstats->scanned_pages++. Although, I see at least one place where
> both are incremented so maybe I'm not entirely correct about the last
> two sentences.

So I've spent a fair amount of time debugging really-long-running
VACUUM processes with customers, and generally what I really want to
know is:

>>> What block number are we at? <<<

Because, if I know that, and I can see how fast that's increasing,
then I can estimate whether the VACUUM is going to end in a reasonable
period of time or not.  So my preference is to not bother breaking out
skipped pages, but just report the block number and call it good.  I
will defer to a strong consensus on something else, but reporting the
block number has the advantage of being dead simple and, in my
experience, that would answer the question that I typically have.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-27 Thread Robert Haas
On Tue, Jan 26, 2016 at 11:37 PM, Vinayak Pokale  wrote:
> Hi,
>
> Please find attached updated patch with an updated interface.

Well, this isn't right.  You've got this sort of thing:

+scanned_index_pages += RelationGetNumberOfBlocks(Irel[i]);
+/* Report progress to the statistics collector */
+pgstat_report_progress_update_message(0, progress_message);
+pgstat_report_progress_update_counter(1, scanned_heap_pages);
+pgstat_report_progress_update_counter(3, scanned_index_pages);
+pgstat_report_progress_update_counter(4,
vacrelstats->num_index_scans + 1);

The point of having pgstat_report_progress_update_counter() is so that
you can efficiently update a single counter without having to update
everything, when only one counter has changed.  But here you are
calling this function a whole bunch of times in a row, which
completely misses the point - if you are updating all the counters,
it's more efficient to use an interface that does them all at once
instead of one at a time.

But there's a second problem here, too, which is that I think you've
got this code in the wrong place.  The second point of the
pgstat_report_progress_update_counter interface is that this should be
cheap enough that we can do it every time the counter changes.  That's
not what you are doing here.  You're updating the counters at various
points in the code and just pushing new values for all of them
regardless of which ones have changed.  I think you should find a way
that you can update the value immediately at the exact moment it
changes.  If that seems like too much of a performance hit we can talk
about it, but I think the value of this feature will be greatly
weakened if users can't count on it to be fully and continuously up to
the moment.  When something gets stuck, you want to know where it's
stuck, not approximately kinda where it's stuck.

+if(!scan_all)
+scanned_heap_pages = scanned_heap_pages +
next_not_all_visible_block;

I don't want to be too much of a stickler for details here, but it
seems to me that this is an outright lie.  The number of scanned pages
does not go up when we decide to skip some pages, because scanning and
skipping aren't the same thing.  If we're only going to report one
number here, it needs to be called something like "current heap page",
and then you can just report blkno at the top of each iteration of
lazy_scan_heap's main loop.  If you want to report the numbers of
scanned and skipped pages separately that'd be OK too, but you can't
call it the number of scanned pages and then actually report a value
that is not that.

+/*
+ * Reporting vacuum progress to statistics collector
+ */

This patch doesn't report anything to the statistics collector, nor should it.

Instead of making the SQL-visible function
pg_stat_get_vacuum_progress(), I think it should be something more
generic like pg_stat_get_command_progress().  Maybe VACUUM will be the
only command that reports into that feature for right now, but I'd
hope for us to change that pretty soon after we get the first patch
committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-26 Thread Vinayak Pokale
Hi,

Please find attached updated patch with an updated interface.

On Jan 26, 2016 11:22 AM, "Vinayak Pokale"  wrote:
>
> Hi Amit,
>
> Thank you for reviewing the patch.
>
> On Jan 26, 2016 9:51 AM, "Amit Langote" 
wrote:
> >
> >
> > Hi Vinayak,
> >
> > On 2016/01/25 20:58, Vinayak Pokale wrote:
> > > Hi,
> > >
> > > Please find attached updated patch with an updated interface.
> > >
> >
> > Thanks for updating the patch.
> >
> > > I added the below interface to update the
> > > scanned_heap_pages,scanned_index_pages and index_scan_count only.
> > > void pgstat_report_progress_scanned_pages(int num_of_int, uint32
> > > *progress_scanned_pages_param)
> >
> > I think it's still the same interface with the names changed. IIRC, what
> > was suggested was to provide a way to not have to pass the entire array
> > for the update of a single member of it. Just pass the index of the
> > updated member and its new value. Maybe, something like:
> >
> > void pgstat_progress_update_counter(int index, uint32 newval);
> >
> > The above function would presumably update the value of
> > beentry.st_progress_counter[index] or something like that.

Following interface functions are added:

/*
 * index: in the array of uint32 counters in the beentry
 * counter: new value of the (index)th counter
 */
void
pgstat_report_progress_update_counter(int index, uint32 counter)

/*
called to updatet the VACUUM progress phase.
msg: new value of (index)th message
*/
void
pgstat_report_progress_update_message(int index, char
msg[N_PROGRESS_PARAM][PROGRESS_MESSAGE_LENGTH])

Regards,
Vinayak


Vacuum_progress_checker_v10.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-25 Thread Amit Langote

Hi Vinayak,

On 2016/01/25 20:58, Vinayak Pokale wrote:
> Hi,
> 
> Please find attached updated patch with an updated interface.
> 

Thanks for updating the patch.

> I added the below interface to update the
> scanned_heap_pages,scanned_index_pages and index_scan_count only.
> void pgstat_report_progress_scanned_pages(int num_of_int, uint32
> *progress_scanned_pages_param)

I think it's still the same interface with the names changed. IIRC, what
was suggested was to provide a way to not have to pass the entire array
for the update of a single member of it. Just pass the index of the
updated member and its new value. Maybe, something like:

void pgstat_progress_update_counter(int index, uint32 newval);

The above function would presumably update the value of
beentry.st_progress_counter[index] or something like that.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-25 Thread Vinayak Pokale
Hi,

Please find attached updated patch with an updated interface.

I added the below interface to update the
scanned_heap_pages,scanned_index_pages and index_scan_count only.
void pgstat_report_progress_scanned_pages(int num_of_int, uint32
*progress_scanned_pages_param)

Other interface functions which are called at the beginning:
void pgstat_report_progress_set_command_target(Oid relid)

Regards,
Vinayak

On Wed, Jan 13, 2016 at 3:16 PM, Amit Langote  wrote:

> On 2016/01/12 11:28, Vinayak Pokale wrote:
> > On Jan 12, 2016 11:22 AM, "Amit Langote" 
> > wrote:
> >>
> >> On 2016/01/12 10:30, Amit Langote wrote:
> >>> I'm slightly concerned that the latest patch doesn't incorporate any
> >>> revisions to the original pgstat interface per Robert's comments in
> [1].
> >>
> >> I meant to say "originally proposed pgstat interface on this thread".
> >
> > Yes.
> > Robert's comments related to pgstat interface needs to be address.
> > I will update it.
>
> So, I updated the patch status to "Waiting on Author".
>
> Thanks,
> Amit
>
>
>


Vacuum_progress_checker_v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-25 Thread Vinayak Pokale
Hi Amit,

Thank you for reviewing the patch.
On Jan 26, 2016 9:51 AM, "Amit Langote" 
wrote:
>
>
> Hi Vinayak,
>
> On 2016/01/25 20:58, Vinayak Pokale wrote:
> > Hi,
> >
> > Please find attached updated patch with an updated interface.
> >
>
> Thanks for updating the patch.
>
> > I added the below interface to update the
> > scanned_heap_pages,scanned_index_pages and index_scan_count only.
> > void pgstat_report_progress_scanned_pages(int num_of_int, uint32
> > *progress_scanned_pages_param)
>
> I think it's still the same interface with the names changed. IIRC, what
> was suggested was to provide a way to not have to pass the entire array
> for the update of a single member of it. Just pass the index of the
> updated member and its new value. Maybe, something like:
>
> void pgstat_progress_update_counter(int index, uint32 newval);
>
> The above function would presumably update the value of
> beentry.st_progress_counter[index] or something like that.

Understood. I will update the patch.

Regards,
Vinayak


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-12 Thread Amit Langote
On 2016/01/12 11:28, Vinayak Pokale wrote:
> On Jan 12, 2016 11:22 AM, "Amit Langote" 
> wrote:
>>
>> On 2016/01/12 10:30, Amit Langote wrote:
>>> I'm slightly concerned that the latest patch doesn't incorporate any
>>> revisions to the original pgstat interface per Robert's comments in [1].
>>
>> I meant to say "originally proposed pgstat interface on this thread".
> 
> Yes.
> Robert's comments related to pgstat interface needs to be address.
> I will update it.

So, I updated the patch status to "Waiting on Author".

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-11 Thread Amit Langote
On 2016/01/08 21:20, Rahila Syed wrote:
>> I suspect you need to create a new CF entry for this patch in CF 2016-01.
> 
> Unless I am missing something, there seems to be no entry for this patch
> into CF 2016-01 page: https://commitfest.postgresql.org/8/.
> Regrettably, we have exceeded the deadline to add the patch into this
> commitfest. Is there still some way to add it to the commitfest 2016-01? As
> this feature has received lot of feedback in previous commitfest , adding
> it to this commitfest will surely help in progressing it in order to make
> it ready for PostgreSQL 9.6.

I see that the patch has been added to the CF.

I'm slightly concerned that the latest patch doesn't incorporate any
revisions to the original pgstat interface per Robert's comments in [1].

Thanks,
Amit

[1]
http://www.postgresql.org/message-id/ca+tgmoz5q4n4t0c0_-xktencewoabfdktoppt8nujjjv5oh...@mail.gmail.com




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-11 Thread Vinayak Pokale
On Jan 12, 2016 11:22 AM, "Amit Langote" 
wrote:
>
> On 2016/01/12 10:30, Amit Langote wrote:
> > I'm slightly concerned that the latest patch doesn't incorporate any
> > revisions to the original pgstat interface per Robert's comments in [1].
>
> I meant to say "originally proposed pgstat interface on this thread".

Yes.
Robert's comments related to pgstat interface needs to be address.
I will update it.

Regards,
Vinayak Pokale


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-11 Thread Amit Langote
On 2016/01/12 10:30, Amit Langote wrote:
> I'm slightly concerned that the latest patch doesn't incorporate any
> revisions to the original pgstat interface per Robert's comments in [1].

I meant to say "originally proposed pgstat interface on this thread".

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-11 Thread Vinayak Pokale
Hi Sudhir,

On Jan 7, 2016 3:02 AM, "Sudhir Lonkar-2"  wrote:
>
> Hello,
> >Please find attached patch addressing following comments
> I have tested this patch.
> It is showing empty (null) in phase column of pg_stat_vacuum_progress,
when
> I switched to a unprivileged user.
> In the previous patch, it is showing  in phase
> column.
Yes. I will update the patch.
Regards,
Vinayak Pokale


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-08 Thread Rahila Syed
>I suspect you need to create a new CF entry for this patch in CF 2016-01.

Unless I am missing something, there seems to be no entry for this patch
into CF 2016-01 page: https://commitfest.postgresql.org/8/.
Regrettably, we have exceeded the deadline to add the patch into this
commitfest. Is there still some way to add it to the commitfest 2016-01? As
this feature has received lot of feedback in previous commitfest , adding
it to this commitfest will surely help in progressing it in order to make
it ready for PostgreSQL 9.6.

Thank you,
Rahila Syed


On Mon, Dec 28, 2015 at 6:01 AM, Amit Langote  wrote:

>
> Hi Vinayak,
>
> On 2015/12/25 21:46, Vinayak Pokale wrote:
> > Hi,
> > Please find attached patch addressing following comments.
> >
> >> The relation OID should be reported and not its name. In case of a
> >> relation rename that would not be cool for tracking, and most users
> >> are surely going to join with other system tables using it.
> > The relation OID is reported instead of relation name.
> > The following interface function is called at the beginning to report the
> > relation OID once.
> > void pgstat_report_command_target(Oid relid)
> >
> >> Regarding pg_stat_get_vacuum_progress(): I think a backend can simply be
> >> skipped if (!has_privs_of_role(GetUserId(), beentry->st_userid)) (cannot
> >> put that in plain English, :))
> > Updated in the attached patch.
> >
> > In the previous patch, ACTIVITY_IS_VACUUM is set unnecessarily for
> > VACOPT_FULL and they are not covered by lazy_scan_heap().
> > I have updated it in attached patch and also renamed ACTIVITY_IS_VACUUM
> to
> > COMMAND_LAZY_VACUUM.
> >
> > Added documentation for view.
> > Some more comments need to be addressed.
>
> I suspect you need to create a new CF entry for this patch in CF 2016-01.
>
> Thanks,
> Amit
>
>
>


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-06 Thread Sudhir Lonkar-2
Hello,
>Please find attached patch addressing following comments
I have tested this patch.
It is showing empty (null) in phase column of pg_stat_vacuum_progress, when
I switched to a unprivileged user.
In the previous patch, it is showing  in phase
column.

Thanks and Regards,
Sudhir Lonkar



--
View this message in context: 
http://postgresql.nabble.com/PROPOSAL-VACUUM-Progress-Checker-tp5855849p5880544.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-12-27 Thread Amit Langote

Hi Vinayak,

On 2015/12/25 21:46, Vinayak Pokale wrote:
> Hi,
> Please find attached patch addressing following comments.
> 
>> The relation OID should be reported and not its name. In case of a
>> relation rename that would not be cool for tracking, and most users
>> are surely going to join with other system tables using it.
> The relation OID is reported instead of relation name.
> The following interface function is called at the beginning to report the
> relation OID once.
> void pgstat_report_command_target(Oid relid)
> 
>> Regarding pg_stat_get_vacuum_progress(): I think a backend can simply be
>> skipped if (!has_privs_of_role(GetUserId(), beentry->st_userid)) (cannot
>> put that in plain English, :))
> Updated in the attached patch.
> 
> In the previous patch, ACTIVITY_IS_VACUUM is set unnecessarily for
> VACOPT_FULL and they are not covered by lazy_scan_heap().
> I have updated it in attached patch and also renamed ACTIVITY_IS_VACUUM to
> COMMAND_LAZY_VACUUM.
> 
> Added documentation for view.
> Some more comments need to be addressed.

I suspect you need to create a new CF entry for this patch in CF 2016-01.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-12-25 Thread Vinayak Pokale
Hi,
Please find attached patch addressing following comments.

>The relation OID should be reported and not its name. In case of a
>relation rename that would not be cool for tracking, and most users
>are surely going to join with other system tables using it.
The relation OID is reported instead of relation name.
The following interface function is called at the beginning to report the
relation OID once.
void pgstat_report_command_target(Oid relid)

>Regarding pg_stat_get_vacuum_progress(): I think a backend can simply be
>skipped if (!has_privs_of_role(GetUserId(), beentry->st_userid)) (cannot
>put that in plain English, :))
Updated in the attached patch.

In the previous patch, ACTIVITY_IS_VACUUM is set unnecessarily for
VACOPT_FULL and they are not covered by lazy_scan_heap().
I have updated it in attached patch and also renamed ACTIVITY_IS_VACUUM to
COMMAND_LAZY_VACUUM.

Added documentation for view.
Some more comments need to be addressed.

Regards,

Vinayak Pokale

On Sat, Dec 12, 2015 at 2:07 AM, Robert Haas  wrote:

> On Fri, Dec 11, 2015 at 1:25 AM, Michael Paquier
>  wrote:
> >> For another thing, there are definitely going to be
> >> some people that want the detailed information - and I can practically
> >> guarantee that if we don't make it available, at least one person will
> >> write a tool that tries to reverse-engineer the detailed progress
> >> information from whatever we do report.
> >
> > OK, so this justifies the fact of having detailed information, but
> > does it justify the fact of having precise and accurate data? ISTM
> > that having detailed information and precise information are two
> > different things. The level of details is defined depending on how
> > verbose we want the information to be, and the list you are giving
> > would fulfill this requirement nicely for VACUUM. The level of
> > precision/accuracy at which this information is provided though
> > depends at which frequency we want to send this information. For
> > long-running VACUUM it does not seem necessary to update the fields of
> > the progress tracker each time a counter needs to be incremented.
> > CLUSTER has been mentioned as well as a potential target for the
> > progress facility, but it seems that it enters as well in the category
> > of things that would need to be reported on a slower frequency pace
> > than "each-time-a-counter-is-incremented".
> >
> > My impression is just based on the needs of VACUUM and CLUSTER.
> > Perhaps I am lacking imagination regarding the potential use cases of
> > the progress facility though in cases where we'd want to provide
> > extremely precise progress information :)
> > It just seems to me that this is not a requirement for VACUUM or
> > CLUSTER. That's all.
>
> It's not a hard requirement, but it should be quite easy to do without
> adding any significant overhead.  All you need to do is something
> like:
>
> foo->changecount++;
> pg_write_barrier();
> foo->count_of_blocks++;
> pg_write_barrier();
> foo->changecount++;
>
> I suspect that's plenty cheap enough to do for every block.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Vacuum_progress_checker_v8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-12-11 Thread Robert Haas
On Fri, Dec 11, 2015 at 1:25 AM, Michael Paquier
 wrote:
>> For another thing, there are definitely going to be
>> some people that want the detailed information - and I can practically
>> guarantee that if we don't make it available, at least one person will
>> write a tool that tries to reverse-engineer the detailed progress
>> information from whatever we do report.
>
> OK, so this justifies the fact of having detailed information, but
> does it justify the fact of having precise and accurate data? ISTM
> that having detailed information and precise information are two
> different things. The level of details is defined depending on how
> verbose we want the information to be, and the list you are giving
> would fulfill this requirement nicely for VACUUM. The level of
> precision/accuracy at which this information is provided though
> depends at which frequency we want to send this information. For
> long-running VACUUM it does not seem necessary to update the fields of
> the progress tracker each time a counter needs to be incremented.
> CLUSTER has been mentioned as well as a potential target for the
> progress facility, but it seems that it enters as well in the category
> of things that would need to be reported on a slower frequency pace
> than "each-time-a-counter-is-incremented".
>
> My impression is just based on the needs of VACUUM and CLUSTER.
> Perhaps I am lacking imagination regarding the potential use cases of
> the progress facility though in cases where we'd want to provide
> extremely precise progress information :)
> It just seems to me that this is not a requirement for VACUUM or
> CLUSTER. That's all.

It's not a hard requirement, but it should be quite easy to do without
adding any significant overhead.  All you need to do is something
like:

foo->changecount++;
pg_write_barrier();
foo->count_of_blocks++;
pg_write_barrier();
foo->changecount++;

I suspect that's plenty cheap enough to do for every block.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-12-10 Thread Amit Langote
On 2015/12/10 20:46, Michael Paquier wrote:
> On Thu, Dec 10, 2015 at 7:23 PM, Amit Langote
>  wrote:
>> AIUI, the counts published via stats collector are updated asynchronously
>> w.r.t. operations they count and mostly as aggregate figures. For example,
>> PgStat_StatTabEntry.blocks_fetched. IOW, we never see
>> pg_statio_all_tables.heap_blks_read updating as a scan reads blocks. Maybe
>> that helps keep traffic to pgstat collector to sane levels. But that is
>> not to mean that I think controlling stats collector levels was the only
>> design consideration behind how such counters are published.
>>
>> In case of reporting counters as progress info, it seems we might have to
>> send too many PgStat_Msg's, for example, for every block we finish
>> processing during vacuum. That kind of message traffic may swamp the
>> collector. Then we need to see the updated counters from other counters in
>> near real-time though that may be possible with suitable (build?)
>> configuration.
> 
> As far as I understand it, the basic reason why this patch exists is
> to allow a DBA to have a hint of the progress of a VACUUM that may be
> taking minutes, or say hours, which is something we don't have now. So
> it seems perfectly fine to me to report this information
> asynchronously with a bit of lag. Why would we need so much precision
> in the report?

Sorry, I didn't mean to overstate this requirement. I agree precise
real-time reporting of progress info is not such a stringent requirement
from the patch. The point regarding whether we should storm the collector
with progress info messages still holds, IMHO.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-12-10 Thread Amit Langote
On 2015/12/10 15:28, Michael Paquier wrote:
> On Thu, Dec 10, 2015 at 4:40 AM, Robert Haas  wrote:
>> It's going to be *really* important that this facility provides a
>> lightweight way of updating progress, so I think this whole API is
>> badly designed.  VACUUM, for example, is going to want a way to update
>> the individual counter for the number of pages it's scanned after
>> every page.  It should not have to pass all of the other information
>> that is part of a complete report.  It should just be able to say
>> pgstat_report_progress_update_counter(1, pages_scanned) or something
>> of this sort.  Don't marshal all of the data and then make
>> pgstat_report_progress figure out what's changed.  Provide a series of
>> narrow APIs where the caller tells you exactly what they want to
>> update, and you update only that.  It's fine to have a
>> pgstat_report_progress() function to update everything at once, for
>> the use at the beginning of a command, but the incremental updates
>> within the command should do something lighter-weight.
> 
> [first time looking really at the patch and catching up with this thread]
> 
> Agreed. As far as I can guess from the code, those should be as
> followed to bloat pgstat message queue a minimum:
> 
> +   pgstat_report_activity_flag(ACTIVITY_IS_VACUUM);
> /*
>  * Loop to process each selected relation.
>  */
> Defining a new routine for this purpose is a bit surprising. Cannot we
> just use pgstat_report_activity with a new BackendState STATE_INVACUUM
> or similar if we pursue the progress tracking approach?

ISTM, PgBackendStatus.st_state is normally manipulated at quite different
sites (mostly tcop) than where a backend would be able to report that a
command like VACUUM is running. Also, the value 'active' of
pg_stat_activity.state has an established meaning as Horiguchi-san seems
to point out in his reply. IMHO, this patch shouldn't affect such meaning
of a pg_stat_activity field.


> A couple of comments:
> - The relation OID should be reported and not its name. In case of a
> relation rename that would not be cool for tracking, and most users
> are surely going to join with other system tables using it.

+1

> - The progress tracking facility adds a whole level of complexity for
> very little gain, and IMO this should *not* be part of PgBackendStatus
> because in most cases its data finishes wasted. We don't expect
> backends to run frequently such progress reports, do we? My opinion on
> the matter if that we should define a different collector data for
> vacuum, with something like PgStat_StatVacuumEntry, then have on top
> of it a couple of routines dedicated at feeding up data with it when
> some work is done on a vacuum job.

I assume your comment here means we should use stats collector to the
track/publish progress info, is that right?

AIUI, the counts published via stats collector are updated asynchronously
w.r.t. operations they count and mostly as aggregate figures. For example,
PgStat_StatTabEntry.blocks_fetched. IOW, we never see
pg_statio_all_tables.heap_blks_read updating as a scan reads blocks. Maybe
that helps keep traffic to pgstat collector to sane levels. But that is
not to mean that I think controlling stats collector levels was the only
design consideration behind how such counters are published.

In case of reporting counters as progress info, it seems we might have to
send too many PgStat_Msg's, for example, for every block we finish
processing during vacuum. That kind of message traffic may swamp the
collector. Then we need to see the updated counters from other counters in
near real-time though that may be possible with suitable (build?)
configuration.

Moreover, the counters reported as progress info seem to be of different
nature than those published via the stats collector. I would think of it
in terms of the distinction between track_activities and track_counts. I
find these lines on "The Statistics Collector" page somewhat related:

"PostgreSQL also supports reporting dynamic information about exactly what
is going on in the system right now, such as the exact command currently
being executed by other server processes, and which other connections
exist in the system. This facility is independent of the collector process."

Then,

"When using the statistics to monitor collected data, it is important to
realize that the information does not update instantaneously. Each
individual server process transmits new statistical counts to the
collector just before going idle; so a query or transaction still in
progress does not affect the displayed totals. Also, the collector itself
emits a new report at most once per PGSTAT_STAT_INTERVAL milliseconds (500
ms unless altered while building the server). So the displayed information
lags behind actual activity. However, current-query information collected
by track_activities is always up-to-date."


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-12-10 Thread Michael Paquier
On Thu, Dec 10, 2015 at 7:23 PM, Amit Langote
 wrote:
> On 2015/12/10 15:28, Michael Paquier wrote:
>> - The progress tracking facility adds a whole level of complexity for
>> very little gain, and IMO this should *not* be part of PgBackendStatus
>> because in most cases its data finishes wasted. We don't expect
>> backends to run frequently such progress reports, do we? My opinion on
>> the matter if that we should define a different collector data for
>> vacuum, with something like PgStat_StatVacuumEntry, then have on top
>> of it a couple of routines dedicated at feeding up data with it when
>> some work is done on a vacuum job.
>
> I assume your comment here means we should use stats collector to the
> track/publish progress info, is that right?

Yep.

> AIUI, the counts published via stats collector are updated asynchronously
> w.r.t. operations they count and mostly as aggregate figures. For example,
> PgStat_StatTabEntry.blocks_fetched. IOW, we never see
> pg_statio_all_tables.heap_blks_read updating as a scan reads blocks. Maybe
> that helps keep traffic to pgstat collector to sane levels. But that is
> not to mean that I think controlling stats collector levels was the only
> design consideration behind how such counters are published.
>
> In case of reporting counters as progress info, it seems we might have to
> send too many PgStat_Msg's, for example, for every block we finish
> processing during vacuum. That kind of message traffic may swamp the
> collector. Then we need to see the updated counters from other counters in
> near real-time though that may be possible with suitable (build?)
> configuration.

As far as I understand it, the basic reason why this patch exists is
to allow a DBA to have a hint of the progress of a VACUUM that may be
taking minutes, or say hours, which is something we don't have now. So
it seems perfectly fine to me to report this information
asynchronously with a bit of lag. Why would we need so much precision
in the report?

>> In short, it seems to me that this patch needs a rework, and should be
>> returned with feedback. Other opinions?
>
> Yeah, some more thought needs to be put into design of the general
> reporting interface. Then we also need to pay attention to another
> important aspect of this patch - lazy vacuum instrumentation.

This patch has received a lot of feedback, and it is not in a
committable state, so I marked it as "Returned with feedback" for this
CF.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-12-10 Thread Robert Haas
On Thu, Dec 10, 2015 at 9:49 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> Oh, please, no.  Gosh, this is supposed to be a lightweight facility!
>> Just have a chunk of shared memory and write the data in there.  If
>> you try to feed this through the stats collector you're going to
>> increase the overhead by 100x or more, and there's no benefit.  We've
>> got to do relation stats that way because there's no a priori bound on
>> the number of relations, so we can't just preallocate enough shared
>> memory for all of them.  But there's no similar restriction here: the
>> number of backends IS fixed at startup time.  As long as we limit the
>> amount of progress information that a backend can supply to some fixed
>> length, which IMHO we definitely should, there's no need to add the
>> expense of funneling this through the stats collector.
>
> I agree with this, and I'd further add that if we don't have a
> fixed-length progress state, we've overdesigned the facility entirely.
> People won't be able to make sense of anything that acts much more
> complicated than "0% .. 100% done".  So you need to find a way of
> approximating progress of a given command in terms more or less
> like that, even if it's a pretty crude approximation.

That I don't agree with.  Even for something like VACUUM, it's pretty
hard to approximate overall progress - because, for example, normally
we'll only have 1 index scan per index, but we might have multiple
index scans or none if maintenance_work_mem is too small or if there
aren't any dead tuples after all.  I don't want our progress reporting
facility to end up with this reputation:

https://xkcd.com/612/

This point has already been discussed rather extensively upthread, but
to reiterate, I think it's much better to report slightly more
detailed information and let the user figure out what to do with it.
For example, for a VACUUM, I think we should report something like
this:

1. The number of heap pages scanned thus far.
2. The number of dead tuples found thus far.
3. The number of dead tuples we can store before we run out of
maintenance_work_mem.
4. The number of index pages processed by the current index vac cycle
(or a sentinel value if none is in progress).
5. The number of heap pages for which the "second heap pass" has been completed.

Now, if the user wants to flatten this out to a progress meter, they
can write an SQL expression which does that easily enough, folding the
sizes of the table and its indices and whatever assumptions they want
to make about what will happen down the road.  If we all agree on how
that should be done, it can even ship as a built-in view.  But I
*don't* think we should build those assumptions into the core progress
reporting facility.  For one thing, that would make updating the
progress meter considerably more expensive - you'd have to recompute a
new completion percentage instead of just saying "heap pages processed
went up by one".  For another thing, there are definitely going to be
some people that want the detailed information - and I can practically
guarantee that if we don't make it available, at least one person will
write a tool that tries to reverse-engineer the detailed progress
information from whatever we do report.

Heck, I might do it myself.  If I find a long-running vacuum on a
customer system that doesn't seem to be making progress, knowing that
it's 69% complete and that the completion percentage isn't rising
doesn't help me much.  What I want to know is are we stuck in a heap
vacuum phase, or an index vacuum phase, or a second heap pass.  Are we
making progress so slowly that 69% takes forever to get to 70%, or are
we making absolutely no progress at all?  I think if we don't report a
healthy amount of detail here people will still frequently have to
resort to what I do now, which is ask the customer to install strace
and attach it to the vacuum process.  For many customers, that's not
so easy; and it never inspires any confidence.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-12-10 Thread Kyotaro HORIGUCHI
Sorry, I misunderstood the meaning of PgStat_*.

At Fri, 11 Dec 2015 09:41:04 +0900, Amit Langote 
 wrote in <566a1ba0.70...@lab.ntt.co.jp>
> > As far as I understand it, the basic reason why this patch exists is
> > to allow a DBA to have a hint of the progress of a VACUUM that may be
> > taking minutes, or say hours, which is something we don't have now. So
> > it seems perfectly fine to me to report this information
> > asynchronously with a bit of lag. Why would we need so much precision
> > in the report?
> 
> Sorry, I didn't mean to overstate this requirement. I agree precise
> real-time reporting of progress info is not such a stringent requirement
> from the patch. The point regarding whether we should storm the collector
> with progress info messages still holds, IMHO.

Taking a few seconds interval between each messages would be
sufficient. I personaly think that gettimeofday() per processing
every buffer (or few buffers) is not so heavy-weight but I
suppose there's not such a consensus here. However,
IsCheckpointOnSchedule does that per writing one buffer.

vacuum_delay_point() seems to be a reasonable point to check the
interval and send stats since it would be designed to be called
with the interval also appropriate for this purpose.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-12-10 Thread Amit Langote
On 2015/12/11 14:41, Kyotaro HORIGUCHI wrote:
> Sorry, I misunderstood the meaning of PgStat_*.

I should've just said "messages to the stats collector" instead of
"PgStat_Msg's".

> 
> At Fri, 11 Dec 2015 09:41:04 +0900, Amit Langote 
>  wrote
>>> As far as I understand it, the basic reason why this patch exists is
>>> to allow a DBA to have a hint of the progress of a VACUUM that may be
>>> taking minutes, or say hours, which is something we don't have now. So
>>> it seems perfectly fine to me to report this information
>>> asynchronously with a bit of lag. Why would we need so much precision
>>> in the report?
>>
>> Sorry, I didn't mean to overstate this requirement. I agree precise
>> real-time reporting of progress info is not such a stringent requirement
>> from the patch. The point regarding whether we should storm the collector
>> with progress info messages still holds, IMHO.
> 
> Taking a few seconds interval between each messages would be
> sufficient. I personaly think that gettimeofday() per processing
> every buffer (or few buffers) is not so heavy-weight but I
> suppose there's not such a consensus here. However,
> IsCheckpointOnSchedule does that per writing one buffer.
> 
> vacuum_delay_point() seems to be a reasonable point to check the
> interval and send stats since it would be designed to be called
> with the interval also appropriate for this purpose.

Interesting, vacuum_delay_point() may be worth considering.

It seems though that, overall, PgBackendStatus approach may be more suited
for progress tracking. Let's see what the author thinks.

Thanks,
Amit





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-12-10 Thread Michael Paquier
On Fri, Dec 11, 2015 at 12:59 AM, Robert Haas  wrote:
> On Thu, Dec 10, 2015 at 9:49 AM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> Oh, please, no.  Gosh, this is supposed to be a lightweight facility!
>>> Just have a chunk of shared memory and write the data in there.  If
>>> you try to feed this through the stats collector you're going to
>>> increase the overhead by 100x or more, and there's no benefit.  We've
>>> got to do relation stats that way because there's no a priori bound on
>>> the number of relations, so we can't just preallocate enough shared
>>> memory for all of them.  But there's no similar restriction here: the
>>> number of backends IS fixed at startup time.  As long as we limit the
>>> amount of progress information that a backend can supply to some fixed
>>> length, which IMHO we definitely should, there's no need to add the
>>> expense of funneling this through the stats collector.
>>
>> I agree with this, and I'd further add that if we don't have a
>> fixed-length progress state, we've overdesigned the facility entirely.
>> People won't be able to make sense of anything that acts much more
>> complicated than "0% .. 100% done".  So you need to find a way of
>> approximating progress of a given command in terms more or less
>> like that, even if it's a pretty crude approximation.

Check. My opinion is based on the fact that most of the backends are
not going to use the progress facility at all, and we actually do not
need a high level of precision for VACUUM reports: we could simply
send messages with a certain delay between two messages. And it looks
like a waste to allocate that for all the backends. But I am going to
withdraw here, two committers is by far too much pressure.

> That I don't agree with.  Even for something like VACUUM, it's pretty
> hard to approximate overall progress - because, for example, normally
> we'll only have 1 index scan per index, but we might have multiple
> index scans or none if maintenance_work_mem is too small or if there
> aren't any dead tuples after all.  I don't want our progress reporting
> facility to end up with this reputation:
>
> https://xkcd.com/612/

This brings memories. Who has never faced that...

> This point has already been discussed rather extensively upthread, but
> to reiterate, I think it's much better to report slightly more
> detailed information and let the user figure out what to do with it.
> For example, for a VACUUM, I think we should report something like
> this:
> 1. The number of heap pages scanned thus far.
> 2. The number of dead tuples found thus far.
> 3. The number of dead tuples we can store before we run out of
> maintenance_work_mem.
> 4. The number of index pages processed by the current index vac cycle
> (or a sentinel value if none is in progress).
> 5. The number of heap pages for which the "second heap pass" has been 
> completed
> Now, if the user wants to flatten this out to a progress meter, they
> can write an SQL expression which does that easily enough, folding the
> sizes of the table and its indices and whatever assumptions they want
> to make about what will happen down the road.  If we all agree on how
> that should be done, it can even ship as a built-in view.  But I
> *don't* think we should build those assumptions into the core progress
> reporting facility.  For one thing, that would make updating the
> progress meter considerably more expensive - you'd have to recompute a
> new completion percentage instead of just saying "heap pages processed
> went up by one".

This stuff I agree. Having global counters, and have user compute any
kind of percentage or progress bar is definitely the way to go.

> For another thing, there are definitely going to be
> some people that want the detailed information - and I can practically
> guarantee that if we don't make it available, at least one person will
> write a tool that tries to reverse-engineer the detailed progress
> information from whatever we do report.

OK, so this justifies the fact of having detailed information, but
does it justify the fact of having precise and accurate data? ISTM
that having detailed information and precise information are two
different things. The level of details is defined depending on how
verbose we want the information to be, and the list you are giving
would fulfill this requirement nicely for VACUUM. The level of
precision/accuracy at which this information is provided though
depends at which frequency we want to send this information. For
long-running VACUUM it does not seem necessary to update the fields of
the progress tracker each time a counter needs to be incremented.
CLUSTER has been mentioned as well as a potential target for the
progress facility, but it seems that it enters as well in the category
of things that would need to be reported on a slower frequency pace
than "each-time-a-counter-is-incremented".

My impression is just based 

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-12-10 Thread Robert Haas
On Thu, Dec 10, 2015 at 6:46 AM, Michael Paquier
 wrote:
> On Thu, Dec 10, 2015 at 7:23 PM, Amit Langote
>  wrote:
>> On 2015/12/10 15:28, Michael Paquier wrote:
>>> - The progress tracking facility adds a whole level of complexity for
>>> very little gain, and IMO this should *not* be part of PgBackendStatus
>>> because in most cases its data finishes wasted. We don't expect
>>> backends to run frequently such progress reports, do we? My opinion on
>>> the matter if that we should define a different collector data for
>>> vacuum, with something like PgStat_StatVacuumEntry, then have on top
>>> of it a couple of routines dedicated at feeding up data with it when
>>> some work is done on a vacuum job.
>>
>> I assume your comment here means we should use stats collector to the
>> track/publish progress info, is that right?
>
> Yep.

Oh, please, no.  Gosh, this is supposed to be a lightweight facility!
Just have a chunk of shared memory and write the data in there.  If
you try to feed this through the stats collector you're going to
increase the overhead by 100x or more, and there's no benefit.  We've
got to do relation stats that way because there's no a priori bound on
the number of relations, so we can't just preallocate enough shared
memory for all of them.  But there's no similar restriction here: the
number of backends IS fixed at startup time.  As long as we limit the
amount of progress information that a backend can supply to some fixed
length, which IMHO we definitely should, there's no need to add the
expense of funneling this through the stats collector.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-12-10 Thread Tom Lane
Robert Haas  writes:
> Oh, please, no.  Gosh, this is supposed to be a lightweight facility!
> Just have a chunk of shared memory and write the data in there.  If
> you try to feed this through the stats collector you're going to
> increase the overhead by 100x or more, and there's no benefit.  We've
> got to do relation stats that way because there's no a priori bound on
> the number of relations, so we can't just preallocate enough shared
> memory for all of them.  But there's no similar restriction here: the
> number of backends IS fixed at startup time.  As long as we limit the
> amount of progress information that a backend can supply to some fixed
> length, which IMHO we definitely should, there's no need to add the
> expense of funneling this through the stats collector.

I agree with this, and I'd further add that if we don't have a
fixed-length progress state, we've overdesigned the facility entirely.
People won't be able to make sense of anything that acts much more
complicated than "0% .. 100% done".  So you need to find a way of
approximating progress of a given command in terms more or less
like that, even if it's a pretty crude approximation.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-12-09 Thread Michael Paquier
On Thu, Dec 10, 2015 at 4:40 AM, Robert Haas  wrote:
> On Mon, Dec 7, 2015 at 2:41 AM, Amit Langote  
> wrote:
>> Hm, I guess progress messages would not change that much over the course
>> of a long-running command. How about we pass only the array(s) that we
>> change and NULL for others:
>>
>> With the following prototype for pgstat_report_progress:
>>
>> void pgstat_report_progress(uint32 *uint32_param, int num_uint32_param,
>>bool *message_param[], int num_message_param);
>>
>> If we just wanted to change, say scanned_heap_pages, then we report that
>> using:
>>
>> uint32_param[1] = scanned_heap_pages;
>> pgstat_report_progress(uint32_param, 3, NULL, 0);
>>
>> Also, pgstat_report_progress() should check each of its parameters for
>> NULL before iterating over to copy. So in most reports over the course of
>> a command, the message_param would be NULL and hence not copied.
>
> It's going to be *really* important that this facility provides a
> lightweight way of updating progress, so I think this whole API is
> badly designed.  VACUUM, for example, is going to want a way to update
> the individual counter for the number of pages it's scanned after
> every page.  It should not have to pass all of the other information
> that is part of a complete report.  It should just be able to say
> pgstat_report_progress_update_counter(1, pages_scanned) or something
> of this sort.  Don't marshal all of the data and then make
> pgstat_report_progress figure out what's changed.  Provide a series of
> narrow APIs where the caller tells you exactly what they want to
> update, and you update only that.  It's fine to have a
> pgstat_report_progress() function to update everything at once, for
> the use at the beginning of a command, but the incremental updates
> within the command should do something lighter-weight.

[first time looking really at the patch and catching up with this thread]

Agreed. As far as I can guess from the code, those should be as
followed to bloat pgstat message queue a minimum:

+   pgstat_report_activity_flag(ACTIVITY_IS_VACUUM);
/*
 * Loop to process each selected relation.
 */
Defining a new routine for this purpose is a bit surprising. Cannot we
just use pgstat_report_activity with a new BackendState STATE_INVACUUM
or similar if we pursue the progress tracking approach?

A couple of comments:
- The relation OID should be reported and not its name. In case of a
relation rename that would not be cool for tracking, and most users
are surely going to join with other system tables using it.
- The progress tracking facility adds a whole level of complexity for
very little gain, and IMO this should *not* be part of PgBackendStatus
because in most cases its data finishes wasted. We don't expect
backends to run frequently such progress reports, do we? My opinion on
the matter if that we should define a different collector data for
vacuum, with something like PgStat_StatVacuumEntry, then have on top
of it a couple of routines dedicated at feeding up data with it when
some work is done on a vacuum job.

In short, it seems to me that this patch needs a rework, and should be
returned with feedback. Other opinions?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-12-09 Thread Amit Langote
On 2015/12/10 4:40, Robert Haas wrote:
> It's going to be *really* important that this facility provides a
> lightweight way of updating progress, so I think this whole API is
> badly designed.  VACUUM, for example, is going to want a way to update
> the individual counter for the number of pages it's scanned after
> every page.  It should not have to pass all of the other information
> that is part of a complete report.  It should just be able to say
> pgstat_report_progress_update_counter(1, pages_scanned) or something
> of this sort.  Don't marshal all of the data and then make
> pgstat_report_progress figure out what's changed.  Provide a series of
> narrow APIs where the caller tells you exactly what they want to
> update, and you update only that.  It's fine to have a
> pgstat_report_progress() function to update everything at once, for
> the use at the beginning of a command, but the incremental updates
> within the command should do something lighter-weight.

How about something like the following:

/*
 * index: in the array of uint32 counters in the beentry
 * counter: new value of the (index+1)th counter
 */
void pgstat_report_progress_update_counter(int index, uint32 counter);

/*
 * msg: new value of (index+1)the message (with trailing null byte)
 */
void pgstat_report_progress_update_message(int index, const char *msg);

Actually updating a counter or message would look like:

pgstat_increment_changecount_before(beentry);
// update the counter or message at index in beentry->st_progress_*
pgstat_increment_changecount_after(beentry);

Other interface functions which are called at the beginning:

void pgstat_report_progress_set_command(int commandId);
void pgstat_report_progress_set_command_target(const char *target_name);
or
void pgstat_report_progress_set_command_target(Oid target_oid);

And then a SQL-level,
void pgstat_reset_local_progress();

Which simply sets beentry->st_command to some invalid value which signals
a progress view function to ignore this backend.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-12-09 Thread Kyotaro HORIGUCHI
Hello,

At Thu, 10 Dec 2015 15:28:14 +0900, Michael Paquier  
wrote in 

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-12-09 Thread Robert Haas
On Mon, Dec 7, 2015 at 2:41 AM, Amit Langote
 wrote:
> On 2015/12/03 19:05, Kyotaro HORIGUCHI wrote:
>> At Thu, 3 Dec 2015 16:24:32 +0900, Amit Langote 
>>  wrote
>>> By the way, there are some non-st_progress_* fields that I was talking
>>> about in my previous message. For example, st_activity_flag (which I have
>>> suggested to rename to st_command instead). It needs to be set once at the
>>> beginning of the command processing and need not be touched again. I think
>>> it may be a better idea to do the same for table name or OID. It also
>>> won't change over the duration of the command execution. So, we could set
>>> it once at the beginning where that becomes known. Currently in the patch,
>>> it's reported in st_progress_message[0] by every pgstat_report_progress()
>>> invocation. So, the table name will be strcpy()'d to shared memory for
>>> every scanned block that's reported.
>>
>> If we don't have dedicate reporting functions for each phase
>> (that is, static data phase and progress phase), the one function
>> pgstat_report_progress does that by having some instruction from
>> the caller and it would be a bitfield.
>>
>> Reading the flags for making decision whether every field to copy
>> or not and branching by that seems too much for int32 (or maybe
>> 64?) fields. Alhtough it would be in effective when we have
>> progress_message fields, I don't think it is a good idea without
>> having progress_message.
>>
>>> pgstat_report_progress(uint32 *param1,  uint16 param1_bitmap,
>>>char param2[][..], uint16 param2_bitmap)
>>> {
>>> ...
>>>   for(i = 0; i < 16 ; i++)
>>>   {
>>>   if (param1_bitmap & (1 << i))
>>>beentry->st_progress_param[i] = param1[i];
>>>   if (param2_bitmap & (1 << i))
>>>strcpy(beentry->..., param2[i]);
>>>   }
>>
>> Thoughts?
>
> Hm, I guess progress messages would not change that much over the course
> of a long-running command. How about we pass only the array(s) that we
> change and NULL for others:
>
> With the following prototype for pgstat_report_progress:
>
> void pgstat_report_progress(uint32 *uint32_param, int num_uint32_param,
>bool *message_param[], int num_message_param);
>
> If we just wanted to change, say scanned_heap_pages, then we report that
> using:
>
> uint32_param[1] = scanned_heap_pages;
> pgstat_report_progress(uint32_param, 3, NULL, 0);
>
> Also, pgstat_report_progress() should check each of its parameters for
> NULL before iterating over to copy. So in most reports over the course of
> a command, the message_param would be NULL and hence not copied.

It's going to be *really* important that this facility provides a
lightweight way of updating progress, so I think this whole API is
badly designed.  VACUUM, for example, is going to want a way to update
the individual counter for the number of pages it's scanned after
every page.  It should not have to pass all of the other information
that is part of a complete report.  It should just be able to say
pgstat_report_progress_update_counter(1, pages_scanned) or something
of this sort.  Don't marshal all of the data and then make
pgstat_report_progress figure out what's changed.  Provide a series of
narrow APIs where the caller tells you exactly what they want to
update, and you update only that.  It's fine to have a
pgstat_report_progress() function to update everything at once, for
the use at the beginning of a command, but the incremental updates
within the command should do something lighter-weight.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-12-06 Thread Amit Langote

Hi,

On 2015/12/03 19:05, Kyotaro HORIGUCHI wrote:
> At Thu, 3 Dec 2015 16:24:32 +0900, Amit Langote 
>  wrote
>> By the way, there are some non-st_progress_* fields that I was talking
>> about in my previous message. For example, st_activity_flag (which I have
>> suggested to rename to st_command instead). It needs to be set once at the
>> beginning of the command processing and need not be touched again. I think
>> it may be a better idea to do the same for table name or OID. It also
>> won't change over the duration of the command execution. So, we could set
>> it once at the beginning where that becomes known. Currently in the patch,
>> it's reported in st_progress_message[0] by every pgstat_report_progress()
>> invocation. So, the table name will be strcpy()'d to shared memory for
>> every scanned block that's reported.
> 
> If we don't have dedicate reporting functions for each phase
> (that is, static data phase and progress phase), the one function
> pgstat_report_progress does that by having some instruction from
> the caller and it would be a bitfield.
> 
> Reading the flags for making decision whether every field to copy
> or not and branching by that seems too much for int32 (or maybe
> 64?) fields. Alhtough it would be in effective when we have
> progress_message fields, I don't think it is a good idea without
> having progress_message.
> 
>> pgstat_report_progress(uint32 *param1,  uint16 param1_bitmap,
>>char param2[][..], uint16 param2_bitmap)
>> {
>> ...
>>   for(i = 0; i < 16 ; i++)
>>   {
>>   if (param1_bitmap & (1 << i))
>>beentry->st_progress_param[i] = param1[i];
>>   if (param2_bitmap & (1 << i))
>>strcpy(beentry->..., param2[i]);
>>   }
> 
> Thoughts?

Hm, I guess progress messages would not change that much over the course
of a long-running command. How about we pass only the array(s) that we
change and NULL for others:

With the following prototype for pgstat_report_progress:

void pgstat_report_progress(uint32 *uint32_param, int num_uint32_param,
   bool *message_param[], int num_message_param);

If we just wanted to change, say scanned_heap_pages, then we report that
using:

uint32_param[1] = scanned_heap_pages;
pgstat_report_progress(uint32_param, 3, NULL, 0);

Also, pgstat_report_progress() should check each of its parameters for
NULL before iterating over to copy. So in most reports over the course of
a command, the message_param would be NULL and hence not copied.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-12-03 Thread Kyotaro HORIGUCHI
Hello,

At Thu, 3 Dec 2015 16:24:32 +0900, Amit Langote  
wrote in <565fee30.8010...@lab.ntt.co.jp>
> > Agreed. The patch already separates integer values and texts.
> > And re-reviewing the patch, there's no fields necessary to be
> > passed as string.
> > 
> > total_heap_pages, scanned_heap_pages, total_index_pages,
> > scanned_index_pages, vacrelstats->num_index_scans are currently
> > in int32.
> > 
> > Phase can be in integer, and schema_name and relname can be
> > represented by one OID, uint32.
> 
> AIUI, st_progress_message (strings) are to be used to share certain
> messages as progress information. I think the latest vacuum-progress patch
> uses it to report which phase lazy_scan_heap() is in, for example,
> "Scanning heap" for phase 1 of its processing and "Vacuuming index and
> heap" for phase 2. Those values are shown to the user in a text column
> named "phase" of the pg_stat_vacuum_progress view. That said, reporting
> phase as an integer value may also be worth a consideration. Some other
> command might choose to do that.
> 
> > Finally, *NO* text field is needed at least this usage. So
> > progress_message is totally useless regardless of other usages
> > unknown to us.
> 
> I think it may be okay at this point to add just those st_progress_*
> fields which are required by lazy vacuum progress reporting. If someone
> comes up with instrumentation ideas for some other command, they could
> post patches to add more st_progress_* fields and to implement
> instrumentation and a progress view for that command. This is essentially
> what Robert said in [1] in relation to my suggestion of taking out
> st_progress_param_float from this patch.

Yes. After taking a detour, though.

> By the way, there are some non-st_progress_* fields that I was talking
> about in my previous message. For example, st_activity_flag (which I have
> suggested to rename to st_command instead). It needs to be set once at the
> beginning of the command processing and need not be touched again. I think
> it may be a better idea to do the same for table name or OID. It also
> won't change over the duration of the command execution. So, we could set
> it once at the beginning where that becomes known. Currently in the patch,
> it's reported in st_progress_message[0] by every pgstat_report_progress()
> invocation. So, the table name will be strcpy()'d to shared memory for
> every scanned block that's reported.

If we don't have dedicate reporting functions for each phase
(that is, static data phase and progress phase), the one function
pgstat_report_progress does that by having some instruction from
the caller and it would be a bitfield.

Reading the flags for making decision whether every field to copy
or not and branching by that seems too much for int32 (or maybe
64?) fields. Alhtough it would be in effective when we have
progress_message fields, I don't think it is a good idea without
having progress_message.

> pgstat_report_progress(uint32 *param1,  uint16 param1_bitmap,
>char param2[][..], uint16 param2_bitmap)
> {
> ...
>   for(i = 0; i < 16 ; i++)
>   {
>   if (param1_bitmap & (1 << i))
>beentry->st_progress_param[i] = param1[i];
>   if (param2_bitmap & (1 << i))
>strcpy(beentry->..., param2[i]);
>   }

Thoughts?


> Thanks,
> Amit
> 
> [1]
> http://www.postgresql.org/message-id/CA+TgmoYdZk9nPDtS+_kOt4S6fDRQLW+1jnJBmG0pkRT4ynxO=q...@mail.gmail.com

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   3   >