subject:"monitoring CREATE INDEX \[CONCURRENTLY\]"

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-04-07 Thread Peter Eisentraut

On 2019-04-06 06:40, Alvaro Herrera wrote:
> On 2019-Apr-05, Peter Eisentraut wrote:
> 
>> I've reworded the phases a bit.  There was a bit of a mixup of waiting
>> for snapshots and waiting for lockers.  Perhaps not so important from a
>> user's perspective, but at least now it's more consistent with the
>> source code comments.
> 
> No disagreement with that.  Looks reasonable.
> 
> I didn't test the patch, but it seems OK in a quick once-over.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-04-05 Thread Alvaro Herrera

On 2019-Apr-05, Peter Eisentraut wrote:

> I've reworded the phases a bit.  There was a bit of a mixup of waiting
> for snapshots and waiting for lockers.  Perhaps not so important from a
> user's perspective, but at least now it's more consistent with the
> source code comments.

No disagreement with that.  Looks reasonable.

I didn't test the patch, but it seems OK in a quick once-over.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-04-05 Thread Peter Eisentraut

On 2019-04-05 17:01, Alvaro Herrera wrote:
> Users are going to wonder why the other phases don't appear to complete
> for a long time :-)  Keep in mind that the "waiting" phases are very
> confusing to users.  I suggest we just define additional phase numbers
> for those phases, then switch the "false" argument to
> WaitForLockersMultiple to "true", and it should work :-)  Doc-wise, list
> all the phases in the same docbook table, indicate that REINDEX is also
> covered, and document in an easier-to-follow fashion which phases each
> command goes through.

Done in the attached patch.

I've reworded the phases a bit.  There was a bit of a mixup of waiting
for snapshots and waiting for lockers.  Perhaps not so important from a
user's perspective, but at least now it's more consistent with the
source code comments.

> Yeah, I think that's simple enough -- the CLUSTER one already does that,
> I think.

Added that.

> Another thing for REINDEX TABLE is that we should add a count
> of indexes to process, and how many are done.

Reasonable, but maybe a bit too much for the last moment.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 22c0254b76be05a2235ed84268d189fa6785e844 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 5 Apr 2019 22:27:14 +0200
Subject: [PATCH v2] Report progress of REINDEX operations

This uses the same infrastructure that the CREATE INDEX progress
reporting uses.
---
 doc/src/sgml/monitoring.sgml | 52 +---
 src/backend/catalog/index.c  | 10 ++
 src/backend/catalog/system_views.sql | 11 +++---
 src/backend/commands/indexcmds.c | 39 ++---
 src/include/commands/progress.h  |  3 ++
 src/test/regress/expected/rules.out  | 11 +++---
 6 files changed, 102 insertions(+), 24 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b946e13fdc..4eb43f2de9 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -346,7 +346,7 @@ Dynamic Statistics Views
 
  
   
pg_stat_progress_create_indexpg_stat_progress_create_index
-  One row for each backend running CREATE INDEX, 
showing
+  One row for each backend running CREATE INDEX 
or REINDEX, showing
   current progress.
   See .
  
@@ -3477,7 +3477,7 @@ Progress Reporting
   CREATE INDEX Progress Reporting
 
   
-   Whenever CREATE INDEX is running, the
+   Whenever CREATE INDEX or REINDEX is 
running, the
pg_stat_progress_create_index view will contain
one row for each backend that is currently creating indexes.  The tables
below describe the information that will be reported and provide information
@@ -3516,6 +3516,12 @@ 
pg_stat_progress_create_index View
   oid
   OID of the table on which the index is being created.
  
+ 
+  index_relid
+  oid
+  OID of the index being created or reindexed.  During a
+  non-concurrent CREATE INDEX, this is 0.
+ 
  
   phase
   text
@@ -3605,15 +3611,15 @@ CREATE INDEX phases
  
   initializing
   
-   CREATE INDEX is preparing to create the index.  This
+   CREATE INDEX or REINDEX is 
preparing to create the index.  This
phase is expected to be very brief.
   
  
  
-  waiting for old snapshots
+  waiting for writers before build
   
-   CREATE INDEX CONCURRENTLY is waiting for transactions
-   that can potentially see the table to release their snapshots.
+   CREATE INDEX CONCURRENTLY or REINDEX 
CONCURRENTLY is waiting for transactions
+   with write locks that can potentially see the table to finish.
This phase is skipped when not in concurrent mode.
Columns lockers_total, 
lockers_done
and current_locker_pid contain the progress 
@@ -3632,10 +3638,10 @@ CREATE INDEX phases
   
  
  
-  waiting for writer snapshots
+  waiting for writers before validation
   
-   CREATE INDEX CONCURRENTLY is waiting for transactions
-   that can potentially write into the table to release their snapshots.
+   CREATE INDEX CONCURRENTLY or REINDEX 
CONCURRENTLY is waiting for transactions
+   with write locks that can potentially write into the table to finish.
This phase is skipped when not in concurrent mode.
Columns lockers_total, 
lockers_done
and current_locker_pid contain the progress 
@@ -3670,9 +3676,9 @@ CREATE INDEX phases
   
  
  
-  waiting for reader snapshots
+  waiting for old snapshots
   
-   CREATE INDEX CONCURRENTLY is waiting for transactions
+   CREATE INDEX CONCURRENTLY or REINDEX 
CONCURRENTLY is waiting for transactions
that can potentially see the table to release their snapshots.  This
phase is skipped when not in concurrent mode.
Columns lockers_total, 
lockers_done
@@ -3680,6 +3686,28

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-04-05 Thread Alvaro Herrera

On 2019-Apr-05, Peter Eisentraut wrote:

> It seems we can easily extend this to also monitor REINDEX
> [CONCURRENTLY].  Attached is a quick patch.

That's much easier than I was expecting.  I think we should endeavor to
get this done for pg12.

> For the concurrently part, we currently don't have any phases defined
> for the index swap and drop, but maybe we can just skip that initially.
> What happens if we don't have those?

Users are going to wonder why the other phases don't appear to complete
for a long time :-)  Keep in mind that the "waiting" phases are very
confusing to users.  I suggest we just define additional phase numbers
for those phases, then switch the "false" argument to
WaitForLockersMultiple to "true", and it should work :-)  Doc-wise, list
all the phases in the same docbook table, indicate that REINDEX is also
covered, and document in an easier-to-follow fashion which phases each
command goes through.

> It might be nice to have a column in the view not only for the table OID
> but also the index OID.  That is obviously not so useful for CREATE
> INDEX but more useful for REINDEX.  I haven't looked into how adding
> that would work.

Yeah, I think that's simple enough -- the CLUSTER one already does that,
I think.  Another thing for REINDEX TABLE is that we should add a count
of indexes to process, and how many are done.

I was wondering about reporting the command being run.  In the progress_cluster
view we have a "command" column, which says either CLUSTER or VACUUM FULL.
I didn't add one for CREATE INDEX vs. CONCURRENTLY because it seemed to
me that joining to pg_stat_activity ought to be sufficient.  If we agree
with that reasoning, then we should remove the column from the CLUSTER
view, I think.  If not, we should add one to the create_index view.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-04-05 Thread Peter Eisentraut

It seems we can easily extend this to also monitor REINDEX
[CONCURRENTLY].  Attached is a quick patch.

For the concurrently part, we currently don't have any phases defined
for the index swap and drop, but maybe we can just skip that initially.
What happens if we don't have those?

It might be nice to have a column in the view not only for the table OID
but also the index OID.  That is obviously not so useful for CREATE
INDEX but more useful for REINDEX.  I haven't looked into how adding
that would work.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From cae47efe485c3c6115c1970b9c922ded1353d270 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 5 Apr 2019 16:41:27 +0200
Subject: [PATCH] Report progress of REINDEX operations

This uses the same infrastructure that the CREATE INDEX progress
reporting uses.
---
 src/backend/catalog/index.c  |  8 
 src/backend/commands/indexcmds.c | 19 +--
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 2ed7fdb021..686b3c8260 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -3286,12 +3286,18 @@ reindex_index(Oid indexId, bool skip_constraint_checks, 
char persistence,
heapId = IndexGetRelation(indexId, false);
heapRelation = table_open(heapId, ShareLock);
 
+   pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX,
+ heapId);
+
/*
 * Open the target index relation and get an exclusive lock on it, to
 * ensure that no one else is touching this particular index.
 */
iRel = index_open(indexId, AccessExclusiveLock);
 
+   pgstat_progress_update_param(PROGRESS_CREATEIDX_ACCESS_METHOD_OID,
+
iRel->rd_rel->relam);
+
/*
 * The case of reindexing partitioned tables and indexes is handled
 * differently by upper layers, so this case shouldn't arise.
@@ -3442,6 +3448,8 @@ reindex_index(Oid indexId, bool skip_constraint_checks, 
char persistence,
 errdetail_internal("%s",

pg_rusage_show(;
 
+   pgstat_progress_end_command();
+
/* Close rels, but keep locks */
index_close(iRel, NoLock);
table_close(heapRelation, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 348d543297..7efb4c9e15 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -2873,6 +2873,11 @@ ReindexRelationConcurrently(Oid relationOid, int options)
heapRel = table_open(indexRel->rd_index->indrelid,
 
ShareUpdateExclusiveLock);
 
+   pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX,
+ 
RelationGetRelid(heapRel));
+   
pgstat_progress_update_param(PROGRESS_CREATEIDX_ACCESS_METHOD_OID,
+
indexRel->rd_rel->relam);
+
/* Choose a temporary relation name for the new index */
concurrentName = ChooseRelationName(get_rel_name(indexId),

NULL,
@@ -2967,7 +2972,9 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 * DefineIndex() for more details.
 */
 
-   WaitForLockersMultiple(lockTags, ShareLock, false);
+   pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+
PROGRESS_CREATEIDX_PHASE_WAIT_1);
+   WaitForLockersMultiple(lockTags, ShareLock, true);
CommitTransactionCommand();
 
forboth(lc, indexIds, lc2, newIndexIds)
@@ -3009,7 +3016,9 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 * for more details.
 */
 
-   WaitForLockersMultiple(lockTags, ShareLock, false);
+   pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+
PROGRESS_CREATEIDX_PHASE_WAIT_2);
+   WaitForLockersMultiple(lockTags, ShareLock, true);
CommitTransactionCommand();
 
foreach(lc, newIndexIds)
@@ -3057,6 +3066,8 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 * before the reference snap was taken, we have to wait out any
 * transactions that might have older snapshots.
 */
+   pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-04-01 Thread Alvaro Herrera

I did this (I should stop c'ing this silly little setup code sometime):

create table t (a int) partition by hash (a);
create table t1 partition of t for values with (modulus 10, remainder 0);
create table t2 partition of t for values with (modulus 10, remainder 1);
create table t3 partition of t for values with (modulus 10, remainder 2);
create table t4 partition of t for values with (modulus 10, remainder 3);
create table t5 partition of t for values with (modulus 10, remainder 4);
create table t6 partition of t for values with (modulus 10, remainder 5);
create table t7 partition of t for values with (modulus 10, remainder 6);
create table t8 partition of t for values with (modulus 10, remainder 7);
create table t9 partition of t for values with (modulus 10, remainder 8);
create table t10 partition of t for values with (modulus 10, remainder 9);
insert into t select * from generate_series(1, 100 * 1000 * 1000);

Here's a complete report for CIC on a partition (since partitioned tables don't
support CIC anyway):

select relid::regclass, phase,
   format('lockers: %s/%s (%s)', lockers_done, lockers_total, 
current_locker_pid) as lockers,
   format('blocks: %s/%s', blocks_done, blocks_total) as blocks,
   format('tuples: %s/%s', tuples_done, tuples_total) as tuples,
   format('partitions: %s/%s', partitions_done, partitions_total) as 
partitions
from pg_stat_progress_create_index
\watch 0,1

   lun 01 abr 2019 19:02:31 -03 (cada 0,1s)

 relid | phase  | lockers  |  blocks  | 
  tuples|   partitions
---++--+--+-+-
 t2| building index: scanning table | lockers: 0/0 (0) | blocks: 86/44254 | 
tuples: 0/0 | partitions: 0/0
 t2| building index: scanning table | lockers: 0/0 (0) | blocks: 4639/44254 
| tuples: 0/0 | partitions: 0/0
 t2| building index: scanning table | lockers: 0/0 (0) | blocks: 
10890/44254 | tuples: 0/0 | partitions: 0/0
 t2| building index: scanning table | lockers: 0/0 (0) | blocks: 
17841/44254 | tuples: 0/0 | partitions: 0/0
 t2| building index: scanning table | lockers: 0/0 (0) | blocks: 
22899/44254 | tuples: 0/0 | partitions: 0/0
 t2| building index: scanning table | lockers: 0/0 (0) | blocks: 
29668/44254 | tuples: 0/0 | partitions: 0/0
 t2| building index: scanning table | lockers: 0/0 (0) | blocks: 
35531/44254 | tuples: 0/0 | partitions: 0/0
 t2| building index: scanning table | lockers: 0/0 (0) | blocks: 
41375/44254 | tuples: 0/0 | partitions: 0/0
 t2| building index: scanning table | lockers: 0/0 (0) | blocks: 
44254/44254 | tuples: 0/0 | partitions: 0/0
 t2| building index: scanning table | lockers: 0/0 (0) | blocks: 
44254/44254 | tuples: 0/0 | partitions: 0/0
 t2| building index: scanning table | lockers: 0/0 (0) | blocks: 
44254/44254 | tuples: 0/0 | partitions: 0/0
 t2| building index: loading tuples in tree | lockers: 0/0 (0) | blocks: 
0/0 | tuples: 291737/10001366 | partitions: 0/0
 t2| building index: loading tuples in tree | lockers: 0/0 (0) | blocks: 
0/0 | tuples: 1652429/10001366 | partitions: 0/0
 t2| building index: loading tuples in tree | lockers: 0/0 (0) | blocks: 
0/0 | tuples: 2984365/10001366 | partitions: 0/0
 t2| building index: loading tuples in tree | lockers: 0/0 (0) | blocks: 
0/0 | tuples: 4139066/10001366 | partitions: 0/0
 t2| building index: loading tuples in tree | lockers: 0/0 (0) | blocks: 
0/0 | tuples: 5463784/10001366 | partitions: 0/0
 t2| building index: loading tuples in tree | lockers: 0/0 (0) | blocks: 
0/0 | tuples: 6699498/10001366 | partitions: 0/0
 t2| building index: loading tuples in tree | lockers: 0/0 (0) | blocks: 
0/0 | tuples: 7947694/10001366 | partitions: 0/0
 t2| building index: loading tuples in tree | lockers: 0/0 (0) | blocks: 
0/0 | tuples: 933/10001366 | partitions: 0/0
 t2| index validation: scan index | lockers: 0/0 (0) | blocks: 2542/27426 | 
tuples: 0/0 | partitions: 0/0
 t2| index validation: scan index | lockers: 0/0 (0) | blocks: 7667/27426 | 
tuples: 0/0 | partitions: 0/0
 t2| index validation: scan index | lockers: 0/0 (0) | blocks: 15334/27426 
| tuples: 0/0 | partitions: 0/0
 t2| index validation: scan index | lockers: 0/0 (0) | blocks: 23001/27426 
| tuples: 0/0 | partitions: 0/0
 t2| index validation: sort index scan results | lockers: 0/0 (0) | blocks: 
0/0 | tuples: 0/0 | partitions: 0/0
 t2| index validation: scan heap | lockers: 0/0 (0) | blocks: 2586/44254 | 
tuples: 0/0 | partitions: 0/0
 t2| index validation: scan heap | lockers: 0/0 (0) | blocks: 8180/44254 | 
tuples: 0/0 | partitions: 0/0
 t2| index validation: scan heap | lockers: 0/0 (0) | blocks: 13807/44254 | 
tuples: 0/0 | partitions: 0/0
 t2| index validation: scan heap | lockers: 0/0 (0) | blocks: 19365/44254 | 
tuples: 0/0 | partitions:

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-04-01 Thread Alvaro Herrera

On 2019-Apr-02, Rahila Syed wrote:

> 1.  FWIW, below results for CIC show that blocks_done does not become equal
> to blocks_total at the end of the phase or it processes last 800 blocks so
> fast that
> the update is not visible in less than 1 secs interval.

Yeah, I noticed this too and decided it's not fixable, nor it's
desirable to spend a lot of effort in getting it perfectly accurate -- I
mean, we could introduce locking or sleeping to get the results we want,
but do we really want to make the index building process slower just to
report those block numbers.

Anyway, I think this effect is caused by parallel btree building: those
final blocks are scanned by a worker process, and the leader didn't get
the latest block number scanned.  If you set
max_parallel_maintenance_workers to 0, the effect disappears.

(I used \watch 0.01 to see even faster progress updates; even in that
case the final batch of block numbers is not seen in the updates.  The
btree build code is stupidly fast.)

> 2. However in case of partitioned tables, the following difference in
> blocks_done versus blocks_total at the end of phase is notably high for the
> first partition . Subsequent partitions show negligible difference.
> Partition 1:
> Mon Mar 25 14:27:57 IST 2019
>   pid  | datid | datname  | relid |   phase|
> lockers_total | lockers_done | current_locker_pid | blocks_total |
> blocks_done | tuples_total | tuples_done | partitions_total |
> partitions_done
> ---+---+--+---++---+--++--+-+--+-+--+-
>  10630 | 13533 | postgres | 16394 | building index: table scan
> | 0 |0 |  0 |   381342 |
> 221233 |0 |   0 |3 |   0
> (1 row)

Hmm, in my tests with partitioned tables, I never noticed such a large
discrepancy.  I'm going to have another look.  800 blocks scanned by
workers I can believe, but 16 sounds a bit too much.

> 3. Sorry for nitpicking, I think following phase name can be made more
> consistent with the others.
> The non-am specific phase for scanning a table is named as scan heap while
> am-specific one is named as table scan.
> Can we use heap for am-specific one as well since heap is used elsewhere in
> progress reporting too?

Hmm, I'd rather go the other way and use "table" everywhere rather than
heap, since we've been getting a lot of stuff done for table AMs.

> 4. -   scan = table_beginscan_parallel(btspool->heap,
> ParallelTableScanFromBTShared(btshared));
> +   scan = table_beginscan_parallel(btspool->heap,
> +
> ParallelTableScanFromBTShared(btshared));
> 
> Is this change required?

Yes, for my OCD.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-04-01 Thread Rahila Syed

Hi,

On Mon, 1 Apr 2019 at 21:40, Alvaro Herrera 
wrote:

> Hi Rahila, thanks for reviewing.
>
> On 2019-Mar-25, Rahila Syed wrote:
>
> > Please see few comments below:
> >
> > 1. Makecheck fails currently as view definition of expected rules.out
> does
> > not reflect latest changes in progress metrics numbering.
>
> Ouch ... fixed.
>
> > 2. +  
> > I think there is a typo here 's/partitioned/partitioned table/'
>
> Ah, so there is.  Fixed.
>
> > 3.
> > I think parallel scan equivalent bpscan->phs_nblocks along with
> > hscan->rs_nblocks should be used similar to startblock computation above.
>
> Hmm, yeah.  I think the way things are setup currently it doesn't matter
> much, but it seems fragile to rely on that.
>
>
Thank you for incorporating the review comments.


> I also moved the reporting of total blocks to scan in the initial table
> scan so that it occurs wholly in heapam_index_build_range_scan; I had
> originally put that code in _bt_spools_heapscan, but that was a
> modularity violation I think.  (It didn't make a practical difference,
> but it made no sense to have the two cases report the number in wildly
> different locations.)  Also added a final nblocks metric update after
> the scan is done.
>
> I think this patch is done.
>

I tested the v8 patch by running plain CREATE INDEX, CIC, and for
partitioned tables
and the results are as expected.  Please see few observations below.

1.  FWIW, below results for CIC show that blocks_done does not become equal
to blocks_total at the end of the phase or it processes last 800 blocks so
fast that
the update is not visible in less than 1 secs interval.

*Mon Mar 25 11:06:31 IST 2019*
  pid  | datid | datname  | relid |   phase|
lockers_total | lockers_done | current_locker_pid | blocks_total |
blocks_done | tuples_total | tuples_done | partitions_total |
partitions_done
---+---+--+---++---+--++--+-+--+-+--+-
 10630 | 13533 | postgres | 16384 | building index: table scan
| 0 |0 |  0 |  1293366 |
1292535 |0 |   0 |0 |   0
(1 row)

*Mon Mar 25 11:06:31 IST 2019*
  pid  | datid | datname  | relid |  phase
| lockers_total | lockers_done | current_locker_pid | blocks_total |
blocks_done | tuples_total | tuples_done | partitions_total |
partitions_done
---+---+--+---+-+---+--++--+-+--+-+--+-
 10630 | 13533 | postgres | 16384 | building index: sorting tuples, spool 1
| 0 |0 |  0 |0
|   0 |2 |   0 |0
|   0
(1 row)

2. However in case of partitioned tables, the following difference in
blocks_done versus blocks_total at the end of phase is notably high for the
first partition . Subsequent partitions show negligible difference.
Partition 1:
Mon Mar 25 14:27:57 IST 2019
  pid  | datid | datname  | relid |   phase|
lockers_total | lockers_done | current_locker_pid | blocks_total |
blocks_done | tuples_total | tuples_done | partitions_total |
partitions_done
---+---+--+---++---+--++--+-+--+-+--+-
 10630 | 13533 | postgres | 16394 | building index: table scan
| 0 |0 |  0 |   381342 |
221233 |0 |   0 |3 |   0
(1 row)

Mon Mar 25 14:27:57 IST 2019
  pid  | datid | datname  | relid |  phase
| lockers_total | lockers_done | current_locker_pid | blocks_total |
blocks_done | tuples_total | tuples_done | partitions_total |
partitions_done
---+---+--+---+-+---+--++--+-+--+-+--+-
 10630 | 13533 | postgres | 16394 | building index: sorting tuples, spool 1
| 0 |0 |  0 |0
|   0 | 4999 |   0 |3
|   0

The partitions are equal in size and the other two partitions have
blocks_done and blocks_total to be approx. 221233. The blocks_total for
partition 1 is reported higher.

3. Sorry for nitpicking, I think following phase name can be made more
consistent with the others.
The non-am specific phase for scanning a table is named as scan heap while
am-specific one is named as table scan.
Can we use heap for am-specific one

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-04-01 Thread Alvaro Herrera

Hi Rahila, thanks for reviewing.

On 2019-Mar-25, Rahila Syed wrote:

> Please see few comments below:
> 
> 1. Makecheck fails currently as view definition of expected rules.out does
> not reflect latest changes in progress metrics numbering.

Ouch ... fixed.

> 2. +  
> I think there is a typo here 's/partitioned/partitioned table/'

Ah, so there is.  Fixed.

> 3.
> I think parallel scan equivalent bpscan->phs_nblocks along with
> hscan->rs_nblocks should be used similar to startblock computation above.

Hmm, yeah.  I think the way things are setup currently it doesn't matter
much, but it seems fragile to rely on that.

I also moved the reporting of total blocks to scan in the initial table
scan so that it occurs wholly in heapam_index_build_range_scan; I had
originally put that code in _bt_spools_heapscan, but that was a
modularity violation I think.  (It didn't make a practical difference,
but it made no sense to have the two cases report the number in wildly
different locations.)  Also added a final nblocks metric update after
the scan is done.

I think this patch is done.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From d916a5daf8059fcd1ab122fba6641dac747837a8 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 25 Mar 2019 13:29:34 -0300
Subject: [PATCH v9 1/2] Report progress of CREATE INDEX operations

---
 contrib/amcheck/verify_nbtree.c  |   2 +-
 contrib/bloom/blinsert.c |   2 +-
 contrib/bloom/blutils.c  |   1 +
 doc/src/sgml/indexam.sgml|  13 ++
 doc/src/sgml/monitoring.sgml | 224 ++-
 src/backend/access/brin/brin.c   |   5 +-
 src/backend/access/gin/gininsert.c   |   2 +-
 src/backend/access/gin/ginutil.c |   1 +
 src/backend/access/gist/gist.c   |   1 +
 src/backend/access/gist/gistbuild.c  |   2 +-
 src/backend/access/hash/hash.c   |   3 +-
 src/backend/access/heap/heapam_handler.c | 107 +++
 src/backend/access/nbtree/nbtree.c   |   9 +
 src/backend/access/nbtree/nbtsort.c  |  56 +-
 src/backend/access/nbtree/nbtutils.c |  24 +++
 src/backend/access/spgist/spginsert.c|   2 +-
 src/backend/access/spgist/spgutils.c |   1 +
 src/backend/catalog/index.c  |  67 ++-
 src/backend/catalog/system_views.sql |  27 +++
 src/backend/commands/indexcmds.c |  72 +++-
 src/backend/storage/ipc/standby.c|   2 +-
 src/backend/storage/lmgr/lmgr.c  |  46 -
 src/backend/storage/lmgr/lock.c  |   7 +-
 src/backend/utils/adt/amutils.c  |  23 +++
 src/backend/utils/adt/pgstatfuncs.c  |   2 +
 src/include/access/amapi.h   |   4 +
 src/include/access/genam.h   |   1 +
 src/include/access/nbtree.h  |  11 ++
 src/include/access/tableam.h |  10 +
 src/include/catalog/pg_proc.dat  |  10 +-
 src/include/commands/progress.h  |  37 +++-
 src/include/pgstat.h |   5 +-
 src/include/storage/lmgr.h   |   4 +-
 src/include/storage/lock.h   |   2 +-
 src/test/regress/expected/rules.out  |  30 ++-
 35 files changed, 769 insertions(+), 46 deletions(-)

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index 9d5b2e5be67..591e0a3e46a 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -566,7 +566,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
 			 RelationGetRelationName(state->rel),
 			 RelationGetRelationName(state->heaprel));
 
-		table_index_build_scan(state->heaprel, state->rel, indexinfo, true,
+		table_index_build_scan(state->heaprel, state->rel, indexinfo, true, false,
 			   bt_tuple_present_callback, (void *) state, scan);
 
 		ereport(DEBUG1,
diff --git a/contrib/bloom/blinsert.c b/contrib/bloom/blinsert.c
index 1b8df7e1e84..48f35d39990 100644
--- a/contrib/bloom/blinsert.c
+++ b/contrib/bloom/blinsert.c
@@ -142,7 +142,7 @@ blbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	initCachedPage();
 
 	/* Do the heap scan */
-	reltuples = table_index_build_scan(heap, index, indexInfo, true,
+	reltuples = table_index_build_scan(heap, index, indexInfo, true, false,
 	   bloomBuildCallback, (void *) ,
 	   NULL);
 
diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index d078dfbd469..ee3bd562748 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -132,6 +132,7 @@ blhandler(PG_FUNCTION_ARGS)
 	amroutine->amcostestimate = blcostestimate;
 	amroutine->amoptions = bloptions;
 	amroutine->amproperty = NULL;
+	amroutine->ambuildphasename = NULL;
 	amroutine->amvalidate = blvalidate;
 	amroutine->ambeginscan = blbeginscan;
 	amroutine->amrescan = blrescan;
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index b56d3b3daa1..ff8290da9ff

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-31 Thread Rahila Syed

Hi Alvaro,

Please see few comments below:

1. Makecheck fails currently as view definition of expected rules.out does
not reflect latest changes in progress metrics numbering.

2. +  
+   When creating an index on a partitioned, this column is set to the
+   total number of partitions on which the index is to be created.
+  
+ 
+ 
+  partitions_done
+  bigint
+  
+   When creating an index on a partitioned, this column is set to the

I think there is a typo here 's/partitioned/partitioned table/'

3.
+   if (hscan->rs_base.rs_parallel != NULL)
+   {
+   ParallelBlockTableScanDesc bpscan;
+
+   bpscan = (ParallelBlockTableScanDesc)
hscan->rs_base.rs_parallel;
+   startblock = bpscan->phs_startblock;
+   }
+   else
+   startblock = hscan->rs_startblock;
+
+   /*
+* Might have wrapped around the end of the relation, if startblock
was
+* not zero.
+*/
+   if (hscan->rs_cblock > startblock)
+   blocks_done = hscan->rs_cblock - startblock;
+   else
+   blocks_done = hscan->rs_nblocks - startblock +
+   hscan->rs_cblock;
+
+   return blocks_done;

I think parallel scan equivalent bpscan->phs_nblocks along with
hscan->rs_nblocks should be used similar to startblock computation above.

Thank you,
Rahila Syed

On Fri, 29 Mar 2019 at 23:46, Alvaro Herrera 
wrote:

> On 2019-Mar-29, Alvaro Herrera wrote:
>
> > So, CLUSTER and ALTER TABLE rewrites only do non-concurrent index
> > builds; and REINDEX can reuse pretty much the same wait-for metrics
> > columns as CIC.  So I think it's okay if I move only the metrics that
> > conflict for index_build.
>
> The attached version does it that way.  I had to enlarge the param set a
> bit more.  (I suspect those extra columns will be useful to reindex.)
> Also, rebased for recent conflicting changes.
>
>
> I think we should consider a new column of an array type, where we could
> put things like the list of PIDs to be waited for, the list of OIDs of
> index to rebuild, or the list of partitions to build the index on.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


-- 
Rahila Syed
Performance Engineer
2ndQuadrant
http://www.2ndQuadrant.com 
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-29 Thread Alvaro Herrera

On 2019-Mar-29, Robert Haas wrote:

> On Fri, Mar 29, 2019 at 3:28 PM Alvaro Herrera  
> wrote:

> > Maybe we can consider using dynamic shmem for that, and include a
> > pointer to it in the fixed-size chunk.  (It's a bit too late to be
> > writing this code, mind; I'm just proposing this for a future
> > improvement.)
> 
> Sounds expensive.  We don't want to spend a lot of energy pushing out
> progress reports which, often enough, nobody will ever examine.  I
> designed the current system as I did to make it cheap.

Well, I'm not proposing this for things that would change more than once
or a very limited number of times during one command; certainly not once
per tuple or per block like other metrics do.  The examples I mentioned
are once per command (eg., list of OIDs of partitions to process) or
list of PIDs to wait for, which we wouldn't modify it once set for each
waiting cycle (three times for CREATE INDEX CONCURRENTLY, five times for
REINDEX CONCURRENTLY).

> Adding DSM in there would open up lots of exciting new failure
> possibilities and significantly increase the overhead.  And probably
> add quite a bit of code complexity, too.

Yeah, that's true.

Anyway, I'm not intending to tackle this for the time being.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-29 Thread Robert Haas

On Fri, Mar 29, 2019 at 3:28 PM Alvaro Herrera  wrote:
> On 2019-Mar-29, Robert Haas wrote:
> > On Fri, Mar 29, 2019 at 2:16 PM Alvaro Herrera  
> > wrote:
> > > I think we should consider a new column of an array type, where we could
> > > put things like the list of PIDs to be waited for, the list of OIDs of
> > > index to rebuild, or the list of partitions to build the index on.
> >
> > This has to work with a fixed-size chunk of shared memory.
>
> Bah, of course.
>
> Maybe we can consider using dynamic shmem for that, and include a
> pointer to it in the fixed-size chunk.  (It's a bit too late to be
> writing this code, mind; I'm just proposing this for a future
> improvement.)

Sounds expensive.  We don't want to spend a lot of energy pushing out
progress reports which, often enough, nobody will ever examine.  I
designed the current system as I did to make it cheap.  Adding DSM in
there would open up lots of exciting new failure possibilities and
significantly increase the overhead.  And probably add quite a bit of
code complexity, too.

There's probably room for an elaborate progress-reporting facility in
PostgreSQL that can even handle arbitrary stuff like queries.  But I
think it might look a lot different from this one, which is designed
and intended to handle simple cases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-29 Thread Alvaro Herrera

On 2019-Mar-29, Robert Haas wrote:

> On Fri, Mar 29, 2019 at 2:16 PM Alvaro Herrera  
> wrote:
> > I think we should consider a new column of an array type, where we could
> > put things like the list of PIDs to be waited for, the list of OIDs of
> > index to rebuild, or the list of partitions to build the index on.
> 
> This has to work with a fixed-size chunk of shared memory.

Bah, of course.

Maybe we can consider using dynamic shmem for that, and include a
pointer to it in the fixed-size chunk.  (It's a bit too late to be
writing this code, mind; I'm just proposing this for a future
improvement.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-29 Thread Robert Haas

On Fri, Mar 29, 2019 at 2:16 PM Alvaro Herrera  wrote:
> I think we should consider a new column of an array type, where we could
> put things like the list of PIDs to be waited for, the list of OIDs of
> index to rebuild, or the list of partitions to build the index on.

This has to work with a fixed-size chunk of shared memory.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-29 Thread Alvaro Herrera

On 2019-Mar-29, Alvaro Herrera wrote:

> So, CLUSTER and ALTER TABLE rewrites only do non-concurrent index
> builds; and REINDEX can reuse pretty much the same wait-for metrics
> columns as CIC.  So I think it's okay if I move only the metrics that
> conflict for index_build.

The attached version does it that way.  I had to enlarge the param set a
bit more.  (I suspect those extra columns will be useful to reindex.)
Also, rebased for recent conflicting changes.


I think we should consider a new column of an array type, where we could
put things like the list of PIDs to be waited for, the list of OIDs of
index to rebuild, or the list of partitions to build the index on.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From c36cc4e62067ead6eb8266602046f090689ccf76 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 25 Mar 2019 13:29:34 -0300
Subject: [PATCH v8 1/2] Report progress of CREATE INDEX operations

---
 contrib/amcheck/verify_nbtree.c  |   2 +-
 contrib/bloom/blinsert.c |   2 +-
 contrib/bloom/blutils.c  |   1 +
 doc/src/sgml/indexam.sgml|  13 ++
 doc/src/sgml/monitoring.sgml | 224 ++-
 src/backend/access/brin/brin.c   |   5 +-
 src/backend/access/gin/gininsert.c   |   2 +-
 src/backend/access/gin/ginutil.c |   1 +
 src/backend/access/gist/gist.c   |   1 +
 src/backend/access/gist/gistbuild.c  |   2 +-
 src/backend/access/hash/hash.c   |   3 +-
 src/backend/access/heap/heapam_handler.c |  73 
 src/backend/access/nbtree/nbtree.c   |   9 +
 src/backend/access/nbtree/nbtsort.c  |  61 +-
 src/backend/access/nbtree/nbtutils.c |  24 +++
 src/backend/access/spgist/spginsert.c|   2 +-
 src/backend/access/spgist/spgutils.c |   1 +
 src/backend/catalog/index.c  |  67 ++-
 src/backend/catalog/system_views.sql |  27 +++
 src/backend/commands/indexcmds.c |  72 +++-
 src/backend/storage/ipc/standby.c|   2 +-
 src/backend/storage/lmgr/lmgr.c  |  46 -
 src/backend/storage/lmgr/lock.c  |   7 +-
 src/backend/utils/adt/amutils.c  |  23 +++
 src/backend/utils/adt/pgstatfuncs.c  |   2 +
 src/include/access/amapi.h   |   4 +
 src/include/access/genam.h   |   1 +
 src/include/access/nbtree.h  |  11 ++
 src/include/access/tableam.h |  10 +
 src/include/catalog/pg_proc.dat  |  10 +-
 src/include/commands/progress.h  |  37 +++-
 src/include/pgstat.h |   5 +-
 src/include/storage/lmgr.h   |   4 +-
 src/include/storage/lock.h   |   2 +-
 src/test/regress/expected/rules.out  |  30 ++-
 35 files changed, 740 insertions(+), 46 deletions(-)

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index 9ecb1999e34..b55178328c8 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -566,7 +566,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
 			 RelationGetRelationName(state->rel),
 			 RelationGetRelationName(state->heaprel));
 
-		table_index_build_scan(state->heaprel, state->rel, indexinfo, true,
+		table_index_build_scan(state->heaprel, state->rel, indexinfo, true, false,
 			   bt_tuple_present_callback, (void *) state, scan);
 
 		ereport(DEBUG1,
diff --git a/contrib/bloom/blinsert.c b/contrib/bloom/blinsert.c
index 1b8df7e1e84..48f35d39990 100644
--- a/contrib/bloom/blinsert.c
+++ b/contrib/bloom/blinsert.c
@@ -142,7 +142,7 @@ blbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	initCachedPage();
 
 	/* Do the heap scan */
-	reltuples = table_index_build_scan(heap, index, indexInfo, true,
+	reltuples = table_index_build_scan(heap, index, indexInfo, true, false,
 	   bloomBuildCallback, (void *) ,
 	   NULL);
 
diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index d078dfbd469..ee3bd562748 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -132,6 +132,7 @@ blhandler(PG_FUNCTION_ARGS)
 	amroutine->amcostestimate = blcostestimate;
 	amroutine->amoptions = bloptions;
 	amroutine->amproperty = NULL;
+	amroutine->ambuildphasename = NULL;
 	amroutine->amvalidate = blvalidate;
 	amroutine->ambeginscan = blbeginscan;
 	amroutine->amrescan = blrescan;
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index b56d3b3daa1..ff8290da9ff 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -127,6 +127,7 @@ typedef struct IndexAmRoutine
 amcostestimate_function amcostestimate;
 amoptions_function amoptions;
 amproperty_function amproperty; /* can be NULL */
+ambuildphasename_function ambuildphasename;   /* can be NULL */
 amvalidate_function amvalidate;
 ambeginscan_function ambeginscan;

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-29 Thread Alvaro Herrera

On 2019-Mar-29, Andres Freund wrote:

> Hi,
> 
> On 2019-03-29 12:02:18 -0300, Alvaro Herrera wrote:
> > I just noticed that the CLUSTER calls index_build, which my patch
> > modifies to include additional progress metrics; this means that during
> > the index build phase, the metrics set by CLUSTER will be trashed by the
> > ones my patch introduces.
> 
> Yea, it really seems that the index build infrastructure needs to
> support cooperating with the caller's progress reporting. For CLUSTER,
> REINDEX, ALTER TABLE rewrites etc, they all would likely want to have
> insight into the index build while also having their own progress.

So, CLUSTER and ALTER TABLE rewrites only do non-concurrent index
builds; and REINDEX can reuse pretty much the same wait-for metrics
columns as CIC.  So I think it's okay if I move only the metrics that
conflict for index_build.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-29 Thread Alvaro Herrera

On 2019-Mar-29, Alvaro Herrera wrote:

> I just noticed that the CLUSTER calls index_build, which my patch
> modifies to include additional progress metrics; this means that during
> the index build phase, the metrics set by CLUSTER will be trashed by the
> ones my patch introduces.

Indeed:

  pid  | datid | datname  | relid | command |phase| 
cluster_index_relid | heap_tuples_scanned | heap_tuples_written | 
heap_blks_total | heap_blks_scanned | index_rebuild_count 
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 |  162402 |  162402 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 |  460362 |  460362 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 |  754004 |  754004 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 1047058 | 1047058 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 1356296 | 1356296 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 1645321 | 1645321 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 1939920 | 1939920 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 2227450 | 2227450 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 2526116 | 2526116 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 2828468 | 2828468 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 3142982 | 3142982 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 3451494 | 3451494 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 3769799 | 3769799 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 4077513 | 4077513 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 4383255 | 4383255 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 4700286 | 4700286 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 5015468 | 5015468 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 5324951 | 5324951 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 5628172 | 5628172 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 5940862 | 5940862 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 6253778 | 6253778 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 6560474 | 6560474 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 6881248 | 6881248 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER |

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-29 Thread Andres Freund

Hi,

On 2019-03-29 12:02:18 -0300, Alvaro Herrera wrote:
> I just noticed that the CLUSTER calls index_build, which my patch
> modifies to include additional progress metrics; this means that during
> the index build phase, the metrics set by CLUSTER will be trashed by the
> ones my patch introduces.

Yea, it really seems that the index build infrastructure needs to
support cooperating with the caller's progress reporting. For CLUSTER,
REINDEX, ALTER TABLE rewrites etc, they all would likely want to have
insight into the index build while also having their own progress.

Greetings,

Andres Freund

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-29 Thread Alvaro Herrera

I just noticed that the CLUSTER calls index_build, which my patch
modifies to include additional progress metrics; this means that during
the index build phase, the metrics set by CLUSTER will be trashed by the
ones my patch introduces.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-28 Thread Robert Haas

On Thu, Mar 28, 2019 at 12:07 PM Simon Riggs  wrote:
> Seems more like our own labelling of the phases is responsible for that, 
> rather than it being a specific problem. The numbering should reflect the 
> ordinal executed step number. So if a VACUUM has required two sets of index 
> scanning, the heap scan phase (normally phase 3) should be labelled phase 6 
> when it occurs the second time, rather than "phase 3 again, doh" which 
> clearly doesn't work.

That would not be too simple to do with the infrastructure we have
available, I think.  Also, Alvaro's showed phase names like '3 of 8',
but if you regarded each set of index scans as a separate phase rather
than a repetition of a phase that had already happened, you wouldn't
know whether there were going to be 8 phases in total or some other
number, because you don't know how many times you're going to scan the
indexes.

I suggest that it makes sense to leave the phase numbers out of this
commit.  If someone wants to make a proposal for adding phase numbers
to the various commands that now support progress reporting as a
separate commit, then that can be discussed separately.

BTW, if we are going to do that, it might be best to put then in
separate view columns rather than making them part of the phase names.
Somebody might want to look for those phase names using SQL or
client-side logic and not have the logic get broken if we renumber the
phases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-28 Thread Simon Riggs

On Thu, 28 Mar 2019 at 15:39, Alvaro Herrera 
wrote:

> On 2019-Mar-28, Simon Riggs wrote:
>
> > On Thu, 28 Mar 2019 at 14:56, Alvaro Herrera 
> > wrote:
> >
> > > I have not reinstated phase numbers; I have Rahila's positive vote for
> > > them.  Do I hear any more votes on this issue?
> >
> > If there is a specific technical issue, I'd like to understand that more.
>
> There's no technical issue -- that's pretty straightforward.  Earlier
> versions of the patch had them, and removing them only meant editing
> strings in a couple of places.
>
> > If it is just a usability preference, then I say we should have numbers.
> >
> > Numbering is natural for people. If we say "It's currently doing phase
> > XYZ", they will say "Is that the 3rd phase?", we'll say "No, actually the
> > 5th", and then they will say "Why didn't you just number them?"
>
> There are eight phases.  If you run normal CREATE INDEX (not concurrent)
> then you get phases 1, then 3, done.  If you run CIC you get phases from
> 1 to 8.  Phase 3 "building index" has arbitrary subphases (they depend
> on AM) in both cases.
>

Maybe the AM won't know, but I don't think that matters. It's still useful
to know the difference between Phase 3.3 and Phase 3.33 and Phase 7.

The description only helps you if you understand what it means. If your AM
replies something many users wouldn't understand like "сортировка" or
"constructing triples", we still want to know where that step fits in the
overall sequence of steps.

> I think the lack of phase numbering comes from the fact that the first
> command we did (VACUUM) sometimes jumps backwards in phase numbers, so
> it would be a bit absurd from users's POV.
>

Seems more like our own labelling of the phases is responsible for that,
rather than it being a specific problem. The numbering should reflect the
ordinal executed step number. So if a VACUUM has required two sets of index
scanning, the heap scan phase (normally phase 3) should be labelled phase 6
when it occurs the second time, rather than "phase 3 again, doh" which
clearly doesn't work.

By the time VACUUM moves to its 2nd phase, which is normally thought of as
"Phase2 Index Scanning", we know how much of the table has been scanned, so
we really should be able to calculate how many more phases will be needed.
We also know how many AM sub-phases will be called for that step.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-28 Thread Alvaro Herrera

On 2019-Mar-28, Simon Riggs wrote:

> On Thu, 28 Mar 2019 at 14:56, Alvaro Herrera 
> wrote:
> 
> > I have not reinstated phase numbers; I have Rahila's positive vote for
> > them.  Do I hear any more votes on this issue?
> 
> If there is a specific technical issue, I'd like to understand that more.

There's no technical issue -- that's pretty straightforward.  Earlier
versions of the patch had them, and removing them only meant editing
strings in a couple of places.

> If it is just a usability preference, then I say we should have numbers.
> 
> Numbering is natural for people. If we say "It's currently doing phase
> XYZ", they will say "Is that the 3rd phase?", we'll say "No, actually the
> 5th", and then they will say "Why didn't you just number them?"

There are eight phases.  If you run normal CREATE INDEX (not concurrent)
then you get phases 1, then 3, done.  If you run CIC you get phases from
1 to 8.  Phase 3 "building index" has arbitrary subphases (they depend
on AM) in both cases.

I think the lack of phase numbering comes from the fact that the first
command we did (VACUUM) sometimes jumps backwards in phase numbers, so
it would be a bit absurd from users's POV.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-28 Thread Simon Riggs

On Thu, 28 Mar 2019 at 14:56, Alvaro Herrera 
wrote:

> I have not reinstated phase numbers; I have Rahila's positive vote for
> them.  Do I hear any more votes on this issue?
>

If there is a specific technical issue, I'd like to understand that more.
If it is just a usability preference, then I say we should have numbers.

Numbering is natural for people. If we say "It's currently doing phase
XYZ", they will say "Is that the 3rd phase?", we'll say "No, actually the
5th", and then they will say "Why didn't you just number them?"

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-28 Thread Alvaro Herrera

On 2019-Mar-10, David Fetter wrote:

> Would it be a very large lift to report progress for the rest of the
> index types we support?

Patch v7 I just posted does that.  Please give it a look and let me know
what you think.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-28 Thread Alvaro Herrera

Here's v7.  This is rebased on top of yesterday's tableam commit
reworking the index build API (thanks Rahila for letting me know it had
already rot).  No changes otherwise.  Got rid of 0001 because the
tableam changes made that unnecessary.  (Each new table AM will have to
include its own progress reporting for index builds in its
index_build_range_scan method, which is sensible.)

Patch 0003 now takes care of all the AMs.  This supports the index build
phase as well as the index-validate-heapscan for CONCURRENTLY builds;
the indexscan scan there is not reported, which is just a small portion
of the index build so I don't feel bad about that; it can probably be
added with just a two-line patch on each AM's ambulkdelete method as a
subsequent patch.

I have not reinstated phase numbers; I have Rahila's positive vote for
them.  Do I hear any more votes on this issue?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 12ebc1e6bd165e6d6c1dad83d2e458dd76367e78 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 25 Mar 2019 13:29:34 -0300
Subject: [PATCH v7 1/2] Report progress of CREATE INDEX operations

---
 contrib/amcheck/verify_nbtree.c  |   2 +-
 contrib/bloom/blinsert.c |   2 +-
 contrib/bloom/blutils.c  |   1 +
 doc/src/sgml/indexam.sgml|  13 ++
 doc/src/sgml/monitoring.sgml | 224 ++-
 src/backend/access/brin/brin.c   |   5 +-
 src/backend/access/gin/gininsert.c   |   2 +-
 src/backend/access/gin/ginutil.c |   1 +
 src/backend/access/gist/gist.c   |   1 +
 src/backend/access/gist/gistbuild.c  |   2 +-
 src/backend/access/hash/hash.c   |   3 +-
 src/backend/access/heap/heapam_handler.c |  75 
 src/backend/access/nbtree/nbtree.c   |   9 +
 src/backend/access/nbtree/nbtsort.c  |  61 +-
 src/backend/access/nbtree/nbtutils.c |  24 +++
 src/backend/access/spgist/spginsert.c|   2 +-
 src/backend/access/spgist/spgutils.c |   1 +
 src/backend/catalog/index.c  |  67 ++-
 src/backend/catalog/system_views.sql |  27 +++
 src/backend/commands/indexcmds.c |  52 +-
 src/backend/storage/ipc/standby.c|   2 +-
 src/backend/storage/lmgr/lmgr.c  |  46 -
 src/backend/storage/lmgr/lock.c  |   7 +-
 src/backend/utils/adt/amutils.c  |  23 +++
 src/backend/utils/adt/pgstatfuncs.c  |   2 +
 src/include/access/amapi.h   |   4 +
 src/include/access/genam.h   |   1 +
 src/include/access/nbtree.h  |  11 ++
 src/include/access/tableam.h |  10 +
 src/include/catalog/pg_proc.dat  |  10 +-
 src/include/commands/progress.h  |  35 
 src/include/pgstat.h |   5 +-
 src/include/storage/lmgr.h   |   4 +-
 src/include/storage/lock.h   |   2 +-
 src/test/regress/expected/rules.out  |  30 ++-
 35 files changed, 728 insertions(+), 38 deletions(-)

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index 9ecb1999e34..b55178328c8 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -566,7 +566,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
 			 RelationGetRelationName(state->rel),
 			 RelationGetRelationName(state->heaprel));
 
-		table_index_build_scan(state->heaprel, state->rel, indexinfo, true,
+		table_index_build_scan(state->heaprel, state->rel, indexinfo, true, false,
 			   bt_tuple_present_callback, (void *) state, scan);
 
 		ereport(DEBUG1,
diff --git a/contrib/bloom/blinsert.c b/contrib/bloom/blinsert.c
index 1b8df7e1e84..48f35d39990 100644
--- a/contrib/bloom/blinsert.c
+++ b/contrib/bloom/blinsert.c
@@ -142,7 +142,7 @@ blbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	initCachedPage();
 
 	/* Do the heap scan */
-	reltuples = table_index_build_scan(heap, index, indexInfo, true,
+	reltuples = table_index_build_scan(heap, index, indexInfo, true, false,
 	   bloomBuildCallback, (void *) ,
 	   NULL);
 
diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index d078dfbd469..ee3bd562748 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -132,6 +132,7 @@ blhandler(PG_FUNCTION_ARGS)
 	amroutine->amcostestimate = blcostestimate;
 	amroutine->amoptions = bloptions;
 	amroutine->amproperty = NULL;
+	amroutine->ambuildphasename = NULL;
 	amroutine->amvalidate = blvalidate;
 	amroutine->ambeginscan = blbeginscan;
 	amroutine->amrescan = blrescan;
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index b56d3b3daa1..ff8290da9ff 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -127,6 +127,7 @@ typedef struct IndexAmRoutine
 amcostestimate_function amcostestimate;
 amoptions_function amoptions;

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-27 Thread Rahila Syed

On Mon, 25 Mar 2019 at 22:23, Alvaro Herrera 
wrote:

> Here's v6 of this patch.  I have rebased on top of today's CLUSTER
> monitoring, as well as on table AM commits.  The latter caused a bit of
> trouble, as now the number of blocks processed by a scan is not as easy
> to get as before; I added a new entry point heapscan_get_blocks_done on
> heapam.c to help with that.  (I suppose this will need some fixups later
> on.)
>
> I removed the "M of N" phase labels that Robert didn't like; those were
> suggested by Rahila and upvoted by Amit L.  I'm of two minds about
> those.  If you care about those and want them back, please speak up.
>
> I see value in reporting those numbers because it gives user insight into
where
we are at in the whole process without having to refer to documentation or
code.
Besides here also we are reporting facts as we follow for other metrics.

I agree that it will be most effective if the phases are carried out in
succession
which is not the case every time and its relevance varies for each command
as mentioned upthread by Alvaro and Robert. But I feel as long as we have in
the documentation that some phases overlap, some are mutually exclusive
hence
may be skipped etc. reporting `phase number versus total phases` does
provide
valuable information.
We are able to give user a whole picture in addition to reporting progress
within phases.

Thank you,
-- 
Rahila Syed
Performance Engineer
2ndQuadrant
http://www.2ndQuadrant.com 
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-25 Thread Alvaro Herrera

Hi

On 2019-Mar-25, Andres Freund wrote:

> I've not followed this thread at all, so I might just be out of my depth
> here. From my POV, a later patch in the yet-to-be-applied patchqueue
> moves the main part of cluster below the table AM (as there's enough low
> level details, e.g. dealing with HOT). Therefore I don't have a problem
> having heap's implementation interrogate the scan in a heap specific
> manner.
> 
> Is that the angle you were wondering about? If not, any chance to point
> out more precisely what to look at?
>
> Obviously out of pure laziness, I'd prefer this to go in after my move
> of index creation scans & cluster below tableam.h. But admittedly,
> managing my exhaustion isn't the the sole goal of the project

Well, this is create index rather than cluster, but yes this conflicts
pretty heavily with patch 0008 in your v21 at
20190324031630.nt7numguo5ojq...@alap3.anarazel.de.  I wonder if I should
rather push and help merge your 0008, or wait until you push and deal
with it afterwards.  I'd rather do the former, I think.

Anyway I was thinking about the conceptual angle -- the progress
monitoring stuff is doing block-based reporting.  I think even if we get
a non-block-based heap, we can still report the number of physical
blocks already processed by the scan, which is what the index build
monitoring is interested in showing the user.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-25 Thread Andres Freund

Hi,

On 2019-03-25 23:11:00 -0300, Alvaro Herrera wrote:
> On 2019-Mar-25, Alvaro Herrera wrote:
> 
> > Here's v6 of this patch.  I have rebased on top of today's CLUSTER
> > monitoring, as well as on table AM commits.  The latter caused a bit of
> > trouble, as now the number of blocks processed by a scan is not as easy
> > to get as before; I added a new entry point heapscan_get_blocks_done on
> > heapam.c to help with that.  (I suppose this will need some fixups later
> > on.)
> 
> Andres, I suppose you have something to say about patch 0001 here?

I've not followed this thread at all, so I might just be out of my depth
here. From my POV, a later patch in the yet-to-be-applied patchqueue
moves the main part of cluster below the table AM (as there's enough low
level details, e.g. dealing with HOT). Therefore I don't have a problem
having heap's implementation interrogate the scan in a heap specific
manner.

Is that the angle you were wondering about? If not, any chance to point
out more precisely what to look at?

Obviously out of pure laziness, I'd prefer this to go in after my move
of index creation scans & cluster below tableam.h. But admittedly,
managing my exhaustion isn't the the sole goal of the project

Greetings,

Andres Freund

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-25 Thread Alvaro Herrera

On 2019-Mar-25, Alvaro Herrera wrote:

> Here's v6 of this patch.  I have rebased on top of today's CLUSTER
> monitoring, as well as on table AM commits.  The latter caused a bit of
> trouble, as now the number of blocks processed by a scan is not as easy
> to get as before; I added a new entry point heapscan_get_blocks_done on
> heapam.c to help with that.  (I suppose this will need some fixups later
> on.)

Andres, I suppose you have something to say about patch 0001 here?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-25 Thread Amit Langote

On 2019/03/26 1:53, Alvaro Herrera wrote:
> Here's v6 of this patch.  I have rebased on top of today's CLUSTER
> monitoring, as well as on table AM commits.  The latter caused a bit of
> trouble, as now the number of blocks processed by a scan is not as easy
> to get as before; I added a new entry point heapscan_get_blocks_done on
> heapam.c to help with that.  (I suppose this will need some fixups later
> on.)
> 
> I removed the "M of N" phase labels that Robert didn't like; those were
> suggested by Rahila and upvoted by Amit L.  I'm of two minds about
> those.  If you care about those and want them back, please speak up.

On second thought, I'm neutral on it too.

Thanks,
Amit

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-25 Thread Alvaro Herrera

Here's v6 of this patch.  I have rebased on top of today's CLUSTER
monitoring, as well as on table AM commits.  The latter caused a bit of
trouble, as now the number of blocks processed by a scan is not as easy
to get as before; I added a new entry point heapscan_get_blocks_done on
heapam.c to help with that.  (I suppose this will need some fixups later
on.)

I removed the "M of N" phase labels that Robert didn't like; those were
suggested by Rahila and upvoted by Amit L.  I'm of two minds about
those.  If you care about those and want them back, please speak up.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From ccf8a84f785c96ac7ae01e17b4fb7999183fed9e Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 25 Mar 2019 13:29:34 -0300
Subject: [PATCH v6 1/3] implement heapscan_get_blocks_done

---
 src/backend/access/heap/heapam.c | 34 
 src/include/access/heapam.h  |  1 +
 2 files changed, 35 insertions(+)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fa747be73ab..a7f6c09dcd0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1053,6 +1053,40 @@ heapgettup_pagemode(HeapScanDesc scan,
 	}
 }
 
+/*
+ * Return the number of blocks that have been read by this scan since
+ * starting.  This is not 100% accurate, because in a parallel scan the
+ * counter can be moving concurrently.
+ */
+BlockNumber
+heapscan_get_blocks_done(TableScanDesc sscan)
+{
+	HeapScanDesc	hscan = (HeapScanDesc) sscan;
+	BlockNumber		startblock;
+	BlockNumber		blocks_done;
+
+	if (hscan->rs_base.rs_parallel != NULL)
+	{
+		ParallelBlockTableScanDesc bpscan;
+
+		bpscan = (ParallelBlockTableScanDesc) hscan->rs_base.rs_parallel;
+		startblock = bpscan->phs_startblock;
+	}
+	else
+		startblock = hscan->rs_startblock;
+
+	/*
+	 * Might have wrapped around the end of the relation, if startblock was
+	 * not zero.
+	 */
+	if (hscan->rs_cblock > startblock)
+		blocks_done = hscan->rs_cblock - startblock;
+	else
+		blocks_done = hscan->rs_nblocks - startblock +
+			hscan->rs_cblock;
+
+	return blocks_done;
+}
 
 #if defined(DISABLE_COMPLEX_MACRO)
 /*
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e72f9787517..d4f537b3791 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -118,6 +118,7 @@ extern TableScanDesc heap_beginscan(Relation relation, Snapshot snapshot,
 extern void heap_setscanlimits(TableScanDesc scan, BlockNumber startBlk,
    BlockNumber endBlk);
 extern void heapgetpage(TableScanDesc scan, BlockNumber page);
+extern BlockNumber heapscan_get_blocks_done(TableScanDesc sscan);
 extern void heap_rescan(TableScanDesc scan, ScanKey key, bool set_params,
 			bool allow_strat, bool allow_sync, bool allow_pagemode);
 extern void heap_rescan_set_params(TableScanDesc scan, ScanKey key,
-- 
2.17.1

>From 7adb314a8be321084c7f7255cefc53eaa8c2ab0e Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 2 Jan 2019 16:14:39 -0300
Subject: [PATCH v6 2/3] Report progress of CREATE INDEX operations

---
 contrib/amcheck/verify_nbtree.c   |   2 +-
 contrib/bloom/blinsert.c  |   2 +-
 contrib/bloom/blutils.c   |   1 +
 doc/src/sgml/indexam.sgml |  13 ++
 doc/src/sgml/monitoring.sgml  | 224 +-
 src/backend/access/brin/brin.c|   5 +-
 src/backend/access/gin/gininsert.c|   2 +-
 src/backend/access/gin/ginutil.c  |   1 +
 src/backend/access/gist/gist.c|   1 +
 src/backend/access/gist/gistbuild.c   |   2 +-
 src/backend/access/hash/hash.c|   3 +-
 src/backend/access/nbtree/nbtree.c|   9 ++
 src/backend/access/nbtree/nbtsort.c   |  58 ++-
 src/backend/access/nbtree/nbtutils.c  |  24 +++
 src/backend/access/spgist/spginsert.c |   2 +-
 src/backend/access/spgist/spgutils.c  |   1 +
 src/backend/catalog/index.c   | 118 +-
 src/backend/catalog/system_views.sql  |  27 
 src/backend/commands/indexcmds.c  |  52 +-
 src/backend/storage/ipc/standby.c |   2 +-
 src/backend/storage/lmgr/lmgr.c   |  46 +-
 src/backend/storage/lmgr/lock.c   |   7 +-
 src/backend/utils/adt/amutils.c   |  23 +++
 src/backend/utils/adt/pgstatfuncs.c   |   2 +
 src/include/access/amapi.h|   4 +
 src/include/access/genam.h|   1 +
 src/include/access/nbtree.h   |  11 ++
 src/include/catalog/index.h   |   2 +
 src/include/catalog/pg_proc.dat   |  10 +-
 src/include/commands/progress.h   |  35 
 src/include/pgstat.h  |   5 +-
 src/include/storage/lmgr.h|   4 +-
 src/include/storage/lock.h|   2 +-
 src/test/regress/expected/rules.out   |  30 +++-
 34 files changed, 693 insertions(+), 38 deletions(-)

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-11 Thread Alvaro Herrera

On 2019-Mar-11, Robert Haas wrote:

> On Mon, Mar 11, 2019 at 3:43 PM Alvaro Herrera  
> wrote:
> > > Huh.  Well, that's another option, but then what do we do if the
> > > number of phases is not a constant?
> >
> > Well, why do we care?  "Some phases might be skipped".
> 
> It seems pretty confusing.  I mean, in the case of the CLUSTER patch,
> you're either going to seq-scan the table or index-scan the table.
> Those are (at last check) reported using different phase numbers, but
> they are mutually exclusive.  Generally, if you are going to do either
> foo -> bar -> baz -> quux or foo -> bletch -> quux, how many phases
> are there total?  5?  4?

Hmm.  Your argument is not entirely devoid of merit, but I'm not 100%
convinced either.

So, in CLUSTER, the phases in the middle section are exclusive of one
another.  You do bar and baz, or you do bletch.  But you never do bar
and bletch, or bletch and baz, or bar on isolation or baz on isolation.
Furthermore, the counting of phases depends on internal system state
(optimizer output), not on the user's input.

In CREATE INDEX, it's not exactly the same.  You either have a
complicated 8-phase system (CREATE INDEX CONCURRENTLY) or just a
two-phase system.  The phases for the second case are a strict subset of
the cases in the first case.  Whether to use one or the other phase
sequence is entirely up to the user.

On the other hand, the subphase numbers vary per AM (but I expect
they're always the same for any one AM.)

To me, it's not a big deal, but if we don't put the number in the phase
name, then we force users to keep the reference documentation open every
time they create an index.

I'm not wed to anything in this area, TBH.  My first patch had no phase
numbers and I added them because of reviewer feedback.  I do agree we
should be consistent ... but on the other hand, each case is a bit
different: consider VACUUM, which goes back to phase 2 after doing phase
3 for a while.  You don't have that behavior for either CLUSTER or
CREATE INDEX.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-11 Thread Robert Haas

On Mon, Mar 11, 2019 at 3:43 PM Alvaro Herrera  wrote:
> > Huh.  Well, that's another option, but then what do we do if the
> > number of phases is not a constant?
>
> Well, why do we care?  "Some phases might be skipped".

It seems pretty confusing.  I mean, in the case of the CLUSTER patch,
you're either going to seq-scan the table or index-scan the table.
Those are (at last check) reported using different phase numbers, but
they are mutually exclusive.  Generally, if you are going to do either
foo -> bar -> baz -> quux or foo -> bletch -> quux, how many phases
are there total?  5?  4?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-11 Thread Alvaro Herrera

On 2019-Mar-11, Robert Haas wrote:

> On Mon, Mar 11, 2019 at 3:26 PM Alvaro Herrera  
> wrote:
> > Oh.  That's easily removed.  Though I have to say that other people said
> > that they liked it so much that they would have liked to have it in the
> > original VACUUM one too 
> > (5ba2b281-9c84-772a-cf37-17780d782...@lab.ntt.co.jp).
> 
> Huh.  Well, that's another option, but then what do we do if the
> number of phases is not a constant?

Well, why do we care?  "Some phases might be skipped".

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-11 Thread Robert Haas

On Mon, Mar 11, 2019 at 3:26 PM Alvaro Herrera  wrote:
> Oh.  That's easily removed.  Though I have to say that other people said
> that they liked it so much that they would have liked to have it in the
> original VACUUM one too (5ba2b281-9c84-772a-cf37-17780d782...@lab.ntt.co.jp).

Huh.  Well, that's another option, but then what do we do if the
number of phases is not a constant?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-11 Thread Alvaro Herrera

On 2019-Mar-11, Robert Haas wrote:

> On Mon, Mar 11, 2019 at 3:18 PM Alvaro Herrera  
> wrote:
> > On 2019-Mar-11, Robert Haas wrote:
> > > I don't think that I much like this (3 of 8) and (2 of 5) stuff.  It's
> > > inconsistent with what we've got already and it doesn't add much.
> > > Someone who wants to know which phase it is can look at the underlying
> > > numbers directly instead of going through the view, but most people
> > > probably won't care, and given that the phases may be of dramatically
> > > unequal length, I don't think it's adding much.
> > >
> > > Another reason why I think this is a bad idea is that there may be
> > > some operations where we don't transit all the phases in all cases;
> > > the pending patch for CLUSTER progress reporting works like that.
> >
> > What part of it don't you like?  Is it the fact that we have phase
> > numbers in the phase name?  Is it the fact that we count total phases?
> > Is it that we have two numbers being current (phase + subphase)?
> 
> that you have phase numbers in the phase name

Oh.  That's easily removed.  Though I have to say that other people said
that they liked it so much that they would have liked to have it in the
original VACUUM one too (5ba2b281-9c84-772a-cf37-17780d782...@lab.ntt.co.jp).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-11 Thread Robert Haas

On Mon, Mar 11, 2019 at 3:18 PM Alvaro Herrera  wrote:
> On 2019-Mar-11, Robert Haas wrote:
> > I don't think that I much like this (3 of 8) and (2 of 5) stuff.  It's
> > inconsistent with what we've got already and it doesn't add much.
> > Someone who wants to know which phase it is can look at the underlying
> > numbers directly instead of going through the view, but most people
> > probably won't care, and given that the phases may be of dramatically
> > unequal length, I don't think it's adding much.
> >
> > Another reason why I think this is a bad idea is that there may be
> > some operations where we don't transit all the phases in all cases;
> > the pending patch for CLUSTER progress reporting works like that.
>
> What part of it don't you like?  Is it the fact that we have phase
> numbers in the phase name?  Is it the fact that we count total phases?
> Is it that we have two numbers being current (phase + subphase)?

that you have phase numbers in the phase name

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-11 Thread Alvaro Herrera

On 2019-Mar-11, Robert Haas wrote:

> I don't think that I much like this (3 of 8) and (2 of 5) stuff.  It's
> inconsistent with what we've got already and it doesn't add much.
> Someone who wants to know which phase it is can look at the underlying
> numbers directly instead of going through the view, but most people
> probably won't care, and given that the phases may be of dramatically
> unequal length, I don't think it's adding much.
> 
> Another reason why I think this is a bad idea is that there may be
> some operations where we don't transit all the phases in all cases;
> the pending patch for CLUSTER progress reporting works like that.

What part of it don't you like?  Is it the fact that we have phase
numbers in the phase name?  Is it the fact that we count total phases?
Is it that we have two numbers being current (phase + subphase)?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-11 Thread Robert Haas

On Mon, Mar 11, 2019 at 8:41 AM Alvaro Herrera  wrote:
> > I wonder how is the phase 'building index(3 of 8): initializing(1/5)' when
> > the blocks_done count is increasing. Shouldn't it have
> > changed to reflect  PROGRESS_BTREE_PHASE_INDEXBUILD_HEAPSCAN as building
> > index(3 of 8): table scan(2/5) ?
> > Although I think it has been rectified in the latest patch as I now get
> > 'table scan' phase in output as I do CREATE INDEX on table with 100
> > records
>
> Yeah, this was a bug that I fixed in v5.  (It was a misunderstanding
> about how parallel scanning is set up, IIRC).  For v5, I tested both
> parallel and non-parallel builds, with and without sync seqscans, and
> everything seemed to behave correctly.
>
>
> Thanks for looking!  I intend to post a new version later this week.

I don't think that I much like this (3 of 8) and (2 of 5) stuff.  It's
inconsistent with what we've got already and it doesn't add much.
Someone who wants to know which phase it is can look at the underlying
numbers directly instead of going through the view, but most people
probably won't care, and given that the phases may be of dramatically
unequal length, I don't think it's adding much.

Another reason why I think this is a bad idea is that there may be
some operations where we don't transit all the phases in all cases;
the pending patch for CLUSTER progress reporting works like that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-11 Thread Rahila Syed

Hi Alvaro,

On Tue, 5 Mar 2019 at 08:32, Alvaro Herrera 
wrote:

> Hi Rahila,
>
> Thanks for looking.
>
> On 2019-Mar-04, Rahila wrote:
>
> > 1. Thank you for incorporating review comments.
> > Can you please rebase the latest 0001-Report-progress-of-
> > CREATE-INDEX-operations.patch on master? Currently it does not apply on
> > 754b90f657bd54b482524b73726dae4a9165031c
>
> Hmm, rebased to current master.  Didn't conflict at all when rebasing,
> so it's strange that it didn't apply.

Thanks for updating the patch. Sorry, I think it wasn't that the patch
needed rebasing but
I failed to apply it correctly last time. I can apply it now.

> +extern char *btbuildphasename(int64 phasenum);

1. I think int64 is too large a datatype for phasenum.
Also int32 is used for phasenum in  pg_indexam_progress_phasename().
Can we have it as int8?

2.

>   if ((previous_blkno == InvalidBlockNumber) ||

+   (scan->rs_cblock != previous_blkno))

+   {

+
> pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_DONE,

+
> scan->rs_cblock);

+   previous_blkno = scan->rs_cblock;

+   }

. In validate_index_heapscan, we dont calculate blocks_done similar to
IndexBuildHeapScan i.e
block_done += scan->rs_cblock - previous_blkno which IMO is more accurate.
Reporting scan->rs_cblock as blocks_done might give slightly inaccurate
results as we are
still processing that block.

3. There is no if(progress) check in validate_index()/
validate_index_heapscan() code. Wont it be a problem if it is called from
other
index methods which dont support reporting progress at the moment?

4.  Just to clarify my understanding can you please see below comment

Quoting your following comment in cluster command progress monitor thread
while referring to progress reporting from IndexBuildHeapScan,

"One, err, small issue with that idea is that we need the param numbers
not to conflict for any "progress update providers" that are to be used
simultaneously by any command."

Does that mean that we can't have any other INDEX progress monitoring, use
PROGRESS_SCAN_BLOCKS_TOTAL and PROGRESS_SCAN_BLOCKS_DONE
parameter numbers to report anything but the metrics they report now.

5.

> 15:56:44.682 | building index (3 of 8): initializing (1/5)|
>442478 |  442399 |0 |   0 |0 |
>  0

15:56:44.694 | building index (3 of 8): initializing (1/5)|
>442478 |  442399 |0 |   0 |0 |
>  0

15:56:44.705 | building index (3 of 8): sorting tuples, spool 1 (3/5) |
>442478 |  442399 |1 |   0 |0 |
>  0

15:56:44.716 | building index (3 of 8): sorting tuples, spool 1 (3/5) |
>442478 |  442399 |1 |   0 |0 |
>  0

I wonder how is the phase 'building index(3 of 8): initializing(1/5)' when
the blocks_done count is increasing. Shouldn't it have
changed to reflect  PROGRESS_BTREE_PHASE_INDEXBUILD_HEAPSCAN as building
index(3 of 8): table scan(2/5) ?
Although I think it has been rectified in the latest patch as I now get
'table scan' phase in output as I do CREATE INDEX on table with 100
records

Thank you,
.--
Rahila Syed
Performance Engineer
2ndQuadrant
http://www.2ndQuadrant.com 
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-11 Thread Alvaro Herrera

Hi Rahila

On 2019-Mar-11, Rahila Syed wrote:

> On Tue, 5 Mar 2019 at 08:32, Alvaro Herrera 
> wrote:

> > +extern char *btbuildphasename(int64 phasenum);
> 
> 1. I think int64 is too large a datatype for phasenum.
> Also int32 is used for phasenum in  pg_indexam_progress_phasename().
> Can we have it as int8?

It does look strange, I agree, and the first code I wrote had it using a
smaller type.  However, I later realized that since the value comes
directly from pg_stat_get_progress_info(), which returns int8 values, it
was pointless to only accept a small fraction of the possible values for
no good reason, so I widened it to int64 as you see now.

> 2.
> . In validate_index_heapscan, we dont calculate blocks_done similar to
> IndexBuildHeapScan i.e
> block_done += scan->rs_cblock - previous_blkno which IMO is more accurate.
> Reporting scan->rs_cblock as blocks_done might give slightly inaccurate
> results as we are
> still processing that block.

Thanks for pointing out that there's an off-by-one bug there (should be
cblock-1).  However, IndexBuildHeapScan uses more complicated code
because it needs to cover for two additional things that
validate_index_heapscan doesn't: parallel heapscans and synchronized
seqscans.  We could do that, I just saw no point in it.

> 3. There is no if(progress) check in validate_index()/
> validate_index_heapscan() code. Wont it be a problem if it is called from
> other index methods which dont support reporting progress at the
> moment?

Good question.  I'll have a look.  Most likely, I'll end up having
things so that building an index using an unsupported index AM reports
progress based on IndexBuildHeapScan / validate_index /
validate_index_heapscan ... which might mean I should remove the
'progress' parameter from there and have them report unconditionally.

> 4.  Just to clarify my understanding can you please see below comment
> 
> Quoting your following comment in cluster command progress monitor thread
> while referring to progress reporting from IndexBuildHeapScan,
> 
> "One, err, small issue with that idea is that we need the param numbers
> not to conflict for any "progress update providers" that are to be used
> simultaneously by any command."
> 
> Does that mean that we can't have any other INDEX progress monitoring, use
> PROGRESS_SCAN_BLOCKS_TOTAL and PROGRESS_SCAN_BLOCKS_DONE
> parameter numbers to report anything but the metrics they report now.

What I mean is that the literal parameter numbers defined as
PROGRESS_SCAN_BLOCKS_DONE/TOTAL may not be used for other parameters by
commands that call IndexBuildHeapScan, if those other parameters are
used by the same commands simultaneously with IndexBuildHeapScan.  So
those parameter numbers become "reserved".

> 5.
> 
> > 15:56:44.682 | building index (3 of 8): initializing (1/5)|
> >442478 |  442399 |0 |   0 |0 |
> 
> I wonder how is the phase 'building index(3 of 8): initializing(1/5)' when
> the blocks_done count is increasing. Shouldn't it have
> changed to reflect  PROGRESS_BTREE_PHASE_INDEXBUILD_HEAPSCAN as building
> index(3 of 8): table scan(2/5) ?
> Although I think it has been rectified in the latest patch as I now get
> 'table scan' phase in output as I do CREATE INDEX on table with 100
> records

Yeah, this was a bug that I fixed in v5.  (It was a misunderstanding
about how parallel scanning is set up, IIRC).  For v5, I tested both
parallel and non-parallel builds, with and without sync seqscans, and
everything seemed to behave correctly.

Thanks for looking!  I intend to post a new version later this week.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-10 Thread David Fetter

On Mon, Mar 04, 2019 at 05:46:07PM -0300, Alvaro Herrera wrote:
> Hi Rahila,
> 
> Thanks for looking.
> 
> On 2019-Mar-04, Rahila wrote:
> 
> > 1. Thank you for incorporating review comments.
> > Can you please rebase the latest 0001-Report-progress-of-
> > CREATE-INDEX-operations.patch on master? Currently it does not apply on
> > 754b90f657bd54b482524b73726dae4a9165031c
> 
> Hmm, rebased to current master.  Didn't conflict at all when rebasing,
> so it's strange that it didn't apply.
> 
> > >   15:56:44.694 | building index (3 of 8): initializing (1/5)| 
> > >   442478 |  442399 |0 |   0 |
> > > 0 |   0
> > >   15:56:44.705 | building index (3 of 8): sorting tuples, spool 1 (3/5) | 
> > >   442478 |  442399 |1 |   0 |
> > > 0 |   0
> > >   15:56:44.716 | building index (3 of 8): sorting tuples, spool 1 (3/5) | 
> > >   442478 |  442399 |1 |   0 |
> > > 0 |   0
> > >   15:56:44.727 | building index (3 of 8): final btree sort & load (5/5) | 
> > >   442478 |  442399 |1 |   79057 |
> > > 0 |   0
> > >   15:56:44.738 | building index (3 of 8): final btree sort & load (5/5) | 
> > >   442478 |  442399 |1 |  217018 |
> > > 0 |   0
> > >   15:56:44.75  | building index (3 of 8): final btree sort & load (5/5) | 
> > >   442478 |  442399 |1 |  353804 |
> > > 0 |   0
> > 2. In the above report, even though we are reporting progress in terms of
> > tuples_done for final btree sort & load phase we have not cleared
> > the blocks_done entry from previous phases. I think this can be confusing as
> > the blocks_done does not correspond to the tuples_done in the final btree
> > sort & load phase.
> 
> Good point.  Done in the attached version, wherein I also added comments
> to explain the IndexBuildHeapScan API change.  I didn't change the hash
> AM implementation here.

Would it be a very large lift to report progress for the rest of the
index types we support?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-04 Thread Alvaro Herrera

Hi Rahila,

Thanks for looking.

On 2019-Mar-04, Rahila wrote:

> 1. Thank you for incorporating review comments.
> Can you please rebase the latest 0001-Report-progress-of-
> CREATE-INDEX-operations.patch on master? Currently it does not apply on
> 754b90f657bd54b482524b73726dae4a9165031c

Hmm, rebased to current master.  Didn't conflict at all when rebasing,
so it's strange that it didn't apply.

> >   15:56:44.694 | building index (3 of 8): initializing (1/5)|   
> > 442478 |  442399 |0 |   0 |0 |  
> >  0
> >   15:56:44.705 | building index (3 of 8): sorting tuples, spool 1 (3/5) |   
> > 442478 |  442399 |1 |   0 |0 |  
> >  0
> >   15:56:44.716 | building index (3 of 8): sorting tuples, spool 1 (3/5) |   
> > 442478 |  442399 |1 |   0 |0 |  
> >  0
> >   15:56:44.727 | building index (3 of 8): final btree sort & load (5/5) |   
> > 442478 |  442399 |1 |   79057 |0 |  
> >  0
> >   15:56:44.738 | building index (3 of 8): final btree sort & load (5/5) |   
> > 442478 |  442399 |1 |  217018 |0 |  
> >  0
> >   15:56:44.75  | building index (3 of 8): final btree sort & load (5/5) |   
> > 442478 |  442399 |1 |  353804 |0 |  
> >  0
> 2. In the above report, even though we are reporting progress in terms of
> tuples_done for final btree sort & load phase we have not cleared
> the blocks_done entry from previous phases. I think this can be confusing as
> the blocks_done does not correspond to the tuples_done in the final btree
> sort & load phase.

Good point.  Done in the attached version, wherein I also added comments
to explain the IndexBuildHeapScan API change.  I didn't change the hash
AM implementation here.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 52770a9255950b4da74579d7da212246d29a268b Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 2 Jan 2019 16:14:39 -0300
Subject: [PATCH v5 1/2] Report progress of CREATE INDEX operations

---
 contrib/amcheck/verify_nbtree.c   |   2 +-
 contrib/bloom/blinsert.c  |   2 +-
 contrib/bloom/blutils.c   |   1 +
 doc/src/sgml/indexam.sgml |  13 ++
 doc/src/sgml/monitoring.sgml  | 227 +-
 src/backend/access/brin/brin.c|   5 +-
 src/backend/access/gin/gininsert.c|   2 +-
 src/backend/access/gin/ginutil.c  |   1 +
 src/backend/access/gist/gist.c|   1 +
 src/backend/access/gist/gistbuild.c   |   2 +-
 src/backend/access/hash/hash.c|   3 +-
 src/backend/access/nbtree/nbtree.c|   9 +
 src/backend/access/nbtree/nbtsort.c   |  57 ++-
 src/backend/access/nbtree/nbtutils.c  |  24 +++
 src/backend/access/spgist/spginsert.c |   2 +-
 src/backend/access/spgist/spgutils.c  |   1 +
 src/backend/catalog/index.c   | 147 -
 src/backend/catalog/system_views.sql  |  27 +++
 src/backend/commands/indexcmds.c  |  52 +-
 src/backend/storage/ipc/standby.c |   2 +-
 src/backend/storage/lmgr/lmgr.c   |  46 +-
 src/backend/storage/lmgr/lock.c   |   7 +-
 src/backend/utils/adt/amutils.c   |  23 +++
 src/backend/utils/adt/pgstatfuncs.c   |   2 +
 src/include/access/amapi.h|   4 +
 src/include/access/genam.h|   1 +
 src/include/access/nbtree.h   |  11 ++
 src/include/catalog/index.h   |   2 +
 src/include/catalog/pg_proc.dat   |  10 +-
 src/include/commands/progress.h   |  36 
 src/include/pgstat.h  |   5 +-
 src/include/storage/lmgr.h|   4 +-
 src/include/storage/lock.h|   2 +-
 33 files changed, 695 insertions(+), 38 deletions(-)

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index 964200a7678..99f6ed6bc44 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -534,7 +534,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool readonly,
 			 RelationGetRelationName(state->rel),
 			 RelationGetRelationName(state->heaprel));
 
-		IndexBuildHeapScan(state->heaprel, state->rel, indexinfo, true,
+		IndexBuildHeapScan(state->heaprel, state->rel, indexinfo, true, false,
 		   bt_tuple_present_callback, (void *) state, scan);
 
 		ereport(DEBUG1,
diff --git a/contrib/bloom/blinsert.c b/contrib/bloom/blinsert.c
index e43fbe0005f..947ee74881f 100644
--- a/contrib/bloom/blinsert.c
+++ b/contrib/bloom/blinsert.c
@@ -141,7 +141,7 @@ blbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	initCachedPage();
 
 	/* Do the heap scan */
-	reltuples = IndexBuildHeapScan(heap, index, indexInfo, true,
+	reltuples = IndexBuildHeapScan(heap, index,

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-04 Thread Rahila


Hi Alvaro,

Resending the email as earlier one didn't get sent on pgsql-hackers.

On 2/23/19 3:24 AM, Alvaro Herrera wrote:

On 2019-Feb-13, Amit Langote wrote:


Doesn't the name amphasename sound a bit too generic, given that it can
only describe the phases of ambuild?  Maybe ambuildphase?

Hmm, yeah, maybe it does.  I renamed it "ambuildphasename", since it's
not about reporting the phase itself -- it's about translating the phase
number to the string that's reported to the user.

The attached patch does it that way.  Also, when an index build uses an
AM that doesn't support progress reporting, it no longer reports a NULL
phase name while building.  I also changed it to report the progress of
phase 7 (heap scan validation) using block numbers rather than tuple
counts.  I also tweaked the strings reported in the view.  They're
clearer now IMO.

One slight annoyance is that when parallel workers are used, the last
block number reported in phase 3/subphase 2 (IndexBuildHeapScan stuff)
is not necessarily accurate, since the tail of the table could well be
scanned by a worker that's not the leader, and we only report in the
leader when it gets a new block.

When the AM does not support progress reporting, everything stays as
zeros during the index build phase.  It's easy to notice how slow hash
indexes are to build compared to btrees this way!  Maybe it'd be
better fallback on reporting block numbers in IndexBuildHeapScan when
this happens.  Thoughts?

I added docs to the monitoring section -- that's the bulkiest part of
the patch.


1. Thank you for incorporating review comments.
Can you please rebase the latest 0001-Report-progress-of-
CREATE-INDEX-operations.patch on master? Currently it does not apply on 
754b90f657bd54b482524b73726dae4a9165031c

  15:56:44.694 | building index (3 of 8): initializing (1/5)|   
442478 |  442399 |0 |   0 |0 |  
 0
  15:56:44.705 | building index (3 of 8): sorting tuples, spool 1 (3/5) |   
442478 |  442399 |1 |   0 |0 |  
 0
  15:56:44.716 | building index (3 of 8): sorting tuples, spool 1 (3/5) |   
442478 |  442399 |1 |   0 |0 |  
 0
  15:56:44.727 | building index (3 of 8): final btree sort & load (5/5) |   
442478 |  442399 |1 |   79057 |0 |  
 0
  15:56:44.738 | building index (3 of 8): final btree sort & load (5/5) |   
442478 |  442399 |1 |  217018 |0 |  
 0
  15:56:44.75  | building index (3 of 8): final btree sort & load (5/5) |   
442478 |  442399 |1 |  353804 |0 |  
 0
  
2. In the above report, even though we are reporting progress in terms 
of tuples_done for final btree sort & load phase we have not cleared
the blocks_done entry from previous phases. I think this can be 
confusing as the blocks_done does not correspond to the tuples_done in 
the final btree sort & load phase.


Thank you,
Rahila Syed

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-04 Thread Rahila Syed

Hi Alvaro,


> On 2019-Feb-13, Amit Langote wrote:
>
> > Doesn't the name amphasename sound a bit too generic, given that it can
> > only describe the phases of ambuild?  Maybe ambuildphase?
>
> Hmm, yeah, maybe it does.  I renamed it "ambuildphasename", since it's
> not about reporting the phase itself -- it's about translating the phase
> number to the string that's reported to the user.
>
> The attached patch does it that way.  Also, when an index build uses an
> AM that doesn't support progress reporting, it no longer reports a NULL
> phase name while building.  I also changed it to report the progress of
> phase 7 (heap scan validation) using block numbers rather than tuple
> counts.  I also tweaked the strings reported in the view.  They're
> clearer now IMO.
>
> One slight annoyance is that when parallel workers are used, the last
> block number reported in phase 3/subphase 2 (IndexBuildHeapScan stuff)
> is not necessarily accurate, since the tail of the table could well be
> scanned by a worker that's not the leader, and we only report in the
> leader when it gets a new block.
>
> When the AM does not support progress reporting, everything stays as
> zeros during the index build phase.  It's easy to notice how slow hash
> indexes are to build compared to btrees this way!  Maybe it'd be
> better fallback on reporting block numbers in IndexBuildHeapScan when
> this happens.  Thoughts?
>
> I added docs to the monitoring section -- that's the bulkiest part of
> the patch.
>

1. Thank you for incorporating review comments.
Can you please rebase the latest
0001-Report-progress-of-CREATE-INDEX-operations.patch on master? Currently
it does not apply on 754b90f657bd54b482524b73726dae4a9165031c


>  15:56:44.694 | building index (3 of 8): initializing (1/5)|
>  442478 |  442399 |0 |   0 |0
> |   0
>  15:56:44.705 | building index (3 of 8): sorting tuples, spool 1 (3/5) |
>  442478 |  442399 |1 |   0 |0
> |   0
>  15:56:44.716 | building index (3 of 8): sorting tuples, spool 1 (3/5) |
>  442478 |  442399 |1 |   0 |0
> |   0
>  15:56:44.727 | building index (3 of 8): final btree sort & load (5/5) |
>  442478 |  442399 |1 |   79057 |0
> |   0
>

2. In the above report, even though we are reporting progress in terms of
tuples_done for final btree sort & load phase we have not cleared
the blocks_done entry from previous phases. I think this can be confusing
as the blocks_done does not correspond to the tuples_done in the current
phase.


-- 
Rahila Syed
Performance Engineer
2ndQuadrant
http://www.2ndQuadrant.com 
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-02-22 Thread Alvaro Herrera

Hmm, looks like a very bare-bones support for hash indexes does not
require a lot of code, and gives a clear picture (you can sit all night
watching the numbers go up, instead of biting your fingernails wondering
if it'll be completed by dawn.)  This part isn't 100% done -- it we
would better to have ambuildphasename support.

(I'm a bit confused about phase 5 not reporting anything for hash
indexes in CIC, though.  That's part is supposed to be AM agnostic.)

I think it was a mistake to define the progress constants in one header
file commands/progress.h and the associated functions in pgstat.h.  I
think it would be better to move the function decls to
commands/progress.h.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index fc7db5d6a13..cf7ec655044 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -22,9 +22,11 @@
 #include "access/hash_xlog.h"
 #include "access/relscan.h"
 #include "catalog/index.h"
+#include "commands/progress.h"
 #include "commands/vacuum.h"
 #include "miscadmin.h"
 #include "optimizer/plancat.h"
+#include "pgstat.h"
 #include "utils/builtins.h"
 #include "utils/index_selfuncs.h"
 #include "utils/rel.h"
@@ -160,8 +162,10 @@ hashbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	buildstate.heapRel = heap;
 
 	/* do the heap scan */
-	reltuples = IndexBuildHeapScan(heap, index, indexInfo, true, false,
+	reltuples = IndexBuildHeapScan(heap, index, indexInfo, true, true,
    hashbuildCallback, (void *) , NULL);
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_TUPLES_TOTAL,
+ buildstate.indtuples);
 
 	if (buildstate.spool)
 	{
diff --git a/src/backend/access/hash/hashsort.c b/src/backend/access/hash/hashsort.c
index 8c55436b193..00a57470a77 100644
--- a/src/backend/access/hash/hashsort.c
+++ b/src/backend/access/hash/hashsort.c
@@ -26,7 +26,9 @@
 #include "postgres.h"
 
 #include "access/hash.h"
+#include "commands/progress.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "utils/tuplesort.h"
 
 
@@ -116,6 +118,7 @@ void
 _h_indexbuild(HSpool *hspool, Relation heapRel)
 {
 	IndexTuple	itup;
+	long		tups_done = 0;
 #ifdef USE_ASSERT_CHECKING
 	uint32		hashkey = 0;
 #endif
@@ -141,5 +144,8 @@ _h_indexbuild(HSpool *hspool, Relation heapRel)
 #endif
 
 		_hash_doinsert(hspool->index, itup, heapRel);
+
+		pgstat_progress_update_param(PROGRESS_CREATEIDX_TUPLES_DONE,
+	 ++tups_done);
 	}
 }

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-02-22 Thread Alvaro Herrera

On 2019-Feb-13, Amit Langote wrote:

> Doesn't the name amphasename sound a bit too generic, given that it can
> only describe the phases of ambuild?  Maybe ambuildphase?

Hmm, yeah, maybe it does.  I renamed it "ambuildphasename", since it's
not about reporting the phase itself -- it's about translating the phase
number to the string that's reported to the user.

The attached patch does it that way.  Also, when an index build uses an
AM that doesn't support progress reporting, it no longer reports a NULL
phase name while building.  I also changed it to report the progress of
phase 7 (heap scan validation) using block numbers rather than tuple
counts.  I also tweaked the strings reported in the view.  They're
clearer now IMO.

One slight annoyance is that when parallel workers are used, the last
block number reported in phase 3/subphase 2 (IndexBuildHeapScan stuff)
is not necessarily accurate, since the tail of the table could well be
scanned by a worker that's not the leader, and we only report in the
leader when it gets a new block.

When the AM does not support progress reporting, everything stays as
zeros during the index build phase.  It's easy to notice how slow hash
indexes are to build compared to btrees this way!  Maybe it'd be
better fallback on reporting block numbers in IndexBuildHeapScan when
this happens.  Thoughts?

I added docs to the monitoring section -- that's the bulkiest part of
the patch.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From e8eae6d3d5af45ecb45171745a4af374013baffe Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 2 Jan 2019 16:14:39 -0300
Subject: [PATCH] Report progress of CREATE INDEX operations

---
 contrib/amcheck/verify_nbtree.c   |   2 +-
 contrib/bloom/blinsert.c  |   2 +-
 contrib/bloom/blutils.c   |   1 +
 doc/src/sgml/indexam.sgml |  13 ++
 doc/src/sgml/monitoring.sgml  | 227 +-
 src/backend/access/brin/brin.c|   5 +-
 src/backend/access/gin/gininsert.c|   2 +-
 src/backend/access/gin/ginutil.c  |   1 +
 src/backend/access/gist/gist.c|   1 +
 src/backend/access/gist/gistbuild.c   |   2 +-
 src/backend/access/hash/hash.c|   3 +-
 src/backend/access/nbtree/nbtree.c|   9 +
 src/backend/access/nbtree/nbtsort.c   |  43 -
 src/backend/access/nbtree/nbtutils.c  |  24 +++
 src/backend/access/spgist/spginsert.c |   2 +-
 src/backend/access/spgist/spgutils.c  |   1 +
 src/backend/catalog/index.c   | 134 ++-
 src/backend/catalog/system_views.sql  |  27 +++
 src/backend/commands/indexcmds.c  |  52 +-
 src/backend/storage/ipc/standby.c |   2 +-
 src/backend/storage/lmgr/lmgr.c   |  46 +-
 src/backend/storage/lmgr/lock.c   |   7 +-
 src/backend/utils/adt/amutils.c   |  23 +++
 src/backend/utils/adt/pgstatfuncs.c   |   2 +
 src/include/access/amapi.h|   4 +
 src/include/access/genam.h|   1 +
 src/include/access/nbtree.h   |  11 ++
 src/include/catalog/index.h   |   2 +
 src/include/catalog/pg_proc.dat   |  10 +-
 src/include/commands/progress.h   |  36 
 src/include/pgstat.h  |   5 +-
 src/include/storage/lmgr.h|   4 +-
 src/include/storage/lock.h|   2 +-
 33 files changed, 668 insertions(+), 38 deletions(-)

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index 964200a7678..99f6ed6bc44 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -534,7 +534,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool readonly,
 			 RelationGetRelationName(state->rel),
 			 RelationGetRelationName(state->heaprel));
 
-		IndexBuildHeapScan(state->heaprel, state->rel, indexinfo, true,
+		IndexBuildHeapScan(state->heaprel, state->rel, indexinfo, true, false,
 		   bt_tuple_present_callback, (void *) state, scan);
 
 		ereport(DEBUG1,
diff --git a/contrib/bloom/blinsert.c b/contrib/bloom/blinsert.c
index e43fbe0005f..947ee74881f 100644
--- a/contrib/bloom/blinsert.c
+++ b/contrib/bloom/blinsert.c
@@ -141,7 +141,7 @@ blbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	initCachedPage();
 
 	/* Do the heap scan */
-	reltuples = IndexBuildHeapScan(heap, index, indexInfo, true,
+	reltuples = IndexBuildHeapScan(heap, index, indexInfo, true, false,
    bloomBuildCallback, (void *) ,
    NULL);
 
diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index 64583765787..697a37a384b 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -132,6 +132,7 @@ blhandler(PG_FUNCTION_ARGS)
 	amroutine->amcostestimate = blcostestimate;
 	amroutine->amoptions = bloptions;
 	amroutine->amproperty = NULL;
+	amroutine->ambuildphasename = NULL;
 	amroutine->amvalidate = blvalidate;
 	amroutine->ambeginscan = blbeginscan;
 	amroutine->amrescan =

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-02-12 Thread Tatsuro Yamada


On 2019/02/13 4:16, Alvaro Herrera wrote:

I added metrics for the validate_index phases; patch attached.  This is
still a bit raw, but it looks much better now.  Here's a sample
concurrent index build on a 100M tuple table.  There are no concurrent


Great!

+   s.pid AS pid, S.datid AS datid, D.datname AS datname,
+   S.relid AS relid,
+   CASE s.param2 WHEN 0 THEN 'initializing (phase 1 of 8)'
+ WHEN 1 THEN 'waiting for lockers 1 
(phase 2 of 8)'
+ WHEN 2 THEN 'building index (3 of 8): 
' ||
+   
pg_indexam_progress_phasename(s.param1::oid, s.param3)

It would be better to replace "s" with "S".

s/s.pid/S.pid/
s/s.param2/S.param2/
s/s.param1::oid, s.param3/S.param1::oid, S.param3/

These are not important comments but for readability. :)

Thanks,
Tatsuro Yamada

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-02-12 Thread Amit Langote

Hi Alvaro,

On 2019/02/12 12:18, Alvaro Herrera wrote:
> I ended up defining phases for the index_build phase in the AM itself;
> the code reports a phase number using the regular API, and then the
> pgstat_progress view obtains the name of each phase using a support
> method.

diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 05102724ead..189179a5667 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -127,6 +127,7 @@ typedef struct IndexAmRoutine
 amcostestimate_function amcostestimate;
 amoptions_function amoptions;
 amproperty_function amproperty; /* can be NULL */
+amphasename_function amphasename;   /* can be NULL */

Doesn't the name amphasename sound a bit too generic, given that it can
only describe the phases of ambuild?  Maybe ambuildphase?

Thanks,
Amit

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-02-12 Thread Amit Langote

On 2019/02/13 11:59, Simon Riggs wrote:
> On Wed, 13 Feb 2019 at 00:46, Alvaro Herrera 
> wrote:
> 
>> Here's a sample
>> concurrent index build on a 100M tuple table.
> 
> 
> Cool

+1

Looking at the "([phase] x of x)" in the sample output, I thought
pg_stat_progress_vacuum's output should've had it too (not a concern of
Alvaro's patch though.)

Thanks,
Amit

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-02-12 Thread Simon Riggs

On Wed, 13 Feb 2019 at 00:46, Alvaro Herrera 
wrote:

> Here's a sample
> concurrent index build on a 100M tuple table.


Cool


> There are no concurrent
> transactions, so phases that wait for lockers don't appear.


Would prefer to see them, so we know they are zero.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-02-11 Thread Alvaro Herrera

Hi Rahila, Pavan,

Thanks for the review.  I incorporated some fixes per your comments.
More fixes are needed still.  That said, the patch in attachment gives
good insight into how I think this should turn out.

> > index_build
> > ---

> OK.
> I think the main phases in which index_build for most AMs can be divided is
> as follows:
> [...]

I ended up defining phases for the index_build phase in the AM itself;
the code reports a phase number using the regular API, and then the
pgstat_progress view obtains the name of each phase using a support
method.

For btree, I ended up not reporting anything about the sort per se; we
just scan the heap (reporting block numbers, which is easy because we
know the heap size in advance) and count heap tuples while scanning;
once that's done, we know how many tuples we need to write out to the
index, so that total becomes the next stage's target total.  That's
simpler. (It is wrong for partial indexes currently, though.)

So for btree we have this:

/*
 *  btphasename() -- Return name of phase, for index build progress report
 */
char *
btphasename(int64 phasenum)
{
switch (phasenum)
{
case PROGRESS_CREATEIDX_SUBPHASE_INITIALIZE:
return "initializing (1/5)";
case PROGRESS_BTREE_PHASE_INDEXBUILD_HEAPSCAN:
return "table scan (2/5)";
case PROGRESS_BTREE_PHASE_PERFORMSORT_1:
return "sorting tuples, spool 1 (3/5)";
case PROGRESS_BTREE_PHASE_PERFORMSORT_2:
return "sorting tuples, spool 2 (4/5)";
case PROGRESS_BTREE_PHASE_LEAF_LOAD:
return "final btree sort & load (5/5)";
default:
return NULL;
}
}

Now this is a bit strange, because the report looks like this:

  phase | building index (3 of 8): initializing (1/5)
  [...]
  blocks total  | 442478
  blocks done   | 3267

So for phase 3, we always have phase and subphase counters in the phase
name.  However, I don't think there's any clean way to know from the
very beginning how many subphases are there going to be, and it would be
even more confusing to have the total "of N" number vary each time
depending on the access method.  So this leaves the phase counter going
from 1 to 8, and then for phase 3 you have a second part that goes from
1 to N.

Anyway, at some point it completes the heap scan, and the phase changes
to this:

  phase | building index (3 of 8): heap scan(2/5)

I think I should take Rahila's suggestion to report the number of
workers running (but I'm not sure I see the point in reporting number of
workers planned).

The heap scan quickly gives way to the next one,

  phase | building index (3 of 8): final btree sort & load (5/5)
  [...]
  tuples total  | 1
  tuples done   | 58103713

Finally,
  phase | validating index scan (phase 5 of 8)
and
  phase | validate index heapscan (phase 7 of 8)

and then it's over.

Now, it's clear that I haven't yet nailed all the progress updates
correctly, because there are some stalls where nothing seems to be
happening.  The final index_validate phases have been ignored so far.

> 1. In the above code, I think it will be useful if we report phases as
> 'Initializing phase 1 of n'
> 'Waiting for lockers phase 2 of n' etc. i.e reporting total number of
> phases as well.

Great idea, done.

> 2. IIUC, the above code in WaitForLockersMultiple can be written under
> if(progress) condition like rest of the progress checking code in that
> function
> and pass NULL for count otherwise.

Yep, good point.

> 3. Will it be useful to report pid's of the backend's
> for the transactions which CREATE INDEX concurrently is waiting for?
> I think it can be used to debug long running transactions.

Done.

> 4. Should we have additional statistics update phase before
> index_update_stats
> like PROGRESS_VACUUM_PHASE_FINAL_CLEANUP?

Not sure about this one ... it's supposed to be a fairly quick
operation.

> 5. I think it may be useful to report number of parallel workers requested
> and number of workers
> actually running index build in case of btree.

True, I haven't done this one.  I'll add it next.

> 6. Also, this can be reported as an additional validation phase for
> exclusion constraint
> in index_build function as it involves scanning all live tuples of heap
> relation.
> 
>  /*
>  * If it's for an exclusion constraint, make a second pass over the
> heap
>  * to verify that the constraint is satisfied.  We must not do this
> until
>  * the index is fully valid.  (Broken HOT chains shouldn't matter,
> though;
>  * see comments for IndexCheckExclusion.)
>  */
> if (indexInfo->ii_ExclusionOps != NULL)
> IndexCheckExclusion(heapRelation,

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-02-01 Thread Michael Paquier

Hi all,

Based on the latest emails exchanged, the patch got some feedback from
Pavan which has not been addressed.  So I am marking the patch as
returned with feedback.
--
Michael


signature.asc
Description: PGP signature

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-01-24 Thread Alvaro Herrera

On 2019-Jan-09, Pavan Deolasee wrote:

> Looks like it's missing the validate_index blocks-scanned report, which is
> not AM-specific and your original proposal does mention that.

True. That phase is actually 3 phases, which would be reported
separately:

  5. index_bulk_delete() scan
  6. performsort
  7. validate_index_heapscan

> I thought a bit about index_build part. If most AMs follow a somewhat
> standard phases while building an index, it might be simpler to define
> those phases and have AM agnostic code report those phases.

Well, think about btrees.  We first scan the whole table putting all
tuples in a spool (two spools actually), then we tell the spools to get
sorted, then we extract tuples from the spools, and finally we read the
spool to produce the leaf pages.  If we just report the table scan, the
reports gets to 100% complete in that phase and then waits for a very
long time during which nothing seems to happen.  That's not cool.

I'm adding a new AM support routine which turns the subphase number into
a textual description, so that we don't have to hardcode phase names in
the agnostic code.

> Can we have more descriptive text for waiters? Such as "waiting for
> existing writers", "waiting for intermediate writers" and "waiting for
> old readers".  Not sure if I got those correct, something of that sort
> will definitely give more insight into what the transaction is waiting
> for.

Can do.

> Can we actually also report the list of transactions the command is waiting
> on?
> That could be useful to the user if CIC appears to be stuck too long.

I'm afraid this is not possible, because the progress report interface
doesn't allow for something like that.

> IMHO we should just use full term INDEX instead of IDX, such as
> PROGRESS_CREATE_INDEX_PARTITIONS_TOTAL. It's already a long name, so couple
> of extra characters won't make a difference.

Really?  I though it was clear enough and it's *three* characters saved
even.

> I think we should clear the PROGRESS_WAITFOR_TOTAL and PROGRESS_WAITFOR_DONE
> when the wait phase is over, to avoid any confusion. For example, I noticed
> that the counters from WAIT_1 are reported as-is if WAIT_2 had no lockers.

Yes, absolutely.

> +GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *ocount)
> 
> Could that out variable be named something differently? "countp" or
> something like that. I did not check if there is some practice that we
> follow, but I remember suffixing with "p" rather than prefixing with
> "o" (for out I assume)

Sure.

> +/* Progress parameters for CREATE INDEX */
> +#define PROGRESS_CREATEIDX_PHASE 0
> +/* 1 and 2 reserved for "waitfor" metrics */
> +#define PROGRESS_CREATEIDX_PARTITIONS_TOTAL 3
> +#define PROGRESS_CREATEIDX_PARTITIONS_DONE 4
> +
> 
> Is there a reason to leave those reserve placeholders, only to fill them a
> few lines down?

Yes -- those are going to be used by other reports that also use similar
metrics, such as the CLUSTER report.

I'm running out of columns to put the numbers into :-(  Right now I have

1. phase
2. subphase (for index_build)
3. lockers total (to wait for)
4. lockers done
5. blocks total
6. blocks done
7. tapes total
8. tapes done
9. partitions total
10. partitions done

The "tapes total/done" bit is about reporting the performsort steps; I'm
not really sure yet it'll be tapes, but I hope it can be done with two
numbers.  Anyway, in btree build I have these subphases

1. spool heapscan using IndexBuildHeapScan
2. _bt_parallel_heapscan stuff  (only one of these in a build)
3. bt_leafbuild, performsort spool 1
4. bt_leafbuild, performsort spool 2
5. bt_load

and I don't have columns to put the metrics for the btload thing,
assuming I figure out what those are.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-01-18 Thread Rahila Syed

Hi Alvaro,

The WIP patch needs a rebase. Please see few in-line comments.

On Fri, Dec 21, 2018 at 3:30 AM Alvaro Herrera 
wrote:

> Monitoring progress of CREATE INDEX [CONCURRENTLY] is sure to be welcome,
> so here's a proposal.
>
> There are three distinct interesting cases.  One is straight CREATE
> INDEX of a standalone table; then we have CREATE INDEX CONCURRENTLY;
> finally, CREATE INDEX on a partitioned table.  Note that there's no
> CONCURRENTLY for partitioned tables.
>
> A non-concurrent build is a very straightforward: we call create_index,
> which does index_build, done.  See below for how to report for
> index_build, which is the interesting part.  I propose not to report
> anything else than that for non-concurrent build.  There's some
> preparatory work that's identical than for CIC (see below).  Like
> VACUUM, it seems a bit pointless to report an initial phase that's
> almost immediate, so I propose we just don't report anything until the
> actual index building starts.


 Aren't we reporting this initial preparatory work in the form of
'initializing' phase that you
have in current patch? IIUC, there are no metrics but the name of the phase.

CREATE INDEX CONCURRENTLY does these things first, which we would not
> report (this is just like VACUUM, which only starts reporting once it
> starts scanning blocks):
>
> a. lock rel.  No metrics to report.
> b. other prep; includes lots of catalog access.  Unlikely to lock, but
>not impossible.  No metrics to report.
> c. create_index.  CIC skips index_build here, so there's no reason to
>report it either.
>
> We would start reporting at this point, with these phases:
>
> 1. WaitForLockers 1.  Report how many xacts do we need to wait for, how
>many are done.
> 2. index_build.  See below.
> 3. WaitForLockers 2.  Report how many xacts do we need to wait for, how
>many are done.
> 4. validate_index.  Scans the whole rel again.  Report number of blocks
>scanned.
> 5. wait for virtual XIDs.  Like WaitForLockers: report how many xacts we
>need to wait for, how many are done.
>
> We're done.
>
> (Alternatively, we could have an initial "prep" phase for a/b/c for the
> concurrent case and a/b for non-concurrent.  I'm just not sure it's
> useful.)



> index_build
> ---
>
> The actual index building is an AM-specific undertaking, and we report
> its progress separately from the AM-agnostic code.  That is, each AM has
> freedom to define its own list of phases and counters, separate from the
> generic code.  This avoids the need to provide a new AM method or invoke
> callbacks.  So when you see that CREATE_INDEX_PHASE is either "index
> build" you'll have a separate BTREE_CREATE_PHASE value set to either
> "scanning heap" or "sorting" or "building upper layers"; equivalently
> for other AMs.
>
> OK.
I think the main phases in which index_build for most AMs can be divided is
as follows:
1. Scanning heap tuples for building index which is common for all AMs
2. Forming index tuples which is AM-specific
3. Writing tuples into the index which is AM-specific.
Out of these, metrics for phase 1 can be heap_tuples_scanned /
total_heap_tuples and
for phase 3, it can be index_tuples_computed/ total_index_tuples.
I am not sure about metrics for phase 2 especially for Btree which involves
reporting progress of sorting.

Partitioned indexes
> ---
>
> For partitioned indexes, we only have the index build phase, but we
> repeat it for each partition.  In addition to the index_build metrics
> described above, we should report how many partitions we need to handle
> in total and how many partitions are already done.  (I'm avoiding
> getting in the trouble of reporting *which* partition we're currently
> handling and have already handled.)
>
> Thoughts?
>
> +CREATE VIEW pg_stat_progress_create_index AS
+ SELECT
+ s.pid AS pid, S.datid AS datid, D.datname AS datname,
+ S.relid AS relid,
+ CASE s.param1 WHEN 0 THEN 'initializing'
+   WHEN 1 THEN 'waiting for lockers 1'
+   WHEN 2 THEN 'building index'
+   WHEN 3 THEN 'waiting for lockers 2'
+   WHEN 4 THEN 'validating index'
+   WHEN 5 THEN 'waiting for lockers 3'
+   END as phase,
+ S.param2 AS procs_to_wait_for,
+ S.param3 AS procs_waited_for,
+ S.param4 AS partitions_to_build,
+ S.param5 AS partitions_built
+ FROM pg_stat_get_progress_info('CREATE INDEX') AS S
+ LEFT JOIN pg_database D ON S.datid = D.oid;
+

1. In the above code, I think it will be useful if we report phases as
'Initializing phase 1 of n'
'Waiting for lockers phase 2 of n' etc. i.e reporting total number of
phases as well.

+   holders = lappend(holders,
+ GetLockConflicts(locktag,
lockmode, ));
+   total += count;
2. IIUC, the above code in WaitForLockersMultiple can be written under
if(progress) condition like rest of the progress checking code in that
function
and pass NULL for count otherwise.

3. Will it be useful to report pid's of

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-01-08 Thread Pavan Deolasee

On Tue, Jan 1, 2019 at 6:09 AM Alvaro Herrera 
wrote:

> For discussion, here's an preliminary patch.  This is just a first
> skeleton; needs to grow a lot of flesh yet, per my previous proposal.
> As far as the generic CREATE INDEX stuff goes, I think this is complete;
> it's missing the AM-specific bits.
>

Looks like it's missing the validate_index blocks-scanned report, which is
not AM-specific and your original proposal does mention that.

I thought a bit about index_build part. If most AMs follow a somewhat
standard phases while building an index, it might be simpler to define
those phases and have AM agnostic code report those phases. Or may be just
report the most significant information, instead of reporting each
sub-phase of index_build. I think the most important progress to know would
be how far the heap is scanned for to-be-indexed tuples. AFAICS all AMs
use IndexBuildHeapScan() to scan the heap. Can we simply do some reporting
from that routine? Like number of blocks scanned against the total number
of blocks requested?

Some minor comments on the patch, though I suspect you might be already
updating the patch since you marked it as WIP.

+CREATE VIEW pg_stat_progress_create_index AS
+ SELECT
+ s.pid AS pid, S.datid AS datid, D.datname AS datname,
+ S.relid AS relid,
+ CASE s.param1 WHEN 0 THEN 'initializing'
+   WHEN 1 THEN 'waiting for lockers 1'
+   WHEN 2 THEN 'building index'
+   WHEN 3 THEN 'waiting for lockers 2'
+   WHEN 4 THEN 'validating index'
+   WHEN 5 THEN 'waiting for lockers 3'

Can we have more descriptive text for waiters? Such as "waiting for existing
writers", "waiting for intermediate writers" and "waiting for old readers".
Not
sure if I got those correct, something of that sort will definitely give
more
insight into what the transaction is waiting for.

Can we actually also report the list of transactions the command is waiting
on?
That could be useful to the user if CIC appears to be stuck too long.

+ pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+ nparts);
+

IMHO we should just use full term INDEX instead of IDX, such as
PROGRESS_CREATE_INDEX_PARTITIONS_TOTAL. It's already a long name, so couple
of
extra characters won't make a difference. I did not see much precedence to
shortern to IDX for INDEX elsewhere in the code (though we tend to do that
for
variable names etc).

@@ -1282,6 +1318,9 @@ DefineIndex(Oid relationId,
  old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
_old_snapshots);
+ pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+ PROGRESS_CREATEIDX_PHASE_WAIT_3);
+ pgstat_progress_update_param(PROGRESS_WAITFOR_TOTAL, n_old_snapshots);

I think we should clear the PROGRESS_WAITFOR_TOTAL and PROGRESS_WAITFOR_DONE
when the wait phase is over, to avoid any confusion. For example, I noticed
that the counters from WAIT_1 are reported as-is if WAIT_2 had no lockers.

Shouldn't PROGRESS_WAITFOR_DONE be updated when we skip a snapshot in the
code below?
if (!VirtualTransactionIdIsValid(old_snapshots[i]))
continue;   /* found uninteresting in previous cycle */

@@ -2817,7 +2818,7 @@ FastPathGetRelationLockEntry(LOCALLOCK *locallock)
  * uses of the result.
  */
 VirtualTransactionId *
-GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode)
+GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *ocount)

Could that out variable be named something differently? "countp" or
something
like that. I did not check if there is some practice that we follow, but I
remember suffixing with "p" rather than prefixing with "o" (for out I
assume)

+
+/* Progress parameters for CREATE INDEX */
+#define PROGRESS_CREATEIDX_PHASE 0
+/* 1 and 2 reserved for "waitfor" metrics */
+#define PROGRESS_CREATEIDX_PARTITIONS_TOTAL 3
+#define PROGRESS_CREATEIDX_PARTITIONS_DONE 4
+

Is there a reason to leave those reserve placeholders, only to fill them a
few
lines down?

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2018-12-31 Thread Alvaro Herrera

For discussion, here's an preliminary patch.  This is just a first
skeleton; needs to grow a lot of flesh yet, per my previous proposal.
As far as the generic CREATE INDEX stuff goes, I think this is complete;
it's missing the AM-specific bits.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index c2ad944e04..f4cb28c6d6 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1590,7 +1590,7 @@ index_drop(Oid indexId, bool concurrent)
 		 * to acquire an exclusive lock on our table.  The lock code will
 		 * detect deadlock and error out properly.
 		 */
-		WaitForLockers(heaplocktag, AccessExclusiveLock);
+		WaitForLockers(heaplocktag, AccessExclusiveLock, true);
 
 		/*
 		 * No more predicate locks will be acquired on this index, and we're
@@ -1634,7 +1634,7 @@ index_drop(Oid indexId, bool concurrent)
 		 * Wait till every transaction that saw the old index state has
 		 * finished.
 		 */
-		WaitForLockers(heaplocktag, AccessExclusiveLock);
+		WaitForLockers(heaplocktag, AccessExclusiveLock, true);
 
 		/*
 		 * Re-open relations to allow us to complete our actions.
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5253837b54..921a84eb45 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -904,6 +904,24 @@ CREATE VIEW pg_stat_progress_vacuum AS
 FROM pg_stat_get_progress_info('VACUUM') AS S
 		LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_create_index AS
+	SELECT
+		s.pid AS pid, S.datid AS datid, D.datname AS datname,
+		S.relid AS relid,
+		CASE s.param1 WHEN 0 THEN 'initializing'
+	  WHEN 1 THEN 'waiting for lockers 1'
+	  WHEN 2 THEN 'building index'
+	  WHEN 3 THEN 'waiting for lockers 2'
+	  WHEN 4 THEN 'validating index'
+	  WHEN 5 THEN 'waiting for lockers 3'
+	  END as phase,
+		S.param2 AS procs_to_wait_for,
+		S.param3 AS procs_waited_for,
+		S.param4 AS partitions_to_build,
+		S.param5 AS partitions_built
+	FROM pg_stat_get_progress_info('CREATE INDEX') AS S
+		LEFT JOIN pg_database D ON S.datid = D.oid;
+
 CREATE VIEW pg_user_mappings AS
 SELECT
 U.oid   AS umid,
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 816c73a36a..03321c2dc4 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -35,6 +35,7 @@
 #include "commands/dbcommands.h"
 #include "commands/defrem.h"
 #include "commands/event_trigger.h"
+#include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/tablespace.h"
 #include "mb/pg_wchar.h"
@@ -47,6 +48,7 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_func.h"
 #include "parser/parse_oper.h"
+#include "pgstat.h"
 #include "rewrite/rewriteManip.h"
 #include "storage/lmgr.h"
 #include "storage/proc.h"
@@ -370,6 +372,15 @@ DefineIndex(Oid relationId,
 	Snapshot	snapshot;
 	int			i;
 
+
+	/*
+	 * Start progress report.  If we're building a partition, this was already
+	 * done.
+	 */
+	if (!OidIsValid(parentIndexId))
+		pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX,
+	  relationId);
+
 	/*
 	 * count key attributes in index
 	 */
@@ -866,6 +877,11 @@ DefineIndex(Oid relationId,
 	if (!OidIsValid(indexRelationId))
 	{
 		heap_close(rel, NoLock);
+
+		/* If this is the top-level index, we're done */
+		if (!OidIsValid(parentIndexId))
+			pgstat_progress_end_command();
+
 		return address;
 	}
 
@@ -891,6 +907,9 @@ DefineIndex(Oid relationId,
 			TupleDesc	parentDesc;
 			Oid		   *opfamOids;
 
+			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+		 nparts);
+
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
 
 			parentDesc = CreateTupleDescCopy(RelationGetDescr(rel));
@@ -1040,6 +1059,8 @@ DefineIndex(Oid relationId,
 skip_build, quiet);
 }
 
+pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
+			 i + 1);
 pfree(attmap);
 			}
 
@@ -1074,6 +1095,8 @@ DefineIndex(Oid relationId,
 		 * Indexes on partitioned tables are not themselves built, so we're
 		 * done here.
 		 */
+		if (!OidIsValid(parentIndexId))
+			pgstat_progress_end_command();
 		return address;
 	}
 
@@ -1081,6 +1104,11 @@ DefineIndex(Oid relationId,
 	{
 		/* Close the heap and we're done, in the non-concurrent case */
 		heap_close(rel, NoLock);
+
+		/* If this is the top-level index, we're done. */
+		if (!OidIsValid(parentIndexId))
+			pgstat_progress_end_command();
+
 		return address;
 	}
 
@@ -1132,7 +1160,9 @@ DefineIndex(Oid relationId,
 	 * exclusive lock on our table.  The lock code will detect deadlock and
 	 * error out properly.
 	 */
-	WaitForLockers(heaplocktag, ShareLock);
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+

monitoring CREATE INDEX [CONCURRENTLY]

2018-12-20 Thread Alvaro Herrera

Monitoring progress of CREATE INDEX [CONCURRENTLY] is sure to be welcome,
so here's a proposal.

There are three distinct interesting cases.  One is straight CREATE
INDEX of a standalone table; then we have CREATE INDEX CONCURRENTLY;
finally, CREATE INDEX on a partitioned table.  Note that there's no
CONCURRENTLY for partitioned tables.

A non-concurrent build is a very straightforward: we call create_index,
which does index_build, done.  See below for how to report for
index_build, which is the interesting part.  I propose not to report
anything else than that for non-concurrent build.  There's some
preparatory work that's identical than for CIC (see below).  Like
VACUUM, it seems a bit pointless to report an initial phase that's
almost immediate, so I propose we just don't report anything until the
actual index building starts.

CREATE INDEX CONCURRENTLY does these things first, which we would not
report (this is just like VACUUM, which only starts reporting once it
starts scanning blocks):

a. lock rel.  No metrics to report.
b. other prep; includes lots of catalog access.  Unlikely to lock, but
   not impossible.  No metrics to report.
c. create_index.  CIC skips index_build here, so there's no reason to
   report it either.

We would start reporting at this point, with these phases:

1. WaitForLockers 1.  Report how many xacts do we need to wait for, how
   many are done.
2. index_build.  See below.
3. WaitForLockers 2.  Report how many xacts do we need to wait for, how
   many are done.
4. validate_index.  Scans the whole rel again.  Report number of blocks
   scanned.
5. wait for virtual XIDs.  Like WaitForLockers: report how many xacts we
   need to wait for, how many are done.

We're done.

(Alternatively, we could have an initial "prep" phase for a/b/c for the
concurrent case and a/b for non-concurrent.  I'm just not sure it's
useful.)

index_build
---

The actual index building is an AM-specific undertaking, and we report
its progress separately from the AM-agnostic code.  That is, each AM has
freedom to define its own list of phases and counters, separate from the
generic code.  This avoids the need to provide a new AM method or invoke
callbacks.  So when you see that CREATE_INDEX_PHASE is either "index
build" you'll have a separate BTREE_CREATE_PHASE value set to either
"scanning heap" or "sorting" or "building upper layers"; equivalently
for other AMs.

Partitioned indexes
---

For partitioned indexes, we only have the index build phase, but we
repeat it for each partition.  In addition to the index_build metrics
described above, we should report how many partitions we need to handle
in total and how many partitions are already done.  (I'm avoiding
getting in the trouble of reporting *which* partition we're currently
handling and have already handled.)

Thoughts?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

58 matches

Mail list logo