Re: Progress report of CREATE INDEX for nested partitioned tables

2023-03-26 Thread Justin Pryzby
On Sat, Mar 25, 2023 at 03:43:32PM -0400, Tom Lane wrote:
> I pushed 0001 with some cosmetic changes (for instance, trying to
> make the style of the doc entries for partitions_total/partitions_done
> match the rest of their table).

Thanks.

> I'm not touching 0002 or 0003, because I think they're fundamentally
> a bad idea.  Progress reporting is inherently inexact, because it's

Nobody could disagree that it's inexact.  The assertions are for minimal
sanity tests and consistency.  Like if "total" is set multiple times (as
in this patch), or if a progress value goes backwards.  Anyway the
assertions exposed two other issues that would need to be fixed before
the assertions themselves could be proposed.

-- 
Justin




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-03-25 Thread Tom Lane
I pushed 0001 with some cosmetic changes (for instance, trying to
make the style of the doc entries for partitions_total/partitions_done
match the rest of their table).

I'm not touching 0002 or 0003, because I think they're fundamentally
a bad idea.  Progress reporting is inherently inexact, because it's
so hard to predict the amount of work to be done in advance -- have
you ever seen a system anywhere whose progress bars reliably advance
at a uniform rate?  I think adding assertions that the estimates are
error-free is just going to cause headaches.  As an example, I added
a comment pointing out that the current fix won't crash and burn if
the caller failed to lock all the child tables in advance: the
find_all_inheritors call should be safe anyway, so the worst consequence
would be an imprecise partitions_total estimate.  But that argument
falls down if we're going to add assertions that partitions_total
isn't in error.

regards, tom lane




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-03-25 Thread Tom Lane
Justin Pryzby  writes:
> On Sat, Mar 25, 2023 at 11:55:13AM -0400, Tom Lane wrote:
>> That's why I wanted list_length() not list_length() - 1.  We are
>> doing *something* at the top partitioned table, it just doesn't
>> involve a table scan, so I don't find this totally unreasonable.
>> If you agree we are doing work at intermediate partitioned tables,
>> how are we not doing work at the top one?

> What you're proposing would redefine the meaning of
> PARTITIONS_DONE/TOTAL, even in the absence of intermediate partitioned
> tables.  Which might be okay, but the scope of this thread/patch was to
> fix the behavior involving intermediate partitioned tables.

I'm a little skeptical of that argument, because this patch is already
redefining the meaning of PARTITIONS_TOTAL.  The fact that the existing
documentation is vague enough to be read either way doesn't make it not
a change.

Still, in the interests of getting something done I'll drop the issue.

regards, tom lane




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-03-25 Thread Justin Pryzby
On Sat, Mar 25, 2023 at 11:55:13AM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > On Thu, Mar 23, 2023 at 04:35:46PM -0400, Tom Lane wrote:
> >> Furthermore, as things stand it's not hard
> >> for PARTITIONS_TOTAL to be zero --- there's at least one such case
> >> in the regression tests --- and that seems just weird to me.
> 
> > I don't know why it'd seem weird.  postgres doesn't create partitions
> > automatically, so by default there are none.  If we create a table but
> > never load any data, it'll have no partitions.
> 
> My problem with it is that it's not clear how to tell "no partitioned
> index creation in progress" from "partitioned index creation in progress,
> but total = 0".  Maybe there's some out-of-band way to tell that in the
> stats reporting system, but still it's a weird corner case.
> 
> > Also, the TOTAL=0 case
> > won't go away just because we start counting intermediate partitions.
> 
> That's why I wanted list_length() not list_length() - 1.  We are
> doing *something* at the top partitioned table, it just doesn't
> involve a table scan, so I don't find this totally unreasonable.
> If you agree we are doing work at intermediate partitioned tables,
> how are we not doing work at the top one?

What you're proposing would redefine the meaning of
PARTITIONS_DONE/TOTAL, even in the absence of intermediate partitioned
tables.  Which might be okay, but the scope of this thread/patch was to
fix the behavior involving intermediate partitioned tables.

It's somewhat weird to me that find_all_inheritors(rel) returns the rel
itself.  But it's an internal function, and evidently that's what's
needed/desirable to do, so that's fine.

However, "PARTITIONS_TOTAL" has a certain user-facing definition, and
"Number of partitions" is easier to explain than "Number of partitions
plus the rel itself", and IMO an easier definition is a better one.

Your complaint seems similar to something I've said a few times before:
it's weird to expose macroscopic progress reporting of partitioned
tables in the same view and in the same *row* as microscopic progress of
its partitions.  But changing that is a job for another patch.  I won't
be opposed to it if someone were to propose a patch to remove
partitions_{done,total}.  See also:
https://www.postgresql.org/message-id/flat/YCy5ZMt8xAyoOMmv%40paquier.xyz#b20d1be226a93dacd3fd40b402315105

-- 
Justin




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-03-25 Thread Tom Lane
Justin Pryzby  writes:
> On Thu, Mar 23, 2023 at 04:35:46PM -0400, Tom Lane wrote:
>> Furthermore, as things stand it's not hard
>> for PARTITIONS_TOTAL to be zero --- there's at least one such case
>> in the regression tests --- and that seems just weird to me.

> I don't know why it'd seem weird.  postgres doesn't create partitions
> automatically, so by default there are none.  If we create a table but
> never load any data, it'll have no partitions.

My problem with it is that it's not clear how to tell "no partitioned
index creation in progress" from "partitioned index creation in progress,
but total = 0".  Maybe there's some out-of-band way to tell that in the
stats reporting system, but still it's a weird corner case.

> Also, the TOTAL=0 case
> won't go away just because we start counting intermediate partitions.

That's why I wanted list_length() not list_length() - 1.  We are
doing *something* at the top partitioned table, it just doesn't
involve a table scan, so I don't find this totally unreasonable.
If you agree we are doing work at intermediate partitioned tables,
how are we not doing work at the top one?

regards, tom lane




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-03-24 Thread Justin Pryzby
On Thu, Mar 23, 2023 at 04:35:46PM -0400, Tom Lane wrote:
> So I'm still pretty desperately unhappy with count_leaf_partitions.
> I don't like expending significant cost purely for progress tracking
> purposes, I don't like the undocumented assumption that NoLock is
> safe, and what's more, if it is safe then we've already traversed
> the inheritance tree to lock everything so in principle we could
> have the count already.  However, it does seem like getting that
> knowledge from point A to point B would be a mess in most places.
> 
> One thing we could do to reduce the cost (and improve the safety)
> is to forget the idea of checking the relkinds and just set the
> PARTITIONS_TOTAL count to list_length() of the find_all_inheritors
> result.

Actually list_length() minus 1 ...

> We've already agreed that it's okay if the PARTITIONS_DONE
> count never reaches PARTITIONS_TOTAL, so this would just be taking
> that idea further.  (Or we could increment PARTITIONS_DONE for
> non-leaf partitions when we visit them, thus making that TOTAL
> more nearly correct.)

Yes, I think that's actually more correct.  If TOTAL is set without
regard to relkind, then DONE ought to be set the same way.

I updated the documentation to indicate that the counters include the
intermediate partitioned rels, but I wonder if it's better to say
nothing and leave that undefined.

> Furthermore, as things stand it's not hard
> for PARTITIONS_TOTAL to be zero --- there's at least one such case
> in the regression tests --- and that seems just weird to me.

I don't know why it'd seem weird.  postgres doesn't create partitions
automatically, so by default there are none.  If we create a table but
never load any data, it'll have no partitions.  Also, the TOTAL=0 case
won't go away just because we start counting intermediate partitions.

> By the by, this is awful code:
> 
> + if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))
> 
> Consult the definition of RELKIND_HAS_STORAGE to see why.
> But I want to get rid of that rather than fixing it.

Good point, but I'd burden-shift the blame to RELKIND_HAS_STORAGE().

BTW, I promoted myself to a co-author of the patch.  My interest here is
to resolve this hoping to allow the CIC patch to progress.

-- 
Justin
>From b48837a934ef1381d26ef62bbaf66f5e52b0cd6d Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev 
Date: Tue, 31 Jan 2023 19:13:07 +0400
Subject: [PATCH 1/3] fix CREATE INDEX progress report with nested partitions

The progress reporting was added in v12 (ab0dfc961) but the original
patch didn't seem to consider the possibility of nested partitioning.

When called recursively, DefineIndex() would clobber the number of
completed partitions, and it was possible to end up with the TOTAL
counter greater than the DONE counter.

This clarifies/re-defines that the progress reporting counts both direct
and indirect children, as well as intermediate partitioned tables:

- The TOTAL counter is set once at the start of the command.
- For indexes which are newly-built, the recursively-called
DefineIndex() increments the DONE counter.
- For pre-existing indexes which are ATTACHed rather than built,
DefineIndex() increments the DONE counter, unless the attached index is
partitioned, in which case progress report is not updated.

Author: Ilya Gladyshev, Justin Pryzby
Reviewed-By: Tomas Vondra, Dean Rasheed, Alvaro Herrera, Matthias van de Meent, Tom Lane
Discussion: https://www.postgresql.org/message-id/flat/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel%40gmail.com
---
 doc/src/sgml/monitoring.sgml  | 10 +++-
 src/backend/bootstrap/bootparse.y |  2 +
 src/backend/commands/indexcmds.c  | 60 +--
 src/backend/commands/tablecmds.c  |  4 +-
 src/backend/tcop/utility.c|  6 +-
 src/backend/utils/activity/backend_progress.c | 28 +
 src/include/commands/defrem.h |  1 +
 src/include/utils/backend_progress.h  |  1 +
 8 files changed, 103 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 7ab4424bf13..cf88dca53f3 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6929,7 +6929,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
   
   
When creating an index on a partitioned table, this column is set to
-   the total number of partitions on which the index is to be created.
+   the total number of partitions on which the index is to be created or attached.
+   In the case of intermediate partitioned tables, this includes both
+   direct and indirect partitions, and includes the intermediate
+   partitioned tables themselves.
This field is 0 during a REINDEX.
   
  
@@ -6940,7 +6943,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
   
   
When creating an index on a partitioned table, this column is set to
-   the number of 

Re: Progress report of CREATE INDEX for nested partitioned tables

2023-03-23 Thread Tom Lane
So I'm still pretty desperately unhappy with count_leaf_partitions.
I don't like expending significant cost purely for progress tracking
purposes, I don't like the undocumented assumption that NoLock is
safe, and what's more, if it is safe then we've already traversed
the inheritance tree to lock everything so in principle we could
have the count already.  However, it does seem like getting that
knowledge from point A to point B would be a mess in most places.

One thing we could do to reduce the cost (and improve the safety)
is to forget the idea of checking the relkinds and just set the
PARTITIONS_TOTAL count to list_length() of the find_all_inheritors
result.  We've already agreed that it's okay if the PARTITIONS_DONE
count never reaches PARTITIONS_TOTAL, so this would just be taking
that idea further.  (Or we could increment PARTITIONS_DONE for
non-leaf partitions when we visit them, thus making that TOTAL
more nearly correct.)  Furthermore, as things stand it's not hard
for PARTITIONS_TOTAL to be zero --- there's at least one such case
in the regression tests --- and that seems just weird to me.

By the by, this is awful code:

+   if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))

Consult the definition of RELKIND_HAS_STORAGE to see why.
But I want to get rid of that rather than fixing it.

regards, tom lane




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-03-21 Thread Justin Pryzby
On Thu, Mar 16, 2023 at 07:04:16PM +0400, Ilya Gladyshev wrote:
> > 16 марта 2023 г., в 04:07, Justin Pryzby  написал(а):
> > 
> > On Tue, Mar 14, 2023 at 06:58:14PM +0400, Ilya Gladyshev wrote:
> >>> The only change from the current patch is (3).  (1) still calls
> >>> count_leaf_partitions(), but only once.  I'd prefer that to rearranging
> >>> the progress reporting to set the TOTAL in ProcessUtilitySlow().
> >> 
> >> As for reusing TOTAL calculated outside of DefineIndex, as I can see, 
> >> ProcessUtilitySlow is not the only call site for DefineIndex (although, I 
> >> don’t know whether all of them need progress tracking), for instance, 
> >> there is ALTER TABLE that calls DefineIndex to create index for 
> >> constraints. So I feel like rearranging progress reporting will result in 
> >> unnecessary code duplication in those call sites, so passing in an 
> >> optional parameter seems to be easier here, if we are going to optimize 
> >> it, after all. Especially if back-patching is a non-issue.
> > 
> > Yeah.  See attached.  I don't like duplicating the loop.  Is this really
> > the right direction to go ?
> > 
> > I haven't verified if the child tables are locked in all the paths which
> > would call count_leaf_partitions().  But why is it important to lock
> > them for this?  If they weren't locked before, that'd be a pre-existing
> > problem...
> > <0001-fix-CREATE-INDEX-progress-report-with-nested-partiti.patch>
> 
> I’m not sure what the general policy on locking is, but I have checked ALTER 
> TABLE ADD INDEX, and the all the partitions seem to be locked on the first 
> entry to DefineIndex there. All other call sites pass in the parentIndexId, 
> which means the progress tracking machinery will not be initialized, so I 
> think, we don’t need to do locking in count_leaf_partitions(). 

> The approach in the patch looks good to me. Some nitpicks on the patch: 
> 1. There’s an unnecessary second call to get_rel_relkind in 
> ProcessUtilitySlow, we can just use what’s in the variable relkind.
> 2. We can also combine else and if to have one less nested level like that:
> + else if (!RELKIND_HAS_PARTITIONS(child_relkind))
> 
> 3. There was a part of the comment saying "If the index was built by calling 
> DefineIndex() recursively, the called function is responsible for updating 
> the progress report for built indexes.", I think it is still useful to have 
> it there.

Thanks, I addressed (1) and (3).  (2) is deliberate, to allow a place to
put the comment which is not specific to the !HAS_PARTITIONS case.

-- 
Justin
>From 90300bc4ca08088de3a0015ff5c1a251537feacc Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev 
Date: Tue, 31 Jan 2023 19:13:07 +0400
Subject: [PATCH 1/3] fix CREATE INDEX progress report with nested partitions

The progress reporting was added in v12 (ab0dfc961) but the original
patch didn't seem to consider the possibility of nested partitioning.

When called recursively, DefineIndex() would clobber the number of
completed partitions, and it was possible to end up with the TOTAL
counter greater than the DONE counter.

This clarifies/re-defines that the progress reporting counts both direct
and indirect children, but doesn't count intermediate partitioned tables:

- The TOTAL counter is set once at the start of the command.
- For indexes which are newly-built, the recursively-called
DefineIndex() increments the DONE counter.
- For pre-existing indexes which are ATTACHed rather than built,
DefineIndex() increments the DONE counter, unless the attached index is
partitioned, in which case progress report is not updated.

Author: Ilya Gladyshev
Reviewed-By: Justin Pryzby, Tomas Vondra, Dean Rasheed, Alvaro Herrera, Matthias van de Meent
Discussion: https://www.postgresql.org/message-id/flat/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel%40gmail.com
---
 doc/src/sgml/monitoring.sgml  | 10 ++-
 src/backend/bootstrap/bootparse.y |  2 +
 src/backend/commands/indexcmds.c  | 74 +--
 src/backend/commands/tablecmds.c  |  4 +-
 src/backend/tcop/utility.c|  8 +-
 src/backend/utils/activity/backend_progress.c | 28 +++
 src/include/commands/defrem.h |  1 +
 src/include/utils/backend_progress.h  |  1 +
 8 files changed, 119 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 6249bb50d02..3f891b75541 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6901,7 +6901,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
   
   
When creating an index on a partitioned table, this column is set to
-   the total number of partitions on which the index is to be created.
+   the total number of partitions on which the index is to be created or attached.
+   In the case of intermediate partitioned tables, this includes both
+   direct and indirect 

Re: Progress report of CREATE INDEX for nested partitioned tables

2023-03-16 Thread Ilya Gladyshev



> 16 марта 2023 г., в 04:07, Justin Pryzby  написал(а):
> 
> On Tue, Mar 14, 2023 at 06:58:14PM +0400, Ilya Gladyshev wrote:
>>> The only change from the current patch is (3).  (1) still calls
>>> count_leaf_partitions(), but only once.  I'd prefer that to rearranging
>>> the progress reporting to set the TOTAL in ProcessUtilitySlow().
>> 
>> As for reusing TOTAL calculated outside of DefineIndex, as I can see, 
>> ProcessUtilitySlow is not the only call site for DefineIndex (although, I 
>> don’t know whether all of them need progress tracking), for instance, there 
>> is ALTER TABLE that calls DefineIndex to create index for constraints. So I 
>> feel like rearranging progress reporting will result in unnecessary code 
>> duplication in those call sites, so passing in an optional parameter seems 
>> to be easier here, if we are going to optimize it, after all. Especially if 
>> back-patching is a non-issue.
> 
> Yeah.  See attached.  I don't like duplicating the loop.  Is this really
> the right direction to go ?
> 
> I haven't verified if the child tables are locked in all the paths which
> would call count_leaf_partitions().  But why is it important to lock
> them for this?  If they weren't locked before, that'd be a pre-existing
> problem...
> <0001-fix-CREATE-INDEX-progress-report-with-nested-partiti.patch>

I’m not sure what the general policy on locking is, but I have checked ALTER 
TABLE ADD INDEX, and the all the partitions seem to be locked on the first 
entry to DefineIndex there. All other call sites pass in the parentIndexId, 
which means the progress tracking machinery will not be initialized, so I 
think, we don’t need to do locking in count_leaf_partitions(). 

The approach in the patch looks good to me. Some nitpicks on the patch: 
1. There’s an unnecessary second call to get_rel_relkind in ProcessUtilitySlow, 
we can just use what’s in the variable relkind.
2. We can also combine else and if to have one less nested level like that:
+   else if (!RELKIND_HAS_PARTITIONS(child_relkind))

3. There was a part of the comment saying "If the index was built by calling 
DefineIndex() recursively, the called function is responsible for updating the 
progress report for built indexes.", I think it is still useful to have it 
there.






Re: Progress report of CREATE INDEX for nested partitioned tables

2023-03-15 Thread Justin Pryzby
On Tue, Mar 14, 2023 at 06:58:14PM +0400, Ilya Gladyshev wrote:
> > The only change from the current patch is (3).  (1) still calls
> > count_leaf_partitions(), but only once.  I'd prefer that to rearranging
> > the progress reporting to set the TOTAL in ProcessUtilitySlow().
>
> As for reusing TOTAL calculated outside of DefineIndex, as I can see, 
> ProcessUtilitySlow is not the only call site for DefineIndex (although, I 
> don’t know whether all of them need progress tracking), for instance, there 
> is ALTER TABLE that calls DefineIndex to create index for constraints. So I 
> feel like rearranging progress reporting will result in unnecessary code 
> duplication in those call sites, so passing in an optional parameter seems to 
> be easier here, if we are going to optimize it, after all. Especially if 
> back-patching is a non-issue.

Yeah.  See attached.  I don't like duplicating the loop.  Is this really
the right direction to go ?

I haven't verified if the child tables are locked in all the paths which
would call count_leaf_partitions().  But why is it important to lock
them for this?  If they weren't locked before, that'd be a pre-existing
problem...
>From 1e05b055d1bd21265b68265ddef5e9654629087c Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev 
Date: Tue, 31 Jan 2023 19:13:07 +0400
Subject: [PATCH] fix CREATE INDEX progress report with nested partitions

The progress reporting was added in v12 (ab0dfc961) but the original
patch didn't seem to consider the possibility of nested partitioning.

When called recursively, DefineIndex() would clobber the number of
completed partitions, and it was possible to end up with the TOTAL
counter greater than the DONE counter.

This clarifies/re-defines that the progress reporting counts both direct
and indirect children, but doesn't count intermediate partitioned tables:

- The TOTAL counter is set once at the start of the command.
- For indexes which are newly-built, the recursively-called
DefineIndex() increments the DONE counter.
- For pre-existing indexes which are ATTACHed rather than built,
DefineIndex() increments the DONE counter, unless the attached index is
partitioned, in which case progress report is not updated.

Author: Ilya Gladyshev
Reviewed-By: Justin Pryzby, Tomas Vondra, Dean Rasheed, Alvaro Herrera, Matthias van de Meent
Discussion: https://www.postgresql.org/message-id/flat/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel%40gmail.com
---
 doc/src/sgml/monitoring.sgml  | 10 ++-
 src/backend/bootstrap/bootparse.y |  2 +
 src/backend/commands/indexcmds.c  | 70 +--
 src/backend/commands/tablecmds.c  |  4 +-
 src/backend/tcop/utility.c|  8 ++-
 src/backend/utils/activity/backend_progress.c | 28 
 src/include/commands/defrem.h |  1 +
 src/include/utils/backend_progress.h  |  1 +
 8 files changed, 115 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 6249bb50d02..3f891b75541 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6901,7 +6901,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
   
   
When creating an index on a partitioned table, this column is set to
-   the total number of partitions on which the index is to be created.
+   the total number of partitions on which the index is to be created or attached.
+   In the case of intermediate partitioned tables, this includes both
+   direct and indirect partitions, but excludes the intermediate
+   partitioned tables themselves.
This field is 0 during a REINDEX.
   
  
@@ -6912,7 +6915,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
   
   
When creating an index on a partitioned table, this column is set to
-   the number of partitions on which the index has been created.
+   the number of partitions on which the index has been created or attached.
+   In the case of intermediate partitioned tables, this includes both
+   direct and indirect partitions, but excludes the intermediate
+   partitioned tables themselves.
This field is 0 during a REINDEX.
   
  
diff --git a/src/backend/bootstrap/bootparse.y b/src/backend/bootstrap/bootparse.y
index 86804bb598e..81a1b7bfec3 100644
--- a/src/backend/bootstrap/bootparse.y
+++ b/src/backend/bootstrap/bootparse.y
@@ -306,6 +306,7 @@ Boot_DeclareIndexStmt:
 $4,
 InvalidOid,
 InvalidOid,
+-1,
 false,
 false,
 false,
@@ -358,6 +359,7 @@ Boot_DeclareUniqueIndexStmt:
 $5,
 InvalidOid,
 InvalidOid,
+-1,
 false,
 false,
 false,
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 16ec0b114e6..62a8f0d6aa2 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -130,6 

Re: Progress report of CREATE INDEX for nested partitioned tables

2023-03-14 Thread Ilya Gladyshev



> 14 марта 2023 г., в 18:34, Justin Pryzby  написал(а):
> 
> On Mon, Mar 13, 2023 at 10:42:59AM -0400, Tom Lane wrote:
>> Justin Pryzby  writes:
>>> On Sun, Mar 12, 2023 at 06:25:13PM -0400, Tom Lane wrote:
 I agree that adding such a field to IndexStmt would be a very bad idea.
 However, adding another parameter to DefineIndex doesn't seem like a
 problem.
>> 
>>> It's a problem since this is a bug and it's desirable to backpatch a
>>> fix, right ?
>> 
>> I do not think this is important enough to justify a back-patch.
> 
> That's fine with me, but it comes as a surprise, and it might invalidate
> earlier discussions, which were working under the constraint of
> maintaining a compatible ABI.
> 
>>> Incrementing by 0 sounds terrible, since someone who has intermediate
>>> partitioned tables is likely to always see 0% done.
>> 
>> How so?  The counter will increase after there's some actual work done,
>> ie building an index.  If there's no indexes to build then it hardly
>> matters, because the command will complete in very little time.
> 
> I misunderstood your idea as suggesting to skip progress reporting for
> *every* intermediate partitioned index, and its leaves.  But I guess
> what you meant is to skip progress reporting when ATTACHing an
> intermediate partitioned index. That seems okay, since 1) intermediate
> partitioned tables are probably rare, and 2) ATTACH is fast, so the
> effect is indistinguisable from querying the progress report a few
> moments later.

+1 that counting attached partitioned indexes as 0 is fine. 

> The idea would be for:
> 1) TOTAL to show the number of direct and indirect leaf partitions;
> 2) update progress while building direct or indirect indexes;
> 3) ATTACHing intermediate partitioned tables to increment by 0;
> 4) ATTACHing a direct child should continue to increment by 1,
> since that common case already works as expected and shouldn't be
> changed.
> 
> The only change from the current patch is (3).  (1) still calls
> count_leaf_partitions(), but only once.  I'd prefer that to rearranging
> the progress reporting to set the TOTAL in ProcessUtilitySlow().
> 
> -- 
> Justin

As for reusing TOTAL calculated outside of DefineIndex, as I can see, 
ProcessUtilitySlow is not the only call site for DefineIndex (although, I don’t 
know whether all of them need progress tracking), for instance, there is ALTER 
TABLE that calls DefineIndex to create index for constraints. So I feel like 
rearranging progress reporting will result in unnecessary code duplication in 
those call sites, so passing in an optional parameter seems to be easier here, 
if we are going to optimize it, after all. Especially if back-patching is a 
non-issue.





Re: Progress report of CREATE INDEX for nested partitioned tables

2023-03-14 Thread Tom Lane
Justin Pryzby  writes:
> The idea would be for:
> 1) TOTAL to show the number of direct and indirect leaf partitions;
> 2) update progress while building direct or indirect indexes;
> 3) ATTACHing intermediate partitioned tables to increment by 0;
> 4) ATTACHing a direct child should continue to increment by 1,
> since that common case already works as expected and shouldn't be
> changed.

OK.

> The only change from the current patch is (3).  (1) still calls
> count_leaf_partitions(), but only once.  I'd prefer that to rearranging
> the progress reporting to set the TOTAL in ProcessUtilitySlow().

I don't agree with that.  find_all_inheritors is fairly expensive
and it seems completely silly to do it twice just to avoid adding
a parameter to DefineIndex.

regards, tom lane




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-03-14 Thread Justin Pryzby
On Mon, Mar 13, 2023 at 10:42:59AM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > On Sun, Mar 12, 2023 at 06:25:13PM -0400, Tom Lane wrote:
> >> I agree that adding such a field to IndexStmt would be a very bad idea.
> >> However, adding another parameter to DefineIndex doesn't seem like a
> >> problem.
> 
> > It's a problem since this is a bug and it's desirable to backpatch a
> > fix, right ?
> 
> I do not think this is important enough to justify a back-patch.

That's fine with me, but it comes as a surprise, and it might invalidate
earlier discussions, which were working under the constraint of
maintaining a compatible ABI.

> > Incrementing by 0 sounds terrible, since someone who has intermediate
> > partitioned tables is likely to always see 0% done.
> 
> How so?  The counter will increase after there's some actual work done,
> ie building an index.  If there's no indexes to build then it hardly
> matters, because the command will complete in very little time.

I misunderstood your idea as suggesting to skip progress reporting for
*every* intermediate partitioned index, and its leaves.  But I guess
what you meant is to skip progress reporting when ATTACHing an
intermediate partitioned index. That seems okay, since 1) intermediate
partitioned tables are probably rare, and 2) ATTACH is fast, so the
effect is indistinguisable from querying the progress report a few
moments later.

The idea would be for:
1) TOTAL to show the number of direct and indirect leaf partitions;
2) update progress while building direct or indirect indexes;
3) ATTACHing intermediate partitioned tables to increment by 0;
4) ATTACHing a direct child should continue to increment by 1,
since that common case already works as expected and shouldn't be
changed.

The only change from the current patch is (3).  (1) still calls
count_leaf_partitions(), but only once.  I'd prefer that to rearranging
the progress reporting to set the TOTAL in ProcessUtilitySlow().

-- 
Justin




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-03-13 Thread Tom Lane
Justin Pryzby  writes:
> On Sun, Mar 12, 2023 at 06:25:13PM -0400, Tom Lane wrote:
>> I agree that adding such a field to IndexStmt would be a very bad idea.
>> However, adding another parameter to DefineIndex doesn't seem like a
>> problem.

> It's a problem since this is a bug and it's desirable to backpatch a
> fix, right ?

I do not think this is important enough to justify a back-patch.

> Incrementing by 0 sounds terrible, since someone who has intermediate
> partitioned tables is likely to always see 0% done.

How so?  The counter will increase after there's some actual work done,
ie building an index.  If there's no indexes to build then it hardly
matters, because the command will complete in very little time.

> And incrementing PARTITIONS_DONE by 1 could lead to bogus progress
> reporting with "N_done > N_Total" if an intermediate partitioned table
> had no leaf partitions at all.

Well, we could fix that if we made TOTAL be the total number of
descendants rather than just the leaves ;-).  But I think not
incrementing is probably better.

regards, tom lane




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-03-13 Thread Justin Pryzby
On Sun, Mar 12, 2023 at 06:25:13PM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > On Sun, Mar 12, 2023 at 04:14:06PM -0400, Tom Lane wrote:
> >> Hm.  Could we get rid of count_leaf_partitions by doing the work in
> >> ProcessUtilitySlow?  Or at least passing that OID list forward instead
> >> of recomputing it?
> 
> > count_leaf_partitions() is called in two places:
> 
> > Once to get PROGRESS_CREATEIDX_PARTITIONS_TOTAL.  It'd be easy enough to
> > pass an integer total via IndexStmt (but I think we wanted to avoid
> > adding anything there, since it's not a part of the statement).
> 
> I agree that adding such a field to IndexStmt would be a very bad idea.
> However, adding another parameter to DefineIndex doesn't seem like a
> problem.

It's a problem since this is a bug and it's desirable to backpatch a
fix, right ?

> Or could we just call pgstat_progress_update_param directly from
> ProcessUtilitySlow, after counting the partitions in the existing loop?

That'd be fine if it was only needed for TOTAL, but it doesn't handle
the 2nd call to count_leaf_partitions().

On Sun, Mar 12, 2023 at 08:15:28PM -0400, Tom Lane wrote:
> I wrote:
> > Justin Pryzby  writes:
> >> count_leaf_partitions() is also called for sub-partitions, in the case
> >> that a matching "partitioned index" already exists, and the progress
> >> report needs to be incremented by the number of leaves for which indexes
> >> were ATTACHED.
> 
> > Can't you increment progress by one at the point where the actual attach
> > happens?
> 
> Oh, never mind; now I realize that the point is that you didn't ever
> iterate over those leaf indexes.
> 
> However, consider a thought experiment: assume for whatever reason that
> all the actual index builds happen first, then all the cases where you
> succeed in attaching a sub-partitioned index happen at the end of the
> command.  In that case, the percentage-done indicator would go from
> some-number to 100% more or less instantly.
> 
> What if we simply do nothing at sub-partitioned indexes?  Or if that's
> slightly too radical, just increase the PARTITIONS_DONE counter by 1?
> That would look indistinguishable from the case where all the attaches
> happen at the end.

Incrementing by 0 sounds terrible, since someone who has intermediate
partitioned tables is likely to always see 0% done.  (It's true that
intermediate partitioned tables don't seem to have been considered by
the original patch, and it's indisputable that progress reporting
currently misbehaves in that case).

And incrementing PARTITIONS_DONE by 1 could lead to bogus progress
reporting with "N_done > N_Total" if an intermediate partitioned table
had no leaf partitions at all.  That's one of the problems this thread
is trying to fix (the other being "total changing in the middle of the
command").

Maybe your idea is usable though, since indirect partitioned indexes
*can* be counted correctly during recursion.  What's hard to fix is the
case that an index is both *partitioned* and *attached*.  Maybe it's
okay to count that case as 0.  The consequence is that the command would
end before the progress report got to 100%.

The other option seems to be to define the progress report to count only
*direct* children.
https://www.postgresql.org/message-id/20221213044331.GJ27893%40telsasoft.com

> The reason I find this annoying is that the non-optional nature of the
> progress reporting mechanism was sold on the basis that it would add
> only negligible overhead.  Adding extra pass(es) over pg_inherits
> breaks that promise.  Maybe it's cheap enough to not matter in the
> big scheme of things, but we should not be having to make arguments
> like that one.

If someone is running a DDL command involving nested partitions, I'm not
so concerned about the cost of additional scans of pg_inherits.  They
either have enough data to justify partitioning partitions, or they're
doing something silly.

-- 
Justin




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-03-12 Thread Tom Lane
I wrote:
> Justin Pryzby  writes:
>> count_leaf_partitions() is also called for sub-partitions, in the case
>> that a matching "partitioned index" already exists, and the progress
>> report needs to be incremented by the number of leaves for which indexes
>> were ATTACHED.

> Can't you increment progress by one at the point where the actual attach
> happens?

Oh, never mind; now I realize that the point is that you didn't ever
iterate over those leaf indexes.

However, consider a thought experiment: assume for whatever reason that
all the actual index builds happen first, then all the cases where you
succeed in attaching a sub-partitioned index happen at the end of the
command.  In that case, the percentage-done indicator would go from
some-number to 100% more or less instantly.

What if we simply do nothing at sub-partitioned indexes?  Or if that's
slightly too radical, just increase the PARTITIONS_DONE counter by 1?
That would look indistinguishable from the case where all the attaches
happen at the end.

regards, tom lane




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-03-12 Thread Tom Lane
Justin Pryzby  writes:
> On Sun, Mar 12, 2023 at 04:14:06PM -0400, Tom Lane wrote:
>> Hm.  Could we get rid of count_leaf_partitions by doing the work in
>> ProcessUtilitySlow?  Or at least passing that OID list forward instead
>> of recomputing it?

> count_leaf_partitions() is called in two places:

> Once to get PROGRESS_CREATEIDX_PARTITIONS_TOTAL.  It'd be easy enough to
> pass an integer total via IndexStmt (but I think we wanted to avoid
> adding anything there, since it's not a part of the statement).

I agree that adding such a field to IndexStmt would be a very bad idea.
However, adding another parameter to DefineIndex doesn't seem like a
problem.  Or could we just call pgstat_progress_update_param directly from
ProcessUtilitySlow, after counting the partitions in the existing loop?

> count_leaf_partitions() is also called for sub-partitions, in the case
> that a matching "partitioned index" already exists, and the progress
> report needs to be incremented by the number of leaves for which indexes
> were ATTACHED.

Can't you increment progress by one at the point where the actual attach
happens?

I also wonder whether leaving non-leaf partitions out of the total
is making things more complicated rather than simpler ...

> We'd need a mapping from OID => npartitions (or to
> compile some data structure of all the partitioned partitions).  I guess
> CreateIndex() could call CreatePartitionDirectory().  But it looks like
> that would be *more* expensive.

The reason I find this annoying is that the non-optional nature of the
progress reporting mechanism was sold on the basis that it would add
only negligible overhead.  Adding extra pass(es) over pg_inherits
breaks that promise.  Maybe it's cheap enough to not matter in the
big scheme of things, but we should not be having to make arguments
like that one.

regards, tom lane




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-03-12 Thread Justin Pryzby
On Sun, Mar 12, 2023 at 04:14:06PM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > On Fri, Mar 10, 2023 at 03:36:10PM -0500, Tom Lane wrote:
> >> I took a look through this.  It seems like basically a good solution,
> >> but the count_leaf_partitions() function is bothering me, for two
> >> reasons:
> 
> > ... find_all_inheritors() will also have been called by
> > ProcessUtilitySlow().  Maybe it's sufficient to mention that ?
> 
> Hm.  Could we get rid of count_leaf_partitions by doing the work in
> ProcessUtilitySlow?  Or at least passing that OID list forward instead
> of recomputing it?

count_leaf_partitions() is called in two places:

Once to get PROGRESS_CREATEIDX_PARTITIONS_TOTAL.  It'd be easy enough to
pass an integer total via IndexStmt (but I think we wanted to avoid
adding anything there, since it's not a part of the statement).

count_leaf_partitions() is also called for sub-partitions, in the case
that a matching "partitioned index" already exists, and the progress
report needs to be incremented by the number of leaves for which indexes
were ATTACHED.  We'd need a mapping from OID => npartitions (or to
compile some data structure of all the partitioned partitions).  I guess
CreateIndex() could call CreatePartitionDirectory().  But it looks like
that would be *more* expensive.

-- 
Justin




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-03-12 Thread Tom Lane
Justin Pryzby  writes:
> On Fri, Mar 10, 2023 at 03:36:10PM -0500, Tom Lane wrote:
>> I took a look through this.  It seems like basically a good solution,
>> but the count_leaf_partitions() function is bothering me, for two
>> reasons:

> ... find_all_inheritors() will also have been called by
> ProcessUtilitySlow().  Maybe it's sufficient to mention that ?

Hm.  Could we get rid of count_leaf_partitions by doing the work in
ProcessUtilitySlow?  Or at least passing that OID list forward instead
of recomputing it?

regards, tom lane




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-03-12 Thread Justin Pryzby
On Fri, Mar 10, 2023 at 03:36:10PM -0500, Tom Lane wrote:
> Justin Pryzby  writes:
> > Update to address a compiler warning in the supplementary patches adding
> > assertions.
> 
> I took a look through this.  It seems like basically a good solution,
> but the count_leaf_partitions() function is bothering me, for two
> reasons:
> 
> 1. It seems like a pretty expensive thing to do.  Don't we have the
> info at hand somewhere already?

I don't know where that would be.  We need the list of both direct *and*
indirect partitions.  See:
https://www.postgresql.org/message-id/5073D187-4200-4A2D-BAC0-91C657E3C22E%40gmail.com

If it would help to avoid the concern, then I might consider proposing
not to call get_rel_relkind() ...

> 2. Is it really safe to do find_all_inheritors with NoLock?  If so,
> a comment explaining why would be good.

In both cases (both for the parent and for case of a partitioned child
with pre-existing indexes being ATTACHed), the table itself is already
locked by DefineIndex():

lockmode = concurrent ? ShareUpdateExclusiveLock : ShareLock;
rel = table_open(relationId, lockmode);

and
childrel = table_open(childRelid, lockmode);
...
table_close(childrel, NoLock);

And, find_all_inheritors() will also have been called by
ProcessUtilitySlow().  Maybe it's sufficient to mention that ?

-- 
Justin




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-03-10 Thread Tom Lane
Justin Pryzby  writes:
> Update to address a compiler warning in the supplementary patches adding
> assertions.

I took a look through this.  It seems like basically a good solution,
but the count_leaf_partitions() function is bothering me, for two
reasons:

1. It seems like a pretty expensive thing to do.  Don't we have the
info at hand somewhere already?

2. Is it really safe to do find_all_inheritors with NoLock?  If so,
a comment explaining why would be good.

regards, tom lane




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-02-16 Thread Justin Pryzby
On Wed, Feb 08, 2023 at 04:40:49PM -0600, Justin Pryzby wrote:
> This squishes together 001/2 as the main patch.
> I believe it's ready.

Update to address a compiler warning in the supplementary patches adding
assertions.
>From 71427bf7cd9927af04513ba3fe99e481a8ba1f61 Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev 
Date: Tue, 31 Jan 2023 19:13:07 +0400
Subject: [PATCH 1/3] fix CREATE INDEX progress report with nested partitions

The progress reporting was added in v12 (ab0dfc961) but the original
patch didn't seem to consider the possibility of nested partitioning.

When called recursively, DefineIndex() would clobber the number of
completed partitions, and it was possible to end up with the TOTAL
counter greater than the DONE counter.

This clarifies/re-defines that the progress reporting counts both direct
and indirect children, but doesn't count intermediate partitioned tables:

- The TOTAL counter is set once at the start of the command.
- For indexes which are newly-built, the recursively-called
DefineIndex() increments the DONE counter.
- For pre-existing indexes which are ATTACHed rather than built,
DefineIndex() increments the DONE counter, and if the attached index is
partitioned, the counter is incremented to account for each of its leaf
partitions.

Author: Ilya Gladyshev
Reviewed-By: Justin Pryzby, Tomas Vondra, Dean Rasheed, Alvaro Herrera, Matthias van de Meent
Discussion: https://www.postgresql.org/message-id/flat/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel%40gmail.com
---
 doc/src/sgml/monitoring.sgml  | 10 ++-
 src/backend/commands/indexcmds.c  | 70 +--
 src/backend/utils/activity/backend_progress.c | 28 
 src/include/utils/backend_progress.h  |  1 +
 4 files changed, 102 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index dca50707ad4..945d64ed2fa 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6896,7 +6896,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
   
   
When creating an index on a partitioned table, this column is set to
-   the total number of partitions on which the index is to be created.
+   the total number of partitions on which the index is to be created or attached.
+   In the case of intermediate partitioned tables, this includes both
+   direct and indirect partitions, but excludes the intermediate
+   partitioned tables themselves.
This field is 0 during a REINDEX.
   
  
@@ -6907,7 +6910,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
   
   
When creating an index on a partitioned table, this column is set to
-   the number of partitions on which the index has been created.
+   the number of partitions on which the index has been created or attached.
+   In the case of intermediate partitioned tables, this includes both
+   direct and indirect partitions, but excludes the intermediate
+   partitioned tables themselves.
This field is 0 during a REINDEX.
   
  
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 16ec0b114e6..147265ddcca 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -130,6 +130,30 @@ typedef struct ReindexErrorInfo
 	char		relkind;
 } ReindexErrorInfo;
 
+
+/*
+ * Count the number of direct and indirect leaf partitions.  Note that this
+ * also excludes foreign tables.
+ */
+static int
+count_leaf_partitions(Oid relid)
+{
+	int			nleaves = 0;
+	List	   *childs = find_all_inheritors(relid, NoLock, NULL);
+	ListCell   *lc;
+
+	foreach(lc, childs)
+	{
+		Oid			partrelid = lfirst_oid(lc);
+
+		if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))
+			nleaves++;
+	}
+
+	list_free(childs);
+	return nleaves;
+}
+
 /*
  * CheckIndexCompatible
  *		Determine whether an existing index definition is compatible with a
@@ -1219,8 +1243,18 @@ DefineIndex(Oid relationId,
 			Relation	parentIndex;
 			TupleDesc	parentDesc;
 
-			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
-		 nparts);
+			/*
+			 * Set the total number of partitions at the start of the command;
+			 * don't update it when being called recursively.
+			 */
+			if (!OidIsValid(parentIndexId))
+			{
+int			total_parts;
+
+total_parts = count_leaf_partitions(relationId);
+pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+			 total_parts);
+			}
 
 			/* Make a local copy of partdesc->oids[], just for safety */
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
@@ -1250,6 +1284,7 @@ DefineIndex(Oid relationId,
 			{
 Oid			childRelid = part_oids[i];
 Relation	childrel;
+char		child_relkind;
 Oid			child_save_userid;
 int			child_save_sec_context;
 int			child_save_nestlevel;
@@ -1259,6 +1294,7 @@ DefineIndex(Oid relationId,
 bool		found = false;
 
 childrel = 

Re: Progress report of CREATE INDEX for nested partitioned tables

2023-02-08 Thread Justin Pryzby
On Thu, Feb 02, 2023 at 09:18:07AM -0600, Justin Pryzby wrote:
> On Wed, Feb 01, 2023 at 07:24:48PM +0100, Matthias van de Meent wrote:
> > On Wed, 1 Feb 2023 at 18:51, Ilya Gladyshev  
> > wrote:
> > > 1 февр. 2023 г., в 20:27, Matthias van de Meent 
> > >  написал(а):
> > >
> > >> In HEAD we set TOTAL to whatever number partitioned table we're
> > >> currently processing has - regardless of whether we're the top level
> > >> statement.
> > >> With the patch we instead add the number of child relations to that
> > >> count, for which REL_HAS_STORAGE(child) -- or at least, in the v3
> > >> posted by Ilya. Approximately immediately after updating that count we
> > >> recurse to the child relations, and that only returns once it is done
> > >> creating the indexes, so both TOTAL and DONE go up as we process more
> > >> partitions in the hierarchy.
> > >
> > > The TOTAL in the patch is set only when processing the top-level parent 
> > > and it is not updated when we recurse, so yes, it is constant. From v3:
> > 
> > Ugh, I misread the patch, more specifically count_leaf_partitions and
> > the !OidIsValid(parentIndexId) condition changes.
> > 
> > You are correct, sorry for the noise.
> 
> That suggests that the comments could've been more clear.  I added a
> comment suggested by Tomas and adjusted some others and wrote a commit
> message.  I even ran pgindent for about the 3rd time ever.
> 
> 002 are my changes as a separate patch, which you could apply to your
> local branch.
> 
> And 003/4 are assertions that I wrote to demonstrate the problem and the
> verify the fixes, but not being proposed for commit.

That was probably a confusing way to present it - I should've sent the
relative diff as a .txt rather than as patch 002.

This squishes together 001/2 as the main patch.
I believe it's ready.

-- 
Justin
>From ed643acace062cad15cd6ea9dc76a663de9c97d4 Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev 
Date: Tue, 31 Jan 2023 19:13:07 +0400
Subject: [PATCH 1/3] fix CREATE INDEX progress report with nested partitions

The progress reporting was added in v12 (ab0dfc961) but the original
patch didn't seem to consider the possibility of nested partitioning.

When called recursively, DefineIndex() would clobber the number of
completed partitions, and it was possible to end up with the TOTAL
counter greater than the DONE counter.

This clarifies/re-defines that the progress reporting counts both direct
and indirect children, but doesn't count intermediate partitioned tables:

- The TOTAL counter is set once at the start of the command.
- For indexes which are newly-built, the recursively-called
DefineIndex() increments the DONE counter.
- For pre-existing indexes which are ATTACHed rather than built,
DefineIndex() increments the DONE counter, and if the attached index is
partitioned, the counter is incremented to account for each of its leaf
partitions.

Author: Ilya Gladyshev
Reviewed-By: Justin Pryzby, Tomas Vondra, Dean Rasheed, Alvaro Herrera, Matthias van de Meent
Discussion: https://www.postgresql.org/message-id/flat/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel%40gmail.com
---
 doc/src/sgml/monitoring.sgml  | 10 ++-
 src/backend/commands/indexcmds.c  | 70 +--
 src/backend/utils/activity/backend_progress.c | 28 
 src/include/utils/backend_progress.h  |  1 +
 4 files changed, 102 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1756f1a4b67..fa139dcece7 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6601,7 +6601,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
   
   
When creating an index on a partitioned table, this column is set to
-   the total number of partitions on which the index is to be created.
+   the total number of partitions on which the index is to be created or attached.
+   In the case of intermediate partitioned tables, this includes both
+   direct and indirect partitions, but excludes the intermediate
+   partitioned tables themselves.
This field is 0 during a REINDEX.
   
  
@@ -6612,7 +6615,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
   
   
When creating an index on a partitioned table, this column is set to
-   the number of partitions on which the index has been created.
+   the number of partitions on which the index has been created or attached.
+   In the case of intermediate partitioned tables, this includes both
+   direct and indirect partitions, but excludes the intermediate
+   partitioned tables themselves.
This field is 0 during a REINDEX.
   
  
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 16ec0b114e6..84c84c41acc 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -130,6 +130,30 @@ typedef struct ReindexErrorInfo
 	char		relkind;

Re: Progress report of CREATE INDEX for nested partitioned tables

2023-02-02 Thread Justin Pryzby
On Wed, Feb 01, 2023 at 07:24:48PM +0100, Matthias van de Meent wrote:
> On Wed, 1 Feb 2023 at 18:51, Ilya Gladyshev  
> wrote:
> >
> > 1 февр. 2023 г., в 20:27, Matthias van de Meent 
> >  написал(а):
> >
> >> In HEAD we set TOTAL to whatever number partitioned table we're
> >> currently processing has - regardless of whether we're the top level
> >> statement.
> >> With the patch we instead add the number of child relations to that
> >> count, for which REL_HAS_STORAGE(child) -- or at least, in the v3
> >> posted by Ilya. Approximately immediately after updating that count we
> >> recurse to the child relations, and that only returns once it is done
> >> creating the indexes, so both TOTAL and DONE go up as we process more
> >> partitions in the hierarchy.
> >
> >
> > The TOTAL in the patch is set only when processing the top-level parent and 
> > it is not updated when we recurse, so yes, it is constant. From v3:
> 
> Ugh, I misread the patch, more specifically count_leaf_partitions and
> the !OidIsValid(parentIndexId) condition changes.
> 
> You are correct, sorry for the noise.

That suggests that the comments could've been more clear.  I added a
comment suggested by Tomas and adjusted some others and wrote a commit
message.  I even ran pgindent for about the 3rd time ever.

002 are my changes as a separate patch, which you could apply to your
local branch.

And 003/4 are assertions that I wrote to demonstrate the problem and the
verify the fixes, but not being proposed for commit.

-- 
Justin
>From 375961e18aaa7ed7b2ebee972ad07c7a38099ef4 Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev 
Date: Tue, 31 Jan 2023 19:13:07 +0400
Subject: [PATCH 1/4] create index progress increment

---
 doc/src/sgml/monitoring.sgml  |  4 +-
 src/backend/commands/indexcmds.c  | 64 +--
 src/backend/utils/activity/backend_progress.c | 25 
 src/include/utils/backend_progress.h  |  1 +
 4 files changed, 87 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1756f1a4b67..a911900271c 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6601,7 +6601,7 @@ FROM pg_stat_get_backend_idset() AS backendid;
   
   
When creating an index on a partitioned table, this column is set to
-   the total number of partitions on which the index is to be created.
+   the total number of leaf partitions on which the index is to be created or attached.
This field is 0 during a REINDEX.
   
  
@@ -6612,7 +6612,7 @@ FROM pg_stat_get_backend_idset() AS backendid;
   
   
When creating an index on a partitioned table, this column is set to
-   the number of partitions on which the index has been created.
+   the number of leaf partitions on which the index has been created or attached.
This field is 0 during a REINDEX.
   
  
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 16ec0b114e6..936b4e3c1db 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -130,6 +130,30 @@ typedef struct ReindexErrorInfo
 	char		relkind;
 } ReindexErrorInfo;
 
+
+/*
+ * Count the number of direct and indirect leaf partitions, excluding foreign
+ * tables.
+ */
+static int
+count_leaf_partitions(Oid relid)
+{
+	int			nleaves = 0;
+	List	   *childs = find_all_inheritors(relid, NoLock, NULL);
+	ListCell   *lc;
+
+	foreach(lc, childs)
+	{
+		Oid	partrelid = lfirst_oid(lc);
+
+		if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))
+			nleaves++;
+	}
+
+	list_free(childs);
+	return nleaves;
+}
+
 /*
  * CheckIndexCompatible
  *		Determine whether an existing index definition is compatible with a
@@ -1219,8 +1243,14 @@ DefineIndex(Oid relationId,
 			Relation	parentIndex;
 			TupleDesc	parentDesc;
 
-			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
-		 nparts);
+			if (!OidIsValid(parentIndexId))
+			{
+int total_parts;
+
+total_parts = count_leaf_partitions(relationId);
+pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+			 total_parts);
+			}
 
 			/* Make a local copy of partdesc->oids[], just for safety */
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
@@ -1250,6 +1280,7 @@ DefineIndex(Oid relationId,
 			{
 Oid			childRelid = part_oids[i];
 Relation	childrel;
+char		child_relkind;
 Oid			child_save_userid;
 int			child_save_sec_context;
 int			child_save_nestlevel;
@@ -1259,6 +1290,7 @@ DefineIndex(Oid relationId,
 bool		found = false;
 
 childrel = table_open(childRelid, lockmode);
+child_relkind = RelationGetForm(childrel)->relkind;
 
 GetUserIdAndSecContext(_save_userid,
 	   _save_sec_context);
@@ -1431,9 +1463,25 @@ DefineIndex(Oid relationId,
 	SetUserIdAndSecContext(child_save_userid,
 		   child_save_sec_context);
 }
+else
+{
+	

Re: Progress report of CREATE INDEX for nested partitioned tables

2023-02-01 Thread Matthias van de Meent
On Wed, 1 Feb 2023 at 18:51, Ilya Gladyshev  wrote:
>
> 1 февр. 2023 г., в 20:27, Matthias van de Meent 
>  написал(а):
>
>> In HEAD we set TOTAL to whatever number partitioned table we're
>> currently processing has - regardless of whether we're the top level
>> statement.
>> With the patch we instead add the number of child relations to that
>> count, for which REL_HAS_STORAGE(child) -- or at least, in the v3
>> posted by Ilya. Approximately immediately after updating that count we
>> recurse to the child relations, and that only returns once it is done
>> creating the indexes, so both TOTAL and DONE go up as we process more
>> partitions in the hierarchy.
>
>
> The TOTAL in the patch is set only when processing the top-level parent and 
> it is not updated when we recurse, so yes, it is constant. From v3:

Ugh, I misread the patch, more specifically count_leaf_partitions and
the !OidIsValid(parentIndexId) condition changes.

You are correct, sorry for the noise.

Kind regards,

Matthias van de Meent




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-02-01 Thread Ilya Gladyshev


> 1 февр. 2023 г., в 20:27, Matthias van de Meent 
>  написал(а):
> 
> On Wed, 1 Feb 2023 at 16:53, Justin Pryzby  > wrote:
>> 
>> On Wed, Feb 01, 2023 at 04:21:35PM +0100, Matthias van de Meent wrote:
>>> On Wed, 1 Feb 2023 at 15:21, Ilya Gladyshev  
>>> wrote:
> 1 февр. 2023 г., в 16:01, Alvaro Herrera  
> написал(а):
> Hmm, count_leaf_partitions has to scan pg_inherits and do a syscache
> lookup for every single element therein ... this sounds slow.
> 
> In one of the callsites, we already have the partition descriptor
> available.  We could just scan partdesc->is_leaf[] and add one for each
> 'true' value we see there.
 
 The problem is that partdesc contains only direct children of the table 
 and we need all the children down the inheritance tree to count the total 
 number of leaf partitions in the first callsite.
 
> In the other callsite, we had the table open just a few lines before the
> place you call count_leaf_partitions.  Maybe we can rejigger things by
> examining its state before closing it: if relkind is not partitioned we
> know leaf_partitions=0, and only if partitioned we count leaf partitions.
> I think that would save some work.  I also wonder if it's worth writing
> a bespoke function for counting leaf partitions rather than relying on
> find_all_inheritors.
 
 Sure, added this condition to avoid the extra work here.
 
>>> 
   When creating an index on a partitioned table, this column is set to
 -   the total number of partitions on which the index is to be created.
 +   the total number of leaf partitions on which the index is to be 
 created or attached.
>>> 
>>> I think we should also add a note about the (now) non-constant nature
>>> of the value, something along the lines of "This value is updated as
>>> we're processing and discovering partitioned tables in the partition
>>> hierarchy".
>> 
>> But the TOTAL is constant, right ?  Updating the total when being called
>> recursively is the problem these patches fix.
> 
> If that's the case, then I'm not seeing the 'fix' part of the patch. I
> thought this patch was fixing the provably incorrect TOTAL value where
> DONE > TOTAL due to the recursive operation overwriting the DONE/TOTAL
> values instead of updating them.
> 
> In HEAD we set TOTAL to whatever number partitioned table we're
> currently processing has - regardless of whether we're the top level
> statement.
> With the patch we instead add the number of child relations to that
> count, for which REL_HAS_STORAGE(child) -- or at least, in the v3
> posted by Ilya. Approximately immediately after updating that count we
> recurse to the child relations, and that only returns once it is done
> creating the indexes, so both TOTAL and DONE go up as we process more
> partitions in the hierarchy.

The TOTAL in the patch is set only when processing the top-level parent and it 
is not updated when we recurse, so yes, it is constant. From v3:

@@ -1219,8 +1243,14 @@ DefineIndex(Oid relationId,
RelationparentIndex;
TupleDesc   parentDesc;
 
-   
pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
-   
 nparts);
+   if (!OidIsValid(parentIndexId))
+   {
+   int total_parts;
+
+   total_parts = count_leaf_partitions(relationId);
+   
pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+   
 total_parts);
+   }


It is set to the total number of children on all levels of the hierarchy, not 
just the current one, so the total value doesn’t need to be updated later, 
because it is set to the correct value from the very beginning. 

It is the DONE counter that is updated, and when we attach an index of a 
partition that is itself a partitioned table (like a2 in your example, if it 
already had an index created), it will be updated by the number of children of 
the partition.

@@ -1431,9 +1463,25 @@ DefineIndex(Oid relationId,

SetUserIdAndSecContext(child_save_userid,

   child_save_sec_context);
}
+   else
+   {
+   int attached_parts = 1;
+
+   if 
(RELKIND_HAS_PARTITIONS(child_relkind))
+   attached_parts = 
count_leaf_partitions(childRelid);
+
+   /*
+ 

Re: Progress report of CREATE INDEX for nested partitioned tables

2023-02-01 Thread Matthias van de Meent
On Wed, 1 Feb 2023 at 16:53, Justin Pryzby  wrote:
>
> On Wed, Feb 01, 2023 at 04:21:35PM +0100, Matthias van de Meent wrote:
> > On Wed, 1 Feb 2023 at 15:21, Ilya Gladyshev  
> > wrote:
> > > > 1 февр. 2023 г., в 16:01, Alvaro Herrera  
> > > > написал(а):
> > > > Hmm, count_leaf_partitions has to scan pg_inherits and do a syscache
> > > > lookup for every single element therein ... this sounds slow.
> > > >
> > > > In one of the callsites, we already have the partition descriptor
> > > > available.  We could just scan partdesc->is_leaf[] and add one for each
> > > > 'true' value we see there.
> > >
> > > The problem is that partdesc contains only direct children of the table 
> > > and we need all the children down the inheritance tree to count the total 
> > > number of leaf partitions in the first callsite.
> > >
> > > > In the other callsite, we had the table open just a few lines before the
> > > > place you call count_leaf_partitions.  Maybe we can rejigger things by
> > > > examining its state before closing it: if relkind is not partitioned we
> > > > know leaf_partitions=0, and only if partitioned we count leaf 
> > > > partitions.
> > > > I think that would save some work.  I also wonder if it's worth writing
> > > > a bespoke function for counting leaf partitions rather than relying on
> > > > find_all_inheritors.
> > >
> > > Sure, added this condition to avoid the extra work here.
> > >
> >
> > >When creating an index on a partitioned table, this column is set 
> > > to
> > > -   the total number of partitions on which the index is to be 
> > > created.
> > > +   the total number of leaf partitions on which the index is to be 
> > > created or attached.
> >
> > I think we should also add a note about the (now) non-constant nature
> > of the value, something along the lines of "This value is updated as
> > we're processing and discovering partitioned tables in the partition
> > hierarchy".
>
> But the TOTAL is constant, right ?  Updating the total when being called
> recursively is the problem these patches fix.

If that's the case, then I'm not seeing the 'fix' part of the patch. I
thought this patch was fixing the provably incorrect TOTAL value where
DONE > TOTAL due to the recursive operation overwriting the DONE/TOTAL
values instead of updating them.

In HEAD we set TOTAL to whatever number partitioned table we're
currently processing has - regardless of whether we're the top level
statement.
With the patch we instead add the number of child relations to that
count, for which REL_HAS_STORAGE(child) -- or at least, in the v3
posted by Ilya. Approximately immediately after updating that count we
recurse to the child relations, and that only returns once it is done
creating the indexes, so both TOTAL and DONE go up as we process more
partitions in the hierarchy.

An example hierarchy:
CREATE TABLE parent (a, b) partition by list (a);
CREATE TABLE a1
PARTITION OF parent FOR VALUES IN (1)
PARTITION BY LIST (b);
CREATE TABLE a1bd
PARTITION OF a1 DEFAULT;

CREATE TABLE a2
PARTITION OF parent FOR VALUES IN (2)
PARTITION BY LIST (b);
CREATE TABLE a2bd
PARTITION OF a2 DEFAULT;

INSERT INTO parent (a, b) SELECT * from generate_series(1, 2) a(a)
cross join generate_series(1, 10),b(b);
CREATE INDEX ON parent(a,b);

This will only discover that a2bd will need to be indexed after a1bd
is done (or vice versa, depending on which order a1 and a2 are
processed in the ).

Kind regards,

Matthias van de Meent




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-02-01 Thread Justin Pryzby
On Wed, Feb 01, 2023 at 04:21:35PM +0100, Matthias van de Meent wrote:
> On Wed, 1 Feb 2023 at 15:21, Ilya Gladyshev  
> wrote:
> > > 1 февр. 2023 г., в 16:01, Alvaro Herrera  
> > > написал(а):
> > > Hmm, count_leaf_partitions has to scan pg_inherits and do a syscache
> > > lookup for every single element therein ... this sounds slow.
> > >
> > > In one of the callsites, we already have the partition descriptor
> > > available.  We could just scan partdesc->is_leaf[] and add one for each
> > > 'true' value we see there.
> >
> > The problem is that partdesc contains only direct children of the table and 
> > we need all the children down the inheritance tree to count the total 
> > number of leaf partitions in the first callsite.
> >
> > > In the other callsite, we had the table open just a few lines before the
> > > place you call count_leaf_partitions.  Maybe we can rejigger things by
> > > examining its state before closing it: if relkind is not partitioned we
> > > know leaf_partitions=0, and only if partitioned we count leaf partitions.
> > > I think that would save some work.  I also wonder if it's worth writing
> > > a bespoke function for counting leaf partitions rather than relying on
> > > find_all_inheritors.
> >
> > Sure, added this condition to avoid the extra work here.
> >
> 
> >When creating an index on a partitioned table, this column is set to
> > -   the total number of partitions on which the index is to be created.
> > +   the total number of leaf partitions on which the index is to be 
> > created or attached.
> 
> I think we should also add a note about the (now) non-constant nature
> of the value, something along the lines of "This value is updated as
> we're processing and discovering partitioned tables in the partition
> hierarchy".

But the TOTAL is constant, right ?  Updating the total when being called
recursively is the problem these patches fix.

-- 
Justin




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-02-01 Thread Matthias van de Meent
On Wed, 1 Feb 2023 at 15:21, Ilya Gladyshev  wrote:
>
> > 1 февр. 2023 г., в 16:01, Alvaro Herrera  
> > написал(а):
> >
> > Hmm, count_leaf_partitions has to scan pg_inherits and do a syscache
> > lookup for every single element therein ... this sounds slow.
> >
> > In one of the callsites, we already have the partition descriptor
> > available.  We could just scan partdesc->is_leaf[] and add one for each
> > 'true' value we see there.
>
> The problem is that partdesc contains only direct children of the table and 
> we need all the children down the inheritance tree to count the total number 
> of leaf partitions in the first callsite.
>
> > In the other callsite, we had the table open just a few lines before the
> > place you call count_leaf_partitions.  Maybe we can rejigger things by
> > examining its state before closing it: if relkind is not partitioned we
> > know leaf_partitions=0, and only if partitioned we count leaf partitions.
> > I think that would save some work.  I also wonder if it's worth writing
> > a bespoke function for counting leaf partitions rather than relying on
> > find_all_inheritors.
>
> Sure, added this condition to avoid the extra work here.
>

>When creating an index on a partitioned table, this column is set to
> -   the total number of partitions on which the index is to be created.
> +   the total number of leaf partitions on which the index is to be 
> created or attached.

I think we should also add a note about the (now) non-constant nature
of the value, something along the lines of "This value is updated as
we're processing and discovering partitioned tables in the partition
hierarchy".

Kind regards,

Matthias van de Meent




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-02-01 Thread Ilya Gladyshev

> 1 февр. 2023 г., в 16:01, Alvaro Herrera  написал(а):
> 
> Hmm, count_leaf_partitions has to scan pg_inherits and do a syscache
> lookup for every single element therein ... this sounds slow. 
> 
> In one of the callsites, we already have the partition descriptor
> available.  We could just scan partdesc->is_leaf[] and add one for each
> 'true' value we see there.

The problem is that partdesc contains only direct children of the table and we 
need all the children down the inheritance tree to count the total number of 
leaf partitions in the first callsite.

> In the other callsite, we had the table open just a few lines before the
> place you call count_leaf_partitions.  Maybe we can rejigger things by
> examining its state before closing it: if relkind is not partitioned we
> know leaf_partitions=0, and only if partitioned we count leaf partitions.
> I think that would save some work.  I also wonder if it's worth writing
> a bespoke function for counting leaf partitions rather than relying on
> find_all_inheritors.

Sure, added this condition to avoid the extra work here.


v3-0001-create-index-progress-increment.patch
Description: Binary data





Re: Progress report of CREATE INDEX for nested partitioned tables

2023-02-01 Thread Alvaro Herrera
Hmm, count_leaf_partitions has to scan pg_inherits and do a syscache
lookup for every single element therein ... this sounds slow. 

In one of the callsites, we already have the partition descriptor
available.  We could just scan partdesc->is_leaf[] and add one for each
'true' value we see there.

In the other callsite, we had the table open just a few lines before the
place you call count_leaf_partitions.  Maybe we can rejigger things by
examining its state before closing it: if relkind is not partitioned we
know leaf_partitions=0, and only if partitioned we count leaf partitions.
I think that would save some work.  I also wonder if it's worth writing
a bespoke function for counting leaf partitions rather than relying on
find_all_inheritors.

I think there's probably not much point optimizing it further than that.
If there was, then we could think about creating a data representation
that we can build for the entire partitioning hierarchy in a single pass
with the count of leaf partitions that sit below each specific non-leaf;
but I think that's just over-engineering.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"After a quick R of TFM, all I can say is HOLY CR** THAT IS COOL! PostgreSQL was
amazing when I first started using it at 7.2, and I'm continually astounded by
learning new features and techniques made available by the continuing work of
the development team."
Berend Tober, http://archives.postgresql.org/pgsql-hackers/2007-08/msg01009.php




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-01-31 Thread Ilya Gladyshev


> 1 февр. 2023 г., в 08:29, Justin Pryzby  написал(а):
> 
> On Tue, Jan 31, 2023 at 07:32:20PM +0400, Ilya Gladyshev wrote:
>>> 17 янв. 2023 г., в 23:44, Tomas Vondra  
>>> написал(а):
>>> Do we actually need the new parts_done field? I mean, we already do
>>> track the value - at PROGRESS_CREATEIDX_PARTITIONS_DONE index in the
>>> st_progress_param array. Can't we simply read it from there? Then we
>>> would not have ABI issues with the new field added to IndexStmt.
>> 
>> I think it’s a good approach and it could be useful outside of scope of this 
>> patch too. So I have attached a patch, that introduces 
>> pgstat_progress_incr_param function for this purpose. There’s one thing I am 
>> not sure about, IIUC, we can assume that the only process that can write 
>> into MyBEEntry of the current backend is the current backend itself, 
>> therefore looping to get consistent reads from this array is not required. 
>> Please correct me, if I am wrong here.
> 
> Thanks for the updated patch.
> 
> I think you're right - pgstat_begin_read_activity() is used for cases
> involving other backends.  It ought to be safe for a process to read its
> own status bits, since we know they're not also being written.
> 
> You changed DefineIndex() to update progress for the leaf indexes' when
> called recursively.  The caller updates the progress for "attached"
> indexes, but not created ones.  That allows providing fine-granularity
> progress updates when using intermediate partitions, right ?  (Rather
> than updating the progress by more than one at a time in the case of
> intermediate partitioning).
> 
> If my understanding is right, that's subtle, and adds a bit of
> complexity to the current code, so could use careful commentary.  I
> suggest:
> 
> * If the index was attached, update progress for all its direct and
> * indirect leaf indexes all at once.  If the index was built by calling
> * DefineIndex() recursively, the called function is responsible for
> * updating the progress report for built indexes.
> 
> ...
> 
> * If this is the top-level index, we're done. When called recursively
> * for child tables, the done partition counter is incremented now,
> * rather than in the caller.

Yes, you are correct about the intended behavior, I added your comments to the 
patch.

> I guess you know that there were compiler warnings (related to your
> question).
> https://cirrus-ci.com/task/6571212386598912
> 
> pgstat_progress_incr_param() could call pgstat_progress_update_param()
> rather than using its own Assert() and WRITE_ACTIVITY calls.  I'm not
> sure which I prefer, though.
> 
> Also, there are whitespace/tab/style issues in
> pgstat_progress_incr_param().
> 
> -- 
> Justin

Thank you for the review, I fixed the aforementioned issues in the v2.


v2-0001-create-index-progress-increment.patch
Description: Binary data



Re: Progress report of CREATE INDEX for nested partitioned tables

2023-01-31 Thread Justin Pryzby
On Tue, Jan 31, 2023 at 07:32:20PM +0400, Ilya Gladyshev wrote:
> > 17 янв. 2023 г., в 23:44, Tomas Vondra  
> > написал(а):
> > Do we actually need the new parts_done field? I mean, we already do
> > track the value - at PROGRESS_CREATEIDX_PARTITIONS_DONE index in the
> > st_progress_param array. Can't we simply read it from there? Then we
> > would not have ABI issues with the new field added to IndexStmt.
> 
> I think it’s a good approach and it could be useful outside of scope of this 
> patch too. So I have attached a patch, that introduces 
> pgstat_progress_incr_param function for this purpose. There’s one thing I am 
> not sure about, IIUC, we can assume that the only process that can write into 
> MyBEEntry of the current backend is the current backend itself, therefore 
> looping to get consistent reads from this array is not required. Please 
> correct me, if I am wrong here.

Thanks for the updated patch.

I think you're right - pgstat_begin_read_activity() is used for cases
involving other backends.  It ought to be safe for a process to read its
own status bits, since we know they're not also being written.

You changed DefineIndex() to update progress for the leaf indexes' when
called recursively.  The caller updates the progress for "attached"
indexes, but not created ones.  That allows providing fine-granularity
progress updates when using intermediate partitions, right ?  (Rather
than updating the progress by more than one at a time in the case of
intermediate partitioning).

If my understanding is right, that's subtle, and adds a bit of
complexity to the current code, so could use careful commentary.  I
suggest:

* If the index was attached, update progress for all its direct and
* indirect leaf indexes all at once.  If the index was built by calling
* DefineIndex() recursively, the called function is responsible for
* updating the progress report for built indexes.

...

* If this is the top-level index, we're done. When called recursively
* for child tables, the done partition counter is incremented now,
* rather than in the caller.

I guess you know that there were compiler warnings (related to your
question).
https://cirrus-ci.com/task/6571212386598912

pgstat_progress_incr_param() could call pgstat_progress_update_param()
rather than using its own Assert() and WRITE_ACTIVITY calls.  I'm not
sure which I prefer, though.

Also, there are whitespace/tab/style issues in
pgstat_progress_incr_param().

-- 
Justin




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-01-31 Thread Ilya Gladyshev

> 17 янв. 2023 г., в 23:44, Tomas Vondra  
> написал(а):
> Do we actually need the new parts_done field? I mean, we already do
> track the value - at PROGRESS_CREATEIDX_PARTITIONS_DONE index in the
> st_progress_param array. Can't we simply read it from there? Then we
> would not have ABI issues with the new field added to IndexStmt.

I think it’s a good approach and it could be useful outside of scope of this 
patch too. So I have attached a patch, that introduces 
pgstat_progress_incr_param function for this purpose. There’s one thing I am 
not sure about, IIUC, we can assume that the only process that can write into 
MyBEEntry of the current backend is the current backend itself, therefore 
looping to get consistent reads from this array is not required. Please correct 
me, if I am wrong here.



0001-create-index-progress-increment.patch
Description: Binary data


Re: Progress report of CREATE INDEX for nested partitioned tables

2023-01-25 Thread Dean Rasheed
On Wed, 18 Jan 2023 at 15:25, Justin Pryzby  wrote:
>
> TBH, I think the best approach is what I did in:
> 0001-report-top-parent-progress-for-CREATE-INDEX.txt
>
> That's a minimal patch, ideal for backpatching.
>
> ..which defines/clarifies that the progress reporting is only for
> *direct* children.  That avoids the need to change any data structures,
> and it's what was probably intended by the original patch, which doesn't
> seem to have considered intermediate partitioned tables.
>
> I think it'd be fine to re-define that in some future release, to allow
> showing indirect children (probably only "leaves", and not intermediate
> partitioned tables).  Or "total_bytes" or other global progress.
>

Hmm. My expectation as a user is that partitions_total includes both
direct and indirect (leaf) child partitions, that it is set just once
at the start of the process, and that partitions_done increases from
zero to partitions_total as the index-build proceeds. I think that
should be achievable with a minimally invasive patch that doesn't
change any data structures.

I agree with all the review comments Tomas posted. In particular, this
shouldn't need any changes to IndexStmt. I think the best approach
would be to just add a new function to backend_progress.c that offsets
a specified progress parameter by a specified amount, so that you can
just increment partitions_done by one or more, at the appropriate
points.

Regards,
Dean




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-01-18 Thread Justin Pryzby
TBH, I think the best approach is what I did in:
0001-report-top-parent-progress-for-CREATE-INDEX.txt

That's a minimal patch, ideal for backpatching.

..which defines/clarifies that the progress reporting is only for
*direct* children.  That avoids the need to change any data structures,
and it's what was probably intended by the original patch, which doesn't
seem to have considered intermediate partitioned tables.

I think it'd be fine to re-define that in some future release, to allow
showing indirect children (probably only "leaves", and not intermediate
partitioned tables).  Or "total_bytes" or other global progress.

-- 
Justin




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-01-17 Thread Tomas Vondra



On 1/17/23 21:59, Justin Pryzby wrote:
> On Tue, Jan 17, 2023 at 08:44:36PM +0100, Tomas Vondra wrote:
>> On 1/9/23 09:44, Ilya Gladyshev wrote:
>>> On Sun, 2023-01-08 at 10:48 -0600, Justin Pryzby wrote:
 On Sat, Dec 17, 2022 at 08:30:02AM -0600, Justin Pryzby wrote:
> ...
>
> @committers: Is it okay to add nparts_done to IndexStmt ?

 Any hint about this ?

>>
>> AFAIK fields added at the end of a struct is seen as acceptable from the
>> ABI point of view. It's not risk-free, but we did that multiple times
>> when fixing bugs, IIRC.
> 
> My question isn't whether it's okay to add a field at the end of a
> struct in general, but rather whether it's acceptable to add this field
> at the end of this struct, and not because it's unsafe to do in a minor
> release, but whether someone is going to say that it's an abuse of the
> data structure.
> 

Ah, you mean whether it's the right place for the parameter?

I don't think it is, really. IndexStmt is meant to be a description of
the CREATE INDEX statement, not something that includes info about how
it's processed. But it's the only struct we pass to the DefineIndex for
child indexes, so the only alternatives I can think of is a global
variable and the (existing) param array field.

Nevertheless, ABI compatibility is still relevant for backbranches.


>> Do we actually need the new parts_done field? I mean, we already do
>> track the value - at PROGRESS_CREATEIDX_PARTITIONS_DONE index in the
>> st_progress_param array. Can't we simply read it from there? Then we
>> would not have ABI issues with the new field added to IndexStmt.
> 
> Good idea to try.
> 

OK

>> As for the earlier discussion about the "correct" behavior for leaf vs.
>> non-leaf partitions and whether to calculate partitions in advance:
>>
>> * I agree it's desirable to count partitions in advance, instead of
>> adding incrementally. The view is meant to provide "overview" of the
>> CREATE INDEX progress, and imagine you get
>>
>>   partitions_total partitions_done
>> 10   9
>>
>> so that you believe you're ~90% done. But then it jumps to the next
>> child and now you get
>>
>>   partitions_total partitions_done
>> 20  10
>>
>> which makes the view a bit useless for it's primary purpose, IMHO.
> 
> To be clear, that's the current, buggy behavior, and this thread is
> about fixing it.  The proposed patches all ought to avoid that.
> 
> But the bug isn't caused by not "calculating partitions in advance".
> Rather, the issue is that currently, the "total" is overwritten while
> recursing.
> 

You're right the issue us about overwriting the total - not sure what I
was thinking about when writing this. I guess I got distracted by the
discussion about "preliminary planning" etc. Sorry for the confusion.

> That's a separate question from whether indirect partitions are counted
> or not.
> 
>> I wonder if we could improve this to track the size of partitions for
>> total/done? That'd make leaf/non-leaf distinction unnecessary, because
>> non-leaf partitions have size 0.
> 
> Maybe, but it's out of scope for this patch.
> 

+1, it was just an idea for future.


-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-01-17 Thread Justin Pryzby
On Tue, Jan 17, 2023 at 08:44:36PM +0100, Tomas Vondra wrote:
> On 1/9/23 09:44, Ilya Gladyshev wrote:
> > On Sun, 2023-01-08 at 10:48 -0600, Justin Pryzby wrote:
> >> On Sat, Dec 17, 2022 at 08:30:02AM -0600, Justin Pryzby wrote:
> >>> ...
> >>>
> >>> @committers: Is it okay to add nparts_done to IndexStmt ?
> >>
> >> Any hint about this ?
> >>
> 
> AFAIK fields added at the end of a struct is seen as acceptable from the
> ABI point of view. It's not risk-free, but we did that multiple times
> when fixing bugs, IIRC.

My question isn't whether it's okay to add a field at the end of a
struct in general, but rather whether it's acceptable to add this field
at the end of this struct, and not because it's unsafe to do in a minor
release, but whether someone is going to say that it's an abuse of the
data structure.

> Do we actually need the new parts_done field? I mean, we already do
> track the value - at PROGRESS_CREATEIDX_PARTITIONS_DONE index in the
> st_progress_param array. Can't we simply read it from there? Then we
> would not have ABI issues with the new field added to IndexStmt.

Good idea to try.

> As for the earlier discussion about the "correct" behavior for leaf vs.
> non-leaf partitions and whether to calculate partitions in advance:
> 
> * I agree it's desirable to count partitions in advance, instead of
> adding incrementally. The view is meant to provide "overview" of the
> CREATE INDEX progress, and imagine you get
> 
>   partitions_total partitions_done
> 10   9
> 
> so that you believe you're ~90% done. But then it jumps to the next
> child and now you get
> 
>   partitions_total partitions_done
> 20  10
> 
> which makes the view a bit useless for it's primary purpose, IMHO.

To be clear, that's the current, buggy behavior, and this thread is
about fixing it.  The proposed patches all ought to avoid that.

But the bug isn't caused by not "calculating partitions in advance".
Rather, the issue is that currently, the "total" is overwritten while
recursing.

That's a separate question from whether indirect partitions are counted
or not.

> I wonder if we could improve this to track the size of partitions for
> total/done? That'd make leaf/non-leaf distinction unnecessary, because
> non-leaf partitions have size 0.

Maybe, but it's out of scope for this patch.

Thanks for looking.

-- 
Justin




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-01-17 Thread Tomas Vondra
On 1/9/23 09:44, Ilya Gladyshev wrote:
> On Sun, 2023-01-08 at 10:48 -0600, Justin Pryzby wrote:
>> On Sat, Dec 17, 2022 at 08:30:02AM -0600, Justin Pryzby wrote:
>>> ...
>>>
>>> @committers: Is it okay to add nparts_done to IndexStmt ?
>>
>> Any hint about this ?
>>

AFAIK fields added at the end of a struct is seen as acceptable from the
ABI point of view. It's not risk-free, but we did that multiple times
when fixing bugs, IIRC.

The primary risk is old extensions (built on older minor version)
running on new server, getting confused by new fields (and implied
shifts in the structs). But fields at the end should be safe - the
extension simply ignores the stuff at the end. The one problem would be
arrays of structs, because even a field at the end changes the array
stride. But I don't think we do that with IndexStmt ...

Of course, if the "old" extension itself allocates the struct and passes
it to core code, that might still be an issue, because it'll allocate a
smaller struct, and core might see bogus data at the end.

On the other hand, new extensions on old server may get confused too,
because it may try setting a field that does not exist.

So ultimately it's about weighing risks vs. benefits - evaluating
whether fixing the issue is actually worth it.

The question is if/how many such extensions messing with IndexStmt in
this way actually exist. That is, allocate IndexStmt (or array of it). I
haven't found any, but maybe some extensions for index or partition
management do it? Not sure.

But ...

Do we actually need the new parts_done field? I mean, we already do
track the value - at PROGRESS_CREATEIDX_PARTITIONS_DONE index in the
st_progress_param array. Can't we simply read it from there? Then we
would not have ABI issues with the new field added to IndexStmt.

>> This should be resolved before the "CIC on partitioned tables" patch,
>> which I think is otherwise done.
> 
> I suggest that we move on with the IndexStmt patch and see what the
> committers have to say about it. I have brushed the patch up a bit,
> fixing TODOs and adding docs as per our discussion above.
> 

I did take a look at the patch, so here are my 2c:

1) num_leaf_partitions says it's "excluding foreign tables" but then it
uses RELKIND_HAS_STORAGE() which excludes various others relkinds, e.g.
partitioned tables etc. Minor, but perhaps a bit confusing.

2) I'd probably say count_leaf_partitions() instead.

3) The new part in DefineIndex counting leaf partitions should have a
comment before

if (!OidIsValid(parentIndexId))
{ ... }

4) It's a bit confusing that one of the branches in DefineIndex just
sets stmt->parts_done without calling pgstat_progress_update_param
(while the other one does both). AFAICS the call is not needed because
we already updated it during the recursive DefineIndex call, but maybe
the comment should mention that?


As for the earlier discussion about the "correct" behavior for leaf vs.
non-leaf partitions and whether to calculate partitions in advance:

* I agree it's desirable to count partitions in advance, instead of
adding incrementally. The view is meant to provide "overview" of the
CREATE INDEX progress, and imagine you get

  partitions_total partitions_done
10   9

so that you believe you're ~90% done. But then it jumps to the next
child and now you get

  partitions_total partitions_done
20  10

which makes the view a bit useless for it's primary purpose, IMHO.


* I don't care very much about leaf vs. non-leaf partitions. If we
exclude non-leaf ones, fine with me. But the number of non-leaf ones
should be much smaller than leaf ones, and if the partition already has
a matching index that distorts the tracking too. Furthermore the
partitions may have different size etc. so the progress is only
approximate anyway.

I wonder if we could improve this to track the size of partitions for
total/done? That'd make leaf/non-leaf distinction unnecessary, because
non-leaf partitions have size 0.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Progress report of CREATE INDEX for nested partitioned tables

2023-01-09 Thread Ilya Gladyshev
On Sun, 2023-01-08 at 10:48 -0600, Justin Pryzby wrote:
> On Sat, Dec 17, 2022 at 08:30:02AM -0600, Justin Pryzby wrote:
> > We have the common problem of too many patches.
> > 
> > https://www.postgresql.org/message-id/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel%40gmail.com
> > This changes the progress reporting to show indirect children as
> > "total", and adds a global variable to track recursion into
> > DefineIndex(), allowing it to be incremented without the value
> > being
> > lost to the caller.
> > 
> > https://www.postgresql.org/message-id/20221211063334.GB27893%40telsasoft.com
> > This also counts indirect children, but only increments the
> > progress
> > reporting in the parent.  This has the disadvantage that when
> > intermediate partitions are in use, the done_partitions counter
> > will
> > "jump" from (say) 20 to 30 without ever hitting 21-29.
> > 
> > https://www.postgresql.org/message-id/20221213044331.GJ27893%40telsasoft.com
> > This has two alternate patches:
> > - One patch changes to only update progress reporting of *direct*
> >   children.  This is minimal, but discourages any future plan to
> > track
> >   progress involving intermediate partitions with finer
> > granularity.
> > - A alternative patch adds IndexStmt.nparts_done, and allows
> > reporting
> >   fine-grained progress involving intermediate partitions.
> > 
> > https://www.postgresql.org/message-id/flat/039564d234fc3d014c555a7ee98be69a9e724836.ca...@gmail.com
> > This also reports progress of intermediate children.  The first
> > patch
> > does it by adding an argument to DefineIndex() (which isn't okay to
> > backpatch).  And an alternate patch does it by adding to IndexStmt.
> > 
> > @committers: Is it okay to add nparts_done to IndexStmt ?
> 
> Any hint about this ?
> 
> This should be resolved before the "CIC on partitioned tables" patch,
> which I think is otherwise done.

I suggest that we move on with the IndexStmt patch and see what the
committers have to say about it. I have brushed the patch up a bit,
fixing TODOs and adding docs as per our discussion above.

From 490d8afa7cb952e5b3947d81d85bf9690499e8a3 Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev 
Date: Tue, 13 Dec 2022 22:04:45 +0400
Subject: [PATCH v2] parts_done via IndexStmt

---
 doc/src/sgml/monitoring.sgml   |  4 +--
 src/backend/bootstrap/bootparse.y  |  2 ++
 src/backend/commands/indexcmds.c   | 57 +++---
 src/backend/parser/parse_utilcmd.c |  2 ++
 src/include/nodes/parsenodes.h |  1 +
 5 files changed, 59 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index cf220c3bcb..2c32a92364 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6565,7 +6565,7 @@ FROM pg_stat_get_backend_idset() AS backendid;
   
   
When creating an index on a partitioned table, this column is set to
-   the total number of partitions on which the index is to be created.
+   the total number of leaf partitions on which the index is to be created or attached.
This field is 0 during a REINDEX.
   
  
@@ -6576,7 +6576,7 @@ FROM pg_stat_get_backend_idset() AS backendid;
   
   
When creating an index on a partitioned table, this column is set to
-   the number of partitions on which the index has been created.
+   the number of leaf partitions on which the index has been created or attached.
This field is 0 during a REINDEX.
   
  
diff --git a/src/backend/bootstrap/bootparse.y b/src/backend/bootstrap/bootparse.y
index 86804bb598..dad9c38d90 100644
--- a/src/backend/bootstrap/bootparse.y
+++ b/src/backend/bootstrap/bootparse.y
@@ -296,6 +296,7 @@ Boot_DeclareIndexStmt:
 	stmt->concurrent = false;
 	stmt->if_not_exists = false;
 	stmt->reset_default_tblspc = false;
+	stmt->parts_done = -1;
 
 	/* locks and races need not concern us in bootstrap mode */
 	relationId = RangeVarGetRelid(stmt->relation, NoLock,
@@ -348,6 +349,7 @@ Boot_DeclareUniqueIndexStmt:
 	stmt->concurrent = false;
 	stmt->if_not_exists = false;
 	stmt->reset_default_tblspc = false;
+	stmt->parts_done = -1;
 
 	/* locks and races need not concern us in bootstrap mode */
 	relationId = RangeVarGetRelid(stmt->relation, NoLock,
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 8afc006f89..be4d1c884e 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -130,6 +130,29 @@ typedef struct ReindexErrorInfo
 	char		relkind;
 } ReindexErrorInfo;
 
+
+/*
+ * Count the number of direct and indirect leaf partitions, excluding foreign
+ * tables.
+ */
+static int
+num_leaf_partitions(Oid relid)
+{
+	int		nleaves = 0;
+	List	*childs = find_all_inheritors(relid, NoLock, NULL);
+	ListCell *lc;
+
+	foreach(lc, childs)
+	{
+		Oid	partrelid = lfirst_oid(lc);
+		if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))
+	

Re: Progress report of CREATE INDEX for nested partitioned tables

2023-01-08 Thread Justin Pryzby
On Sat, Dec 17, 2022 at 08:30:02AM -0600, Justin Pryzby wrote:
> We have the common problem of too many patches.
> 
> https://www.postgresql.org/message-id/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel%40gmail.com
> This changes the progress reporting to show indirect children as
> "total", and adds a global variable to track recursion into
> DefineIndex(), allowing it to be incremented without the value being
> lost to the caller.
> 
> https://www.postgresql.org/message-id/20221211063334.GB27893%40telsasoft.com
> This also counts indirect children, but only increments the progress
> reporting in the parent.  This has the disadvantage that when
> intermediate partitions are in use, the done_partitions counter will
> "jump" from (say) 20 to 30 without ever hitting 21-29.
> 
> https://www.postgresql.org/message-id/20221213044331.GJ27893%40telsasoft.com
> This has two alternate patches:
> - One patch changes to only update progress reporting of *direct*
>   children.  This is minimal, but discourages any future plan to track
>   progress involving intermediate partitions with finer granularity.
> - A alternative patch adds IndexStmt.nparts_done, and allows reporting
>   fine-grained progress involving intermediate partitions.
> 
> https://www.postgresql.org/message-id/flat/039564d234fc3d014c555a7ee98be69a9e724836.ca...@gmail.com
> This also reports progress of intermediate children.  The first patch
> does it by adding an argument to DefineIndex() (which isn't okay to
> backpatch).  And an alternate patch does it by adding to IndexStmt.
> 
> @committers: Is it okay to add nparts_done to IndexStmt ?

Any hint about this ?

This should be resolved before the "CIC on partitioned tables" patch,
which I think is otherwise done.




Re: Progress report of CREATE INDEX for nested partitioned tables

2022-12-17 Thread Justin Pryzby
On Tue, Dec 13, 2022 at 10:18:58PM +0400, Ilya Gladyshev wrote:
> > > I actually think that the progress view would be better off without
> > > the total number of partitions, 
> > 
> > Just curious - why ?
> 
> We don't really know how many indexes we are going to create, unless we
> have some kind of preliminary "planning" stage where we acumulate all
> the relations that will need to have indexes created (rather than
> attached). And if someone wants the total, it can be calculated
> manually without this view, it's less user-friendly, but if we can't do
> it well, I would leave it up to the user.

Thanks.  One other reason is that the partitions (and sub-partitions)
may not be equally sized.  Also, I've said before that it's weird to
report macroscopic progress about the number of partitions finihed in
the same place as reporting microscopic details like the number of
blocks done of the relation currently being processed.

> > I have another proposal: since the original patch 3.5 years ago
> > didn't
> > consider or account for sub-partitions, let's not start counting them
> > now.  It was never defined whether they were included or not (and I
> > guess that they're not common) so we can take this opportunity to
> > clarify the definition.
> 
> I have had this thought initially, but then I thought that it's not
> what I would want, if I was to track progress of multi-level
> partitioned tables (but yeah, I guess it's pretty uncommon). In this
> respect, I like your initial counter-proposal more, because it leaves
> us room to improve this in the future. Otherwise, if we commit to
> reporting only top-level partitions now, I'm not sure we will have the
> opportunity to change this.

We have the common problem of too many patches.

https://www.postgresql.org/message-id/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel%40gmail.com
This changes the progress reporting to show indirect children as
"total", and adds a global variable to track recursion into
DefineIndex(), allowing it to be incremented without the value being
lost to the caller.

https://www.postgresql.org/message-id/20221211063334.GB27893%40telsasoft.com
This also counts indirect children, but only increments the progress
reporting in the parent.  This has the disadvantage that when
intermediate partitions are in use, the done_partitions counter will
"jump" from (say) 20 to 30 without ever hitting 21-29.

https://www.postgresql.org/message-id/20221213044331.GJ27893%40telsasoft.com
This has two alternate patches:
- One patch changes to only update progress reporting of *direct*
  children.  This is minimal, but discourages any future plan to track
  progress involving intermediate partitions with finer granularity.
- A alternative patch adds IndexStmt.nparts_done, and allows reporting
  fine-grained progress involving intermediate partitions.

https://www.postgresql.org/message-id/flat/039564d234fc3d014c555a7ee98be69a9e724836.ca...@gmail.com
This also reports progress of intermediate children.  The first patch
does it by adding an argument to DefineIndex() (which isn't okay to
backpatch).  And an alternate patch does it by adding to IndexStmt.

@committers: Is it okay to add nparts_done to IndexStmt ?

-- 
Justin




Re: Progress report of CREATE INDEX for nested partitioned tables

2022-12-13 Thread Ilya Gladyshev
On Mon, 2022-12-12 at 22:43 -0600, Justin Pryzby wrote:
> On Mon, Dec 12, 2022 at 11:39:23PM +0400, Ilya Gladyshev wrote:
> > 
> > > Could you check what I've written as a counter-proposal ?
> > 
> > I think that this might be a good solution to start with, it gives
> > us the opportunity to improve the granularity later without any
> > surprising changes for the end user. We could use this patch for
> > previous versions and make more granular output in the latest. What
> > do you think?
> 
> Somehow, it hadn't occured to me that my patch "lost granularity" by
> incrementing the progress bar by more than one...  Shoot.
> 
> > I actually think that the progress view would be better off without
> > the total number of partitions, 
> 
> Just curious - why ?

We don't really know how many indexes we are going to create, unless we
have some kind of preliminary "planning" stage where we acumulate all
the relations that will need to have indexes created (rather than
attached). And if someone wants the total, it can be calculated
manually without this view, it's less user-friendly, but if we can't do
it well, I would leave it up to the user.

> 
> > With this in mind, I think your proposal to exclude catalog-only
> > indexes sounds reasonable to me, but I feel like the docs are off
> > in this case, because the attached indexes are not created, but we
> > pretend like they are in this metric, so we should fix one or the
> > other.
> 
> I agree that the docs should indicate whether we're counting "all
> partitions", "direct partitions", and whether or not that includes
> partitioned partitions, or just leaf partitions.

Agree. I think that docs should also be explicit about the attached
indexes, if we decide to count them in as "created".

> I have another proposal: since the original patch 3.5 years ago
> didn't
> consider or account for sub-partitions, let's not start counting them
> now.  It was never defined whether they were included or not (and I
> guess that they're not common) so we can take this opportunity to
> clarify the definition.

I have had this thought initially, but then I thought that it's not
what I would want, if I was to track progress of multi-level
partitioned tables (but yeah, I guess it's pretty uncommon). In this
respect, I like your initial counter-proposal more, because it leaves
us room to improve this in the future. Otherwise, if we commit to
reporting only top-level partitions now, I'm not sure we will have the
opportunity to change this.


> Alternately, if it's okay to add nparts_done to the IndexStmt, then
> that's easy.

Yeah, or we could add another argument to DefineIndex. I don't know if
it's ok, or which option is better here in terms of compatibility and
interface-wise, so I have tried both of them, see the attached patches.



From 4f55526f8a534d6b2f27b6b17cadaee0370e4cbc Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev 
Date: Tue, 13 Dec 2022 19:42:53 +0400
Subject: [PATCH] parts_done via DefineIndex args

---
 src/backend/commands/indexcmds.c | 52 +++-
 src/backend/commands/tablecmds.c |  7 +++--
 src/backend/tcop/utility.c   |  3 +-
 src/include/commands/defrem.h|  3 +-
 4 files changed, 53 insertions(+), 12 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index b5b860c3ab..9fa9109b9e 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -129,6 +129,29 @@ typedef struct ReindexErrorInfo
charrelkind;
 } ReindexErrorInfo;
 
+
+/*
+ * Count the number of direct and indirect leaf partitions, excluding foreign
+ * tables.
+ */
+static int
+num_leaf_partitions(Oid relid)
+{
+   int nleaves = 0;
+   List*childs = find_all_inheritors(relid, NoLock, NULL);
+   ListCell *lc;
+
+   foreach(lc, childs)
+   {
+   Oid partrelid = lfirst_oid(lc);
+   if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))
+   nleaves++;
+   }
+
+   list_free(childs);
+   return nleaves;
+}
+
 /*
  * CheckIndexCompatible
  * Determine whether an existing index definition is compatible 
with a
@@ -530,7 +553,8 @@ DefineIndex(Oid relationId,
bool check_rights,
bool check_not_in_use,
bool skip_build,
-   bool quiet)
+   bool quiet,
+   int *parts_done)
 {
boolconcurrent;
char   *indexRelationName;
@@ -568,6 +592,7 @@ DefineIndex(Oid relationId,
Oid root_save_userid;
int root_save_sec_context;
int root_save_nestlevel;
+   int root_parts_done;
 
root_save_nestlevel = NewGUCNestLevel();
 
@@ -1218,8 +1243,17 @@ DefineIndex(Oid relationId,
RelationparentIndex;

Re: Progress report of CREATE INDEX for nested partitioned tables

2022-12-12 Thread Justin Pryzby
On Mon, Dec 12, 2022 at 11:39:23PM +0400, Ilya Gladyshev wrote:
> 
> > Could you check what I've written as a counter-proposal ?
> 
> I think that this might be a good solution to start with, it gives us the 
> opportunity to improve the granularity later without any surprising changes 
> for the end user. We could use this patch for previous versions and make more 
> granular output in the latest. What do you think?

Somehow, it hadn't occured to me that my patch "lost granularity" by
incrementing the progress bar by more than one...  Shoot.

> I actually think that the progress view would be better off without the total 
> number of partitions, 

Just curious - why ?

> With this in mind, I think your proposal to exclude catalog-only indexes 
> sounds reasonable to me, but I feel like the docs are off in this case, 
> because the attached indexes are not created, but we pretend like they are in 
> this metric, so we should fix one or the other.

I agree that the docs should indicate whether we're counting "all
partitions", "direct partitions", and whether or not that includes
partitioned partitions, or just leaf partitions.

I have another proposal: since the original patch 3.5 years ago didn't
consider or account for sub-partitions, let's not start counting them
now.  It was never defined whether they were included or not (and I
guess that they're not common) so we can take this opportunity to
clarify the definition.

Alternately, if it's okay to add nparts_done to the IndexStmt, then
that's easy.

-- 
Justin
>From 2e93bd37ca3c8add1f8e3e44a9f3906c332b83f2 Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev 
Date: Fri, 9 Dec 2022 23:17:29 +0400
Subject: [PATCH] report top parent progress for CREATE INDEX

! add asserts, avoid global var
! do not count intermediate/nested/sub-partitions
---
 src/backend/commands/indexcmds.c  | 30 ++-
 src/backend/utils/activity/backend_progress.c | 79 +++
 src/include/nodes/parsenodes.h|  2 +
 3 files changed, 107 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index b5b860c3abf..6e6bba9d3a9 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -1218,8 +1218,28 @@ DefineIndex(Oid relationId,
 			Relation	parentIndex;
 			TupleDesc	parentDesc;
 
-			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
-		 nparts);
+			if (!OidIsValid(parentIndexId))
+			{
+int			nleaves = 0;
+List		*childs;
+ListCell	*lc;
+
+/*
+ * Count the number of physical children, excluding foreign
+ * tables, intermediate partitioned tables, as well as the
+ * partitioned index itself.
+ */
+childs = find_all_inheritors(relationId, NoLock, NULL);
+foreach(lc, childs)
+{
+	Oid		partrelid = lfirst_oid(lc);
+	if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))
+		nleaves++;
+}
+
+pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+			 nleaves);
+			}
 
 			/* Make a local copy of partdesc->oids[], just for safety */
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
@@ -1431,8 +1451,10 @@ DefineIndex(Oid relationId,
 		   child_save_sec_context);
 }
 
-pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
-			 i + 1);
+if (RELKIND_HAS_STORAGE(get_rel_relkind(childRelid)))
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
+ ++stmt->nparts_done);
+
 free_attrmap(attmap);
 			}
 
diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c
index f29199725b7..c661ad94782 100644
--- a/src/backend/utils/activity/backend_progress.c
+++ b/src/backend/utils/activity/backend_progress.c
@@ -10,6 +10,7 @@
  */
 #include "postgres.h"
 
+#include "commands/progress.h"
 #include "port/atomics.h"		/* for memory barriers */
 #include "utils/backend_progress.h"
 #include "utils/backend_status.h"
@@ -37,6 +38,82 @@ pgstat_progress_start_command(ProgressCommandType cmdtype, Oid relid)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+static void
+pgstat_progress_asserts(void)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+	volatile int64		*a = beentry->st_progress_param;
+
+	/*
+	 * If the command fails due to interrupt or error, the values may be
+	 * less than rather than equal to expected, final value.
+	 */
+
+	switch (beentry->st_progress_command)
+	{
+	case PROGRESS_COMMAND_VACUUM:
+		Assert(a[PROGRESS_VACUUM_HEAP_BLKS_SCANNED] <=
+a[PROGRESS_VACUUM_TOTAL_HEAP_BLKS]);
+		Assert(a[PROGRESS_VACUUM_HEAP_BLKS_VACUUMED] <=
+a[PROGRESS_VACUUM_TOTAL_HEAP_BLKS]);
+		Assert(a[PROGRESS_VACUUM_NUM_DEAD_TUPLES] <=
+a[PROGRESS_VACUUM_MAX_DEAD_TUPLES]);
+		break;
+
+	case PROGRESS_COMMAND_ANALYZE:
+		Assert(a[PROGRESS_ANALYZE_BLOCKS_DONE] <=
+a[PROGRESS_ANALYZE_BLOCKS_TOTAL]);
+		/* Extended stats can be skipped */
+		

Re: Progress report of CREATE INDEX for nested partitioned tables

2022-12-12 Thread Ilya Gladyshev

> Could you check what I've written as a counter-proposal ?

I think that this might be a good solution to start with, it gives us the 
opportunity to improve the granularity later without any surprising changes for 
the end user. We could use this patch for previous versions and make more 
granular output in the latest. What do you think?

> As long as we're changing partitions_done to include nested
> sub-partitions, it seems to me like we should exclude intermediate
> "catalog-only" partitioned indexes, and count only physical leaf
> partitions.  Should it alo exclude any children with matching indexes,
> which will also be catalog-only changes?  Probably not.
> 
> The docs say:
> |When creating an index on a partitioned table, this column is set to the
> |total number of partitions on which the index is to be created. This
> |field is 0 during a REINDEX.

I agree with you on catalog-only partitioned indexes, but I think that in the 
perfect world we should exclude all the relations where the index isn’t 
actually created, so that means excluding attached indexes as well. However, 
IMO doing it this way will require too much of a code rewrite for quite a minor 
feature (but we could do it, ofc). I actually think that the progress view 
would be better off without the total number of partitions, but I’m not sure we 
have this option now. With this in mind, I think your proposal to exclude 
catalog-only indexes sounds reasonable to me, but I feel like the docs are off 
in this case, because the attached indexes are not created, but we pretend like 
they are in this metric, so we should fix one or the other.

> 
>> I changed current behaviour to report the total number of partitions in
>> the inheritance tree and fixed recursion in the attached patch. I used
>> a static variable to keep the counter to avoid ABI breakage of
>> DefineIndex, so that we could backpatch this to previous versions.
> 
> I wrote a bunch of assertions for this, which seems to have uncovered an
> similar issue with COPY progress reporting, dating to 8a4f618e7.  I'm
> not sure the assertions are okay.  I imagine they may break other
> extensions, as with file_fdw.
> 
> -- 
> Justin
> <0001-fix-progress-reporting-of-nested-partitioned-indexes.patch>



Re: Progress report of CREATE INDEX for nested partitioned tables

2022-12-10 Thread Justin Pryzby
On Sat, Dec 10, 2022 at 12:18:32PM +0400, Ilya Gladyshev wrote:
> Hi,
> 
> I have noticed that progress reporting for CREATE INDEX of partitioned
> tables seems to be working poorly for nested partitioned tables. In
> particular, it overwrites total and done partitions count when it
> recurses down to child partitioned tables and it only reports top-level
> partitions. So it's not hard to see something like this during CREATE
> INDEX now:
> 
> postgres=# select partitions_total, partitions_done from
> pg_stat_progress_create_index ;
>  partitions_total | partitions_done 
> --+-
> 1 |   2
> (1 row)

Yeah.  I didn't verify, but it looks like this is a bug going to back to
v12.  As you said, when called recursively, DefineIndex() clobbers the
number of completed partitions.

Maybe DefineIndex() could flatten the list of partitions.  But I don't
think that can work easily with iteration rather than recursion.

Could you check what I've written as a counter-proposal ?

As long as we're changing partitions_done to include nested
sub-partitions, it seems to me like we should exclude intermediate
"catalog-only" partitioned indexes, and count only physical leaf
partitions.  Should it alo exclude any children with matching indexes,
which will also be catalog-only changes?  Probably not.

The docs say:
|When creating an index on a partitioned table, this column is set to the
|total number of partitions on which the index is to be created. This
|field is 0 during a REINDEX.

> I changed current behaviour to report the total number of partitions in
> the inheritance tree and fixed recursion in the attached patch. I used
> a static variable to keep the counter to avoid ABI breakage of
> DefineIndex, so that we could backpatch this to previous versions.

I wrote a bunch of assertions for this, which seems to have uncovered an
similar issue with COPY progress reporting, dating to 8a4f618e7.  I'm
not sure the assertions are okay.  I imagine they may break other
extensions, as with file_fdw.

-- 
Justin
>From b8077babf9a101f9d1bf41dd1ad866d2ea38b603 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 10 Dec 2022 16:17:50 -0600
Subject: [PATCH] fix progress reporting of nested, partitioned indexes..

---
 src/backend/commands/indexcmds.c  | 51 +++--
 src/backend/utils/activity/backend_progress.c | 72 +++
 2 files changed, 118 insertions(+), 5 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index b5b860c3abf..6caa1f94ed7 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -482,6 +482,26 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 	}
 }
 
+/*
+ * Count the number of direct and indirect leaf partitions, excluding foreign
+ * tables.
+ */
+static int
+num_leaf_partitions(Oid relid)
+{
+	int		nleaves = 0;
+	List	*childs = find_all_inheritors(relid, NoLock, NULL);
+	ListCell *lc;
+
+	foreach(lc, childs)
+	{
+		Oid	partrelid = lfirst_oid(lc);
+		if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))
+			nleaves++;
+	}
+
+	return nleaves;
+}
 
 /*
  * DefineIndex
@@ -1212,14 +1232,22 @@ DefineIndex(Oid relationId,
 		partdesc = RelationGetPartitionDesc(rel, true);
 		if ((!stmt->relation || stmt->relation->inh) && partdesc->nparts > 0)
 		{
-			int			nparts = partdesc->nparts;
+			int			nparts = partdesc->nparts; /* only direct children */
+			int			nparts_done = 0; /* direct and indirect children */
 			Oid		   *part_oids = palloc_array(Oid, nparts);
 			bool		invalidate_parent = false;
 			Relation	parentIndex;
 			TupleDesc	parentDesc;
 
-			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
-		 nparts);
+			if (!OidIsValid(parentIndexId))
+			{
+/*
+ * Report the number of leaf partitions (excluding foreign
+ * tables), not just direct children.
+ */
+pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+			 num_leaf_partitions(relationId));
+			}
 
 			/* Make a local copy of partdesc->oids[], just for safety */
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
@@ -1431,8 +1459,21 @@ DefineIndex(Oid relationId,
 		   child_save_sec_context);
 }
 
-pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
-			 i + 1);
+if (!OidIsValid(parentIndexId))
+{
+	/*
+	 * Report progress only when processing top-level indexes.
+	 * When recursing, the called functions wouldn't know the
+	 * current number of partitions which were processed.
+	 * After recursing, increment by the number of children.
+	 * This works also for physical/leaf partitions.
+	 */
+	nparts_done += num_leaf_partitions(childRelid);
+
+	pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
+ nparts_done);
+}
+
 free_attrmap(attmap);
 			}
 
diff --git 

Progress report of CREATE INDEX for nested partitioned tables

2022-12-10 Thread Ilya Gladyshev
Hi,

I have noticed that progress reporting for CREATE INDEX of partitioned
tables seems to be working poorly for nested partitioned tables. In
particular, it overwrites total and done partitions count when it
recurses down to child partitioned tables and it only reports top-level
partitions. So it's not hard to see something like this during CREATE
INDEX now:

postgres=# select partitions_total, partitions_done from
pg_stat_progress_create_index ;
 partitions_total | partitions_done 
--+-
1 |   2
(1 row)


I changed current behaviour to report the total number of partitions in
the inheritance tree and fixed recursion in the attached patch. I used
a static variable to keep the counter to avoid ABI breakage of
DefineIndex, so that we could backpatch this to previous versions.

Thanks,
Ilya Gladyshev
From 342caa3d4ce273b14a6998c20c32b862d2d4c890 Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev 
Date: Fri, 9 Dec 2022 23:17:29 +0400
Subject: [PATCH] report top parent progress for CREATE INDEX

---
 src/backend/commands/indexcmds.c | 36 ++--
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index b5b860c3ab..80557dad8d 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -129,6 +129,12 @@ typedef struct ReindexErrorInfo
 	char		relkind;
 } ReindexErrorInfo;
 
+/*
+ * Used to report the number of partitions,
+ * that were processed during index creation.
+ */
+static int create_index_parts_done;
+
 /*
  * CheckIndexCompatible
  *		Determine whether an existing index definition is compatible with a
@@ -1218,8 +1224,18 @@ DefineIndex(Oid relationId,
 			Relation	parentIndex;
 			TupleDesc	parentDesc;
 
-			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
-		 nparts);
+			if (!OidIsValid(parentIndexId))
+			{
+List *inhs;
+
+/* Reset counter of done partitions when processing root index */
+create_index_parts_done = 0;
+inhs = find_all_inheritors(relationId, NoLock, NULL);
+/* inheritance tree size without root itself */
+pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+			 list_length(inhs) - 1);
+list_free(inhs);
+			}
 
 			/* Make a local copy of partdesc->oids[], just for safety */
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
@@ -1431,8 +1447,6 @@ DefineIndex(Oid relationId,
 		   child_save_sec_context);
 }
 
-pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
-			 i + 1);
 free_attrmap(attmap);
 			}
 
@@ -1465,13 +1479,17 @@ DefineIndex(Oid relationId,
 
 		/*
 		 * Indexes on partitioned tables are not themselves built, so we're
-		 * done here.
+		 * done here. If it is a child partitioned table, increment done parts counter.
 		 */
 		AtEOXact_GUC(false, root_save_nestlevel);
 		SetUserIdAndSecContext(root_save_userid, root_save_sec_context);
 		table_close(rel, NoLock);
 		if (!OidIsValid(parentIndexId))
 			pgstat_progress_end_command();
+		else
+			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
+		 ++create_index_parts_done);
+
 		return address;
 	}
 
@@ -1483,9 +1501,15 @@ DefineIndex(Oid relationId,
 		/* Close the heap and we're done, in the non-concurrent case */
 		table_close(rel, NoLock);
 
-		/* If this is the top-level index, we're done. */
+		/*
+		 * If this is the top-level index, we're done, otherwise, increment
+		 * done partition counter.
+		 */
 		if (!OidIsValid(parentIndexId))
 			pgstat_progress_end_command();
+		else
+			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
+		 ++create_index_parts_done);
 
 		return address;
 	}
-- 
2.30.2