Re: error context for vacuum to include block number

2020-04-01 Thread Alvaro Herrera
On 2020-Apr-01, Andres Freund wrote:

> On 2020-04-01 07:54:45 +0530, Amit Kapila wrote:
> > Pushed. I think we are done here.  The patch is marked as committed in
> > CF.  Thank you!
> 
> Awesome! Thanks for all your work on this, all. This'll make it a lot
> easier to debug errors during autovacuum.

Seconded!

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-04-01 Thread Andres Freund
On 2020-04-01 07:54:45 +0530, Amit Kapila wrote:
> Pushed. I think we are done here.  The patch is marked as committed in
> CF.  Thank you!

Awesome! Thanks for all your work on this, all. This'll make it a lot
easier to debug errors during autovacuum.




Re: error context for vacuum to include block number

2020-03-31 Thread Amit Kapila
On Tue, Mar 31, 2020 at 8:53 AM Justin Pryzby  wrote:
>
> On Tue, Mar 31, 2020 at 07:50:45AM +0530, Amit Kapila wrote:
> > One thing I have noticed is that there is some saving by using
> > vacrelstats->relnamespace as that avoids sys cache lookup.  OTOH,
> > using vacrelstats->relname doesn't save much, but maybe for the sake
> > of consistency, we can use it.
>
> Mostly I wrote that to avoid repeatedly calling functions/macro with long 
> name.
> I consider it a minor cleanup.  I think we should put them to use.  The
> LVRelStats describes them as not being specifically for the error context.
>

Pushed. I think we are done here.  The patch is marked as committed in
CF.  Thank you!


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-30 Thread Justin Pryzby
On Tue, Mar 31, 2020 at 07:50:45AM +0530, Amit Kapila wrote:
> On Mon, Mar 30, 2020 at 9:56 PM Justin Pryzby  wrote:
> >
> > On Mon, Mar 30, 2020 at 02:31:53PM +0530, Amit Kapila wrote:
> > > The v37-0003-Avoid-some-calls-to-RelationGetRelationName.patch looks
> > > good to me.  I have added the commit message in the patch.
> >
> > I realized the 0003 patch has an error in lazy_vacuum_index; it should be:
> >
> > -   RelationGetRelationName(indrel),
> > +   vacrelstats->indname,
> >
> 
> Hmm, it is like that in the patch I have sent yesterday.  Are you
> referring to the patch I have sent yesterday or some older version?

Oh good.  That was a recent fix I made, and I was afraid I'd never sent it, and
not sure if you'd used it.  Looks like it was fixed since v36...  As you can
see, I'm losing track of my branches.  It will be nice to finally put this to
rest.

> One thing I have noticed is that there is some saving by using
> vacrelstats->relnamespace as that avoids sys cache lookup.  OTOH,
> using vacrelstats->relname doesn't save much, but maybe for the sake
> of consistency, we can use it.

Mostly I wrote that to avoid repeatedly calling functions/macro with long name.
I consider it a minor cleanup.  I think we should put them to use.  The
LVRelStats describes them as not being specifically for the error context.

-- 
Justin




Re: error context for vacuum to include block number

2020-03-30 Thread Amit Kapila
On Mon, Mar 30, 2020 at 9:56 PM Justin Pryzby  wrote:
>
> On Mon, Mar 30, 2020 at 02:31:53PM +0530, Amit Kapila wrote:
> > The v37-0003-Avoid-some-calls-to-RelationGetRelationName.patch looks
> > good to me.  I have added the commit message in the patch.
>
> I realized the 0003 patch has an error in lazy_vacuum_index; it should be:
>
> -   RelationGetRelationName(indrel),
> +   vacrelstats->indname,
>

Hmm, it is like that in the patch I have sent yesterday.  Are you
referring to the patch I have sent yesterday or some older version?
One thing I have noticed is that there is some saving by using
vacrelstats->relnamespace as that avoids sys cache lookup.  OTOH,
using vacrelstats->relname doesn't save much, but maybe for the sake
of consistency, we can use it.

> That was maybe due to originally using a separate errinfo for each phase, with
> one "char *relname" and no "char *indrel".
>
> > I don't think the above change is correct.  How will vacrelstats have
> > correct values when vacuum_one_index is called via parallel workers
> > (via parallel_vacuum_main)?
>
> You're right: parallel main's vacrelstats was added by this patchset and only
> the error context fields were initialized.  I fixed it up in the attached by
> also setting vacrelstats->new_rel_tuples and old_live_tuples.  It's not clear
> if this is worth it just to save an argument to two functions?
>

Right, it is not clear to me whether that is an improvement, so I
suggest let's leave that patch for now.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-30 Thread Justin Pryzby
On Mon, Mar 30, 2020 at 02:31:53PM +0530, Amit Kapila wrote:
> Now that the main patch is committed, I have reviewed the other two patches.

Thanks for that

On Mon, Mar 30, 2020 at 02:31:53PM +0530, Amit Kapila wrote:
> The v37-0003-Avoid-some-calls-to-RelationGetRelationName.patch looks
> good to me.  I have added the commit message in the patch.

I realized the 0003 patch has an error in lazy_vacuum_index; it should be:

-   RelationGetRelationName(indrel),
+   vacrelstats->indname,

That was maybe due to originally using a separate errinfo for each phase, with
one "char *relname" and no "char *indrel".

> I don't think the above change is correct.  How will vacrelstats have
> correct values when vacuum_one_index is called via parallel workers
> (via parallel_vacuum_main)?

You're right: parallel main's vacrelstats was added by this patchset and only
the error context fields were initialized.  I fixed it up in the attached by
also setting vacrelstats->new_rel_tuples and old_live_tuples.  It's not clear
if this is worth it just to save an argument to two functions?

-- 
Justin
>From 85672d7f071c91f3ec9190be7feb293f0e49cf8a Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 26 Feb 2020 19:22:55 -0600
Subject: [PATCH v39 1/2] Avoid some calls to RelationGetRelationName

---
 src/backend/access/heap/vacuumlazy.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 0d2e724a7d..803e7660f7 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -655,8 +655,8 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
 			}
 			appendStringInfo(, msgfmt,
 			 get_database_name(MyDatabaseId),
-			 get_namespace_name(RelationGetNamespace(onerel)),
-			 RelationGetRelationName(onerel),
+			 vacrelstats->relnamespace,
+			 vacrelstats->relname,
 			 vacrelstats->num_index_scans);
 			appendStringInfo(, _("pages: %u removed, %u remain, %u skipped due to pins, %u skipped frozen\n"),
 			 vacrelstats->pages_removed,
@@ -827,7 +827,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			if (params->nworkers > 0)
 ereport(WARNING,
 		(errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum temporary tables in parallel",
-RelationGetRelationName(onerel;
+vacrelstats->relname)));
 		}
 		else
 			lps = begin_parallel_vacuum(RelationGetRelid(onerel), Irel,
@@ -1722,7 +1722,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	if (vacuumed_pages)
 		ereport(elevel,
 (errmsg("\"%s\": removed %.0f row versions in %u pages",
-		RelationGetRelationName(onerel),
+		vacrelstats->relname,
 		tups_vacuumed, vacuumed_pages)));
 
 	/*
@@ -1751,7 +1751,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 
 	ereport(elevel,
 			(errmsg("\"%s\": found %.0f removable, %.0f nonremovable row versions in %u out of %u pages",
-	RelationGetRelationName(onerel),
+	vacrelstats->relname,
 	tups_vacuumed, num_tuples,
 	vacrelstats->scanned_pages, nblocks),
 			 errdetail_internal("%s", buf.data)));
@@ -1883,7 +1883,7 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 
 	ereport(elevel,
 			(errmsg("\"%s\": removed %d row versions in %d pages",
-	RelationGetRelationName(onerel),
+	vacrelstats->relname,
 	tupindex, npages),
 			 errdetail_internal("%s", pg_rusage_show(;
 
@@ -2431,7 +2431,7 @@ lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
 
 	ereport(elevel,
 			(errmsg(msg,
-	RelationGetRelationName(indrel),
+	vacrelstats->indname,
 	dead_tuples->num_tuples),
 			 errdetail_internal("%s", pg_rusage_show(;
 
@@ -2602,7 +2602,7 @@ lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats)
 vacrelstats->lock_waiter_detected = true;
 ereport(elevel,
 		(errmsg("\"%s\": stopping truncate due to conflicting lock request",
-RelationGetRelationName(onerel;
+vacrelstats->relname)));
 return;
 			}
 
@@ -2668,7 +2668,7 @@ lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats)
 
 		ereport(elevel,
 (errmsg("\"%s\": truncated %u to %u pages",
-		RelationGetRelationName(onerel),
+		vacrelstats->relname,
 		old_rel_pages, new_rel_pages),
  errdetail_internal("%s",
 	pg_rusage_show(;
@@ -2733,7 +2733,7 @@ count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats)
 {
 	ereport(elevel,
 			(errmsg("\"%s\": suspending truncate due to conflicting lock request",
-	RelationGetRelationName(onerel;
+	vacrelstats->relname)));
 
 	vacrelstats->lock_waiter_detected = true;
 	return blkno;
-- 
2.17.0

>From e69a3feb5dcf3860a34771c826cd3a1bdfbf9c83 Mon Sep 17 

Re: error context for vacuum to include block number

2020-03-30 Thread Amit Kapila
On Fri, Mar 27, 2020 at 5:03 AM Justin Pryzby  wrote:
>

Now that the main patch is committed, I have reviewed the other two patches.

v37-0002-Drop-reltuples
1.
@@ -2289,11 +2289,10 @@ vacuum_one_index(Relation indrel,
IndexBulkDeleteResult **stats,

  /* Do vacuum or cleanup of the index */
  if (lvshared->for_cleanup)
- lazy_cleanup_index(indrel, stats, lvshared->reltuples,
-lvshared->estimated_count, vacrelstats);
+ lazy_cleanup_index(indrel, stats, vacrelstats);
  else
  lazy_vacuum_index(indrel, stats, dead_tuples,
-   lvshared->reltuples, vacrelstats);
+   vacrelstats);

I don't think the above change is correct.  How will vacrelstats have
correct values when vacuum_one_index is called via parallel workers
(via parallel_vacuum_main)?

The v37-0003-Avoid-some-calls-to-RelationGetRelationName.patch looks
good to me.  I have added the commit message in the patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v38-0001-Avoid-some-calls-to-RelationGetRelationName-and-.patch
Description: Binary data


Re: error context for vacuum to include block number

2020-03-29 Thread Amit Kapila
On Sun, Mar 29, 2020 at 11:35 AM Amit Kapila  wrote:
>
> On Sun, Mar 29, 2020 at 9:04 AM Masahiko Sawada
>  wrote:
> >
> > On Sat, 28 Mar 2020 at 13:23, Amit Kapila  wrote:
> > >
> > >
> > > Please find attached the updated patch with all the changes discussed.
> > > Let me know if I have missed anything?
> > >
> >
> > Thank you for updating the patch! Looks good to me.
> >
>
> Okay, I will push this tomorrow.
>

Pushed.  I see one buildfarm failure [1] but that doesn't seem to be
related to this patch.

[1] - 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura=2020-03-30%2002%3A20%3A03

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-29 Thread Amit Kapila
On Sun, Mar 29, 2020 at 9:04 AM Masahiko Sawada
 wrote:
>
> On Sat, 28 Mar 2020 at 13:23, Amit Kapila  wrote:
> >
> >
> > Please find attached the updated patch with all the changes discussed.
> > Let me know if I have missed anything?
> >
>
> Thank you for updating the patch! Looks good to me.
>

Okay, I will push this tomorrow.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-28 Thread Masahiko Sawada
On Sat, 28 Mar 2020 at 13:23, Amit Kapila  wrote:
>
> On Sat, Mar 28, 2020 at 7:04 AM Justin Pryzby  wrote:
> >
> > On Sat, Mar 28, 2020 at 06:59:10AM +0530, Amit Kapila wrote:
> > > On Sat, Mar 28, 2020 at 6:46 AM Justin Pryzby  
> > > wrote:
> > > >
> > > > On Sat, Mar 28, 2020 at 06:28:38AM +0530, Amit Kapila wrote:
> > > > > > Hm, but I caused a crash *without* adding CHECK_FOR_INTERRUPTS, just
> > > > > > kill+sleep.  The kill() could come from running 
> > > > > > pg_cancel_backend().  And the
> > > > > > sleep() just encourages a context switch, which can happen at any 
> > > > > > time.
> > > > >
> > > > > pg_sleep internally uses CHECK_FOR_INTERRUPTS() due to which it would
> > > > > have accepted the signal sent via pg_cancel_backend().  Can you try
> > > > > your scenario by temporarily removing CHECK_FOR_INTERRUPTS from
> > > > > pg_sleep() or maybe better by using OS Sleep call?
> > > >
> > > > Ah, that explains it.  Right, I'm not able to induce a crash with 
> > > > usleep().
> > > >
> > > > Do you want me to resend a patch without that change ?  I feel like 
> > > > continuing
> > > > to trade patches is more likely to introduce new errors or lose someone 
> > > > else's
> > > > changes than to make much progress.  The patch has been through enough
> > > > iterations and it's very easy to miss an issue if I try to eyeball it.
> > >
> > > I can do it but we have to agree on the other two points (a) I still
> > > feel that switching to the truncate phase should be done at the place
> > > from where we are calling lazy_truncate_heap and (b)
> > > lazy_cleanup_index should switch back the error phase after calling
> > > index_vacuum_cleanup.  I have explained my reasoning for these points
> > > a few emails back.
> >
> > I have no objection to either.  It was intuitive to me to do it how I
> > originally wrote it but I'm not wedded to it.
> >
>
> Please find attached the updated patch with all the changes discussed.
> Let me know if I have missed anything?
>

Thank you for updating the patch! Looks good to me.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-03-27 Thread Amit Kapila
On Sat, Mar 28, 2020 at 7:04 AM Justin Pryzby  wrote:
>
> On Sat, Mar 28, 2020 at 06:59:10AM +0530, Amit Kapila wrote:
> > On Sat, Mar 28, 2020 at 6:46 AM Justin Pryzby  wrote:
> > >
> > > On Sat, Mar 28, 2020 at 06:28:38AM +0530, Amit Kapila wrote:
> > > > > Hm, but I caused a crash *without* adding CHECK_FOR_INTERRUPTS, just
> > > > > kill+sleep.  The kill() could come from running pg_cancel_backend().  
> > > > > And the
> > > > > sleep() just encourages a context switch, which can happen at any 
> > > > > time.
> > > >
> > > > pg_sleep internally uses CHECK_FOR_INTERRUPTS() due to which it would
> > > > have accepted the signal sent via pg_cancel_backend().  Can you try
> > > > your scenario by temporarily removing CHECK_FOR_INTERRUPTS from
> > > > pg_sleep() or maybe better by using OS Sleep call?
> > >
> > > Ah, that explains it.  Right, I'm not able to induce a crash with 
> > > usleep().
> > >
> > > Do you want me to resend a patch without that change ?  I feel like 
> > > continuing
> > > to trade patches is more likely to introduce new errors or lose someone 
> > > else's
> > > changes than to make much progress.  The patch has been through enough
> > > iterations and it's very easy to miss an issue if I try to eyeball it.
> >
> > I can do it but we have to agree on the other two points (a) I still
> > feel that switching to the truncate phase should be done at the place
> > from where we are calling lazy_truncate_heap and (b)
> > lazy_cleanup_index should switch back the error phase after calling
> > index_vacuum_cleanup.  I have explained my reasoning for these points
> > a few emails back.
>
> I have no objection to either.  It was intuitive to me to do it how I
> originally wrote it but I'm not wedded to it.
>

Please find attached the updated patch with all the changes discussed.
Let me know if I have missed anything?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v38-0001-Introduce-vacuum-errcontext-to-display-additiona.patch
Description: Binary data


Re: error context for vacuum to include block number

2020-03-27 Thread Justin Pryzby
On Sat, Mar 28, 2020 at 06:59:10AM +0530, Amit Kapila wrote:
> On Sat, Mar 28, 2020 at 6:46 AM Justin Pryzby  wrote:
> >
> > On Sat, Mar 28, 2020 at 06:28:38AM +0530, Amit Kapila wrote:
> > > > Hm, but I caused a crash *without* adding CHECK_FOR_INTERRUPTS, just
> > > > kill+sleep.  The kill() could come from running pg_cancel_backend().  
> > > > And the
> > > > sleep() just encourages a context switch, which can happen at any time.
> > >
> > > pg_sleep internally uses CHECK_FOR_INTERRUPTS() due to which it would
> > > have accepted the signal sent via pg_cancel_backend().  Can you try
> > > your scenario by temporarily removing CHECK_FOR_INTERRUPTS from
> > > pg_sleep() or maybe better by using OS Sleep call?
> >
> > Ah, that explains it.  Right, I'm not able to induce a crash with usleep().
> >
> > Do you want me to resend a patch without that change ?  I feel like 
> > continuing
> > to trade patches is more likely to introduce new errors or lose someone 
> > else's
> > changes than to make much progress.  The patch has been through enough
> > iterations and it's very easy to miss an issue if I try to eyeball it.
> 
> I can do it but we have to agree on the other two points (a) I still
> feel that switching to the truncate phase should be done at the place
> from where we are calling lazy_truncate_heap and (b)
> lazy_cleanup_index should switch back the error phase after calling
> index_vacuum_cleanup.  I have explained my reasoning for these points
> a few emails back.

I have no objection to either.  It was intuitive to me to do it how I
originally wrote it but I'm not wedded to it.

-- 
Justin




Re: error context for vacuum to include block number

2020-03-27 Thread Amit Kapila
On Sat, Mar 28, 2020 at 6:46 AM Justin Pryzby  wrote:
>
> On Sat, Mar 28, 2020 at 06:28:38AM +0530, Amit Kapila wrote:
> > > Hm, but I caused a crash *without* adding CHECK_FOR_INTERRUPTS, just
> > > kill+sleep.  The kill() could come from running pg_cancel_backend().  And 
> > > the
> > > sleep() just encourages a context switch, which can happen at any time.
> >
> > pg_sleep internally uses CHECK_FOR_INTERRUPTS() due to which it would
> > have accepted the signal sent via pg_cancel_backend().  Can you try
> > your scenario by temporarily removing CHECK_FOR_INTERRUPTS from
> > pg_sleep() or maybe better by using OS Sleep call?
>
> Ah, that explains it.  Right, I'm not able to induce a crash with usleep().
>
> Do you want me to resend a patch without that change ?  I feel like continuing
> to trade patches is more likely to introduce new errors or lose someone else's
> changes than to make much progress.  The patch has been through enough
> iterations and it's very easy to miss an issue if I try to eyeball it.
>

I can do it but we have to agree on the other two points (a) I still
feel that switching to the truncate phase should be done at the place
from where we are calling lazy_truncate_heap and (b)
lazy_cleanup_index should switch back the error phase after calling
index_vacuum_cleanup.  I have explained my reasoning for these points
a few emails back.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-27 Thread Justin Pryzby
On Sat, Mar 28, 2020 at 06:28:38AM +0530, Amit Kapila wrote:
> > Hm, but I caused a crash *without* adding CHECK_FOR_INTERRUPTS, just
> > kill+sleep.  The kill() could come from running pg_cancel_backend().  And 
> > the
> > sleep() just encourages a context switch, which can happen at any time.
> 
> pg_sleep internally uses CHECK_FOR_INTERRUPTS() due to which it would
> have accepted the signal sent via pg_cancel_backend().  Can you try
> your scenario by temporarily removing CHECK_FOR_INTERRUPTS from
> pg_sleep() or maybe better by using OS Sleep call?

Ah, that explains it.  Right, I'm not able to induce a crash with usleep().

Do you want me to resend a patch without that change ?  I feel like continuing
to trade patches is more likely to introduce new errors or lose someone else's
changes than to make much progress.  The patch has been through enough
iterations and it's very easy to miss an issue if I try to eyeball it.

-- 
Justin




Re: error context for vacuum to include block number

2020-03-27 Thread Amit Kapila
On Sat, Mar 28, 2020 at 12:34 AM Justin Pryzby  wrote:
>
> On Fri, Mar 27, 2020 at 11:50:30AM +0530, Amit Kapila wrote:
> > > > The crash scenario I'm trying to avoid would be like statement_timeout 
> > > > or other
> > > > asynchronous event occurring between two non-atomic operations.
> > > >
> > > +if (errinfo->phase==VACUUM_ERRCB_PHASE_VACUUM_INDEX && 
> > > errinfo->indname==NULL)
> > > +{
> > > +kill(getpid(), SIGINT);
> > > +pg_sleep(1); // that's needed since signals are delivered asynchronously
> > > +}
> > > I'm not sure if those are possible outside of "induced" errors.  Maybe the
> > > function is essentially atomic due to no CHECK_FOR_INTERRUPTS or similar?
> >
> > Yes, this is exactly the point.  I think unless you have
> > CHECK_FOR_INTERRUPTS in that function, the problems you are trying to
> > think won't happen.
>
> Hm, but I caused a crash *without* adding CHECK_FOR_INTERRUPTS, just
> kill+sleep.  The kill() could come from running pg_cancel_backend().  And the
> sleep() just encourages a context switch, which can happen at any time.
>

pg_sleep internally uses CHECK_FOR_INTERRUPTS() due to which it would
have accepted the signal sent via pg_cancel_backend().  Can you try
your scenario by temporarily removing CHECK_FOR_INTERRUPTS from
pg_sleep() or maybe better by using OS Sleep call?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-27 Thread Justin Pryzby
On Fri, Mar 27, 2020 at 11:50:30AM +0530, Amit Kapila wrote:
> > > The crash scenario I'm trying to avoid would be like statement_timeout or 
> > > other
> > > asynchronous event occurring between two non-atomic operations.
> > >
> > +if (errinfo->phase==VACUUM_ERRCB_PHASE_VACUUM_INDEX && 
> > errinfo->indname==NULL)
> > +{
> > +kill(getpid(), SIGINT);
> > +pg_sleep(1); // that's needed since signals are delivered asynchronously
> > +}
> > I'm not sure if those are possible outside of "induced" errors.  Maybe the
> > function is essentially atomic due to no CHECK_FOR_INTERRUPTS or similar?
> 
> Yes, this is exactly the point.  I think unless you have
> CHECK_FOR_INTERRUPTS in that function, the problems you are trying to
> think won't happen.

Hm, but I caused a crash *without* adding CHECK_FOR_INTERRUPTS, just
kill+sleep.  The kill() could come from running pg_cancel_backend().  And the
sleep() just encourages a context switch, which can happen at any time.  I'm
not convinced that the function couldn't be interrupted by a signal.

-- 
Justin




Re: error context for vacuum to include block number

2020-03-27 Thread Amit Kapila
On Fri, Mar 27, 2020 at 1:29 PM Masahiko Sawada
 wrote:
>
> On Fri, 27 Mar 2020 at 07:17, Justin Pryzby  wrote:
> >
> > On Thu, Mar 26, 2020 at 10:04:57AM -0500, Justin Pryzby wrote:
> > > Does that address your comment ?
> >
> > I hope so.
>
> Thank you for updating the patch. I'm concerned a bit about overhead
> of frequently updating and reverting the callback arguments in
> lazy_vacuum_page(). We call that function every time when we vacuum a
> page, but if the table has an index, we actually don't need to update
> the callback arguments in that function. But I hope it's negligible
> since all operation will be performed on memory.
>

Right, it will be a few instructions.  I think if there is any
overhead of this, we can easily avoid that by (a) adding a check in
update_vacuum_error_cbarg which tells if the phase is getting changed
or not and if it is not changed, then return, (b) pass additional in
lazy_vacuum_page() to indicate whether we need to change the phase,
(c) just invoke update_vacuum_error_cbarg() in the caller.   The
current way appears to be a bit neat than these options, so not sure
if there is an advantage in changing it.  Anyway, if we see any
problem with that it is trivial to change it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-27 Thread Masahiko Sawada
On Fri, 27 Mar 2020 at 07:17, Justin Pryzby  wrote:
>
> On Thu, Mar 26, 2020 at 10:04:57AM -0500, Justin Pryzby wrote:
> > Does that address your comment ?
>
> I hope so.

Thank you for updating the patch. I'm concerned a bit about overhead
of frequently updating and reverting the callback arguments in
lazy_vacuum_page(). We call that function every time when we vacuum a
page, but if the table has an index, we actually don't need to update
the callback arguments in that function. But I hope it's negligible
since all operation will be performed on memory.

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-03-27 Thread Amit Kapila
On Fri, Mar 27, 2020 at 11:46 AM Justin Pryzby  wrote:
>
> On Thu, Mar 26, 2020 at 11:44:24PM -0500, Justin Pryzby wrote:
> > On Fri, Mar 27, 2020 at 09:49:29AM +0530, Amit Kapila wrote:
> > > On Fri, Mar 27, 2020 at 3:47 AM Justin Pryzby  
> > > wrote:
> > > >
> > > > > Hm, I was just wondering what happens if an error happens *during*
> > > > > update_vacuum_error_cbarg().  It seems like if we set
> > > > > errcbarg->phase=VACUUM_INDEX before setting 
> > > > > errcbarg->indname=indname, then an
> > > > > error would cause a crash.
> > > > >
> > >
> > > Can't that be avoided if you check if cbarg->indname is non-null in
> > > vacuum_error_callback as we are already doing for
> > > VACUUM_ERRCB_PHASE_TRUNCATE?
> > >
> > > > >  And if we pfree and set indname before phase, it'd
> > > > > be a problem when going from an index phase to non-index phase.
> > >
> > > How is it possible that we move to the non-index phase without
> > > clearing indname as we always revert back the old phase information?
> >
> > The crash scenario I'm trying to avoid would be like statement_timeout or 
> > other
> > asynchronous event occurring between two non-atomic operations.
> >
> > I said that there's an issue no matter what order we set indname/phase;
> > If we wrote:
> > |cbarg->indname = indname;
> > |cbarg->phase = phase;
> > ..and hit a timeout (or similar) between setting indname=NULL but before
> > setting phase=VACUUM_INDEX, then we can crash due to null pointer.
> >
> > But if we write:
> > |cbarg->phase = phase;
> > |if (cbarg->indname) {pfree(cbarg->indname);}
> > |cbarg->indname = indname ? pstrdup(indname) : NULL;
> > ..then we can still crash if we timeout between freeing cbarg->indname and
> > setting it to null, due to acccessing a pfreed allocation.
>
> If "phase" is updated before "indname", I'm able to induce a synthetic crash
> like this:
>
> +if (errinfo->phase==VACUUM_ERRCB_PHASE_VACUUM_INDEX && 
> errinfo->indname==NULL)
> +{
> +kill(getpid(), SIGINT);
> +pg_sleep(1); // that's needed since signals are delivered asynchronously
> +}
>
> And another crash if we do this after pfree but before setting indname.
>
> +if (errinfo->phase==VACUUM_ERRCB_PHASE_VACUUM_INDEX && 
> errinfo->indname!=NULL)
> +{
> +kill(getpid(), SIGINT);
> +pg_sleep(1);
> +}
>
> I'm not sure if those are possible outside of "induced" errors.  Maybe the
> function is essentially atomic due to no CHECK_FOR_INTERRUPTS or similar?
>

Yes, this is exactly the point.  I think unless you have
CHECK_FOR_INTERRUPTS in that function, the problems you are trying to
think won't happen.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-27 Thread Justin Pryzby
On Thu, Mar 26, 2020 at 11:44:24PM -0500, Justin Pryzby wrote:
> On Fri, Mar 27, 2020 at 09:49:29AM +0530, Amit Kapila wrote:
> > On Fri, Mar 27, 2020 at 3:47 AM Justin Pryzby  wrote:
> > >
> > > > Hm, I was just wondering what happens if an error happens *during*
> > > > update_vacuum_error_cbarg().  It seems like if we set
> > > > errcbarg->phase=VACUUM_INDEX before setting errcbarg->indname=indname, 
> > > > then an
> > > > error would cause a crash.
> > > >
> > 
> > Can't that be avoided if you check if cbarg->indname is non-null in
> > vacuum_error_callback as we are already doing for
> > VACUUM_ERRCB_PHASE_TRUNCATE?
> > 
> > > >  And if we pfree and set indname before phase, it'd
> > > > be a problem when going from an index phase to non-index phase.
> > 
> > How is it possible that we move to the non-index phase without
> > clearing indname as we always revert back the old phase information?
> 
> The crash scenario I'm trying to avoid would be like statement_timeout or 
> other
> asynchronous event occurring between two non-atomic operations.
> 
> I said that there's an issue no matter what order we set indname/phase;
> If we wrote:
> |cbarg->indname = indname;
> |cbarg->phase = phase;
> ..and hit a timeout (or similar) between setting indname=NULL but before
> setting phase=VACUUM_INDEX, then we can crash due to null pointer.
> 
> But if we write:
> |cbarg->phase = phase;
> |if (cbarg->indname) {pfree(cbarg->indname);}
> |cbarg->indname = indname ? pstrdup(indname) : NULL;
> ..then we can still crash if we timeout between freeing cbarg->indname and
> setting it to null, due to acccessing a pfreed allocation.

If "phase" is updated before "indname", I'm able to induce a synthetic crash
like this:

+if (errinfo->phase==VACUUM_ERRCB_PHASE_VACUUM_INDEX && errinfo->indname==NULL) 
+{
+kill(getpid(), SIGINT);
+pg_sleep(1); // that's needed since signals are delivered asynchronously
+}

And another crash if we do this after pfree but before setting indname.

+if (errinfo->phase==VACUUM_ERRCB_PHASE_VACUUM_INDEX && errinfo->indname!=NULL)
+{
+kill(getpid(), SIGINT);
+pg_sleep(1);
+}

I'm not sure if those are possible outside of "induced" errors.  Maybe the
function is essentially atomic due to no CHECK_FOR_INTERRUPTS or similar?

-- 
Justin




Re: error context for vacuum to include block number

2020-03-26 Thread Justin Pryzby
On Fri, Mar 27, 2020 at 09:49:29AM +0530, Amit Kapila wrote:
> On Fri, Mar 27, 2020 at 3:47 AM Justin Pryzby  wrote:
> >
> > > Hm, I was just wondering what happens if an error happens *during*
> > > update_vacuum_error_cbarg().  It seems like if we set
> > > errcbarg->phase=VACUUM_INDEX before setting errcbarg->indname=indname, 
> > > then an
> > > error would cause a crash.
> > >
> 
> Can't that be avoided if you check if cbarg->indname is non-null in
> vacuum_error_callback as we are already doing for
> VACUUM_ERRCB_PHASE_TRUNCATE?
> 
> > >  And if we pfree and set indname before phase, it'd
> > > be a problem when going from an index phase to non-index phase.
> 
> How is it possible that we move to the non-index phase without
> clearing indname as we always revert back the old phase information?

The crash scenario I'm trying to avoid would be like statement_timeout or other
asynchronous event occurring between two non-atomic operations.

I said that there's an issue no matter what order we set indname/phase;
If we wrote:
|cbarg->indname = indname;
|cbarg->phase = phase;
..and hit a timeout (or similar) between setting indname=NULL but before
setting phase=VACUUM_INDEX, then we can crash due to null pointer.

But if we write:
|cbarg->phase = phase;
|if (cbarg->indname) {pfree(cbarg->indname);}
|cbarg->indname = indname ? pstrdup(indname) : NULL;
..then we can still crash if we timeout between freeing cbarg->indname and
setting it to null, due to acccessing a pfreed allocation.

> > >  So maybe we
> > > have to set errcbarg->phase=VACUUM_ERRCB_PHASE_UNKNOWN while in the 
> > > function,
> > > and errcbarg->phase=phase last.
> 
> I find that a bit ad-hoc, if possible, let's try to avoid it.

I think we can do what you suggesting, if the callback checks if 
(cbarg->indname!=NULL).

We'd have to write:
// Must set indname *before* updating phase, in case an error occurs before
// phase is set, to avoid crashing if we're going from an index phase to a
// non-index phase (which should not read indname).  Must not free indname
// until it's set to null.
char *tmp = cbarg->indname;
cbarg->indname = indname ? pstrdup(indname) : NULL;
cbarg->phase = phase;
if (tmp){pfree(tmp);}

Do you think that's better ?

-- 
Justin




Re: error context for vacuum to include block number

2020-03-26 Thread Amit Kapila
On Fri, Mar 27, 2020 at 3:47 AM Justin Pryzby  wrote:
>
>
> > Hm, I was just wondering what happens if an error happens *during*
> > update_vacuum_error_cbarg().  It seems like if we set
> > errcbarg->phase=VACUUM_INDEX before setting errcbarg->indname=indname, then 
> > an
> > error would cause a crash.
> >

Can't that be avoided if you check if cbarg->indname is non-null in
vacuum_error_callback as we are already doing for
VACUUM_ERRCB_PHASE_TRUNCATE?

> >  And if we pfree and set indname before phase, it'd
> > be a problem when going from an index phase to non-index phase.

How is it possible that we move to the non-index phase without
clearing indname as we always revert back the old phase information?
I think it is possible only if we don't clear indname after the phase
is over.

> >  So maybe we
> > have to set errcbarg->phase=VACUUM_ERRCB_PHASE_UNKNOWN while in the 
> > function,
> > and errcbarg->phase=phase last.

I find that a bit ad-hoc, if possible, let's try to avoid it.

>
> And addressed that.
>
> Also, I realized that lazy_cleanup_index has an early "return", so the "Revert
> back" was ineffective.
>

We can call it immediately after index_vacuum_cleanup to avoid that.

>  We talked about how that wasn't needed, since we never
> go back to a previous phase.  Amit wanted to keep it there for consistency, 
> but
> I'd prefer to put any extra effort into calling out the special treatment
> needed/given to lazy_vacuum_heap/index, rather than making everything
> "consistent".
>

Apart from being consistent, the point was it doesn't seem good that
API being called to assume that there is nothing more the caller can
do.  It might be problematic if we later want to enhance or add
something to the caller.

> Amit: I also moved the TRUNCATE_HEAP bit back to truncate_heap(),

There is no problem with it.  We can do it either way and I have also
considered it the way you have done but decide to keep in the caller
because of the previous point I mentioned (not sure if it a good idea
that API being called can assume that there is nothing more the caller
can do after this).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-26 Thread Alvaro Herrera
On 2020-Mar-26, Justin Pryzby wrote:

> On Thu, Mar 26, 2020 at 07:49:51PM -0300, Alvaro Herrera wrote:

> > BTW I'm pretty sure this "revert back" phrasing is not good English --
> > you should just use "revert".  Maybe get some native speaker's opinion
> > on it.
> 
> I'm a native speaker; "revert back" might be called redundant but I think it's
> common usage.

Oh, okay.

> > And speaking of language, I find the particle "cbarg" rather very ugly,
> 
> I renamed it since it was kind of opaque looking.  It's in all the same 
> places,
> so equally infectious; but I hope you like it better.

I like it much better, thanks :-)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-03-26 Thread Justin Pryzby
On Thu, Mar 26, 2020 at 07:49:51PM -0300, Alvaro Herrera wrote:
> > > ... So once we've "reverted back", 1) the pointer is null; and, 2)
> > > the callback function doesn't access it for the previous/reverted
> > > phase anyway.
> 
> BTW I'm pretty sure this "revert back" phrasing is not good English --
> you should just use "revert".  Maybe get some native speaker's opinion
> on it.

I'm a native speaker; "revert back" might be called redundant but I think it's
common usage.

> And speaking of language, I find the particle "cbarg" rather very ugly,
> and it's *everywhere* -- function name, function argument, local
> variable, enum values, enum name.  It even spread to the typedefs.list
> file!  Is this a new virus???  Put some soap in it!  Can't we use "info"
> or "state" or something similar, less infectious, instead?

I renamed it since it was kind of opaque looking.  It's in all the same places,
so equally infectious; but I hope you like it better.

Cheers,
-- 
Justin
>From 4d33dc1c125690b48dc0028c63a663db53ac0311 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v37 1/3] Introduce vacuum errcontext to display additional
 information.

The additional information displayed will be block number for error
occurring while processing heap and index name for error occurring
while processing the index.

This will help us in diagnosing the problems that occur during a vacuum.
For ex. due to corruption (either caused by bad hardware or by some bug)
if we get some error while vacuuming, it can help us identify the block
in heap and or additional index information.

It sets up an error context callback to display additional information
with the error.  During different phases of vacuum (heap scan, heap
vacuum, index vacuum, index clean up, heap truncate), we update the error
context callback to display appropriate information.  We can extend it to
a bit more granular level like adding the phases for FSM operations or for
prefetching the blocks while truncating. However, I felt that it requires
adding many more error callback function calls and can make the code a bit
complex, so left those for now.

Author: Justin Pryzby, with few changes by Amit Kapila
Reviewed-by: Alvaro Herrera, Amit Kapila, Andres Freund, Michael Paquier
and Sawada Masahiko
Discussion: https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 240 ---
 src/tools/pgindent/typedefs.list |   1 +
 2 files changed, 216 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43efc32..973f411caf 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -144,6 +144,17 @@
  */
 #define ParallelVacuumIsActive(lps) PointerIsValid(lps)
 
+/* Phases of vacuum during which we report error context. */
+typedef enum
+{
+	VACUUM_ERRCB_PHASE_UNKNOWN,
+	VACUUM_ERRCB_PHASE_SCAN_HEAP,
+	VACUUM_ERRCB_PHASE_VACUUM_INDEX,
+	VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+	VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
+	VACUUM_ERRCB_PHASE_TRUNCATE
+} VacErrPhase;
+
 /*
  * LVDeadTuples stores the dead tuple TIDs collected during the heap scan.
  * This is allocated in the DSM segment in parallel mode and in local memory
@@ -270,6 +281,8 @@ typedef struct LVParallelState
 
 typedef struct LVRelStats
 {
+	char	   *relnamespace;
+	char	   *relname;
 	/* useindex = true means two-pass strategy; false means one-pass */
 	bool		useindex;
 	/* Overall statistics about rel */
@@ -290,8 +303,12 @@ typedef struct LVRelStats
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
-} LVRelStats;
 
+	/* Used for error callback */
+	char	   *indname;
+	BlockNumber blkno;			/* used only for heap operations */
+	VacErrPhase phase;
+} LVRelStats;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -314,10 +331,10 @@ static void lazy_vacuum_all_indexes(Relation onerel, Relation *Irel,
 	LVRelStats *vacrelstats, LVParallelState *lps,
 	int nindexes);
 static void lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
-			  LVDeadTuples *dead_tuples, double reltuples);
+			  LVDeadTuples *dead_tuples, double reltuples, LVRelStats *vacrelstats);
 static void lazy_cleanup_index(Relation indrel,
 			   IndexBulkDeleteResult **stats,
-			   double reltuples, bool estimated_count);
+			   double reltuples, bool estimated_count, LVRelStats *vacrelstats);
 static int	lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
 			 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
 static bool should_attempt_truncation(VacuumParams *params,
@@ -337,13 +354,13 @@ static void lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult *
 		 int nindexes);
 static void parallel_vacuum_index(Relation *Irel, 

Re: error context for vacuum to include block number

2020-03-26 Thread Alvaro Herrera
On 2020-Mar-26, Justin Pryzby wrote:

> On Thu, Mar 26, 2020 at 10:04:57AM -0500, Justin Pryzby wrote:

> > And I think you're right: we only save state when the calling function has a
> > indname=NULL, so we never "put back" a non-NULL indname.  We go from having 
> > a
> > indname=NULL at lazy_scan_heap to not not-NULL at lazy_vacuum_index, and 
> > never
> > the other way around.
> 
> I removed the free_oldindname argument.

Hah, I was wondering about that free_oldindname business this morning as
well.

> > ... So once we've "reverted back", 1) the pointer is null; and, 2)
> > the callback function doesn't access it for the previous/reverted
> > phase anyway.

BTW I'm pretty sure this "revert back" phrasing is not good English --
you should just use "revert".  Maybe get some native speaker's opinion
on it.

And speaking of language, I find the particle "cbarg" rather very ugly,
and it's *everywhere* -- function name, function argument, local
variable, enum values, enum name.  It even spread to the typedefs.list
file!  Is this a new virus???  Put some soap in it!  Can't we use "info"
or "state" or something similar, less infectious, instead?

Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-03-26 Thread Justin Pryzby
On Thu, Mar 26, 2020 at 10:04:57AM -0500, Justin Pryzby wrote:
> Does that address your comment ?

I hope so.

> > I'm not sure why "free_oldindname" is necessary. Since we initialize
> > vacrelstats->indname with NULL and revert the callback arguments at
> > the end of functions that needs update them, vacrelstats->indname is
> > NULL at the beginning of lazy_vacuum_index() and lazy_cleanup_index().
> > And we make a copy of index name in update_vacuum_error_cbarg(). So I
> > think we can pfree the old index name if errcbarg->indname is not NULL.
> 
> We want to avoid doing this:
>  olderrcbarg = *vacrelstats // saves a pointer
>  update_vacuum_error_cbarg(... NULL); // frees the pointer and sets indname 
> to NULL
>  update_vacuum_error_cbarg(... olderrcbarg.oldindnam) // puts back the 
> pointer, which has been freed
>  // hit an error, and the callback accesses the pfreed pointer
> 
> I think that's only an issue for lazy_vacuum_index().
> 
> And I think you're right: we only save state when the calling function has a
> indname=NULL, so we never "put back" a non-NULL indname.  We go from having a
> indname=NULL at lazy_scan_heap to not not-NULL at lazy_vacuum_index, and never
> the other way around.  So once we've "reverted back", 1) the pointer is null;
> and, 2) the callback function doesn't access it for the previous/reverted 
> phase
> anyway.

I removed the free_oldindname argument.

> Hm, I was just wondering what happens if an error happens *during*
> update_vacuum_error_cbarg().  It seems like if we set
> errcbarg->phase=VACUUM_INDEX before setting errcbarg->indname=indname, then an
> error would cause a crash.  And if we pfree and set indname before phase, it'd
> be a problem when going from an index phase to non-index phase.  So maybe we
> have to set errcbarg->phase=VACUUM_ERRCB_PHASE_UNKNOWN while in the function,
> and errcbarg->phase=phase last.

And addressed that.

Also, I realized that lazy_cleanup_index has an early "return", so the "Revert
back" was ineffective.  We talked about how that wasn't needed, since we never
go back to a previous phase.  Amit wanted to keep it there for consistency, but
I'd prefer to put any extra effort into calling out the special treatment
needed/given to lazy_vacuum_heap/index, rather than making everything
"consistent".

Amit: I also moved the TRUNCATE_HEAP bit back to truncate_heap(), since 1) it's
odd if we don't have anything in truncate_heap() about error reporting except
for "vacrelstats->blkno = blkno"; and, 2) it's nice to set the err callback arg
right after pgstat_progress, and outside of any loop.  In previous versions, it
was within the loop, because it closely wrapped RelationTruncate() and
count_nondeletable_pages() - a previous version used separate phases.

-- 
Justin
>From bfc574979f85c0f0722d182ae8ae03097fb5f9c4 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v36 1/3] Introduce vacuum errcontext to display additional
 information.

The additional information displayed will be block number for error
occurring while processing heap and index name for error occurring
while processing the index.

This will help us in diagnosing the problems that occur during a vacuum.
For ex. due to corruption (either caused by bad hardware or by some bug)
if we get some error while vacuuming, it can help us identify the block
in heap and or additional index information.

It sets up an error context callback to display additional information
with the error.  During different phases of vacuum (heap scan, heap
vacuum, index vacuum, index clean up, heap truncate), we update the error
context callback to display appropriate information.  We can extend it to
a bit more granular level like adding the phases for FSM operations or for
prefetching the blocks while truncating. However, I felt that it requires
adding many more error callback function calls and can make the code a bit
complex, so left those for now.

Author: Justin Pryzby, with few changes by Amit Kapila
Reviewed-by: Alvaro Herrera, Amit Kapila, Andres Freund, Michael Paquier
and Sawada Masahiko
Discussion: https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 240 ---
 src/tools/pgindent/typedefs.list |   1 +
 2 files changed, 216 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43efc32..e98e6b45d3 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -144,6 +144,17 @@
  */
 #define ParallelVacuumIsActive(lps) PointerIsValid(lps)
 
+/* Phases of vacuum during which we report error context. */
+typedef enum
+{
+	VACUUM_ERRCB_PHASE_UNKNOWN,
+	VACUUM_ERRCB_PHASE_SCAN_HEAP,
+	VACUUM_ERRCB_PHASE_VACUUM_INDEX,
+	VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+	VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
+	VACUUM_ERRCB_PHASE_TRUNCATE
+} VacErrCbPhase;
+
 /*
  * 

Re: error context for vacuum to include block number

2020-03-26 Thread Justin Pryzby
On Thu, Mar 26, 2020 at 08:56:54PM +0900, Masahiko Sawada wrote:
> 1.
> @@ -1844,9 +1914,15 @@ lazy_vacuum_page(Relation onerel, BlockNumber
> blkno, Buffer buffer,
> int uncnt = 0;
> TransactionId visibility_cutoff_xid;
> boolall_frozen;
> +   LVRelStats  olderrcbarg;
> 
> pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
> 
> +   /* Update error traceback information */
> +   olderrcbarg = *vacrelstats;
> +   update_vacuum_error_cbarg(vacrelstats, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
> + blkno, NULL, false);
> 
> Since we update vacrelstats->blkno during in the loop in
> lazy_vacuum_heap() we unnecessarily update blkno twice to the same
> value. Also I think we don't need to revert back the callback
> arguments in lazy_vacuum_page(). Perhaps we can either remove the
> change of lazy_vacuum_page() or move the code updating
> vacrelstats->blkno to the beginning of lazy_vacuum_page(). I prefer
> the latter.

We want the error callback to be in place during lazy_scan_heap, since it
calls ReadBufferExtended().

We can't remove the change in lazy_vacuum_page, since it's also called from
lazy_scan_heap, if there are no indexes.

We want lazy_vacuum_page to "revert back" since we go from "scanning heap" to
"vacuuming heap".  lazy_vacuum_page was the motivation for saving and restoring
the called arguments, otherwise lazy_scan_heap() would have to clean up after
the function it called, which was unclean.  Now, every function cleans up after
itself.

Does that address your comment ?

> +static void
> +update_vacuum_error_cbarg(LVRelStats *errcbarg, int phase, BlockNumber blkno,
> + char *indname, bool free_oldindname)
> 
> I'm not sure why "free_oldindname" is necessary. Since we initialize
> vacrelstats->indname with NULL and revert the callback arguments at
> the end of functions that needs update them, vacrelstats->indname is
> NULL at the beginning of lazy_vacuum_index() and lazy_cleanup_index().
> And we make a copy of index name in update_vacuum_error_cbarg(). So I
> think we can pfree the old index name if errcbarg->indname is not NULL.

We want to avoid doing this:
 olderrcbarg = *vacrelstats // saves a pointer
 update_vacuum_error_cbarg(... NULL); // frees the pointer and sets indname to 
NULL
 update_vacuum_error_cbarg(... olderrcbarg.oldindnam) // puts back the pointer, 
which has been freed
 // hit an error, and the callback accesses the pfreed pointer

I think that's only an issue for lazy_vacuum_index().

And I think you're right: we only save state when the calling function has a
indname=NULL, so we never "put back" a non-NULL indname.  We go from having a
indname=NULL at lazy_scan_heap to not not-NULL at lazy_vacuum_index, and never
the other way around.  So once we've "reverted back", 1) the pointer is null;
and, 2) the callback function doesn't access it for the previous/reverted phase
anyway.

Hm, I was just wondering what happens if an error happens *during*
update_vacuum_error_cbarg().  It seems like if we set
errcbarg->phase=VACUUM_INDEX before setting errcbarg->indname=indname, then an
error would cause a crash.  And if we pfree and set indname before phase, it'd
be a problem when going from an index phase to non-index phase.  So maybe we
have to set errcbarg->phase=VACUUM_ERRCB_PHASE_UNKNOWN while in the function,
and errcbarg->phase=phase last.

-- 
Justin




Re: error context for vacuum to include block number

2020-03-26 Thread Masahiko Sawada
On Thu, 26 Mar 2020 at 15:34, Amit Kapila  wrote:
>
> On Thu, Mar 26, 2020 at 12:03 PM Amit Kapila  wrote:
> >
> > On Thu, Mar 26, 2020 at 10:11 AM Justin Pryzby  wrote:
> > >
> > > Seems fine.  Rather than saying "different phases" I, would say:
> > > "The index vacuum and heap vacuum phases may be called multiple times in 
> > > the
> > > middle of the heap scan phase."
> > >
> >
> > Okay, I have slightly adjusted the wording as per your suggestion.
> >
> > > But actually I think the concern is not that we unnecessarily "Revert 
> > > back to
> > > the old phase" but that we do it in a *loop*.  Which I agree doesn't make
> > > sense, to go back and forth between "scanning heap" and "truncating".
> > >
> >
> > Fair point.  I have moved the change to the truncate phase at the
> > caller of lazy_heap_truncate() which should address this concern.
> > Sawada-San, does this address your concern?
> >
>
> Forgot to attach the patch, doing now.

Thank you for updating the patch! The changes around
lazy_truncate_heap() looks good to me.

I have two comments;

1.
@@ -1844,9 +1914,15 @@ lazy_vacuum_page(Relation onerel, BlockNumber
blkno, Buffer buffer,
int uncnt = 0;
TransactionId visibility_cutoff_xid;
boolall_frozen;
+   LVRelStats  olderrcbarg;

pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);

+   /* Update error traceback information */
+   olderrcbarg = *vacrelstats;
+   update_vacuum_error_cbarg(vacrelstats, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+ blkno, NULL, false);

Since we update vacrelstats->blkno during in the loop in
lazy_vacuum_heap() we unnecessarily update blkno twice to the same
value. Also I think we don't need to revert back the callback
arguments in lazy_vacuum_page(). Perhaps we can either remove the
change of lazy_vacuum_page() or move the code updating
vacrelstats->blkno to the beginning of lazy_vacuum_page(). I prefer
the latter.

2.
+/*
+ * Update vacuum error callback for the current phase, block, and index.
+ *
+ * free_oldindname is true if the previous "indname" should be freed.
It must be
+ * false if the caller has copied the old LVRelStats, to avoid keeping a
+ * pointer to a freed allocation.  In which case, the caller should call again
+ * with free_oldindname as true to avoid a leak.
+ */
+static void
+update_vacuum_error_cbarg(LVRelStats *errcbarg, int phase, BlockNumber blkno,
+ char *indname, bool free_oldindname)

I'm not sure why "free_oldindname" is necessary. Since we initialize
vacrelstats->indname with NULL and revert the callback arguments at
the end of functions that needs update them, vacrelstats->indname is
NULL at the beginning of lazy_vacuum_index() and lazy_cleanup_index().
And we make a copy of index name in update_vacuum_error_cbarg(). So I
think we can pfree the old index name if errcbarg->indname is not NULL.


Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-03-26 Thread Amit Kapila
On Thu, Mar 26, 2020 at 12:03 PM Amit Kapila  wrote:
>
> On Thu, Mar 26, 2020 at 10:11 AM Justin Pryzby  wrote:
> >
> > Seems fine.  Rather than saying "different phases" I, would say:
> > "The index vacuum and heap vacuum phases may be called multiple times in the
> > middle of the heap scan phase."
> >
>
> Okay, I have slightly adjusted the wording as per your suggestion.
>
> > But actually I think the concern is not that we unnecessarily "Revert back 
> > to
> > the old phase" but that we do it in a *loop*.  Which I agree doesn't make
> > sense, to go back and forth between "scanning heap" and "truncating".
> >
>
> Fair point.  I have moved the change to the truncate phase at the
> caller of lazy_heap_truncate() which should address this concern.
> Sawada-San, does this address your concern?
>

Forgot to attach the patch, doing now.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v35-0001-Introduce-vacuum-errcontext-to-display-additiona.patch
Description: Binary data


Re: error context for vacuum to include block number

2020-03-26 Thread Amit Kapila
On Thu, Mar 26, 2020 at 10:11 AM Justin Pryzby  wrote:
>
> Seems fine.  Rather than saying "different phases" I, would say:
> "The index vacuum and heap vacuum phases may be called multiple times in the
> middle of the heap scan phase."
>

Okay, I have slightly adjusted the wording as per your suggestion.

> But actually I think the concern is not that we unnecessarily "Revert back to
> the old phase" but that we do it in a *loop*.  Which I agree doesn't make
> sense, to go back and forth between "scanning heap" and "truncating".
>

Fair point.  I have moved the change to the truncate phase at the
caller of lazy_heap_truncate() which should address this concern.
Sawada-San, does this address your concern?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-25 Thread Justin Pryzby
On Thu, Mar 26, 2020 at 09:50:53AM +0530, Amit Kapila wrote:
> > > after count_nondeletable_pages, and then revert it back to
> > > VACUUM_ERRCB_PHASE_SCAN_HEAP phase and the number of blocks of
> > > relation before truncation, after RelationTruncate(). It can be
> > > repeated until no more truncating can be done. Why do we need to
> > > revert back to the scan heap phase? If we can use
> > > vacrelstats->nonempty_pages in the error context message as the
> > > remaining blocks after truncation I think we can update callback
> > > arguments once at the beginning of lazy_truncate_heap() and don't
> > > revert to the previous phase, and pop the error context after exiting.
> >
> > Perhaps.  We need to "revert back" for the vacuum phases, which can be 
> > called
> > multiple times, but we don't need to do that here.
> 
> Yeah, but I think it would be better if are consistent because we have
> no control what the caller of the function intends to do after
> finishing the current phase.  I think we can add some comments where
> we set up the context (in heap_vacuum_rel) like below so that the idea
> is more clear.
> 
> "The idea is to set up an error context callback to display additional
> information with any error during vacuum.  During different phases of
> vacuum (heap scan, heap vacuum, index vacuum, index clean up, heap
> truncate), we update the error context callback to display appropriate
> information.
> 
> Note that different phases of vacuum overlap with each other, so once
> a particular phase is over, we need to revert back to the old phase to
> keep the phase information up-to-date."

Seems fine.  Rather than saying "different phases" I, would say:
"The index vacuum and heap vacuum phases may be called multiple times in the
middle of the heap scan phase."

But actually I think the concern is not that we unnecessarily "Revert back to
the old phase" but that we do it in a *loop*.  Which I agree doesn't make
sense, to go back and forth between "scanning heap" and "truncating".  So I
think we should either remove the "revert back", or otherwise put it
after/outside the "while" loop, and change the "return" paths to use "break".

-- 
Justin




Re: error context for vacuum to include block number

2020-03-25 Thread Amit Kapila
On Wed, Mar 25, 2020 at 6:11 PM Justin Pryzby  wrote:
>
> On Wed, Mar 25, 2020 at 09:27:44PM +0900, Masahiko Sawada wrote:
> > On Wed, 25 Mar 2020 at 20:24, Amit Kapila  wrote:
> > >
> > > On Wed, Mar 25, 2020 at 3:42 PM Justin Pryzby  
> > > wrote:
> > > >
> > > > Attached patch addressing these.
> > > >
> > >
> > > Thanks, you forgot to remove the below declaration which I have
> > > removed in attached.
> > >
> > > @@ -724,20 +758,20 @@ lazy_scan_heap(Relation onerel, VacuumParams
> > > *params, LVRelStats *vacrelstats,
> > >   PROGRESS_VACUUM_MAX_DEAD_TUPLES
> > >   };
> > >   int64 initprog_val[3];
> > > + ErrorContextCallback errcallback;
> > >
> > > Apart from this, I have ran pgindent and now I think it is in good
> > > shape.  Do you have any other comments?  Sawada-San, can you also
> > > check the attached patch and let me know if you have any additional
> > > comments.
> > >
> >
> > Thank you for updating the patch! I have a question about the following 
> > code:
> >
> > +/* Update error traceback information */
> > +olderrcbarg = *vacrelstats;
> > +update_vacuum_error_cbarg(vacrelstats, VACUUM_ERRCB_PHASE_TRUNCATE,
> > +  vacrelstats->nonempty_pages, NULL, 
> > false);
> > +
> >  /*
> >   * Scan backwards from the end to verify that the end pages 
> > actually
> >   * contain no tuples.  This is *necessary*, not optional, because
> >   * other backends could have added tuples to these pages whilst we
> >   * were vacuuming.
> >   */
> >  new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
> > +vacrelstats->blkno = new_rel_pages;
> >
> >  if (new_rel_pages >= old_rel_pages)
> >  {
> >  /* can't do anything after all */
> >  UnlockRelation(onerel, AccessExclusiveLock);
> >  return;
> >  }
> >
> >  /*
> >   * Okay to truncate.
> >   */
> >  RelationTruncate(onerel, new_rel_pages);
> >
> > +/* Revert back to the old phase information for error traceback */
> > +update_vacuum_error_cbarg(vacrelstats,
> > +  olderrcbarg.phase,
> > +  olderrcbarg.blkno,
> > +  olderrcbarg.indname,
> > +  true);
> >
> > vacrelstats->nonempty_pages is the last non-empty block while
> > new_rel_pages, the result of count_nondeletable_pages(), is the number
> > of blocks that we can truncate to in this attempt. Therefore
> > vacrelstats->nonempty_pages <= new_rel_pages. This means that we set a
> > lower block number to arguments and then set a higher block number
> > after count_nondeletable_pages, and then revert it back to
> > VACUUM_ERRCB_PHASE_SCAN_HEAP phase and the number of blocks of
> > relation before truncation, after RelationTruncate(). It can be
> > repeated until no more truncating can be done. Why do we need to
> > revert back to the scan heap phase? If we can use
> > vacrelstats->nonempty_pages in the error context message as the
> > remaining blocks after truncation I think we can update callback
> > arguments once at the beginning of lazy_truncate_heap() and don't
> > revert to the previous phase, and pop the error context after exiting.
>
> Perhaps.  We need to "revert back" for the vacuum phases, which can be called
> multiple times, but we don't need to do that here.
>

Yeah, but I think it would be better if are consistent because we have
no control what the caller of the function intends to do after
finishing the current phase.  I think we can add some comments where
we set up the context (in heap_vacuum_rel) like below so that the idea
is more clear.

"The idea is to set up an error context callback to display additional
information with any error during vacuum.  During different phases of
vacuum (heap scan, heap vacuum, index vacuum, index clean up, heap
truncate), we update the error context callback to display appropriate
information.

Note that different phases of vacuum overlap with each other, so once
a particular phase is over, we need to revert back to the old phase to
keep the phase information up-to-date."

What do you think?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-25 Thread Justin Pryzby
On Wed, Mar 25, 2020 at 09:27:44PM +0900, Masahiko Sawada wrote:
> On Wed, 25 Mar 2020 at 20:24, Amit Kapila  wrote:
> >
> > On Wed, Mar 25, 2020 at 3:42 PM Justin Pryzby  wrote:
> > >
> > > Attached patch addressing these.
> > >
> >
> > Thanks, you forgot to remove the below declaration which I have
> > removed in attached.
> >
> > @@ -724,20 +758,20 @@ lazy_scan_heap(Relation onerel, VacuumParams
> > *params, LVRelStats *vacrelstats,
> >   PROGRESS_VACUUM_MAX_DEAD_TUPLES
> >   };
> >   int64 initprog_val[3];
> > + ErrorContextCallback errcallback;
> >
> > Apart from this, I have ran pgindent and now I think it is in good
> > shape.  Do you have any other comments?  Sawada-San, can you also
> > check the attached patch and let me know if you have any additional
> > comments.
> >
> 
> Thank you for updating the patch! I have a question about the following code:
> 
> +/* Update error traceback information */
> +olderrcbarg = *vacrelstats;
> +update_vacuum_error_cbarg(vacrelstats, VACUUM_ERRCB_PHASE_TRUNCATE,
> +  vacrelstats->nonempty_pages, NULL, false);
> +
>  /*
>   * Scan backwards from the end to verify that the end pages actually
>   * contain no tuples.  This is *necessary*, not optional, because
>   * other backends could have added tuples to these pages whilst we
>   * were vacuuming.
>   */
>  new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
> +vacrelstats->blkno = new_rel_pages;
> 
>  if (new_rel_pages >= old_rel_pages)
>  {
>  /* can't do anything after all */
>  UnlockRelation(onerel, AccessExclusiveLock);
>  return;
>  }
> 
>  /*
>   * Okay to truncate.
>   */
>  RelationTruncate(onerel, new_rel_pages);
> 
> +/* Revert back to the old phase information for error traceback */
> +update_vacuum_error_cbarg(vacrelstats,
> +  olderrcbarg.phase,
> +  olderrcbarg.blkno,
> +  olderrcbarg.indname,
> +  true);
> 
> vacrelstats->nonempty_pages is the last non-empty block while
> new_rel_pages, the result of count_nondeletable_pages(), is the number
> of blocks that we can truncate to in this attempt. Therefore
> vacrelstats->nonempty_pages <= new_rel_pages. This means that we set a
> lower block number to arguments and then set a higher block number
> after count_nondeletable_pages, and then revert it back to
> VACUUM_ERRCB_PHASE_SCAN_HEAP phase and the number of blocks of
> relation before truncation, after RelationTruncate(). It can be
> repeated until no more truncating can be done. Why do we need to
> revert back to the scan heap phase? If we can use
> vacrelstats->nonempty_pages in the error context message as the
> remaining blocks after truncation I think we can update callback
> arguments once at the beginning of lazy_truncate_heap() and don't
> revert to the previous phase, and pop the error context after exiting.

Perhaps.  We need to "revert back" for the vacuum phases, which can be called
multiple times, but we don't need to do that here.

In the future, if we decided to add something for final cleanup phase (say),
it's fine (and maybe better) to exit truncate_heap() without resetting the
argument, and we'd immediately set it to CLEANUP.

I think the same thing applies to lazy_cleanup_index, too.  It can be called
from a parallel worker, but we never "go back" to a heap scan.

-- 
Justin




Re: error context for vacuum to include block number

2020-03-25 Thread Masahiko Sawada
On Wed, 25 Mar 2020 at 20:24, Amit Kapila  wrote:
>
> On Wed, Mar 25, 2020 at 3:42 PM Justin Pryzby  wrote:
> >
> > Attached patch addressing these.
> >
>
> Thanks, you forgot to remove the below declaration which I have
> removed in attached.
>
> @@ -724,20 +758,20 @@ lazy_scan_heap(Relation onerel, VacuumParams
> *params, LVRelStats *vacrelstats,
>   PROGRESS_VACUUM_MAX_DEAD_TUPLES
>   };
>   int64 initprog_val[3];
> + ErrorContextCallback errcallback;
>
> Apart from this, I have ran pgindent and now I think it is in good
> shape.  Do you have any other comments?  Sawada-San, can you also
> check the attached patch and let me know if you have any additional
> comments.
>

Thank you for updating the patch! I have a question about the following code:

+/* Update error traceback information */
+olderrcbarg = *vacrelstats;
+update_vacuum_error_cbarg(vacrelstats, VACUUM_ERRCB_PHASE_TRUNCATE,
+  vacrelstats->nonempty_pages, NULL, false);
+
 /*
  * Scan backwards from the end to verify that the end pages actually
  * contain no tuples.  This is *necessary*, not optional, because
  * other backends could have added tuples to these pages whilst we
  * were vacuuming.
  */
 new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
+vacrelstats->blkno = new_rel_pages;

 if (new_rel_pages >= old_rel_pages)
 {
 /* can't do anything after all */
 UnlockRelation(onerel, AccessExclusiveLock);
 return;
 }

 /*
  * Okay to truncate.
  */
 RelationTruncate(onerel, new_rel_pages);

+/* Revert back to the old phase information for error traceback */
+update_vacuum_error_cbarg(vacrelstats,
+  olderrcbarg.phase,
+  olderrcbarg.blkno,
+  olderrcbarg.indname,
+  true);

vacrelstats->nonempty_pages is the last non-empty block while
new_rel_pages, the result of count_nondeletable_pages(), is the number
of blocks that we can truncate to in this attempt. Therefore
vacrelstats->nonempty_pages <= new_rel_pages. This means that we set a
lower block number to arguments and then set a higher block number
after count_nondeletable_pages, and then revert it back to
VACUUM_ERRCB_PHASE_SCAN_HEAP phase and the number of blocks of
relation before truncation, after RelationTruncate(). It can be
repeated until no more truncating can be done. Why do we need to
revert back to the scan heap phase? If we can use
vacrelstats->nonempty_pages in the error context message as the
remaining blocks after truncation I think we can update callback
arguments once at the beginning of lazy_truncate_heap() and don't
revert to the previous phase, and pop the error context after exiting.

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-03-25 Thread Justin Pryzby
On Wed, Mar 25, 2020 at 04:54:41PM +0530, Amit Kapila wrote:
> On Wed, Mar 25, 2020 at 3:42 PM Justin Pryzby  wrote:
> >
> > Attached patch addressing these.
> >
> 
> Thanks, you forgot to remove the below declaration which I have
> removed in attached.

Yes I saw..

> Apart from this, I have ran pgindent and now I think it is in good
> shape.  Do you have any other comments?  Sawada-San, can you also

I did just notice/remember while testing trucate that autovacuum does this:

src/backend/postmaster/autovacuum.c:
errcontext("automatic vacuum of table \"%s.%s.%s\"",

And that appears to be interacting correctly.  For example if you add an
elog(ERROR) and run UPDATE/DELETE, and wait autovacuum_naptime, then it shows
both contexts.

-- 
Justin




Re: error context for vacuum to include block number

2020-03-25 Thread Amit Kapila
On Wed, Mar 25, 2020 at 3:42 PM Justin Pryzby  wrote:
>
> Attached patch addressing these.
>

Thanks, you forgot to remove the below declaration which I have
removed in attached.

@@ -724,20 +758,20 @@ lazy_scan_heap(Relation onerel, VacuumParams
*params, LVRelStats *vacrelstats,
  PROGRESS_VACUUM_MAX_DEAD_TUPLES
  };
  int64 initprog_val[3];
+ ErrorContextCallback errcallback;

Apart from this, I have ran pgindent and now I think it is in good
shape.  Do you have any other comments?  Sawada-San, can you also
check the attached patch and let me know if you have any additional
comments.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v34-0001-Introduce-vacuum-errcontext-to-display-additiona.patch
Description: Binary data


Re: error context for vacuum to include block number

2020-03-25 Thread Justin Pryzby
On Wed, Mar 25, 2020 at 09:16:46AM +0530, Amit Kapila wrote:
> I think by mistake you have re-introduced this chunk of code.  We
> don't need this as we have done it in the caller.

Yes, sorry.

I used too much of git-am and git-rebase to make sure I didn't lose your
changes and instead reintroduced them.

On Wed, Mar 25, 2020 at 02:16:35PM +0900, Masahiko Sawada wrote:
> > > Won't the latest patch by Justin will fix this as he has updated the
> > > block count after count_nondeletable_pages?  Apart from that, I feel
> >
> > The issue is if the error happens *during* count_nondeletable_pages().
> > We don't want it to say "truncating relation to 100 blocks".
> 
> Right.
> 
> > > the first call to update_vacuum_error_cbarg in lazy_truncate_heap
> > > should have input parameter as vacrelstats->nonempty_pages instead of
> > > new_rel_pages to indicate the remaining pages after truncation?
> >
> > Yea, I think that addresses the issue.

Attached patch addressing these.

-- 
Justin
>From b3e112c3d02982b4c050a43c53e47a868879c561 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v33 1/3] Introduce vacuum errcontext to display additional
 information.

The additional information displayed will be block number for error
occurring while processing heap and index name for error occurring
while processing the index.

This will help us in diagnosing the problems that occur during a vacuum.
For ex. due to corruption (either caused by bad hardware or by some bug)
if we get some error while vacuuming, it can help us identify the block
in heap and or additional index information.

It sets up an error context callback to display additional information
with the error.  During different phases of vacuum (heap scan, heap
vacuum, index vacuum, index clean up, heap truncate), we update the error
context callback to display appropriate information.  We can extend it to
a bit more granular level like adding the phases for FSM operations or for
prefetching the blocks while truncating. However, I felt that it requires
adding many more error callback function calls and can make the code a bit
complex, so left those for now.

Author: Justin Pryzby, with few changes by Amit Kapila
Reviewed-by: Alvaro Herrera, Amit Kapila, Andres Freund, Michael Paquier
and Sawada Masahiko
Discussion: https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 250 ---
 src/tools/pgindent/typedefs.list |   1 +
 2 files changed, 226 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43efc32..c68d5952c8 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -144,6 +144,17 @@
  */
 #define ParallelVacuumIsActive(lps) PointerIsValid(lps)
 
+/* Phases of vacuum during which we report error context. */
+typedef enum
+{
+	VACUUM_ERRCB_PHASE_UNKNOWN,
+	VACUUM_ERRCB_PHASE_SCAN_HEAP,
+	VACUUM_ERRCB_PHASE_VACUUM_INDEX,
+	VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+	VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
+	VACUUM_ERRCB_PHASE_TRUNCATE
+} VacErrCbPhase;
+
 /*
  * LVDeadTuples stores the dead tuple TIDs collected during the heap scan.
  * This is allocated in the DSM segment in parallel mode and in local memory
@@ -270,6 +281,8 @@ typedef struct LVParallelState
 
 typedef struct LVRelStats
 {
+	char	   *relnamespace;
+	char	   *relname;
 	/* useindex = true means two-pass strategy; false means one-pass */
 	bool		useindex;
 	/* Overall statistics about rel */
@@ -290,8 +303,12 @@ typedef struct LVRelStats
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
-} LVRelStats;
 
+	/* Used for error callback */
+	char	   *indname;
+	BlockNumber blkno;			/* used only for heap operations */
+	VacErrCbPhase	phase;
+} LVRelStats;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -314,10 +331,10 @@ static void lazy_vacuum_all_indexes(Relation onerel, Relation *Irel,
 	LVRelStats *vacrelstats, LVParallelState *lps,
 	int nindexes);
 static void lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
-			  LVDeadTuples *dead_tuples, double reltuples);
+			  LVDeadTuples *dead_tuples, double reltuples, LVRelStats *vacrelstats);
 static void lazy_cleanup_index(Relation indrel,
 			   IndexBulkDeleteResult **stats,
-			   double reltuples, bool estimated_count);
+			   double reltuples, bool estimated_count, LVRelStats *vacrelstats);
 static int	lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
 			 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
 static bool should_attempt_truncation(VacuumParams *params,
@@ -337,13 +354,13 @@ static void lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult *
 		 int nindexes);
 static void parallel_vacuum_index(Relation 

Re: error context for vacuum to include block number

2020-03-24 Thread Masahiko Sawada
On Wed, 25 Mar 2020 at 14:08, Justin Pryzby  wrote:
>
> On Wed, Mar 25, 2020 at 10:22:21AM +0530, Amit Kapila wrote:
> > On Wed, Mar 25, 2020 at 10:05 AM Masahiko Sawada
> >  wrote:
> > >
> > > On Wed, 25 Mar 2020 at 12:44, Amit Kapila  wrote:
> > > >
> > > > On Tue, Mar 24, 2020 at 7:51 PM Masahiko Sawada
> > > >  wrote:
> > > > >
> > > > >
> > > > > I got the point. But if we set the error context before that, I think
> > > > > we need to change the error context message. The error context message
> > > > > of heap truncation phase is "while truncating relation \"%s.%s\" to %u
> > > > > blocks", but cbarg->blkno will be the number of blocks of the current
> > > > > relation.
> > > > >
> > > > >case VACUUM_ERRCB_PHASE_TRUNCATE:
> > > > >if (BlockNumberIsValid(cbarg->blkno))
> > > > >errcontext("while truncating relation \"%s.%s\" to %u 
> > > > > blocks",
> > > > >   cbarg->relnamespace, cbarg->relname, 
> > > > > cbarg->blkno);
> > > > >break;
> > > > >
> > > >
> > > > Do you mean to say that actually we are just prefetching or reading
> > > > the pages in count_nondeletable_pages() but the message doesn't have
> > > > any such indication?  If not that, what problem do you see with the
> > > > message?   What is your suggestion?
> > >
> > > I meant that with the patch, suppose that the table has 100 blocks and
> > > we're truncating it to 50 blocks in RelationTruncate(), the error
> > > context message will be "while truncating relation "aaa.bbb" to 100
> > > blocks", which is not correct. I think it should be "while truncating
> > > relation "aaa.bbb" to 50 blocks". We can know the relation can be
> > > truncated to 50 blocks by the result of count_nondeletable_pages(). So
> > > if we update the arguments before it we will use the number of blocks
> > > of relation before truncation.
> > >
> >
> > Won't the latest patch by Justin will fix this as he has updated the
> > block count after count_nondeletable_pages?  Apart from that, I feel
>
> The issue is if the error happens *during* count_nondeletable_pages().
> We don't want it to say "truncating relation to 100 blocks".

Right.

>
> > the first call to update_vacuum_error_cbarg in lazy_truncate_heap
> > should have input parameter as vacrelstats->nonempty_pages instead of
> > new_rel_pages to indicate the remaining pages after truncation?
>
> Yea, I think that addresses the issue.

+1

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-03-24 Thread Justin Pryzby
On Wed, Mar 25, 2020 at 10:22:21AM +0530, Amit Kapila wrote:
> On Wed, Mar 25, 2020 at 10:05 AM Masahiko Sawada
>  wrote:
> >
> > On Wed, 25 Mar 2020 at 12:44, Amit Kapila  wrote:
> > >
> > > On Tue, Mar 24, 2020 at 7:51 PM Masahiko Sawada
> > >  wrote:
> > > >
> > > >
> > > > I got the point. But if we set the error context before that, I think
> > > > we need to change the error context message. The error context message
> > > > of heap truncation phase is "while truncating relation \"%s.%s\" to %u
> > > > blocks", but cbarg->blkno will be the number of blocks of the current
> > > > relation.
> > > >
> > > >case VACUUM_ERRCB_PHASE_TRUNCATE:
> > > >if (BlockNumberIsValid(cbarg->blkno))
> > > >errcontext("while truncating relation \"%s.%s\" to %u 
> > > > blocks",
> > > >   cbarg->relnamespace, cbarg->relname, 
> > > > cbarg->blkno);
> > > >break;
> > > >
> > >
> > > Do you mean to say that actually we are just prefetching or reading
> > > the pages in count_nondeletable_pages() but the message doesn't have
> > > any such indication?  If not that, what problem do you see with the
> > > message?   What is your suggestion?
> >
> > I meant that with the patch, suppose that the table has 100 blocks and
> > we're truncating it to 50 blocks in RelationTruncate(), the error
> > context message will be "while truncating relation "aaa.bbb" to 100
> > blocks", which is not correct. I think it should be "while truncating
> > relation "aaa.bbb" to 50 blocks". We can know the relation can be
> > truncated to 50 blocks by the result of count_nondeletable_pages(). So
> > if we update the arguments before it we will use the number of blocks
> > of relation before truncation.
> >
> 
> Won't the latest patch by Justin will fix this as he has updated the
> block count after count_nondeletable_pages?  Apart from that, I feel

The issue is if the error happens *during* count_nondeletable_pages().
We don't want it to say "truncating relation to 100 blocks".

> the first call to update_vacuum_error_cbarg in lazy_truncate_heap
> should have input parameter as vacrelstats->nonempty_pages instead of
> new_rel_pages to indicate the remaining pages after truncation?

Yea, I think that addresses the issue.

-- 
Justin




Re: error context for vacuum to include block number

2020-03-24 Thread Amit Kapila
On Wed, Mar 25, 2020 at 10:05 AM Masahiko Sawada
 wrote:
>
> On Wed, 25 Mar 2020 at 12:44, Amit Kapila  wrote:
> >
> > On Tue, Mar 24, 2020 at 7:51 PM Masahiko Sawada
> >  wrote:
> > >
> > >
> > > I got the point. But if we set the error context before that, I think
> > > we need to change the error context message. The error context message
> > > of heap truncation phase is "while truncating relation \"%s.%s\" to %u
> > > blocks", but cbarg->blkno will be the number of blocks of the current
> > > relation.
> > >
> > >case VACUUM_ERRCB_PHASE_TRUNCATE:
> > >if (BlockNumberIsValid(cbarg->blkno))
> > >errcontext("while truncating relation \"%s.%s\" to %u 
> > > blocks",
> > >   cbarg->relnamespace, cbarg->relname, 
> > > cbarg->blkno);
> > >break;
> > >
> >
> > Do you mean to say that actually we are just prefetching or reading
> > the pages in count_nondeletable_pages() but the message doesn't have
> > any such indication?  If not that, what problem do you see with the
> > message?   What is your suggestion?
>
> I meant that with the patch, suppose that the table has 100 blocks and
> we're truncating it to 50 blocks in RelationTruncate(), the error
> context message will be "while truncating relation "aaa.bbb" to 100
> blocks", which is not correct. I think it should be "while truncating
> relation "aaa.bbb" to 50 blocks". We can know the relation can be
> truncated to 50 blocks by the result of count_nondeletable_pages(). So
> if we update the arguments before it we will use the number of blocks
> of relation before truncation.
>

Won't the latest patch by Justin will fix this as he has updated the
block count after count_nondeletable_pages?  Apart from that, I feel
the first call to update_vacuum_error_cbarg in lazy_truncate_heap
should have input parameter as vacrelstats->nonempty_pages instead of
new_rel_pages to indicate the remaining pages after truncation?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-24 Thread Justin Pryzby
On Wed, Mar 25, 2020 at 01:34:43PM +0900, Masahiko Sawada wrote:
> I meant that with the patch, suppose that the table has 100 blocks and
> we're truncating it to 50 blocks in RelationTruncate(), the error
> context message will be "while truncating relation "aaa.bbb" to 100
> blocks", which is not correct.

> I think it should be "while truncating
> relation "aaa.bbb" to 50 blocks". We can know the relation can be
> truncated to 50 blocks by the result of count_nondeletable_pages(). So
> if we update the arguments before it we will use the number of blocks
> of relation before truncation.

Hm, yea, at that point it's:
|new_rel_pages = RelationGetNumberOfBlocks(onerel);
..so we can do better.

> My suggestion is either that we change the error message to, for
> example, "while truncating relation "aaa.bbb" having 100 blocks", or
> that we change the patch so that we can use "50 blocks" in the error
> context message.

We could do:

update_vacuum_error_cbarg(vacrelstats,
  VACUUM_ERRCB_PHASE_TRUNCATE,
  InvalidBlockNumber, NULL, false);

new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
vacrelstats->blkno = new_rel_pages;

...

case VACUUM_ERRCB_PHASE_TRUNCATE:
if (BlockNumberIsValid(cbarg->blkno))
errcontext("while truncating relation \"%s.%s\" to %u blocks",
   cbarg->relnamespace, cbarg->relname, cbarg->blkno);
else
/* Error happened before/during count_nondeletable_pages() */
errcontext("while truncating relation \"%s.%s\"",
   cbarg->relnamespace, cbarg->relname);
break;

-- 
Justin




Re: error context for vacuum to include block number

2020-03-24 Thread Masahiko Sawada
On Wed, 25 Mar 2020 at 12:44, Amit Kapila  wrote:
>
> On Tue, Mar 24, 2020 at 7:51 PM Masahiko Sawada
>  wrote:
> >
> > On Tue, 24 Mar 2020 at 22:37, Amit Kapila  wrote:
> > >
> > > On Tue, Mar 24, 2020 at 6:18 PM Masahiko Sawada
> > >  wrote:
> > > >
> > > >
> > > > I've read through the latest version patch (v31), here are my comments:
> > > >
> > > > 1.
> > > > +/* Update error traceback information */
> > > > +olderrcbarg = *vacrelstats;
> > > > +update_vacuum_error_cbarg(vacrelstats,
> > > > +  VACUUM_ERRCB_PHASE_TRUNCATE,
> > > > new_rel_pages, NULL,
> > > > +  false);
> > > > +
> > > >  /*
> > > >   * Scan backwards from the end to verify that the end pages 
> > > > actually
> > > >   * contain no tuples.  This is *necessary*, not optional, 
> > > > because
> > > >   * other backends could have added tuples to these pages 
> > > > whilst we
> > > >   * were vacuuming.
> > > >   */
> > > >  new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
> > > >
> > > > We need to set the error context after setting new_rel_pages.
> > > >
> > >
> > > We want to cover the errors raised in count_nondeletable_pages().  In
> > > an earlier version of the patch, we had TRUNCATE_PREFETCH phase which
> > > use to cover those errors, but that was not good as we were
> > > setting/resetting it multiple times and it was not clear such a
> > > separate phase would add any value.
> >
> > I got the point. But if we set the error context before that, I think
> > we need to change the error context message. The error context message
> > of heap truncation phase is "while truncating relation \"%s.%s\" to %u
> > blocks", but cbarg->blkno will be the number of blocks of the current
> > relation.
> >
> >case VACUUM_ERRCB_PHASE_TRUNCATE:
> >if (BlockNumberIsValid(cbarg->blkno))
> >errcontext("while truncating relation \"%s.%s\" to %u 
> > blocks",
> >   cbarg->relnamespace, cbarg->relname, 
> > cbarg->blkno);
> >break;
> >
>
> Do you mean to say that actually we are just prefetching or reading
> the pages in count_nondeletable_pages() but the message doesn't have
> any such indication?  If not that, what problem do you see with the
> message?   What is your suggestion?

I meant that with the patch, suppose that the table has 100 blocks and
we're truncating it to 50 blocks in RelationTruncate(), the error
context message will be "while truncating relation "aaa.bbb" to 100
blocks", which is not correct. I think it should be "while truncating
relation "aaa.bbb" to 50 blocks". We can know the relation can be
truncated to 50 blocks by the result of count_nondeletable_pages(). So
if we update the arguments before it we will use the number of blocks
of relation before truncation.

My suggestion is either that we change the error message to, for
example, "while truncating relation "aaa.bbb" having 100 blocks", or
that we change the patch so that we can use "50 blocks" in the error
context message.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-03-24 Thread Amit Kapila
On Wed, Mar 25, 2020 at 8:49 AM Justin Pryzby  wrote:
>
> On Tue, Mar 24, 2020 at 09:47:30PM +0900, Masahiko Sawada wrote:
> > We need to set the error context after setting new_rel_pages.
>
> Done
>
> > The type name ErrCbPhase seems to be very generic name, how about
> > VacErrCbPhase or VacuumErrCbPhase?
>
> Done.
>
> Thanks for your review comments.
>

@@ -870,6 +904,12 @@ lazy_scan_heap(Relation onerel, VacuumParams
*params, LVRelStats *vacrelstats,
  else
  skipping_blocks = false;

+ /* Setup error traceback support for ereport() */
+ errcallback.callback = vacuum_error_callback;
+ errcallback.arg = vacrelstats;
+ errcallback.previous = error_context_stack;
+ error_context_stack = 

I think by mistake you have re-introduced this chunk of code.  We
don't need this as we have done it in the caller.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-24 Thread Amit Kapila
On Tue, Mar 24, 2020 at 7:51 PM Masahiko Sawada
 wrote:
>
> On Tue, 24 Mar 2020 at 22:37, Amit Kapila  wrote:
> >
> > On Tue, Mar 24, 2020 at 6:18 PM Masahiko Sawada
> >  wrote:
> > >
> > >
> > > I've read through the latest version patch (v31), here are my comments:
> > >
> > > 1.
> > > +/* Update error traceback information */
> > > +olderrcbarg = *vacrelstats;
> > > +update_vacuum_error_cbarg(vacrelstats,
> > > +  VACUUM_ERRCB_PHASE_TRUNCATE,
> > > new_rel_pages, NULL,
> > > +  false);
> > > +
> > >  /*
> > >   * Scan backwards from the end to verify that the end pages 
> > > actually
> > >   * contain no tuples.  This is *necessary*, not optional, because
> > >   * other backends could have added tuples to these pages whilst 
> > > we
> > >   * were vacuuming.
> > >   */
> > >  new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
> > >
> > > We need to set the error context after setting new_rel_pages.
> > >
> >
> > We want to cover the errors raised in count_nondeletable_pages().  In
> > an earlier version of the patch, we had TRUNCATE_PREFETCH phase which
> > use to cover those errors, but that was not good as we were
> > setting/resetting it multiple times and it was not clear such a
> > separate phase would add any value.
>
> I got the point. But if we set the error context before that, I think
> we need to change the error context message. The error context message
> of heap truncation phase is "while truncating relation \"%s.%s\" to %u
> blocks", but cbarg->blkno will be the number of blocks of the current
> relation.
>
>case VACUUM_ERRCB_PHASE_TRUNCATE:
>if (BlockNumberIsValid(cbarg->blkno))
>errcontext("while truncating relation \"%s.%s\" to %u blocks",
>   cbarg->relnamespace, cbarg->relname, cbarg->blkno);
>break;
>

Do you mean to say that actually we are just prefetching or reading
the pages in count_nondeletable_pages() but the message doesn't have
any such indication?  If not that, what problem do you see with the
message?   What is your suggestion?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-24 Thread Justin Pryzby
On Tue, Mar 24, 2020 at 09:47:30PM +0900, Masahiko Sawada wrote:
> We need to set the error context after setting new_rel_pages.

Done

> The type name ErrCbPhase seems to be very generic name, how about
> VacErrCbPhase or VacuumErrCbPhase?

Done.

Thanks for your review comments.

-- 
Justin
>From 26c57039135896ebf29b96c172d35d869ed1ce69 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v32 1/3] Introduce vacuum errcontext to display additional
 information.

The additional information displayed will be block number for error
occurring while processing heap and index name for error occurring
while processing the index.

This will help us in diagnosing the problems that occur during a vacuum.
For ex. due to corruption (either caused by bad hardware or by some bug)
if we get some error while vacuuming, it can help us identify the block
in heap and or additional index information.

It sets up an error context callback to display additional information
with the error.  During different phases of vacuum (heap scan, heap
vacuum, index vacuum, index clean up, heap truncate), we update the error
context callback to display appropriate information.  We can extend it to
a bit more granular level like adding the phases for FSM operations or for
prefetching the blocks while truncating. However, I felt that it requires
adding many more error callback function calls and can make the code a bit
complex, so left those for now.

Author: Justin Pryzby, with few changes by Amit Kapila
Reviewed-by: Alvaro Herrera, Amit Kapila, Andres Freund, Michael Paquier
and Sawada Masahiko
Discussion: https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 260 ---
 src/tools/pgindent/typedefs.list |   1 +
 2 files changed, 236 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43efc32..cbea791968 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -144,6 +144,17 @@
  */
 #define ParallelVacuumIsActive(lps) PointerIsValid(lps)
 
+/* Phases of vacuum during which we report error context. */
+typedef enum
+{
+	VACUUM_ERRCB_PHASE_UNKNOWN,
+	VACUUM_ERRCB_PHASE_SCAN_HEAP,
+	VACUUM_ERRCB_PHASE_VACUUM_INDEX,
+	VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+	VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
+	VACUUM_ERRCB_PHASE_TRUNCATE
+} VacErrCbPhase;
+
 /*
  * LVDeadTuples stores the dead tuple TIDs collected during the heap scan.
  * This is allocated in the DSM segment in parallel mode and in local memory
@@ -270,6 +281,8 @@ typedef struct LVParallelState
 
 typedef struct LVRelStats
 {
+	char	   *relnamespace;
+	char	   *relname;
 	/* useindex = true means two-pass strategy; false means one-pass */
 	bool		useindex;
 	/* Overall statistics about rel */
@@ -290,8 +303,12 @@ typedef struct LVRelStats
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
-} LVRelStats;
 
+	/* Used for error callback */
+	char	   *indname;
+	BlockNumber blkno;			/* used only for heap operations */
+	VacErrCbPhase	phase;
+} LVRelStats;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -314,10 +331,10 @@ static void lazy_vacuum_all_indexes(Relation onerel, Relation *Irel,
 	LVRelStats *vacrelstats, LVParallelState *lps,
 	int nindexes);
 static void lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
-			  LVDeadTuples *dead_tuples, double reltuples);
+			  LVDeadTuples *dead_tuples, double reltuples, LVRelStats *vacrelstats);
 static void lazy_cleanup_index(Relation indrel,
 			   IndexBulkDeleteResult **stats,
-			   double reltuples, bool estimated_count);
+			   double reltuples, bool estimated_count, LVRelStats *vacrelstats);
 static int	lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
 			 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
 static bool should_attempt_truncation(VacuumParams *params,
@@ -337,13 +354,13 @@ static void lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult *
 		 int nindexes);
 static void parallel_vacuum_index(Relation *Irel, IndexBulkDeleteResult **stats,
   LVShared *lvshared, LVDeadTuples *dead_tuples,
-  int nindexes);
+  int nindexes, LVRelStats *vacrelstats);
 static void vacuum_indexes_leader(Relation *Irel, IndexBulkDeleteResult **stats,
   LVRelStats *vacrelstats, LVParallelState *lps,
   int nindexes);
 static void vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats,
 			 LVShared *lvshared, LVSharedIndStats *shared_indstats,
-			 LVDeadTuples *dead_tuples);
+			 LVDeadTuples *dead_tuples, LVRelStats *vacrelstats);
 static void lazy_cleanup_all_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
 	 LVRelStats *vacrelstats, 

Re: error context for vacuum to include block number

2020-03-24 Thread Masahiko Sawada
On Tue, 24 Mar 2020 at 22:37, Amit Kapila  wrote:
>
> On Tue, Mar 24, 2020 at 6:18 PM Masahiko Sawada
>  wrote:
> >
> >
> > I've read through the latest version patch (v31), here are my comments:
> >
> > 1.
> > +/* Update error traceback information */
> > +olderrcbarg = *vacrelstats;
> > +update_vacuum_error_cbarg(vacrelstats,
> > +  VACUUM_ERRCB_PHASE_TRUNCATE,
> > new_rel_pages, NULL,
> > +  false);
> > +
> >  /*
> >   * Scan backwards from the end to verify that the end pages 
> > actually
> >   * contain no tuples.  This is *necessary*, not optional, because
> >   * other backends could have added tuples to these pages whilst we
> >   * were vacuuming.
> >   */
> >  new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
> >
> > We need to set the error context after setting new_rel_pages.
> >
>
> We want to cover the errors raised in count_nondeletable_pages().  In
> an earlier version of the patch, we had TRUNCATE_PREFETCH phase which
> use to cover those errors, but that was not good as we were
> setting/resetting it multiple times and it was not clear such a
> separate phase would add any value.

I got the point. But if we set the error context before that, I think
we need to change the error context message. The error context message
of heap truncation phase is "while truncating relation \"%s.%s\" to %u
blocks", but cbarg->blkno will be the number of blocks of the current
relation.

   case VACUUM_ERRCB_PHASE_TRUNCATE:
   if (BlockNumberIsValid(cbarg->blkno))
   errcontext("while truncating relation \"%s.%s\" to %u blocks",
  cbarg->relnamespace, cbarg->relname, cbarg->blkno);
   break;

>
> > 2.
> > +   vacrelstats->relnamespace =
> > get_namespace_name(RelationGetNamespace(onerel));
> > +   vacrelstats->relname = pstrdup(RelationGetRelationName(onerel));
> >
> > I think we can pfree these two variables to avoid a memory leak during
> > vacuum on multiple relations.
> >
>
> Yeah, I had also thought about it but I noticed that we are not
> freeing for vacrelstats.  Also, I think the memory is allocated in
> TopTransactionContext which should be freed via
> CommitTransactionCommand before vacuuming of the next relation, so not
> sure if there is much value in freeing those variables.

Right, thank you for explanation. I was concerned memory leak because
relation name and schema name could be large compared to vacrelstats
but I agree to free them at commit time.

>
> > 3.
> > +/* Phases of vacuum during which we report error context. */
> > +typedef enum
> > +{
> > +   VACUUM_ERRCB_PHASE_UNKNOWN,
> > +   VACUUM_ERRCB_PHASE_SCAN_HEAP,
> > +   VACUUM_ERRCB_PHASE_VACUUM_INDEX,
> > +   VACUUM_ERRCB_PHASE_VACUUM_HEAP,
> > +   VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
> > +   VACUUM_ERRCB_PHASE_TRUNCATE
> > +} ErrCbPhase;
> >
> > Comparing to the vacuum progress phases, there is not a phase for
> > final cleanup which includes updating the relation stats. Is there any
> > reason why we don't have that phase for ErrCbPhase?
> >
>
> I think we can add it if we want, but it was not clear to me if that is 
> helpful.

Yeah, me too. The most likely place where an error happens is
vac_update_relstats but the error message seems to be enough.

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-03-24 Thread Amit Kapila
On Tue, Mar 24, 2020 at 7:18 PM Justin Pryzby  wrote:
>
> On Tue, Mar 24, 2020 at 07:07:03PM +0530, Amit Kapila wrote:
> > On Tue, Mar 24, 2020 at 6:18 PM Masahiko Sawada 
> >  wrote:
> > > 1.
> > > +/* Update error traceback information */
> > > +olderrcbarg = *vacrelstats;
> > > +update_vacuum_error_cbarg(vacrelstats,
> > > +  VACUUM_ERRCB_PHASE_TRUNCATE,
> > > new_rel_pages, NULL,
> > > +  false);
> > > +
> > >  /*
> > >   * Scan backwards from the end to verify that the end pages 
> > > actually
> > >   * contain no tuples.  This is *necessary*, not optional, because
> > >   * other backends could have added tuples to these pages whilst 
> > > we
> > >   * were vacuuming.
> > >   */
> > >  new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
> > >
> > > We need to set the error context after setting new_rel_pages.
> >
> > We want to cover the errors raised in count_nondeletable_pages().  In
> > an earlier version of the patch, we had TRUNCATE_PREFETCH phase which
> > use to cover those errors, but that was not good as we were
> > setting/resetting it multiple times and it was not clear such a
> > separate phase would add any value.
>
> I insisted on covering count_nondeletable_pages since it calls ReadBuffer(),
> but I think we need to at least set vacrelsats->blkno = new_rel_pages, since 
> it
> may be different, right ?
>

yeah, that makes sense.

> > > 2.
> > > +   vacrelstats->relnamespace =
> > > get_namespace_name(RelationGetNamespace(onerel));
> > > +   vacrelstats->relname = pstrdup(RelationGetRelationName(onerel));
> > >
> > > I think we can pfree these two variables to avoid a memory leak during
> > > vacuum on multiple relations.
> >
> > Yeah, I had also thought about it but I noticed that we are not
> > freeing for vacrelstats.  Also, I think the memory is allocated in
> > TopTransactionContext which should be freed via
> > CommitTransactionCommand before vacuuming of the next relation, so not
> > sure if there is much value in freeing those variables.
>
> One small reason to free them is that (as Tom mentioned upthread) it's good to
> ensure that those variables are their own allocation, and not depending on
> being able to access relcache or anything else during an unexpected error.
>

That is a good reason to allocate them separately but not for doing
retail free especially when the caller of the function will free the
context in which that is allocated.  I think Sawada-San's concern was
that it will leak memory across the vacuum of multiple relations but
that is not the case here.  Won't it look odd if we are freeing memory
for members of vacrelstats but not for vacrelstats itself?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-24 Thread Justin Pryzby
On Tue, Mar 24, 2020 at 07:07:03PM +0530, Amit Kapila wrote:
> On Tue, Mar 24, 2020 at 6:18 PM Masahiko Sawada 
>  wrote:
> > 1.
> > +/* Update error traceback information */
> > +olderrcbarg = *vacrelstats;
> > +update_vacuum_error_cbarg(vacrelstats,
> > +  VACUUM_ERRCB_PHASE_TRUNCATE,
> > new_rel_pages, NULL,
> > +  false);
> > +
> >  /*
> >   * Scan backwards from the end to verify that the end pages 
> > actually
> >   * contain no tuples.  This is *necessary*, not optional, because
> >   * other backends could have added tuples to these pages whilst we
> >   * were vacuuming.
> >   */
> >  new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
> >
> > We need to set the error context after setting new_rel_pages.
> 
> We want to cover the errors raised in count_nondeletable_pages().  In
> an earlier version of the patch, we had TRUNCATE_PREFETCH phase which
> use to cover those errors, but that was not good as we were
> setting/resetting it multiple times and it was not clear such a
> separate phase would add any value.

I insisted on covering count_nondeletable_pages since it calls ReadBuffer(),
but I think we need to at least set vacrelsats->blkno = new_rel_pages, since it
may be different, right ?

> > 2.
> > +   vacrelstats->relnamespace =
> > get_namespace_name(RelationGetNamespace(onerel));
> > +   vacrelstats->relname = pstrdup(RelationGetRelationName(onerel));
> >
> > I think we can pfree these two variables to avoid a memory leak during
> > vacuum on multiple relations.
> 
> Yeah, I had also thought about it but I noticed that we are not
> freeing for vacrelstats.  Also, I think the memory is allocated in
> TopTransactionContext which should be freed via
> CommitTransactionCommand before vacuuming of the next relation, so not
> sure if there is much value in freeing those variables.

One small reason to free them is that (as Tom mentioned upthread) it's good to
ensure that those variables are their own allocation, and not depending on
being able to access relcache or anything else during an unexpected error.

-- 
Justin




Re: error context for vacuum to include block number

2020-03-24 Thread Amit Kapila
On Tue, Mar 24, 2020 at 6:18 PM Masahiko Sawada
 wrote:
>
>
> I've read through the latest version patch (v31), here are my comments:
>
> 1.
> +/* Update error traceback information */
> +olderrcbarg = *vacrelstats;
> +update_vacuum_error_cbarg(vacrelstats,
> +  VACUUM_ERRCB_PHASE_TRUNCATE,
> new_rel_pages, NULL,
> +  false);
> +
>  /*
>   * Scan backwards from the end to verify that the end pages actually
>   * contain no tuples.  This is *necessary*, not optional, because
>   * other backends could have added tuples to these pages whilst we
>   * were vacuuming.
>   */
>  new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
>
> We need to set the error context after setting new_rel_pages.
>

We want to cover the errors raised in count_nondeletable_pages().  In
an earlier version of the patch, we had TRUNCATE_PREFETCH phase which
use to cover those errors, but that was not good as we were
setting/resetting it multiple times and it was not clear such a
separate phase would add any value.

> 2.
> +   vacrelstats->relnamespace =
> get_namespace_name(RelationGetNamespace(onerel));
> +   vacrelstats->relname = pstrdup(RelationGetRelationName(onerel));
>
> I think we can pfree these two variables to avoid a memory leak during
> vacuum on multiple relations.
>

Yeah, I had also thought about it but I noticed that we are not
freeing for vacrelstats.  Also, I think the memory is allocated in
TopTransactionContext which should be freed via
CommitTransactionCommand before vacuuming of the next relation, so not
sure if there is much value in freeing those variables.

> 3.
> +/* Phases of vacuum during which we report error context. */
> +typedef enum
> +{
> +   VACUUM_ERRCB_PHASE_UNKNOWN,
> +   VACUUM_ERRCB_PHASE_SCAN_HEAP,
> +   VACUUM_ERRCB_PHASE_VACUUM_INDEX,
> +   VACUUM_ERRCB_PHASE_VACUUM_HEAP,
> +   VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
> +   VACUUM_ERRCB_PHASE_TRUNCATE
> +} ErrCbPhase;
>
> Comparing to the vacuum progress phases, there is not a phase for
> final cleanup which includes updating the relation stats. Is there any
> reason why we don't have that phase for ErrCbPhase?
>

I think we can add it if we want, but it was not clear to me if that is helpful.

> The type name ErrCbPhase seems to be very generic name, how about
> VacErrCbPhase or VacuumErrCbPhase?
>

It sounds like a better name.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-24 Thread Masahiko Sawada
On Tue, 24 Mar 2020 at 18:19, Amit Kapila  wrote:
>
> On Tue, Mar 24, 2020 at 2:37 PM Masahiko Sawada
>  wrote:
> >
> > On Tue, 24 Mar 2020 at 13:53, Amit Kapila  wrote:
> > >
> > > On Tue, Mar 24, 2020 at 9:46 AM Justin Pryzby  
> > > wrote:
> > > >
> > > > On Mon, Mar 23, 2020 at 02:25:14PM +0530, Amit Kapila wrote:
> > > > > > Yea, and it would be misleading if we reported "while scanning 
> > > > > > block..of
> > > > > > relation" if we actually failed while writing its FSM.
> > > > > >
> > > > > > My previous patches did this:
> > > > > >
> > > > > > +   case VACUUM_ERRCB_PHASE_VACUUM_FSM:
> > > > > > +   errcontext("while vacuuming free space map 
> > > > > > of relation \"%s.%s\"",
> > > > > > +   cbarg->relnamespace, 
> > > > > > cbarg->relname);
> > > > > > +   break;
> > > > > >
> > > > >
> > > > > In what kind of errors will this help?
> > > >
> > > > If there's an I/O error on an _fsm file, for one.
> > > >
> > >
> > > If there is a read or write failure, then we give error like below
> > > which already has required information.
> > > ereport(ERROR,
> > > (errcode_for_file_access(),
> > > errmsg("could not read block %u in file \"%s\": %m",
> > > blocknum, FilePathName(v->mdfd_vfd;
> >
> > Yeah, you're right. We, however, cannot see that the error happened
> > while recording freespace or while vacuuming freespace map but perhaps
> > we can see the situation by seeing the error message in most cases. So
> > I'm okay with the current set of phases.
> >
> > I'll also test the current version of patch today.
> >
>
> okay, thanks.

I've read through the latest version patch (v31), here are my comments:

1.
+/* Update error traceback information */
+olderrcbarg = *vacrelstats;
+update_vacuum_error_cbarg(vacrelstats,
+  VACUUM_ERRCB_PHASE_TRUNCATE,
new_rel_pages, NULL,
+  false);
+
 /*
  * Scan backwards from the end to verify that the end pages actually
  * contain no tuples.  This is *necessary*, not optional, because
  * other backends could have added tuples to these pages whilst we
  * were vacuuming.
  */
 new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);

We need to set the error context after setting new_rel_pages.

2.
+   vacrelstats->relnamespace =
get_namespace_name(RelationGetNamespace(onerel));
+   vacrelstats->relname = pstrdup(RelationGetRelationName(onerel));

I think we can pfree these two variables to avoid a memory leak during
vacuum on multiple relations.

3.
+/* Phases of vacuum during which we report error context. */
+typedef enum
+{
+   VACUUM_ERRCB_PHASE_UNKNOWN,
+   VACUUM_ERRCB_PHASE_SCAN_HEAP,
+   VACUUM_ERRCB_PHASE_VACUUM_INDEX,
+   VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+   VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
+   VACUUM_ERRCB_PHASE_TRUNCATE
+} ErrCbPhase;

Comparing to the vacuum progress phases, there is not a phase for
final cleanup which includes updating the relation stats. Is there any
reason why we don't have that phase for ErrCbPhase?

The type name ErrCbPhase seems to be very generic name, how about
VacErrCbPhase or VacuumErrCbPhase?

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-03-24 Thread Amit Kapila
On Tue, Mar 24, 2020 at 2:37 PM Masahiko Sawada
 wrote:
>
> On Tue, 24 Mar 2020 at 13:53, Amit Kapila  wrote:
> >
> > On Tue, Mar 24, 2020 at 9:46 AM Justin Pryzby  wrote:
> > >
> > > On Mon, Mar 23, 2020 at 02:25:14PM +0530, Amit Kapila wrote:
> > > > > Yea, and it would be misleading if we reported "while scanning 
> > > > > block..of
> > > > > relation" if we actually failed while writing its FSM.
> > > > >
> > > > > My previous patches did this:
> > > > >
> > > > > +   case VACUUM_ERRCB_PHASE_VACUUM_FSM:
> > > > > +   errcontext("while vacuuming free space map of 
> > > > > relation \"%s.%s\"",
> > > > > +   cbarg->relnamespace, 
> > > > > cbarg->relname);
> > > > > +   break;
> > > > >
> > > >
> > > > In what kind of errors will this help?
> > >
> > > If there's an I/O error on an _fsm file, for one.
> > >
> >
> > If there is a read or write failure, then we give error like below
> > which already has required information.
> > ereport(ERROR,
> > (errcode_for_file_access(),
> > errmsg("could not read block %u in file \"%s\": %m",
> > blocknum, FilePathName(v->mdfd_vfd;
>
> Yeah, you're right. We, however, cannot see that the error happened
> while recording freespace or while vacuuming freespace map but perhaps
> we can see the situation by seeing the error message in most cases. So
> I'm okay with the current set of phases.
>
> I'll also test the current version of patch today.
>

okay, thanks.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-24 Thread Masahiko Sawada
On Tue, 24 Mar 2020 at 13:53, Amit Kapila  wrote:
>
> On Tue, Mar 24, 2020 at 9:46 AM Justin Pryzby  wrote:
> >
> > On Mon, Mar 23, 2020 at 02:25:14PM +0530, Amit Kapila wrote:
> > > > Yea, and it would be misleading if we reported "while scanning block..of
> > > > relation" if we actually failed while writing its FSM.
> > > >
> > > > My previous patches did this:
> > > >
> > > > +   case VACUUM_ERRCB_PHASE_VACUUM_FSM:
> > > > +   errcontext("while vacuuming free space map of 
> > > > relation \"%s.%s\"",
> > > > +   cbarg->relnamespace, 
> > > > cbarg->relname);
> > > > +   break;
> > > >
> > >
> > > In what kind of errors will this help?
> >
> > If there's an I/O error on an _fsm file, for one.
> >
>
> If there is a read or write failure, then we give error like below
> which already has required information.
> ereport(ERROR,
> (errcode_for_file_access(),
> errmsg("could not read block %u in file \"%s\": %m",
> blocknum, FilePathName(v->mdfd_vfd;

Yeah, you're right. We, however, cannot see that the error happened
while recording freespace or while vacuuming freespace map but perhaps
we can see the situation by seeing the error message in most cases. So
I'm okay with the current set of phases.

I'll also test the current version of patch today.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-03-23 Thread Amit Kapila
On Tue, Mar 24, 2020 at 9:46 AM Justin Pryzby  wrote:
>
> On Mon, Mar 23, 2020 at 02:25:14PM +0530, Amit Kapila wrote:
> > > Yea, and it would be misleading if we reported "while scanning block..of
> > > relation" if we actually failed while writing its FSM.
> > >
> > > My previous patches did this:
> > >
> > > +   case VACUUM_ERRCB_PHASE_VACUUM_FSM:
> > > +   errcontext("while vacuuming free space map of 
> > > relation \"%s.%s\"",
> > > +   cbarg->relnamespace, 
> > > cbarg->relname);
> > > +   break;
> > >
> >
> > In what kind of errors will this help?
>
> If there's an I/O error on an _fsm file, for one.
>

If there is a read or write failure, then we give error like below
which already has required information.
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not read block %u in file \"%s\": %m",
blocknum, FilePathName(v->mdfd_vfd;

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-23 Thread Amit Kapila
On Sat, Mar 21, 2020 at 1:33 PM Justin Pryzby  wrote:
>
> On Sat, Mar 21, 2020 at 01:00:03PM +0530, Amit Kapila wrote:
> > I have addressed your comments in the attached patch.  Today, while
> > testing error messages from various phases, I noticed that the patch
> > fails to display error context if the error occurs during the truncate
> > phase.  The reason was that we had popped the error stack in
> > lazy_scan_heap due to which it never calls the callback.  I think we
> > need to set up callback at a higher level as is done in the attached
> > patch.  I have done the testing by inducing errors in various phases
> > and it prints the required information.  Let me know what you think of
> > the attached?
>
> Thanks.  My tests with TRUNCATE were probably back when we had multiple
> push/pop cycles of local error callbacks.
>
> This passes my tests.
>

Today, I have done some additional testing with parallel workers and
it seems to display the appropriate errors.  See below:

postgres=# create table t1(c1 int, c2 char(500), c3 char(500));
CREATE TABLE
postgres=# insert into t1 values(generate_series(1,30),'', '');
INSERT 0 30
postgres=# delete from t1 where c1 > 20;
DELETE 10
postgres=# vacuum t1;
ERROR:  Error induced during index vacuum
CONTEXT:  while vacuuming index "idx_t1_c3" of relation "public.t1"
parallel worker
while vacuuming index "idx_t1_c2" of relation "public.t1"

Here, you can see that the index names displayed in two messages are
different, basically, the leader backend got the error generated in
worker when it was vacuuming the other index.

I have used the attached patch to induce error.

I think the patch is in good shape now and I am happy with it.  We can
think of proceeding with this unless we want the further enhancement
for FSM which I am not sure is a good idea.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


induce_error_index.patch
Description: Binary data


Re: error context for vacuum to include block number

2020-03-23 Thread Amit Kapila
On Mon, Mar 23, 2020 at 1:46 PM Justin Pryzby  wrote:
>
> On Mon, Mar 23, 2020 at 04:39:54PM +0900, Masahiko Sawada wrote:
> > I've already commented on earlier patch but I personally think we'd be
> > better to report freespace map vacuum as a separate phase. The
> > progress report of vacuum command is used to know the progress but
> > this error context would be useful for diagnostic of failure such as
> > disk corruption. For visibility map, I think the visibility map bit
> > that are processed during vacuum is exactly corresponding to the block
> > number but since freespace map vacuum processes the range of blocks
> > I've sometimes had trouble with identifying the cause of the problem.
>

What extra information we can print that can help?  The main problem I
see is that we need to sprinkle errorcallback update function at many
more places.  We can think of writing a wrapper function for FSM calls
used in a vacuum, but I think those can be used only for vacuum.

> Yea, and it would be misleading if we reported "while scanning block..of
> relation" if we actually failed while writing its FSM.
>
> My previous patches did this:
>
> +   case VACUUM_ERRCB_PHASE_VACUUM_FSM:
> +   errcontext("while vacuuming free space map of 
> relation \"%s.%s\"",
> +   cbarg->relnamespace, cbarg->relname);
> +   break;
>

In what kind of errors will this help?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-23 Thread Justin Pryzby
On Mon, Mar 23, 2020 at 04:39:54PM +0900, Masahiko Sawada wrote:
> I've already commented on earlier patch but I personally think we'd be
> better to report freespace map vacuum as a separate phase. The
> progress report of vacuum command is used to know the progress but
> this error context would be useful for diagnostic of failure such as
> disk corruption. For visibility map, I think the visibility map bit
> that are processed during vacuum is exactly corresponding to the block
> number but since freespace map vacuum processes the range of blocks
> I've sometimes had trouble with identifying the cause of the problem.

Yea, and it would be misleading if we reported "while scanning block..of
relation" if we actually failed while writing its FSM.

My previous patches did this:

+   case VACUUM_ERRCB_PHASE_VACUUM_FSM: 

   
+   errcontext("while vacuuming free space map of relation 
\"%s.%s\"", 

+   cbarg->relnamespace, cbarg->relname);   

   
+   break;  

   

Are you suggesting it should report the start (or end?) block number ?

-- 
Justin




Re: error context for vacuum to include block number

2020-03-23 Thread Masahiko Sawada
On Sat, 21 Mar 2020 at 16:30, Amit Kapila  wrote:
>
> On Fri, Mar 20, 2020 at 8:24 PM Justin Pryzby  wrote:
> >
> > On Fri, Mar 20, 2020 at 04:58:08PM +0530, Amit Kapila wrote:
> > > See, how the attached looks?  I have written a commit message as well,
> > > see if I have missed anyone is from the credit list?
> >
> > Thanks for looking again.
> >
> > Couple tweaks:
> >
>
> I have addressed your comments in the attached patch.  Today, while
> testing error messages from various phases, I noticed that the patch
> fails to display error context if the error occurs during the truncate
> phase.  The reason was that we had popped the error stack in
> lazy_scan_heap due to which it never calls the callback.  I think we
> need to set up callback at a higher level as is done in the attached
> patch.  I have done the testing by inducing errors in various phases
> and it prints the required information.  Let me know what you think of
> the attached?

I've looked at the current version patch.

+/* Phases of vacuum during which we report error context. */
+typedef enum
+{
+   VACUUM_ERRCB_PHASE_UNKNOWN,
+   VACUUM_ERRCB_PHASE_SCAN_HEAP,
+   VACUUM_ERRCB_PHASE_VACUUM_INDEX,
+   VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+   VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
+   VACUUM_ERRCB_PHASE_TRUNCATE
+} ErrCbPhase;

I've already commented on earlier patch but I personally think we'd be
better to report freespace map vacuum as a separate phase. The
progress report of vacuum command is used to know the progress but
this error context would be useful for diagnostic of failure such as
disk corruption. For visibility map, I think the visibility map bit
that are processed during vacuum is exactly corresponding to the block
number but since freespace map vacuum processes the range of blocks
I've sometimes had trouble with identifying the cause of the problem.
What do you think?

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-03-21 Thread Justin Pryzby
On Sat, Mar 21, 2020 at 01:00:03PM +0530, Amit Kapila wrote:
> I have addressed your comments in the attached patch.  Today, while
> testing error messages from various phases, I noticed that the patch
> fails to display error context if the error occurs during the truncate
> phase.  The reason was that we had popped the error stack in
> lazy_scan_heap due to which it never calls the callback.  I think we
> need to set up callback at a higher level as is done in the attached
> patch.  I have done the testing by inducing errors in various phases
> and it prints the required information.  Let me know what you think of
> the attached?

Thanks.  My tests with TRUNCATE were probably back when we had multiple
push/pop cycles of local error callbacks.

This passes my tests.

-- 
Justin




Re: error context for vacuum to include block number

2020-03-21 Thread Amit Kapila
On Fri, Mar 20, 2020 at 8:24 PM Justin Pryzby  wrote:
>
> On Fri, Mar 20, 2020 at 04:58:08PM +0530, Amit Kapila wrote:
> > See, how the attached looks?  I have written a commit message as well,
> > see if I have missed anyone is from the credit list?
>
> Thanks for looking again.
>
> Couple tweaks:
>

I have addressed your comments in the attached patch.  Today, while
testing error messages from various phases, I noticed that the patch
fails to display error context if the error occurs during the truncate
phase.  The reason was that we had popped the error stack in
lazy_scan_heap due to which it never calls the callback.  I think we
need to set up callback at a higher level as is done in the attached
patch.  I have done the testing by inducing errors in various phases
and it prints the required information.  Let me know what you think of
the attached?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v31-0001-Introduce-vacuum-errcontext-to-display-additiona.patch
Description: Binary data


Re: error context for vacuum to include block number

2020-03-20 Thread Justin Pryzby
On Fri, Mar 20, 2020 at 04:58:08PM +0530, Amit Kapila wrote:
> See, how the attached looks?  I have written a commit message as well,
> see if I have missed anyone is from the credit list?

Thanks for looking again.

Couple tweaks:

+/* Phases of vacuum during which an error can occur. */

Can you say: "during which we report error context"
Otherwise it sounds like we've somehow precluded errors from happening anywhere
else, which I don't think we can claim.

In the commit messsage:
|The additional information displayed will be block number for errors
|occurred while processing heap and index name for errors occurred
|while processing the index.

=> error occurring

|This will help us in diagnosing the problems that occurred during a
|vacuum.  For ex. due to corruption if we get some error while vacuuming,

=> problems that occur

Maybe it should say that this will help both 1) admins who have corruption due
to hardware (say); and, 2) developer's with corruption due to a bug.

-- 
Justin




Re: error context for vacuum to include block number

2020-03-20 Thread Amit Kapila
On Fri, Mar 20, 2020 at 12:21 PM Justin Pryzby  wrote:
>
> On Fri, Mar 20, 2020 at 11:24:25AM +0530, Amit Kapila wrote:
> > On Fri, Mar 20, 2020 at 5:59 AM Justin Pryzby  wrote:
> > That makes sense.  I have a few more comments:
> >
> > 1.
> > + VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
> > +} errcb_phase;
> >
> > Why do you need a comma after the last element in the above enum?
>
> It's not needed but a common convention to avoid needing a two-line patch in
> order to add a line at the end, like:
>
> - foo
> + foo,
> + bar
>

I don't think this is required and we don't have this at other places,
so I removed it.   Apart from that, I made a few additional changes
(a) moved the typedef to a different palace as it was looking odd
in-between other struct defines, (b) renamed the enum ErrCbPhase as
that suits more to nearby other trypedefs (c) added/edited comments at
few places, (d) ran pgindent.

See, how the attached looks?  I have written a commit message as well,
see if I have missed anyone is from the credit list?

>
> > 8. Can we think of some easy way to add tests for this patch?
>
> Is it possible to make an corrupted index which errors during scan during
> regress tests ?
>

I don't think so.

For now, let's focus on the main patch.  Once that is committed, we
can look into the other code rearrangement/cleanup patches.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v30-0001-Introduce-vacuum-errcontext-to-display-additiona.patch
Description: Binary data


Re: error context for vacuum to include block number

2020-03-20 Thread Justin Pryzby
On Fri, Mar 20, 2020 at 11:24:25AM +0530, Amit Kapila wrote:
> On Fri, Mar 20, 2020 at 5:59 AM Justin Pryzby  wrote:
> That makes sense.  I have a few more comments:
> 
> 1.
> + VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
> +} errcb_phase;
> 
> Why do you need a comma after the last element in the above enum?

It's not needed but a common convention to avoid needing a two-line patch in
order to add a line at the end, like:

- foo
+ foo,
+ bar

> 2. update_vacuum_error_cbarg(vacrelstats, VACUUM_ERRCB_PHASE_SCAN_HEAP, 
> InvalidBlockNumber, NULL);
> 
> Why do we need to call update_vacuum_error_cbarg at the above place
> after we have added a new one inside for.. loop?

If we're going to update the error_context_stack global point to our callback,
without our vacrelstats arg, it'd better be initialized.  I changed to do
vacrelstats->phase = UNKNOWN after its allocation in heap_vacuum_rel().
That matches parallel_vacuum_main().

> 4. At this and similar places, change the comment to something like:
> "Reset the old phase information for error traceback".

I did this:
/* Revert back to the old phase information for error traceback */

> 5. Subject: [PATCH v28 3/5] Drop reltuples
> 
> Is this patch directly related to the main patch (vacuum errcontext to
> show block being processed) or is it an independent improvement of
> code?

It's a cleanup after implementing the new feature.  I left it as a separate
patch to make review easier of the essential patch and of the cleanup.  
See here:
https://www.postgresql.org/message-id/CA%2Bfd4k4JA3YkP6-HUqHOqu6cTGqqZUhBfsMqQ4WXkD0Y8uotUg%40mail.gmail.com

> 6. [PATCH v28 4/5] add callback for truncation
> 
> + VACUUM_ERRCB_PHASE_TRUNCATE,
> + VACUUM_ERRCB_PHASE_TRUNCATE_PREFETCH,
> 
> Do we really need separate phases for truncate and truncate_prefetch?

The context is that there was a request to add err context for (yet another)
phase, TRUNCATE.  But I insisted on adding it to prefetch, too, since it does
ReadBuffer.  But there was an objection that the error might be misleading if
it said "while truncating" but it was actually "prefetching to truncate".

> 7. Is there a reason to keep the truncate phase patch separate from
> the main patch? If not, let's merge them.

They were separate since it's the most-recently added part, and (as now)
there's still discussion about it.

> 8. Can we think of some easy way to add tests for this patch?

Is it possible to make an corrupted index which errors during scan during
regress tests ?

Thanks for looking.

-- 
Justin
>From 941bd2292bd1f3ed926aa20eb572906a6a36df3a Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v29 1/3] vacuum errcontext to show block being processed

Discussion:
https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 245 ---
 src/tools/pgindent/typedefs.list |   1 +
 2 files changed, 221 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43efc32..531f8471db 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -268,8 +268,20 @@ typedef struct LVParallelState
 	int			nindexes_parallel_condcleanup;
 } LVParallelState;
 
+typedef enum
+{
+	VACUUM_ERRCB_PHASE_UNKNOWN,
+	VACUUM_ERRCB_PHASE_SCAN_HEAP,
+	VACUUM_ERRCB_PHASE_VACUUM_INDEX,
+	VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+	VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
+	VACUUM_ERRCB_PHASE_TRUNCATE,
+} errcb_phase;
+
 typedef struct LVRelStats
 {
+	char	   *relnamespace;
+	char	   *relname;
 	/* useindex = true means two-pass strategy; false means one-pass */
 	bool		useindex;
 	/* Overall statistics about rel */
@@ -290,8 +302,12 @@ typedef struct LVRelStats
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
-} LVRelStats;
 
+	/* Used for error callback: */
+	char	   *indname;
+	BlockNumber blkno;			/* used only for heap operations */
+	errcb_phase phase;
+} LVRelStats;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -314,10 +330,10 @@ static void lazy_vacuum_all_indexes(Relation onerel, Relation *Irel,
 	LVRelStats *vacrelstats, LVParallelState *lps,
 	int nindexes);
 static void lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
-			  LVDeadTuples *dead_tuples, double reltuples);
+			  LVDeadTuples *dead_tuples, double reltuples, LVRelStats *vacrelstats);
 static void lazy_cleanup_index(Relation indrel,
 			   IndexBulkDeleteResult **stats,
-			   double reltuples, bool estimated_count);
+			   double reltuples, bool estimated_count, LVRelStats *vacrelstats);
 static int	lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
 			 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
 static bool should_attempt_truncation(VacuumParams *params,
@@ -337,13 +353,13 @@ 

Re: error context for vacuum to include block number

2020-03-19 Thread Amit Kapila
On Fri, Mar 20, 2020 at 5:59 AM Justin Pryzby  wrote:
>
> On Thu, Mar 19, 2020 at 03:29:31PM -0500, Justin Pryzby wrote:
> > I was going to suggest that we could do that by passing in a pointer to a 
> > local
> > variable "LVRelStats olderrcbarg", like:
> > |update_vacuum_error_cbarg(vacrelstats, 
> > VACUUM_ERRCB_PHASE_SCAN_HEAP,
> > |  blkno, NULL, );
> >
> > and then later call:
> > |update_vacuum_error_cbarg(vacrelstats, olderrcbarg.phase,
> > |   olderrcbarg.blkno,
> > |   olderrcbarg.indname,
> > |   NULL);
> >
> > I implemented it in a separate patch, but it may be a bad idea, due to 
> > freeing
> > indname.  To exercise it, I tried to cause a crash by changing "else if
> > (errcbarg->indname)" to "if" without else, but wasn't able to cause a crash,
> > probably just due to having a narrow timing window.
>
> I realized it was better for the caller to just assign the struct on its own.
>

That makes sense.  I have a few more comments:

1.
+ VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
+} errcb_phase;

Why do you need a comma after the last element in the above enum?

2.
+ /* Setup error traceback support for ereport() */
+ update_vacuum_error_cbarg(vacrelstats, VACUUM_ERRCB_PHASE_SCAN_HEAP,
+ InvalidBlockNumber, NULL);
+ errcallback.callback = vacuum_error_callback;
+ errcallback.arg = vacrelstats;
+ errcallback.previous = error_context_stack;
+ error_context_stack = 

Why do we need to call update_vacuum_error_cbarg at the above place
after we have added a new one inside for.. loop?

3.
+ * free_oldindex is true if the previous "indname" should be freed.  It must be
+ * false if the caller has copied the old LVRelSTats,

/LVRelSTats/LVRelStats

4.
/* Clear the error traceback phase */
  update_vacuum_error_cbarg(vacrelstats,
-   VACUUM_ERRCB_PHASE_UNKNOWN, InvalidBlockNumber,
-   NULL);
+   olderrcbarg.phase,
+   olderrcbarg.blkno,
+   olderrcbarg.indname,
+   true);

At this and similar places, change the comment to something like:
"Reset the old phase information for error traceback".

5.
Subject: [PATCH v28 3/5] Drop reltuples

---
 src/backend/access/heap/vacuumlazy.c | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

Is this patch directly related to the main patch (vacuum errcontext to
show block being processed) or is it an independent improvement of
code?

6.
[PATCH v28 4/5] add callback for truncation

+ VACUUM_ERRCB_PHASE_TRUNCATE,
+ VACUUM_ERRCB_PHASE_TRUNCATE_PREFETCH,

Do we really need separate phases for truncate and truncate_prefetch?
We have only one phase for a progress update, similarly, I think
having one phase for error reporting should be sufficient.  It will
also reduce the number of places where we need to call
update_vacuum_error_cbarg.  I think we can set
VACUUM_ERRCB_PHASE_TRUNCATE before count_nondeletable_pages and reset
it at the place you are doing right now in the patch.

7. Is there a reason to keep the truncate phase patch separate from
the main patch? If not, let's merge them.

8. Can we think of some easy way to add tests for this patch?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-19 Thread Justin Pryzby
On Thu, Mar 19, 2020 at 03:29:31PM -0500, Justin Pryzby wrote:
> I was going to suggest that we could do that by passing in a pointer to a 
> local
> variable "LVRelStats olderrcbarg", like:
> |update_vacuum_error_cbarg(vacrelstats, VACUUM_ERRCB_PHASE_SCAN_HEAP,
> |  blkno, NULL, );
> 
> and then later call:
> |update_vacuum_error_cbarg(vacrelstats, olderrcbarg.phase,
> |   olderrcbarg.blkno,
> |   olderrcbarg.indname,
> |   NULL);
> 
> I implemented it in a separate patch, but it may be a bad idea, due to freeing
> indname.  To exercise it, I tried to cause a crash by changing "else if
> (errcbarg->indname)" to "if" without else, but wasn't able to cause a crash,
> probably just due to having a narrow timing window.

I realized it was better for the caller to just assign the struct on its own.

Which gives me an excuse for resending patch, which is needed since I spent too
much time testing this that I failed to update the tip commit for the new
argument.

> It's probably a good idea to pass the indname rather than the relation in any
> case.

I included that with 0001.
I also fixed the argument name in the prototype (Relation rel vs indrel).

And removed these, which were the whole motivation behind saving the values.
|Set the error context while continuing heap scan

-- 
Justin
>From a77a85638404f061d43f06bf3a566f5aa1c5d5f4 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v28 1/5] vacuum errcontext to show block being processed

Discussion:
https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 209 +++
 src/tools/pgindent/typedefs.list |   1 +
 2 files changed, 185 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43efc32..768a69120b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -268,8 +268,19 @@ typedef struct LVParallelState
 	int			nindexes_parallel_condcleanup;
 } LVParallelState;
 
+typedef enum
+{
+	VACUUM_ERRCB_PHASE_UNKNOWN,
+	VACUUM_ERRCB_PHASE_SCAN_HEAP,
+	VACUUM_ERRCB_PHASE_VACUUM_INDEX,
+	VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+	VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
+} errcb_phase;
+
 typedef struct LVRelStats
 {
+	char	   *relnamespace;
+	char	   *relname;
 	/* useindex = true means two-pass strategy; false means one-pass */
 	bool		useindex;
 	/* Overall statistics about rel */
@@ -290,8 +301,12 @@ typedef struct LVRelStats
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
-} LVRelStats;
 
+	/* Used for error callback: */
+	char	   *indname;
+	BlockNumber blkno;			/* used only for heap operations */
+	errcb_phase phase;
+} LVRelStats;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -314,10 +329,10 @@ static void lazy_vacuum_all_indexes(Relation onerel, Relation *Irel,
 	LVRelStats *vacrelstats, LVParallelState *lps,
 	int nindexes);
 static void lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
-			  LVDeadTuples *dead_tuples, double reltuples);
+			  LVDeadTuples *dead_tuples, double reltuples, LVRelStats *vacrelstats);
 static void lazy_cleanup_index(Relation indrel,
 			   IndexBulkDeleteResult **stats,
-			   double reltuples, bool estimated_count);
+			   double reltuples, bool estimated_count, LVRelStats *vacrelstats);
 static int	lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
 			 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
 static bool should_attempt_truncation(VacuumParams *params,
@@ -337,13 +352,13 @@ static void lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult *
 		 int nindexes);
 static void parallel_vacuum_index(Relation *Irel, IndexBulkDeleteResult **stats,
   LVShared *lvshared, LVDeadTuples *dead_tuples,
-  int nindexes);
+  int nindexes, LVRelStats *vacrelstats);
 static void vacuum_indexes_leader(Relation *Irel, IndexBulkDeleteResult **stats,
   LVRelStats *vacrelstats, LVParallelState *lps,
   int nindexes);
 static void vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats,
 			 LVShared *lvshared, LVSharedIndStats *shared_indstats,
-			 LVDeadTuples *dead_tuples);
+			 LVDeadTuples *dead_tuples, LVRelStats *vacrelstats);
 static void lazy_cleanup_all_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
 	 LVRelStats *vacrelstats, LVParallelState *lps,
 	 int nindexes);
@@ -361,6 +376,9 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 

Re: error context for vacuum to include block number

2020-03-19 Thread Justin Pryzby
On Thu, Mar 19, 2020 at 03:18:32PM +0530, Amit Kapila wrote:
> > You're right.  PHASE_SCAN_HEAP was set, but only inside a conditional.
> 
> I think if we do it inside for loop, then we don't need to set it
> conditionally at multiple places.  I have changed like that in the
> attached patch, see if that makes sense to you.

Yes, makes sense, and it's right near pgstat_progress_update_param, which is
nice.

> > Both those issues are due to a change in the most recent patch.  In the
> > previous patch, the PHASE_VACUUM_HEAP was set only by lazy_vacuum_heap(), 
> > and I
> > moved it recently to vacuum_page.  But it needs to be copied, as you point 
> > out.
> >
> > That's unfortunate due to a lack of symmetry: lazy_vacuum_page does its own
> > progress update, which suggests to me that it should also set its own error
> > callback.  It'd be nicer if EITHER the calling functions did that 
> > (scan_heap()
> > and vacuum_heap()) or if it was sufficient for the called function
> > (vacuum_page()) to do it.
> 
> Right, but adding in callers will spread at multiple places.
> 
> I have made a few additional changes in the attached. (a) Removed
> VACUUM_ERRCB_PHASE_VACUUM_FSM as I think we have to add it at many
> places, you seem to have added for FreeSpaceMapVacuumRange() but not
> for RecordPageWithFreeSpace(), (b) Reset the phase to
> VACUUM_ERRCB_PHASE_UNKNOWN after finishing the work for a particular
> phase, so that the new phase shouldn't continue in the callers.
> 
> I have another idea to make (b) better.  How about if a call to
> update_vacuum_error_cbarg returns information of old phase (blkno,
> phase, and indname) along with what it is doing now and then once the
> work for the current phase is over it can reset it back with old phase
> information?   This way the callee after finishing the new phase work
> would be able to reset back to the old phase.  This will work
> something similar to our MemoryContextSwitchTo.

I was going to suggest that we could do that by passing in a pointer to a local
variable "LVRelStats olderrcbarg", like:
|update_vacuum_error_cbarg(vacrelstats, VACUUM_ERRCB_PHASE_SCAN_HEAP,
|  blkno, NULL, );

and then later call:
|update_vacuum_error_cbarg(vacrelstats, olderrcbarg.phase,
|   olderrcbarg.blkno,
|   olderrcbarg.indname,
|   NULL);

I implemented it in a separate patch, but it may be a bad idea, due to freeing
indname.  To exercise it, I tried to cause a crash by changing "else if
(errcbarg->indname)" to "if" without else, but wasn't able to cause a crash,
probably just due to having a narrow timing window.

As written, we only pfree indname if we do actually "reset" the cbarg, which is
in the two routines handling indexes.  It's probably a good idea to pass the
indname rather than the relation in any case.

I rebased the rest of my patches on top of yours.

-- 
Justin
>From a1ef4498cf93a9971be5c1683ceb62879ab9bd17 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v27 1/5] vacuum errcontext to show block being processed

Discussion:
https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 215 +++
 src/tools/pgindent/typedefs.list |   1 +
 2 files changed, 191 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43efc32..92bac9a24d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -268,8 +268,19 @@ typedef struct LVParallelState
 	int			nindexes_parallel_condcleanup;
 } LVParallelState;
 
+typedef enum
+{
+	VACUUM_ERRCB_PHASE_UNKNOWN,
+	VACUUM_ERRCB_PHASE_SCAN_HEAP,
+	VACUUM_ERRCB_PHASE_VACUUM_INDEX,
+	VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+	VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
+} errcb_phase;
+
 typedef struct LVRelStats
 {
+	char	   *relnamespace;
+	char	   *relname;
 	/* useindex = true means two-pass strategy; false means one-pass */
 	bool		useindex;
 	/* Overall statistics about rel */
@@ -290,8 +301,12 @@ typedef struct LVRelStats
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
-} LVRelStats;
 
+	/* Used for error callback: */
+	char	   *indname;
+	BlockNumber blkno;			/* used only for heap operations */
+	errcb_phase phase;
+} LVRelStats;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -314,10 +329,10 @@ static void lazy_vacuum_all_indexes(Relation onerel, Relation *Irel,
 	LVRelStats *vacrelstats, LVParallelState *lps,
 	int nindexes);
 static void lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
-			  LVDeadTuples *dead_tuples, double reltuples);
+			  LVDeadTuples *dead_tuples, double reltuples, LVRelStats *vacrelstats);
 static 

Re: error context for vacuum to include block number

2020-03-19 Thread Amit Kapila
On Thu, Mar 19, 2020 at 9:38 AM Justin Pryzby  wrote:
>
> On Thu, Mar 19, 2020 at 08:20:51AM +0530, Amit Kapila wrote:
>
> > > Few other comments:
> > > 1. The error in lazy_vacuum_heap can either have phase
> > > VACUUM_ERRCB_PHASE_INDEX_* or VACUUM_ERRCB_PHASE_VACUUM_HEAP depending
> > > on when it occurs.  If it occurs the first time it enters that
> > > function before a call to lazy_vacuum_page, it will use phase
> > > VACUUM_ERRCB_PHASE_INDEX_*, otherwise, it would use
> > > VACUUM_ERRCB_PHASE_VACUUM_HEAP.  The reason is lazy_vacuum_index or
> > > lazy_cleanup_index won't reset the phase after leaving that function.
>
> I think you mean that lazy_vacuum_heap() calls ReadBuffer itself, so needs to
> be in phase VACUUM_HEAP even before it calls vacuum_page().
>

Right.

> > > 2. Also once we set phase as VACUUM_ERRCB_PHASE_VACUUM_HEAP via
> > > lazy_vacuum_page, it doesn't seem to be reset to
> > > VACUUM_ERRCB_PHASE_SCAN_HEAP even when we do scanning of the heap.  I
> > > think you need to set phase VACUUM_ERRCB_PHASE_SCAN_HEAP inside loop.
>
> You're right.  PHASE_SCAN_HEAP was set, but only inside a conditional.
>

I think if we do it inside for loop, then we don't need to set it
conditionally at multiple places.  I have changed like that in the
attached patch, see if that makes sense to you.

> Both those issues are due to a change in the most recent patch.  In the
> previous patch, the PHASE_VACUUM_HEAP was set only by lazy_vacuum_heap(), and 
> I
> moved it recently to vacuum_page.  But it needs to be copied, as you point 
> out.
>
> That's unfortunate due to a lack of symmetry: lazy_vacuum_page does its own
> progress update, which suggests to me that it should also set its own error
> callback.  It'd be nicer if EITHER the calling functions did that (scan_heap()
> and vacuum_heap()) or if it was sufficient for the called function
> (vacuum_page()) to do it.
>

Right, but adding in callers will spread at multiple places.

I have made a few additional changes in the attached. (a) Removed
VACUUM_ERRCB_PHASE_VACUUM_FSM as I think we have to add it at many
places, you seem to have added for FreeSpaceMapVacuumRange() but not
for RecordPageWithFreeSpace(), (b) Reset the phase to
VACUUM_ERRCB_PHASE_UNKNOWN after finishing the work for a particular
phase, so that the new phase shouldn't continue in the callers.

I have another idea to make (b) better.  How about if a call to
update_vacuum_error_cbarg returns information of old phase (blkno,
phase, and indname) along with what it is doing now and then once the
work for the current phase is over it can reset it back with old phase
information?   This way the callee after finishing the new phase work
would be able to reset back to the old phase.  This will work
something similar to our MemoryContextSwitchTo.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v26-0001-vacuum-errcontext-to-show-block-being-processed.patch
Description: Binary data


Re: error context for vacuum to include block number

2020-03-18 Thread Justin Pryzby
On Thu, Mar 19, 2020 at 08:20:51AM +0530, Amit Kapila wrote:
> On Tue, Mar 17, 2020 at 11:51 AM Amit Kapila  wrote:
> > On Tue, Mar 17, 2020 at 9:52 AM Amit Kapila  wrote:
> > >
> > > Another minor point, don't we need to remove the error stack by doing
> > > "error_context_stack = errcallback.previous;" in parallel_vacuum_main?

It's a good idea, thanks.

> > Few other comments:
> > 1. The error in lazy_vacuum_heap can either have phase
> > VACUUM_ERRCB_PHASE_INDEX_* or VACUUM_ERRCB_PHASE_VACUUM_HEAP depending
> > on when it occurs.  If it occurs the first time it enters that
> > function before a call to lazy_vacuum_page, it will use phase
> > VACUUM_ERRCB_PHASE_INDEX_*, otherwise, it would use
> > VACUUM_ERRCB_PHASE_VACUUM_HEAP.  The reason is lazy_vacuum_index or
> > lazy_cleanup_index won't reset the phase after leaving that function.

I think you mean that lazy_vacuum_heap() calls ReadBuffer itself, so needs to
be in phase VACUUM_HEAP even before it calls vacuum_page().

> > 2. Also once we set phase as VACUUM_ERRCB_PHASE_VACUUM_HEAP via
> > lazy_vacuum_page, it doesn't seem to be reset to
> > VACUUM_ERRCB_PHASE_SCAN_HEAP even when we do scanning of the heap.  I
> > think you need to set phase VACUUM_ERRCB_PHASE_SCAN_HEAP inside loop.

You're right.  PHASE_SCAN_HEAP was set, but only inside a conditional.

Both those issues are due to a change in the most recent patch.  In the
previous patch, the PHASE_VACUUM_HEAP was set only by lazy_vacuum_heap(), and I
moved it recently to vacuum_page.  But it needs to be copied, as you point out.

That's unfortunate due to a lack of symmetry: lazy_vacuum_page does its own
progress update, which suggests to me that it should also set its own error
callback.  It'd be nicer if EITHER the calling functions did that (scan_heap()
and vacuum_heap()) or if it was sufficient for the called function
(vacuum_page()) to do it.  

> > I think we need to be a bit more careful in setting/resetting the
> > phase information correctly so that it doesn't display the wrong info
> > in the context in an error message.
> 
> Justin, are you planning to work on the pending comments?  If you
> want, I can try to fix some of these.  We have less time left for this
> CF, so we need to do things a bit quicker.

Thanks for reminding.

-- 
Justin
>From bc30a83b96ffcd55420d602eeadbd397fec31fd0 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v25 1/4] vacuum errcontext to show block being processed

Discussion:
https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 202 +++
 1 file changed, 177 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43efc32..9f0efa5aad 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -268,8 +268,19 @@ typedef struct LVParallelState
 	int			nindexes_parallel_condcleanup;
 } LVParallelState;
 
+typedef enum {
+	VACUUM_ERRCB_PHASE_UNKNOWN,
+	VACUUM_ERRCB_PHASE_SCAN_HEAP,
+	VACUUM_ERRCB_PHASE_VACUUM_INDEX,
+	VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+	VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
+	VACUUM_ERRCB_PHASE_VACUUM_FSM,
+} errcb_phase;
+
 typedef struct LVRelStats
 {
+	char 		*relnamespace;
+	char		*relname;
 	/* useindex = true means two-pass strategy; false means one-pass */
 	bool		useindex;
 	/* Overall statistics about rel */
@@ -290,8 +301,12 @@ typedef struct LVRelStats
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
-} LVRelStats;
 
+	/* Used for error callback: */
+	char 		*indname;
+	BlockNumber blkno;	/* used only for heap operations */
+	errcb_phase	phase;
+} LVRelStats;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -314,10 +329,10 @@ static void lazy_vacuum_all_indexes(Relation onerel, Relation *Irel,
 	LVRelStats *vacrelstats, LVParallelState *lps,
 	int nindexes);
 static void lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
-			  LVDeadTuples *dead_tuples, double reltuples);
+			  LVDeadTuples *dead_tuples, double reltuples, LVRelStats *vacrelstats);
 static void lazy_cleanup_index(Relation indrel,
 			   IndexBulkDeleteResult **stats,
-			   double reltuples, bool estimated_count);
+			   double reltuples, bool estimated_count, LVRelStats *vacrelstats);
 static int	lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
 			 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
 static bool should_attempt_truncation(VacuumParams *params,
@@ -337,13 +352,13 @@ static void lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult *
 		 int nindexes);
 static void parallel_vacuum_index(Relation *Irel, IndexBulkDeleteResult **stats,
   LVShared *lvshared, LVDeadTuples *dead_tuples,
-  int nindexes);
+

Re: error context for vacuum to include block number

2020-03-18 Thread Amit Kapila
On Tue, Mar 17, 2020 at 11:51 AM Amit Kapila  wrote:
>
> On Tue, Mar 17, 2020 at 9:52 AM Amit Kapila  wrote:
> >
> >
> > Another minor point, don't we need to remove the error stack by doing
> > "error_context_stack = errcallback.previous;" in parallel_vacuum_main?
> >
>
> Few other comments:
> 1. The error in lazy_vacuum_heap can either have phase
> VACUUM_ERRCB_PHASE_INDEX_* or VACUUM_ERRCB_PHASE_VACUUM_HEAP depending
> on when it occurs.  If it occurs the first time it enters that
> function before a call to lazy_vacuum_page, it will use phase
> VACUUM_ERRCB_PHASE_INDEX_*, otherwise, it would use
> VACUUM_ERRCB_PHASE_VACUUM_HEAP.  The reason is lazy_vacuum_index or
> lazy_cleanup_index won't reset the phase after leaving that function.
>
> 2. Also once we set phase as VACUUM_ERRCB_PHASE_VACUUM_HEAP via
> lazy_vacuum_page, it doesn't seem to be reset to
> VACUUM_ERRCB_PHASE_SCAN_HEAP even when we do scanning of the heap.  I
> think you need to set phase VACUUM_ERRCB_PHASE_SCAN_HEAP inside loop.
>
> I think we need to be a bit more careful in setting/resetting the
> phase information correctly so that it doesn't display the wrong info
> in the context in an error message.
>

Justin, are you planning to work on the pending comments?  If you
want, I can try to fix some of these.  We have less time left for this
CF, so we need to do things a bit quicker.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-17 Thread Amit Kapila
On Tue, Mar 17, 2020 at 9:52 AM Amit Kapila  wrote:
>
>
> Another minor point, don't we need to remove the error stack by doing
> "error_context_stack = errcallback.previous;" in parallel_vacuum_main?
>

Few other comments:
1. The error in lazy_vacuum_heap can either have phase
VACUUM_ERRCB_PHASE_INDEX_* or VACUUM_ERRCB_PHASE_VACUUM_HEAP depending
on when it occurs.  If it occurs the first time it enters that
function before a call to lazy_vacuum_page, it will use phase
VACUUM_ERRCB_PHASE_INDEX_*, otherwise, it would use
VACUUM_ERRCB_PHASE_VACUUM_HEAP.  The reason is lazy_vacuum_index or
lazy_cleanup_index won't reset the phase after leaving that function.

2. Also once we set phase as VACUUM_ERRCB_PHASE_VACUUM_HEAP via
lazy_vacuum_page, it doesn't seem to be reset to
VACUUM_ERRCB_PHASE_SCAN_HEAP even when we do scanning of the heap.  I
think you need to set phase VACUUM_ERRCB_PHASE_SCAN_HEAP inside loop.

I think we need to be a bit more careful in setting/resetting the
phase information correctly so that it doesn't display the wrong info
in the context in an error message.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-16 Thread Amit Kapila
On Tue, Mar 17, 2020 at 9:21 AM Justin Pryzby  wrote:
>
> On Tue, Mar 03, 2020 at 10:05:42PM +0900, Masahiko Sawada wrote:
> > I was concerned about fsm vacuum; vacuum error context might show heap
> > scan while actually doing fsm vacuum. But perhaps we can update
> > callback args for that. That would be helpful for user to distinguish
> > that the problem seems to be either in heap vacuum or in fsm vacuum.
>
> On Tue, Mar 03, 2020 at 04:49:00PM -0300, Alvaro Herrera wrote:
> > Your use of the progress-report enum now has two warts -- the "-1"
> > value, and this one,
> >
> > > +#define PROGRESS_VACUUM_PHASE_VACUUM_FSM   7 /* For error 
> > > reporting only */
> >
> > I'd rather you define a new enum, in lazyvacuum.c.
>
> On Mon, Mar 16, 2020 at 11:44:25AM +0530, Amit Kapila wrote:
> > > On Wed, Mar 04, 2020 at 04:21:06PM +0900, Masahiko Sawada wrote:
> > > > Thank you for updating the patch. But we have two more places where we
> > > > do fsm vacuum.
> > >
> > > Oops, thanks.
> ...
> > it is not clear whether it is a good idea to keep a phase like
> > VACUUM_ERRCB_PHASE_VACUUM_FSM as it has added additional updates in
> > multiple places in the code.
>
> I think you're suggesting to rip out VACUUM_ERRCB_PHASE_VACUUM_FSM, and allow
> reporting any errors there with an error context like "while scanning heap".
>

Right, because that is what we do for progress updates.

> An alternative in the three places using VACUUM_ERRCB_PHASE_VACUUM_FSM is to
> set:
>
> |phase = VACUUM_ERRCB_PHASE_UNKNOWN;
>
> to avoid reporting any error context until another phase is set.
>

Right, that is an alternative, but not sure if it is worth adding
additional code.  I am trying to see if we can get this functionality
without adding code at too many places primarily because the code in
this area is already complex, so adding more things can make it
difficult to understand.

Another minor point, don't we need to remove the error stack by doing
"error_context_stack = errcallback.previous;" in parallel_vacuum_main?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-16 Thread Justin Pryzby
On Tue, Mar 03, 2020 at 10:05:42PM +0900, Masahiko Sawada wrote:
> I was concerned about fsm vacuum; vacuum error context might show heap
> scan while actually doing fsm vacuum. But perhaps we can update
> callback args for that. That would be helpful for user to distinguish
> that the problem seems to be either in heap vacuum or in fsm vacuum.

On Tue, Mar 03, 2020 at 04:49:00PM -0300, Alvaro Herrera wrote:
> Your use of the progress-report enum now has two warts -- the "-1"
> value, and this one,
> 
> > +#define PROGRESS_VACUUM_PHASE_VACUUM_FSM   7 /* For error 
> > reporting only */
> 
> I'd rather you define a new enum, in lazyvacuum.c.

On Mon, Mar 16, 2020 at 11:44:25AM +0530, Amit Kapila wrote:
> > On Wed, Mar 04, 2020 at 04:21:06PM +0900, Masahiko Sawada wrote:
> > > Thank you for updating the patch. But we have two more places where we
> > > do fsm vacuum.
> >
> > Oops, thanks.
...
> it is not clear whether it is a good idea to keep a phase like
> VACUUM_ERRCB_PHASE_VACUUM_FSM as it has added additional updates in
> multiple places in the code.

I think you're suggesting to rip out VACUUM_ERRCB_PHASE_VACUUM_FSM, and allow
reporting any errors there with an error context like "while scanning heap".

An alternative in the three places using VACUUM_ERRCB_PHASE_VACUUM_FSM is to
set:

|phase = VACUUM_ERRCB_PHASE_UNKNOWN;

to avoid reporting any error context until another phase is set.

-- 
Justin




Re: error context for vacuum to include block number

2020-03-16 Thread Alvaro Herrera
On 2020-Mar-16, Amit Kapila wrote:

> 2.
> + /* Setup error traceback support for ereport() */
> + update_vacuum_error_cbarg(vacrelstats, VACUUM_ERRCB_PHASE_SCAN_HEAP,
> + InvalidBlockNumber, NULL);
> + errcallback.callback = vacuum_error_callback;
> + errcallback.arg = vacrelstats;
> + errcallback.previous = error_context_stack;
> + error_context_stack = 
> ..
> ..
> + /* Init vacrelstats for use as error callback by parallel worker: */
> + vacrelstats.relnamespace = get_namespace_name(RelationGetNamespace(onerel));
> + vacrelstats.relname = pstrdup(RelationGetRelationName(onerel));
> + vacrelstats.indname = NULL;
> + vacrelstats.phase = VACUUM_ERRCB_PHASE_UNKNOWN; /* Not yet processing */
> +
> + /* Setup error traceback support for ereport() */
> + errcallback.callback = vacuum_error_callback;
> + errcallback.arg = 
> + errcallback.previous = error_context_stack;
> + error_context_stack = 
> +
> 
> I think the code can be bit simplified if we have a function
> setup_vacuum_error_ctx which takes necessary parameters and fill the
> required vacrelstats params, setup errcallback.  Then we can use
> update_vacuum_error_cbarg at required places.

Heh, he had that and I took it away -- it looked unnatural.  I thought
changing error_context_stack inside such a function, then resetting it
back to "previous" outside the function, was too leaky an abstraction.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-03-16 Thread Amit Kapila
On Thu, Mar 5, 2020 at 3:22 AM Justin Pryzby  wrote:
>
> On Wed, Mar 04, 2020 at 04:21:06PM +0900, Masahiko Sawada wrote:
> > Thank you for updating the patch. But we have two more places where we
> > do fsm vacuum.
>
> Oops, thanks.
>
> I realized that vacuum_page is called not only from lazy_vacuum_heap, but also
> directly from lazy_scan_heap, which failed to update errcbarg.  So I changed 
> to
> update errcbarg in vacuum_page.
>
> What about these other calls ?  I think granularity of individual function
> calls requires a debugger, but is it fine issue if errors here are attributed
> to (say) "scanning heap" ?
>
> GetRecordedFreeSpace
> heap_*_freeze_tuple
> heap_page_prune
> HeapTupleSatisfiesVacuum
> LockBufferForCleanup
> MarkBufferDirty
> Page*AllVisible
> PageGetHeapFreeSpace
> RecordPageWithFreeSpace
> visibilitymap_*
> VM_ALL_FROZEN
>

I think we can keep granularity the same as we have for progress
update functionality which means "scanning heap" is fine.  On similar
lines, it is not clear whether it is a good idea to keep a phase like
VACUUM_ERRCB_PHASE_VACUUM_FSM as it has added additional updates in
multiple places in the code.

Few other comments:
1.
+ /* Init vacrelstats for use as error callback by parallel worker: */
+ vacrelstats.relnamespace = get_namespace_name(RelationGetNamespace(onerel));

It looks a bit odd that the comment is ended with semicolon (:), is
there any reason for same?

2.
+ /* Setup error traceback support for ereport() */
+ update_vacuum_error_cbarg(vacrelstats, VACUUM_ERRCB_PHASE_SCAN_HEAP,
+ InvalidBlockNumber, NULL);
+ errcallback.callback = vacuum_error_callback;
+ errcallback.arg = vacrelstats;
+ errcallback.previous = error_context_stack;
+ error_context_stack = 
..
..
+ /* Init vacrelstats for use as error callback by parallel worker: */
+ vacrelstats.relnamespace = get_namespace_name(RelationGetNamespace(onerel));
+ vacrelstats.relname = pstrdup(RelationGetRelationName(onerel));
+ vacrelstats.indname = NULL;
+ vacrelstats.phase = VACUUM_ERRCB_PHASE_UNKNOWN; /* Not yet processing */
+
+ /* Setup error traceback support for ereport() */
+ errcallback.callback = vacuum_error_callback;
+ errcallback.arg = 
+ errcallback.previous = error_context_stack;
+ error_context_stack = 
+

I think the code can be bit simplified if we have a function
setup_vacuum_error_ctx which takes necessary parameters and fill the
required vacrelstats params, setup errcallback.  Then we can use
update_vacuum_error_cbarg at required places.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-04 Thread Justin Pryzby
On Tue, Mar 03, 2020 at 04:49:00PM -0300, Alvaro Herrera wrote:
> On 2020-Mar-03, Justin Pryzby wrote:
> > On Thu, Feb 27, 2020 at 09:09:42PM -0300, Alvaro Herrera wrote:
> > > > +   case PROGRESS_VACUUM_PHASE_VACUUM_HEAP:
> > > > +   if (BlockNumberIsValid(cbarg->blkno))
> > > > +   errcontext("while vacuuming block %u of 
> > > > relation \"%s.%s\"",
> > > 
> > > I think you should still call errcontext() when blkno is invalid.
> > 
> > In my experience while testing, the conditional avoids lots of CONTEXT noise
> > from interrupted autovacuum, at least.  I couldn't easily reproduce it with 
> > the
> > current patch, though, maybe due to less pushing and popping.
> 
> I think you're saying that the code had the bug that too many lines were
> reported because of excessive stack pushes, and you worked around it by
> making the errcontext() be conditional; and that now the bug is fixed by
> avoiding the push/pop games -- which explains why you can no longer
> reproduce it.  I don't see why you want to keep the no-longer-needed
> workaround.

No - the issue I observed from autovacuum ("while scanning block number
4294967295") was unrelated to showing multiple context lines (that issue I only
saw in the v22 patch, and was related to vacuum_one_index being used by both
leader and parallel workers).

The locations showing a block number first set vacrelstats->blkno to
InvalidBlockNumber, and then later update the vacrelstats->blkno from a loop
variable.  I think today's v24 patch makes it harder to hit the window where
it's set to InvalidBlockNumber, by moving VACUUM_HEAP context into
vacuum_page(), but I don't think we should rely on an absence of
CHECK_FOR_INTERRUPTS() to avoid misleading noise context.  

-- 
Justin




Re: error context for vacuum to include block number

2020-03-04 Thread Justin Pryzby
On Wed, Mar 04, 2020 at 04:21:06PM +0900, Masahiko Sawada wrote:
> Thank you for updating the patch. But we have two more places where we
> do fsm vacuum.

Oops, thanks.

I realized that vacuum_page is called not only from lazy_vacuum_heap, but also
directly from lazy_scan_heap, which failed to update errcbarg.  So I changed to
update errcbarg in vacuum_page.

What about these other calls ?  I think granularity of individual function
calls requires a debugger, but is it fine issue if errors here are attributed
to (say) "scanning heap" ?

GetRecordedFreeSpace
heap_*_freeze_tuple
heap_page_prune
HeapTupleSatisfiesVacuum
LockBufferForCleanup
MarkBufferDirty
Page*AllVisible
PageGetHeapFreeSpace
RecordPageWithFreeSpace
visibilitymap_*
VM_ALL_FROZEN

> These functions have LVDeadTuples and LVRelStats but LVDeadTuples can
> be referred by LVRelStats. If we want to use LVRelStats as callback
> argument, we can remove function arguments that can be referred by
> LVRelStats.

That doesn't work easily with parallel vacuum, which passes not
vacrelstats->dead_tuples, but a dead_tuples pulled out of shm_toc.

But it was easy enough to remove "reltuples".

-- 
Justin
>From a5d981a5ecd867bb46f51a8fb1b153d203df03ac Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v24 1/4] vacuum errcontext to show block being processed

Discussion:
https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 192 +++
 1 file changed, 167 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43efc32..084b43c178 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -268,8 +268,19 @@ typedef struct LVParallelState
 	int			nindexes_parallel_condcleanup;
 } LVParallelState;
 
+typedef enum {
+	VACUUM_ERRCB_PHASE_UNKNOWN,
+	VACUUM_ERRCB_PHASE_SCAN_HEAP,
+	VACUUM_ERRCB_PHASE_VACUUM_INDEX,
+	VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+	VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
+	VACUUM_ERRCB_PHASE_VACUUM_FSM,
+} errcb_phase;
+
 typedef struct LVRelStats
 {
+	char 		*relnamespace;
+	char		*relname;
 	/* useindex = true means two-pass strategy; false means one-pass */
 	bool		useindex;
 	/* Overall statistics about rel */
@@ -290,8 +301,12 @@ typedef struct LVRelStats
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
-} LVRelStats;
 
+	/* Used for error callback: */
+	char 		*indname;
+	BlockNumber blkno;	/* used only for heap operations */
+	errcb_phase	phase;
+} LVRelStats;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -314,10 +329,10 @@ static void lazy_vacuum_all_indexes(Relation onerel, Relation *Irel,
 	LVRelStats *vacrelstats, LVParallelState *lps,
 	int nindexes);
 static void lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
-			  LVDeadTuples *dead_tuples, double reltuples);
+			  LVDeadTuples *dead_tuples, double reltuples, LVRelStats *vacrelstats);
 static void lazy_cleanup_index(Relation indrel,
 			   IndexBulkDeleteResult **stats,
-			   double reltuples, bool estimated_count);
+			   double reltuples, bool estimated_count, LVRelStats *vacrelstats);
 static int	lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
 			 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
 static bool should_attempt_truncation(VacuumParams *params,
@@ -337,13 +352,13 @@ static void lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult *
 		 int nindexes);
 static void parallel_vacuum_index(Relation *Irel, IndexBulkDeleteResult **stats,
   LVShared *lvshared, LVDeadTuples *dead_tuples,
-  int nindexes);
+  int nindexes, LVRelStats *vacrelstats);
 static void vacuum_indexes_leader(Relation *Irel, IndexBulkDeleteResult **stats,
   LVRelStats *vacrelstats, LVParallelState *lps,
   int nindexes);
 static void vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats,
 			 LVShared *lvshared, LVSharedIndStats *shared_indstats,
-			 LVDeadTuples *dead_tuples);
+			 LVDeadTuples *dead_tuples, LVRelStats *vacrelstats);
 static void lazy_cleanup_all_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
 	 LVRelStats *vacrelstats, LVParallelState *lps,
 	 int nindexes);
@@ -361,6 +376,9 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
+static void update_vacuum_error_cbarg(LVRelStats *errcbarg, int phase,
+		BlockNumber blkno, Relation rel);
 
 
 /*
@@ -460,6 +478,9 @@ heap_vacuum_rel(Relation onerel, 

Re: error context for vacuum to include block number

2020-03-03 Thread Masahiko Sawada
On Wed, 4 Mar 2020 at 04:32, Justin Pryzby  wrote:
>
> On Thu, Feb 27, 2020 at 09:09:42PM -0300, Alvaro Herrera wrote:
> > > +   case PROGRESS_VACUUM_PHASE_VACUUM_HEAP:
> > > +   if (BlockNumberIsValid(cbarg->blkno))
> > > +   errcontext("while vacuuming block %u of 
> > > relation \"%s.%s\"",
> > > +   cbarg->blkno, 
> > > cbarg->relnamespace, cbarg->relname);
> > > +   break;
> >
> > I think you should still call errcontext() when blkno is invalid.
>
> In my experience while testing, the conditional avoids lots of CONTEXT noise
> from interrupted autovacuum, at least.  I couldn't easily reproduce it with 
> the
> current patch, though, maybe due to less pushing and popping.
>
> > Maybe it would make sense to make the LVRelStats struct members be char
> > arrays rather than pointers.  Then you memcpy() or strlcpy() them
> > instead of palloc/free.
>
> I had done that in the v15 patch, to allow passing it to parallel workers.
> But I don't think it's really needed.
>
> On Tue, Mar 03, 2020 at 10:05:42PM +0900, Masahiko Sawada wrote:
> > I was concerned about fsm vacuum; vacuum error context might show heap
> > scan while actually doing fsm vacuum. But perhaps we can update
> > callback args for that. That would be helpful for user to distinguish
> > that the problem seems to be either in heap vacuum or in fsm vacuum.
>
> Done in the attached.  But I think non-error reporting of additional progress
> phases is out of scope for this patch.

Thank you for updating the patch. But we have two more places where we
do fsm vacuum.

/*
 * Periodically do incremental FSM vacuuming to make newly-freed
 * space visible on upper FSM pages.  Note: although we've cleaned
 * the current block, we haven't yet updated its FSM entry (that
 * happens further down), so passing end == blkno is correct.
 */
if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
{
FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum,
blkno);
next_fsm_block_to_vacuum = blkno;
and

/*
 * Vacuum the remainder of the Free Space Map.  We must do this whether or
 * not there were indexes.
 */
if (blkno > next_fsm_block_to_vacuum)
FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum, blkno);


---
 static void vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats,
 LVShared *lvshared, LVSharedIndStats
*shared_indstats,
-LVDeadTuples *dead_tuples);
+LVDeadTuples *dead_tuples, LVRelStats
*vacrelstats);
 static void lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
- LVDeadTuples *dead_tuples, double reltuples);
+ LVDeadTuples *dead_tuples, double
reltuples, LVRelStats *vacrelstats);

These functions have LVDeadTuples and LVRelStats but LVDeadTuples can
be referred by LVRelStats. If we want to use LVRelStats as callback
argument, we can remove function arguments that can be referred by
LVRelStats.

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/



PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-03-03 Thread Alvaro Herrera
On 2020-Mar-03, Justin Pryzby wrote:

> On Thu, Feb 27, 2020 at 09:09:42PM -0300, Alvaro Herrera wrote:
> > > + case PROGRESS_VACUUM_PHASE_VACUUM_HEAP:
> > > + if (BlockNumberIsValid(cbarg->blkno))
> > > + errcontext("while vacuuming block %u of 
> > > relation \"%s.%s\"",
> > > + cbarg->blkno, 
> > > cbarg->relnamespace, cbarg->relname);
> > > + break;
> > 
> > I think you should still call errcontext() when blkno is invalid.
> 
> In my experience while testing, the conditional avoids lots of CONTEXT noise
> from interrupted autovacuum, at least.  I couldn't easily reproduce it with 
> the
> current patch, though, maybe due to less pushing and popping.

I think you're saying that the code had the bug that too many lines were
reported because of excessive stack pushes, and you worked around it by
making the errcontext() be conditional; and that now the bug is fixed by
avoiding the push/pop games -- which explains why you can no longer
reproduce it.  I don't see why you want to keep the no-longer-needed
workaround.


Your use of the progress-report enum now has two warts -- the "-1"
value, and this one,

> +#define PROGRESS_VACUUM_PHASE_VACUUM_FSM 7 /* For error 
> reporting only */

I'd rather you define a new enum, in lazyvacuum.c.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-03-03 Thread Justin Pryzby
On Thu, Feb 27, 2020 at 09:09:42PM -0300, Alvaro Herrera wrote:
> > +   case PROGRESS_VACUUM_PHASE_VACUUM_HEAP:
> > +   if (BlockNumberIsValid(cbarg->blkno))
> > +   errcontext("while vacuuming block %u of 
> > relation \"%s.%s\"",
> > +   cbarg->blkno, 
> > cbarg->relnamespace, cbarg->relname);
> > +   break;
> 
> I think you should still call errcontext() when blkno is invalid.

In my experience while testing, the conditional avoids lots of CONTEXT noise
from interrupted autovacuum, at least.  I couldn't easily reproduce it with the
current patch, though, maybe due to less pushing and popping.

> Maybe it would make sense to make the LVRelStats struct members be char
> arrays rather than pointers.  Then you memcpy() or strlcpy() them
> instead of palloc/free.

I had done that in the v15 patch, to allow passing it to parallel workers.
But I don't think it's really needed.

On Tue, Mar 03, 2020 at 10:05:42PM +0900, Masahiko Sawada wrote:
> I was concerned about fsm vacuum; vacuum error context might show heap
> scan while actually doing fsm vacuum. But perhaps we can update
> callback args for that. That would be helpful for user to distinguish
> that the problem seems to be either in heap vacuum or in fsm vacuum.

Done in the attached.  But I think non-error reporting of additional progress
phases is out of scope for this patch.

> On Fri, 21 Feb 2020 at 02:02, Alvaro Herrera  wrote:
> > parallel children will need to "init" on their own, right?
> Right. In that case, I think parallel vacuum worker needs to init the
> callback args at parallel_vacuum_main(). Other functions that parallel
> vacuum worker could call are also called by the leader process.

In the previous patch, I added this to vacuum_one_index.  But I noticed that
sometimes reported multiple CONTEXT lines (while vacuuming..while scanning),
which isn't intended.  I was hacked around that by setting ->previous=NULL, but
your way in parallel main() seems better.

-- 
Justin
>From ca15c197328eb3feb851ec1c3b6ca7e0f1973e93 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v23 1/3] vacuum errcontext to show block being processed

Discussion:
https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 178 +++
 src/include/commands/progress.h  |   1 +
 2 files changed, 153 insertions(+), 26 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43efc32..f15326a24c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -270,6 +270,8 @@ typedef struct LVParallelState
 
 typedef struct LVRelStats
 {
+	char 		*relnamespace;
+	char		*relname;
 	/* useindex = true means two-pass strategy; false means one-pass */
 	bool		useindex;
 	/* Overall statistics about rel */
@@ -290,8 +292,12 @@ typedef struct LVRelStats
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
-} LVRelStats;
 
+	/* Used for error callback: */
+	char 		*indname;
+	BlockNumber blkno;	/* used only for heap operations */
+	int			phase;	/* Reusing same constants as for progress reporting */
+} LVRelStats;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -314,10 +320,10 @@ static void lazy_vacuum_all_indexes(Relation onerel, Relation *Irel,
 	LVRelStats *vacrelstats, LVParallelState *lps,
 	int nindexes);
 static void lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
-			  LVDeadTuples *dead_tuples, double reltuples);
+			  LVDeadTuples *dead_tuples, double reltuples, LVRelStats *vacrelstats);
 static void lazy_cleanup_index(Relation indrel,
 			   IndexBulkDeleteResult **stats,
-			   double reltuples, bool estimated_count);
+			   double reltuples, bool estimated_count, LVRelStats *vacrelstats);
 static int	lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
 			 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
 static bool should_attempt_truncation(VacuumParams *params,
@@ -337,13 +343,13 @@ static void lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult *
 		 int nindexes);
 static void parallel_vacuum_index(Relation *Irel, IndexBulkDeleteResult **stats,
   LVShared *lvshared, LVDeadTuples *dead_tuples,
-  int nindexes);
+  int nindexes, LVRelStats *vacrelstats);
 static void vacuum_indexes_leader(Relation *Irel, IndexBulkDeleteResult **stats,
   LVRelStats *vacrelstats, LVParallelState *lps,
   int nindexes);
 static void vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats,
 			 LVShared *lvshared, LVSharedIndStats *shared_indstats,
-			 LVDeadTuples *dead_tuples);
+			 LVDeadTuples *dead_tuples, 

Re: error context for vacuum to include block number

2020-03-03 Thread Masahiko Sawada
On Fri, 21 Feb 2020 at 02:02, Alvaro Herrera  wrote:
>
> This is, by far, the most complex error context callback we've tried to
> write ... Easy stuff first:
>
> In the error context function itself, you don't need the _() around the
> strings: errcontext() is marked as a gettext trigger and it does the
> translation itself, so the manually added _() is just cruft.
>
> When reporting index names, make sure to attach the namespace to the
> table, not to the index.  Example:
>
> case PROGRESS_VACUUM_PHASE_INDEX_CLEANUP:
> -   errcontext(_("while cleaning up index \"%s.%s\" of relation 
> \"%s\""),
> -  cbarg->relnamespace, cbarg->indname, cbarg->relname);
> +   errcontext("while cleaning up index \"%s\" of relation \"%s.%s\"",
> +  cbarg->indname, cbarg->relnamespace, cbarg->relname);
>
> I think it would be worthwhile to have the "truncate wait" phase as a
> separate thing from the truncate itself, since it requires acquiring a
> possibly taken lock.  This suggests that using the progress enum is not
> a 100% solution ... or maybe it suggests that the progress enum too
> needs to report the truncate-wait phase separately.  (I like the latter
> myself, actually.)
>
> On 2020-Feb-19, Justin Pryzby wrote:
>
> > Also, I was thinking that lazy_scan_heap doesn't needs to do this:
> >
> > +   /* Pop the error context stack while calling vacuum */
> > +   error_context_stack = errcallback.previous;
> > ...
> > +   /* Set the error context while continuing heap scan */
> > +   error_context_stack = 
> >
> > It seems to me that's not actually necessary, since lazy_vacuum_heap will 
> > just
> > *push* a context handler onto the stack, and then pop it back off.
>
> So if you don't pop before pushing, you'll end up with two context
> lines, right?
>
> I find that arrangement a bit confusing.  I think it would make sense to
> initialize the context callback just *once* for a vacuum run, and from
> that point onwards, just update the errcbarg struct to match what
> you're currently doing -- not continually pop/push error callback stack
> entries.  See below ...

I was concerned about fsm vacuum; vacuum error context might show heap
scan while actually doing fsm vacuum. But perhaps we can update
callback args for that. That would be helpful for user to distinguish
that the problem seems to be either in heap vacuum or in fsm vacuum.

>
> (This means you need to pass the "cbarg" as new argument to some of the
> called functions, so that they can update it.)
>
> Another point is that this patch seems to be leaking memory each time
> you set relation/index/namespace name, since you never free those and
> they are changed over and over.
>
> In init_vacuum_error_callback() you don't need the "switch(phase)" bit;
> instead, test rel->rd_rel->relkind, and if it's RELKIND_INDEX then you
> put the relname as indexname, otherwise set it to NULL (after freeing
> the previous value, if there's one).  Note that with this, you only need
> to set the relation name (table name) in the first call!  IOW you should
> split init_vacuum_error_callback() in two functions: one "init" to call
> at start of vacuum, where you set relnamespace and relname; the other
> function is update_vacuum_error_callback() (or you find a better name
> for that) and it sets the phase, and optionally the block number and
> index name (these last two get reset to InvalidBlkNum/ NULL if not
> passed by caller).  I'm not really sure what this means for the parallel
> index vacuuming stuff; probably you'll need a special case for that: the
> parallel children will need to "init" on their own, right?

Right. In that case, I think parallel vacuum worker needs to init the
callback args at parallel_vacuum_main(). Other functions that parallel
vacuum worker could call are also called by the leader process.

Regards,


--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-02-27 Thread Alvaro Herrera
On 2020-Feb-27, Justin Pryzby wrote:

> Originally, the patch only supported "scanning heap", and set the callback
> strictly, to avoid having callback installed when calling other functions 
> (like
> vacuuming heap/indexes).
> 
> Then incrementally added callbacks in increasing number of places.  We only
> need one errcontext.  And possibly you're right that the callback could always
> be in place (?).  But what about things like vacuuming FSM ?  I think we'd 
> need
> another "phase" for that (or else invent a PHASE_IGNORE to do nothing).  Would
> VACUUM_FSM be added to progress reporting, too?  We're also talking about new
> phase for TRUNCATE_PREFETCH and TRUNCATE_WAIT.

I think we should use a separate enum. It's simple enough, and there's
no reason to use the same enum for two different things if it seems to
complicate matters.

> Regarding the cbarg, at one point I took a suggestion from Andres to use the
> LVRelStats struct.  I got rid of that since I didn't like sharing "blkno"
> between heap scanning and heap vacuuming, and needs to be reset when switching
> back to scanning heap.  I experimented now going back to that now.  The only
> utility is in having an single allocation of relname/space.

I'm unsure about reusing that struct.  Not saying don't do it, just ...
unsure.  It possibly has other responsibilities.

I don't think there's a reason to keep 0002 separate.

Regarding this,

> + case PROGRESS_VACUUM_PHASE_VACUUM_HEAP:
> + if (BlockNumberIsValid(cbarg->blkno))
> + errcontext("while vacuuming block %u of 
> relation \"%s.%s\"",
> + cbarg->blkno, 
> cbarg->relnamespace, cbarg->relname);
> + break;

I think you should still call errcontext() when blkno is invalid.  In
fact, just remove the "if" line altogether and let it show whatever
value is there.  It should work okay.  We don't expect the value to be
invalid anyway.

Maybe it would make sense to make the LVRelStats struct members be char
arrays rather than pointers.  Then you memcpy() or strlcpy() them
instead of palloc/free.

Please don't cuddle your braces.


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-02-27 Thread Justin Pryzby
On Thu, Feb 20, 2020 at 02:02:36PM -0300, Alvaro Herrera wrote:
> On 2020-Feb-19, Justin Pryzby wrote:
> 
> > Also, I was thinking that lazy_scan_heap doesn't needs to do this:
> > 
> > +   /* Pop the error context stack while calling vacuum */
> > +   error_context_stack = errcallback.previous;
> > ...
> > +   /* Set the error context while continuing heap scan */
> > +   error_context_stack = 
> > 
> > It seems to me that's not actually necessary, since lazy_vacuum_heap will 
> > just
> > *push* a context handler onto the stack, and then pop it back off.
> 
> So if you don't pop before pushing, you'll end up with two context
> lines, right?

Hm, looks like you're right, but that's not what I intended (and I didn't hit
that in my test).

> I think it would make sense to
> initialize the context callback just *once* for a vacuum run, and from
> that point onwards, just update the errcbarg struct to match what
> you're currently doing -- not continually pop/push error callback stack
> entries.  See below ...

Originally, the patch only supported "scanning heap", and set the callback
strictly, to avoid having callback installed when calling other functions (like
vacuuming heap/indexes).

Then incrementally added callbacks in increasing number of places.  We only
need one errcontext.  And possibly you're right that the callback could always
be in place (?).  But what about things like vacuuming FSM ?  I think we'd need
another "phase" for that (or else invent a PHASE_IGNORE to do nothing).  Would
VACUUM_FSM be added to progress reporting, too?  We're also talking about new
phase for TRUNCATE_PREFETCH and TRUNCATE_WAIT.

Regarding the cbarg, at one point I took a suggestion from Andres to use the
LVRelStats struct.  I got rid of that since I didn't like sharing "blkno"
between heap scanning and heap vacuuming, and needs to be reset when switching
back to scanning heap.  I experimented now going back to that now.  The only
utility is in having an single allocation of relname/space.

-- 
Justin
>From cad1177c7b61f1543fca1ac2120a238fe6f9e555 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v22 1/3] vacuum errcontext to show block being processed

Discussion:
https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 162 +++
 1 file changed, 144 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43ef..d34d0c5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -270,6 +270,8 @@ typedef struct LVParallelState
 
 typedef struct LVRelStats
 {
+	char 		*relnamespace;
+	char		*relname;
 	/* useindex = true means two-pass strategy; false means one-pass */
 	bool		useindex;
 	/* Overall statistics about rel */
@@ -290,8 +292,12 @@ typedef struct LVRelStats
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
-} LVRelStats;
 
+	/* Used for error callback: */
+	char 		*indname;
+	BlockNumber blkno;	/* used only for heap operations */
+	int			phase;	/* Reusing same constants as for progress reporting */
+} LVRelStats;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -314,10 +320,10 @@ static void lazy_vacuum_all_indexes(Relation onerel, Relation *Irel,
 	LVRelStats *vacrelstats, LVParallelState *lps,
 	int nindexes);
 static void lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
-			  LVDeadTuples *dead_tuples, double reltuples);
+			  LVDeadTuples *dead_tuples, double reltuples, LVRelStats *vacrelstats);
 static void lazy_cleanup_index(Relation indrel,
 			   IndexBulkDeleteResult **stats,
-			   double reltuples, bool estimated_count);
+			   double reltuples, bool estimated_count, LVRelStats *vacrelstats);
 static int	lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
 			 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
 static bool should_attempt_truncation(VacuumParams *params,
@@ -361,6 +367,9 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
+static void update_vacuum_error_cbarg(LVRelStats *errcbarg, int phase,
+		BlockNumber blkno, Relation rel);
 
 
 /*
@@ -460,6 +469,9 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
 
 	vacrelstats = (LVRelStats *) palloc0(sizeof(LVRelStats));
 
+	vacrelstats->relnamespace = get_namespace_name(RelationGetNamespace(onerel));
+	vacrelstats->relname = pstrdup(RelationGetRelationName(onerel));
+	vacrelstats->indname = NULL;
 	

Re: error context for vacuum to include block number

2020-02-20 Thread Tom Lane
Alvaro Herrera  writes:
> Another point is that this patch seems to be leaking memory each time
> you set relation/index/namespace name, since you never free those and
> they are changed over and over.

One other point is that this code seems to be trying to ensure that
the error context callback itself won't need to touch the catalog cache or
relcache, which is an important safety feature ... but it's failing at
that goal, because RelationGetRelationName() is going to hand back a
pointer to a string in the relcache.  You need another pstrdup for that.

regards, tom lane




Re: error context for vacuum to include block number

2020-02-20 Thread Alvaro Herrera
This is, by far, the most complex error context callback we've tried to
write ... Easy stuff first:

In the error context function itself, you don't need the _() around the
strings: errcontext() is marked as a gettext trigger and it does the
translation itself, so the manually added _() is just cruft.

When reporting index names, make sure to attach the namespace to the
table, not to the index.  Example:

case PROGRESS_VACUUM_PHASE_INDEX_CLEANUP:
-   errcontext(_("while cleaning up index \"%s.%s\" of relation 
\"%s\""),   
  
-  cbarg->relnamespace, cbarg->indname, cbarg->relname);

  
+   errcontext("while cleaning up index \"%s\" of relation \"%s.%s\"",  

  
+  cbarg->indname, cbarg->relnamespace, cbarg->relname);   

I think it would be worthwhile to have the "truncate wait" phase as a
separate thing from the truncate itself, since it requires acquiring a
possibly taken lock.  This suggests that using the progress enum is not
a 100% solution ... or maybe it suggests that the progress enum too
needs to report the truncate-wait phase separately.  (I like the latter
myself, actually.)

On 2020-Feb-19, Justin Pryzby wrote:

> Also, I was thinking that lazy_scan_heap doesn't needs to do this:
> 
> +   /* Pop the error context stack while calling vacuum */
> +   error_context_stack = errcallback.previous;
> ...
> +   /* Set the error context while continuing heap scan */
> +   error_context_stack = 
> 
> It seems to me that's not actually necessary, since lazy_vacuum_heap will just
> *push* a context handler onto the stack, and then pop it back off.

So if you don't pop before pushing, you'll end up with two context
lines, right?

I find that arrangement a bit confusing.  I think it would make sense to
initialize the context callback just *once* for a vacuum run, and from
that point onwards, just update the errcbarg struct to match what
you're currently doing -- not continually pop/push error callback stack
entries.  See below ...

(This means you need to pass the "cbarg" as new argument to some of the
called functions, so that they can update it.)

Another point is that this patch seems to be leaking memory each time
you set relation/index/namespace name, since you never free those and
they are changed over and over.

In init_vacuum_error_callback() you don't need the "switch(phase)" bit;
instead, test rel->rd_rel->relkind, and if it's RELKIND_INDEX then you
put the relname as indexname, otherwise set it to NULL (after freeing
the previous value, if there's one).  Note that with this, you only need
to set the relation name (table name) in the first call!  IOW you should
split init_vacuum_error_callback() in two functions: one "init" to call
at start of vacuum, where you set relnamespace and relname; the other
function is update_vacuum_error_callback() (or you find a better name
for that) and it sets the phase, and optionally the block number and
index name (these last two get reset to InvalidBlkNum/ NULL if not
passed by caller).  I'm not really sure what this means for the parallel
index vacuuming stuff; probably you'll need a special case for that: the
parallel children will need to "init" on their own, right?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-02-19 Thread Justin Pryzby
Rebased on top of 007491979461ff10d487e1da9bcc87f2fd834f26

Also, I was thinking that lazy_scan_heap doesn't needs to do this:

+   /* Pop the error context stack while calling vacuum */
+   error_context_stack = errcallback.previous;
...
+   /* Set the error context while continuing heap scan */
+   error_context_stack = 

It seems to me that's not actually necessary, since lazy_vacuum_heap will just
*push* a context handler onto the stack, and then pop it back off.  We don't
need to pop our context beforehand.  We also vacuum the FSM, and one might say
that we shouldn't report "...while scanning block number..." if it was
"vacuuming FSM" instead of "scanning heap", to which I would reply that either:
vacuuming FSM could be considered a part of scanning heap??  Or, maybe we
should add an additional callback for that, which is only not very nice since
we'd need to add a PROGRESS enum for which we don't actually report PROGRESS
(or stop using that enum).

I tested using variations on this that works as expected, that context is
correct during vacuum while scanning and after vacuum while scanning:

template1=# SET statement_timeout=0; SET maintenance_work_mem='1MB'; DROP TABLE 
tt; CREATE UNLOGGED TABLE tt(i int); INSERT INTO tt SELECT 
generate_series(1,39); CREATE INDEX ON tt(i); UPDATE tt SET i=i-1; SET 
statement_timeout=1222; VACUUM VERBOSE tt;

>From 91158171f75cd20e69b18843dd3b6525961e4e8b Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v21 1/2] vacuum errcontext to show block being processed

Discussion:
https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 130 ++-
 1 file changed, 129 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43ef..9e69294 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -292,6 +292,14 @@ typedef struct LVRelStats
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+typedef struct
+{
+	char 		*relnamespace;
+	char		*relname;
+	char 		*indname;
+	BlockNumber blkno;	/* used only for heap operations */
+	int			phase;	/* Reusing same enums as for progress reporting */
+} vacuum_error_callback_arg;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -361,6 +369,9 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
+static void init_vacuum_error_callback(ErrorContextCallback *errcallback,
+		vacuum_error_callback_arg *errcbarg, Relation onerel, int phase);
 
 
 /*
@@ -724,6 +735,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		PROGRESS_VACUUM_MAX_DEAD_TUPLES
 	};
 	int64		initprog_val[3];
+	ErrorContextCallback errcallback;
+	vacuum_error_callback_arg errcbarg;
 
 	pg_rusage_init();
 
@@ -870,6 +883,11 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
+	/* Setup error traceback support for ereport() */
+	init_vacuum_error_callback(, , onerel,
+	PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+	error_context_stack = 
+
 	for (blkno = 0; blkno < nblocks; blkno++)
 	{
 		Buffer		buf;
@@ -891,6 +909,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 #define FORCE_CHECK_PAGE() \
 		(blkno == nblocks - 1 && should_attempt_truncation(params, vacrelstats))
 
+		errcbarg.blkno = blkno;
+
 		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
 		if (blkno == next_unskippable_block)
@@ -987,6 +1007,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 vmbuffer = InvalidBuffer;
 			}
 
+			/* Pop the error context stack while calling vacuum */
+			error_context_stack = errcallback.previous;
+
 			/* Work on all the indexes, then the heap */
 			lazy_vacuum_all_indexes(onerel, Irel, indstats,
 	vacrelstats, lps, nindexes);
@@ -1011,6 +1034,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			/* Report that we are once again scanning the heap */
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 		 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+			/* Set the error context while continuing heap scan */
+			error_context_stack = 
 		}
 
 		/*
@@ -1597,6 +1623,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
 	}
 
+	/* Pop the error context stack */
+	error_context_stack = errcallback.previous;
+
 	/* report that everything is scanned and vacuumed */
 	

Re: error context for vacuum to include block number

2020-02-18 Thread Masahiko Sawada
On Mon, 17 Feb 2020 at 15:14, Justin Pryzby  wrote:
>
> On Mon, Feb 17, 2020 at 02:18:11PM +0900, Masahiko Sawada wrote:
> > Oops it seems to me that it's better to set error context after
> > tupindex = 0. Sorry for my bad.
>
> I take your point but did it differently - see what you think
>
> > And the above comment can be written in a single line as other same 
> > comments.
>
> Thanks :)
>
> > Hmm I don't think it's a good idea to have count_nondeletable_pages
> > set the error context of PHASE_TRUNCATE.
>
> I think if we don't do it there then we shouldn't bother handling
> PHASE_TRUNCATE at all, since that's what's likely to hit filesystem or other
> lowlevel errors, before lazy_truncate_heap() hits them.
>
> > Because the patch sets the
> > error context during RelationTruncate that actually truncates the heap
> > but count_nondeletable_pages doesn't truncate it.
>
> I would say that ReadBuffer called by the prefetch in
> count_nondeletable_pages() is called during the course of truncation, the same
> as ReadBuffer called during the course of vacuuming can be attributed to
> vacuuming.

Why do we want to include only count_nondeletable_pages in spite of
that there are also several places where we may wait: waiting for
lock, get the number of blocks etc. User may cancel vacuum during them
but user will not be able to know that vacuum is in truncation phase.
If we want to set the error callback during operation that actually
doesn't truncate heap like count_nondeletable_pages we should set it
for whole lazy_truncate_heap. Otherwise I think we should set it for
only RelationTruncate.

>
> > I think setting the error context only during RelationTruncate would be a
> > good start. We can hear other opinions from other hackers. Some hackers may
> > want to set the error context for whole lazy_truncate_heap.
>
> I avoided doing that since it has several "return" statements, each of which
> would need to "Pop the error context stack", which is at risk of being
> forgotten and left unpopped by anyone who adds or changes flow control.

I imagined that we can add some goto and pop the error callback there.
But since it might make the code bad I suggested to set the error
callback for only RelationTruncate as the first step

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-02-16 Thread Justin Pryzby
On Mon, Feb 17, 2020 at 02:18:11PM +0900, Masahiko Sawada wrote:
> Oops it seems to me that it's better to set error context after
> tupindex = 0. Sorry for my bad.

I take your point but did it differently - see what you think

> And the above comment can be written in a single line as other same comments.

Thanks :)

> Hmm I don't think it's a good idea to have count_nondeletable_pages
> set the error context of PHASE_TRUNCATE.

I think if we don't do it there then we shouldn't bother handling
PHASE_TRUNCATE at all, since that's what's likely to hit filesystem or other
lowlevel errors, before lazy_truncate_heap() hits them.

> Because the patch sets the
> error context during RelationTruncate that actually truncates the heap
> but count_nondeletable_pages doesn't truncate it.

I would say that ReadBuffer called by the prefetch in
count_nondeletable_pages() is called during the course of truncation, the same
as ReadBuffer called during the course of vacuuming can be attributed to
vacuuming.

> I think setting the error context only during RelationTruncate would be a
> good start. We can hear other opinions from other hackers. Some hackers may
> want to set the error context for whole lazy_truncate_heap.

I avoided doing that since it has several "return" statements, each of which
would need to "Pop the error context stack", which is at risk of being
forgotten and left unpopped by anyone who adds or changes flow control.

Also, I just added this to the TRUNCATE case, even though that should never
happen: if (BlockNumberIsValid(cbarg->blkno))...

-- 
Justin
>From 977b1b5e00ce522bd775cf91f7a9c7a9345d3171 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v20 1/2] vacuum errcontext to show block being processed

Discussion:
https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 130 ++-
 1 file changed, 129 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a23cdef..ce3efd7 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -292,6 +292,14 @@ typedef struct LVRelStats
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+typedef struct
+{
+	char 		*relnamespace;
+	char		*relname;
+	char 		*indname;
+	BlockNumber blkno;	/* used only for heap operations */
+	int			phase;	/* Reusing same enums as for progress reporting */
+} vacuum_error_callback_arg;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -361,6 +369,9 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
+static void init_vacuum_error_callback(ErrorContextCallback *errcallback,
+		vacuum_error_callback_arg *errcbarg, Relation onerel, int phase);
 
 
 /*
@@ -724,6 +735,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		PROGRESS_VACUUM_MAX_DEAD_TUPLES
 	};
 	int64		initprog_val[3];
+	ErrorContextCallback errcallback;
+	vacuum_error_callback_arg errcbarg;
 
 	pg_rusage_init();
 
@@ -870,6 +883,11 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
+	/* Setup error traceback support for ereport() */
+	init_vacuum_error_callback(, , onerel,
+	PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+	error_context_stack = 
+
 	for (blkno = 0; blkno < nblocks; blkno++)
 	{
 		Buffer		buf;
@@ -891,6 +909,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 #define FORCE_CHECK_PAGE() \
 		(blkno == nblocks - 1 && should_attempt_truncation(params, vacrelstats))
 
+		errcbarg.blkno = blkno;
+
 		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
 		if (blkno == next_unskippable_block)
@@ -987,6 +1007,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 vmbuffer = InvalidBuffer;
 			}
 
+			/* Pop the error context stack while calling vacuum */
+			error_context_stack = errcallback.previous;
+
 			/* Work on all the indexes, then the heap */
 			lazy_vacuum_all_indexes(onerel, Irel, indstats,
 	vacrelstats, lps, nindexes);
@@ -1011,6 +1034,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			/* Report that we are once again scanning the heap */
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 		 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+			/* Set the error context while continuing heap scan */
+			error_context_stack = 
 		}
 
 		/*
@@ -1597,6 +1623,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			RecordPageWithFreeSpace(onerel, 

Re: error context for vacuum to include block number

2020-02-16 Thread Masahiko Sawada
On Mon, 17 Feb 2020 at 12:57, Justin Pryzby  wrote:
>
> On Mon, Feb 17, 2020 at 10:47:47AM +0900, Masahiko Sawada wrote:
> > Thank you for updating the patch!
> >

Thank you for updating the patch.

> > 1.
> > The above lines need a new line.
>
> Done, thanks.
>
> > 2.
> > In lazy_vacuum_heap, we set the error context and then call
> > pg_rusage_init whereas lazy_vacuum_index and lazy_cleanup_index does
> > the opposite. And lazy_scan_heap also call pg_rusage first. I think
> > lazy_vacuum_heap should follow them for consistency. That is, we can
> > set error context after pages = 0.
>
> Right. Maybe I did it the other way because the two uses of
> PROGRESS_VACUUM_PHASE_VACUUM_HEAP were right next to each other.
>
> > 3.
> > We have 2 other phases: PROGRESS_VACUUM_PHASE_TRUNCATE and
> > PROGRESS_VACUUM_PHASE_FINAL_CLEANUP. I think it's better to set the
> > error context in lazy_truncate_heap as well. What do you think?
> >
> > I'm not sure it's worth to set the error context for FINAL_CLENAUP but
> > we should add the case of FINAL_CLENAUP to vacuum_error_callback as
> > no-op or explain it as a comment even if we don't.
>
> I don't have strong feelings either way.
>
> I looked a bit at the truncation phase.  It also truncates the FSM and VM
> forks, which could be misleading if the error was in one of those files and 
> not
> the main filenode.
>
> I'd have to find a way to test it...
> ...and was pleasantly surprised to see that earlier phases don't choke if I
> (re)create a fake 4GB table like:
>
> postgres=# CREATE TABLE trunc(i int);
> CREATE TABLE
> postgres=# \x\t
> Expanded display is on.
> Tuples only is on.
> postgres=# SELECT relfilenode FROM pg_class WHERE oid='trunc'::regclass;
> relfilenode | 59068
>
> postgres=# \! bash -xc 'truncate -s 1G 
> ./pgsql13.dat111/base/12689/59068{,.{1..3}}'
> + truncate -s 1G ./pgsql13.dat111/base/12689/59074 
> ./pgsql13.dat111/base/12689/59074.1 ./pgsql13.dat111/base/12689/59074.2 
> ./pgsql13.dat111/base/12689/59074.3
>
> postgres=# \timing
> Timing is on.
> postgres=# SET client_min_messages=debug; SET statement_timeout='13s'; VACUUM 
> VERBOSE trunc;
> INFO:  vacuuming "public.trunc"
> INFO:  "trunc": found 0 removable, 0 nonremovable row versions in 524288 out 
> of 524288 pages
> DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2098
> There were 0 unused item identifiers.
> Skipped 0 pages due to buffer pins, 0 frozen pages.
> 524288 pages are entirely empty.
> CPU: user: 5.00 s, system: 1.50 s, elapsed: 6.52 s.
> ERROR:  canceling statement due to statement timeout
> CONTEXT:  while truncating relation "public.trunc" to 0 blocks
>

Yeah lazy_scan_heap deals with all dummy files as new empty pages.

> The callback surrounding RelationTruncate() seems hard to hit unless you add
> CHECK_FOR_INTERRUPTS(); the same was true for index cleanup.
>
> The truncation uses a prefetch, which is more likely to hit any lowlevel 
> error,
> so I added callback there, too.
>
> BTW, for the index cases, I didn't like repeating the namespace here, but 
> WDYT ?
> |CONTEXT:  while vacuuming index "public.t_i_idx3" of relation "t"

The current message looks good to me because we cannot have a table
and its index in the different schema.

1.
pg_rusage_init();
npages = 0;

/*
 * Setup error traceback support for ereport()
 */
init_vacuum_error_callback(, , onerel,
PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
error_context_stack = 

tupindex = 0;

Oops it seems to me that it's better to set error context after
tupindex = 0. Sorry for my bad.

And the above comment can be written in a single line as other same comments.

2.
@@ -2568,6 +2643,12 @@ count_nondeletable_pages(Relation onerel,
LVRelStats *vacrelstats)
BlockNumber blkno;
BlockNumber prefetchedUntil;
instr_time  starttime;
+   ErrorContextCallback errcallback;
+   vacuum_error_callback_arg errcbarg;
+
+   /* Setup error traceback support for ereport() */
+   init_vacuum_error_callback(, , onerel,
+   PROGRESS_VACUUM_PHASE_TRUNCATE);

Hmm I don't think it's a good idea to have count_nondeletable_pages
set the error context of PHASE_TRUNCATE. Because the patch sets the
error context during RelationTruncate that actually truncates the heap
but count_nondeletable_pages doesn't truncate it. I think setting the
error context only during RelationTruncate would be a good start. We
can hear other opinions from other hackers. Some hackers may want to
set the error context for whole lazy_truncate_heap.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-02-16 Thread Justin Pryzby
On Mon, Feb 17, 2020 at 10:47:47AM +0900, Masahiko Sawada wrote:
> Thank you for updating the patch!
> 
> 1.
> The above lines need a new line.

Done, thanks.

> 2.
> In lazy_vacuum_heap, we set the error context and then call
> pg_rusage_init whereas lazy_vacuum_index and lazy_cleanup_index does
> the opposite. And lazy_scan_heap also call pg_rusage first. I think
> lazy_vacuum_heap should follow them for consistency. That is, we can
> set error context after pages = 0.

Right. Maybe I did it the other way because the two uses of
PROGRESS_VACUUM_PHASE_VACUUM_HEAP were right next to each other.

> 3.
> We have 2 other phases: PROGRESS_VACUUM_PHASE_TRUNCATE and
> PROGRESS_VACUUM_PHASE_FINAL_CLEANUP. I think it's better to set the
> error context in lazy_truncate_heap as well. What do you think?
> 
> I'm not sure it's worth to set the error context for FINAL_CLENAUP but
> we should add the case of FINAL_CLENAUP to vacuum_error_callback as
> no-op or explain it as a comment even if we don't.

I don't have strong feelings either way.

I looked a bit at the truncation phase.  It also truncates the FSM and VM
forks, which could be misleading if the error was in one of those files and not
the main filenode.

I'd have to find a way to test it... 
...and was pleasantly surprised to see that earlier phases don't choke if I
(re)create a fake 4GB table like:

postgres=# CREATE TABLE trunc(i int);
CREATE TABLE
postgres=# \x\t
Expanded display is on.
Tuples only is on.
postgres=# SELECT relfilenode FROM pg_class WHERE oid='trunc'::regclass;
relfilenode | 59068

postgres=# \! bash -xc 'truncate -s 1G 
./pgsql13.dat111/base/12689/59068{,.{1..3}}'
+ truncate -s 1G ./pgsql13.dat111/base/12689/59074 
./pgsql13.dat111/base/12689/59074.1 ./pgsql13.dat111/base/12689/59074.2 
./pgsql13.dat111/base/12689/59074.3

postgres=# \timing 
Timing is on.
postgres=# SET client_min_messages=debug; SET statement_timeout='13s'; VACUUM 
VERBOSE trunc;
INFO:  vacuuming "public.trunc"
INFO:  "trunc": found 0 removable, 0 nonremovable row versions in 524288 out of 
524288 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2098
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
524288 pages are entirely empty.
CPU: user: 5.00 s, system: 1.50 s, elapsed: 6.52 s.
ERROR:  canceling statement due to statement timeout
CONTEXT:  while truncating relation "public.trunc" to 0 blocks

The callback surrounding RelationTruncate() seems hard to hit unless you add
CHECK_FOR_INTERRUPTS(); the same was true for index cleanup.

The truncation uses a prefetch, which is more likely to hit any lowlevel error,
so I added callback there, too.

BTW, for the index cases, I didn't like repeating the namespace here, but WDYT ?
|CONTEXT:  while vacuuming index "public.t_i_idx3" of relation "t"

Thanks for rerere-reviewing.

-- 
Justin
>From 8295224470e0ce236025dfbff50de052978fae1d Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v19 1/2] vacuum errcontext to show block being processed

Discussion:
https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 130 +++
 1 file changed, 130 insertions(+)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a23cdef..5e734ee 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -292,6 +292,14 @@ typedef struct LVRelStats
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+typedef struct
+{
+	char 		*relnamespace;
+	char		*relname;
+	char 		*indname;
+	BlockNumber blkno;	/* used only for heap operations */
+	int			phase;	/* Reusing same enums as for progress reporting */
+} vacuum_error_callback_arg;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -361,6 +369,9 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
+static void init_vacuum_error_callback(ErrorContextCallback *errcallback,
+		vacuum_error_callback_arg *errcbarg, Relation onerel, int phase);
 
 
 /*
@@ -724,6 +735,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		PROGRESS_VACUUM_MAX_DEAD_TUPLES
 	};
 	int64		initprog_val[3];
+	ErrorContextCallback errcallback;
+	vacuum_error_callback_arg errcbarg;
 
 	pg_rusage_init();
 
@@ -870,6 +883,11 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
+	/* Setup error traceback support for ereport() */
+	init_vacuum_error_callback(, , onerel,
+	PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+	error_context_stack = 
+
 	for (blkno = 0; blkno < 

Re: error context for vacuum to include block number

2020-02-16 Thread Masahiko Sawada
On Sat, 15 Feb 2020 at 00:34, Justin Pryzby  wrote:
>
> On Fri, Feb 14, 2020 at 12:30:25PM +0900, Masahiko Sawada wrote:
> > * I think the function name is too generic. init_vacuum_error_callback
> > or init_vacuum_errcallback is better.
>
> > * The comment of this function is not accurate since this function is
> > not only for heap vacuum but also index vacuum. How about just
> > "Initialize vacuum error callback"?
>
> > * I think it's easier to read the code if we set the relname and
> > indname in the same order.
>
> > * The comment I wrote in the previous mail seems better, because in
> > this function the reader might get confused that 'rel' is a relation
> > or an index depending on the phase but that comment helps it.
>
> Fixed these
>
> > * rel->rd_index->indexrelid should be rel->rd_index->indrelid.
>
> Ack.  I think that's been wrong since I first wrote it two weeks ago :(
> The error is probably more obvious due to the switch statement you proposed.
>
> Thanks for continued reviews.

Thank you for updating the patch!

1.
+   /* Setup error traceback support for ereport() */
+   init_vacuum_error_callback(, , onerel,
PROGRESS_VACUUM_PHASE_SCAN_HEAP);

+   /*
+* Setup error traceback support for ereport()
+*/
+   init_vacuum_error_callback(, , onerel,
PROGRESS_VACUUM_PHASE_VACUUM_HEAP);

+   /* Setup error traceback support for ereport() */
+   init_vacuum_error_callback(, , indrel,
PROGRESS_VACUUM_PHASE_VACUUM_INDEX);

+   /* Setup error traceback support for ereport() */
+   init_vacuum_error_callback(, , indrel,
PROGRESS_VACUUM_PHASE_INDEX_CLEANUP);

+/* Initialize vacuum error callback */
+static void
+init_vacuum_error_callback(ErrorContextCallback *errcallback,
vacuum_error_callback_arg *errcbarg, Relation rel, int phase)

The above lines need a new line.

2.
static void
lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
{
int tupindex;
int npages;
PGRUsageru0;
Buffer  vmbuffer = InvalidBuffer;
ErrorContextCallback errcallback;
vacuum_error_callback_arg errcbarg;

/* Report that we are now vacuuming the heap */
pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 PROGRESS_VACUUM_PHASE_VACUUM_HEAP);

/*
 * Setup error traceback support for ereport()
 */
init_vacuum_error_callback(, , onerel,
PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
error_context_stack = 

pg_rusage_init();
npages = 0;
:

In lazy_vacuum_heap, we set the error context and then call
pg_rusage_init whereas lazy_vacuum_index and lazy_cleanup_index does
the opposite. And lazy_scan_heap also call pg_rusage first. I think
lazy_vacuum_heap should follow them for consistency. That is, we can
set error context after pages = 0.

3.
We have 2 other phases: PROGRESS_VACUUM_PHASE_TRUNCATE and
PROGRESS_VACUUM_PHASE_FINAL_CLEANUP. I think it's better to set the
error context in lazy_truncate_heap as well. What do you think?

I'm not sure it's worth to set the error context for FINAL_CLENAUP but
we should add the case of FINAL_CLENAUP to vacuum_error_callback as
no-op or explain it as a comment even if we don't.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-02-14 Thread Justin Pryzby
On Fri, Feb 14, 2020 at 12:30:25PM +0900, Masahiko Sawada wrote:
> * I think the function name is too generic. init_vacuum_error_callback
> or init_vacuum_errcallback is better.

> * The comment of this function is not accurate since this function is
> not only for heap vacuum but also index vacuum. How about just
> "Initialize vacuum error callback"?

> * I think it's easier to read the code if we set the relname and
> indname in the same order.

> * The comment I wrote in the previous mail seems better, because in
> this function the reader might get confused that 'rel' is a relation
> or an index depending on the phase but that comment helps it.

Fixed these

> * rel->rd_index->indexrelid should be rel->rd_index->indrelid.

Ack.  I think that's been wrong since I first wrote it two weeks ago :(
The error is probably more obvious due to the switch statement you proposed.

Thanks for continued reviews.

-- 
Justin
>From 94768a134118d30853b75a96b90166363f0fef5b Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v19] vacuum errcontext to show block being processed

Discussion:
https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 120 +++
 1 file changed, 120 insertions(+)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a23cdef..ebfb2e7 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -292,6 +292,14 @@ typedef struct LVRelStats
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+typedef struct
+{
+	char 		*relnamespace;
+	char		*relname;
+	char 		*indname;
+	BlockNumber blkno;	/* used only for heap operations */
+	int			phase;	/* Reusing same enums as for progress reporting */
+} vacuum_error_callback_arg;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -361,6 +369,9 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
+static void init_vacuum_error_callback(ErrorContextCallback *errcallback,
+		vacuum_error_callback_arg *errcbarg, Relation onerel, int phase);
 
 
 /*
@@ -724,6 +735,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		PROGRESS_VACUUM_MAX_DEAD_TUPLES
 	};
 	int64		initprog_val[3];
+	ErrorContextCallback errcallback;
+	vacuum_error_callback_arg errcbarg;
 
 	pg_rusage_init();
 
@@ -870,6 +883,10 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
+	/* Setup error traceback support for ereport() */
+	init_vacuum_error_callback(, , onerel, PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+	error_context_stack = 
+
 	for (blkno = 0; blkno < nblocks; blkno++)
 	{
 		Buffer		buf;
@@ -891,6 +908,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 #define FORCE_CHECK_PAGE() \
 		(blkno == nblocks - 1 && should_attempt_truncation(params, vacrelstats))
 
+		errcbarg.blkno = blkno;
+
 		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
 		if (blkno == next_unskippable_block)
@@ -987,6 +1006,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 vmbuffer = InvalidBuffer;
 			}
 
+			/* Pop the error context stack while calling vacuum */
+			error_context_stack = errcallback.previous;
+
 			/* Work on all the indexes, then the heap */
 			lazy_vacuum_all_indexes(onerel, Irel, indstats,
 	vacrelstats, lps, nindexes);
@@ -1011,6 +1033,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			/* Report that we are once again scanning the heap */
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 		 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+			/* Set the error context while continuing heap scan */
+			error_context_stack = 
 		}
 
 		/*
@@ -1597,6 +1622,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
 	}
 
+	/* Pop the error context stack */
+	error_context_stack = errcallback.previous;
+
 	/* report that everything is scanned and vacuumed */
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
@@ -1772,11 +1800,19 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 	int			npages;
 	PGRUsage	ru0;
 	Buffer		vmbuffer = InvalidBuffer;
+	ErrorContextCallback errcallback;
+	vacuum_error_callback_arg errcbarg;
 
 	/* Report that we are now vacuuming the heap */
 	pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
  PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
 
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	

Re: error context for vacuum to include block number

2020-02-13 Thread Masahiko Sawada
On Fri, 14 Feb 2020 at 08:52, Justin Pryzby  wrote:
>

Thank you for updating the patch.

> On Thu, Feb 13, 2020 at 02:55:53PM +0900, Masahiko Sawada wrote:
> > You need to add a newline to follow the limit line lengths so that the
> > code is readable in an 80-column window. Or please run pgindent.
>
> For now I :set tw=80
>
> > 2.
> > I think that making initialization process of errcontext argument a
> > function is good. But maybe we can merge these two functions into one.
>
> Thanks, this is better, and I used that.
>
> > init_error_context_heap and init_error_context_index actually don't
> > only initialize the callback arguments but also push the vacuum
> > errcallback, in spite of the function name having 'init'. Also I think
> > it might be better to only initialize the callback arguments in this
> > function and to set errcallback by caller, rather than to wrap pushing
> > errcallback by a function.
>
> However I think it's important not to repeat this 4 times:
> errcallback->callback = vacuum_error_callback;
> errcallback->arg = errcbarg;
> errcallback->previous = error_context_stack;
> error_context_stack = errcallback;
>
> So I kept the first 3 of those in the function and copied only assignment to
> the global.  That helps makes the heap scan function clear, which assigns to 
> it
> twice.

Okay. Here is the review comments for v18 patch:

1.
+/* Initialize error context for heap operations */
+static void
+init_error_context(ErrorContextCallback *errcallback,
vacuum_error_callback_arg *errcbarg, Relation rel, int phase)

* I think the function name is too generic. init_vacuum_error_callback
or init_vacuum_errcallback is better.

* The comment of this function is not accurate since this function is
not only for heap vacuum but also index vacuum. How about just
"Initialize vacuum error callback"?

2.
+{
+   switch (phase)
+   {
+   case PROGRESS_VACUUM_PHASE_SCAN_HEAP:
+   case PROGRESS_VACUUM_PHASE_VACUUM_HEAP:
+   errcbarg->relname = RelationGetRelationName(rel);
+   errcbarg->indname = NULL; /* Not used for heap */
+   break;
+
+   case PROGRESS_VACUUM_PHASE_VACUUM_INDEX:
+   case PROGRESS_VACUUM_PHASE_INDEX_CLEANUP:
+   /* indname is the index being processed,
relname is its relation */
+   errcbarg->indname = RelationGetRelationName(rel);
+   errcbarg->relname =
get_rel_name(rel->rd_index->indexrelid);

* I think it's easier to read the code if we set the relname and
indname in the same order.

* The comment I wrote in the previous mail seems better, because in
this function the reader might get confused that 'rel' is a relation
or an index depending on the phase but that comment helps it.

* rel->rd_index->indexrelid should be rel->rd_index->indrelid.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-02-13 Thread Thomas Munro
On Tue, Jan 21, 2020 at 8:11 AM Andres Freund  wrote:
> FWIW, I think we should just flat out delete all this logic, and replace
> it with a few explicit PrefetchBuffer() calls. Just by chance I
> literally just now sped up a VACUUM by more than a factor of 10, by
> manually prefetching buffers. At least the linux kernel readahead logic
> doesn't deal well with reading and writing to different locations in the
> same file, and that's what the ringbuffer pretty invariably leads to for
> workloads that aren't cached.

Interesting.  Andrew Gierth made a similar observation on FreeBSD, and
showed that by patching his kernel to track sequential writes and
sequential reads separately he could improve performance, and I
reproduced the same speedup in a patch of my own based on his
description (that, erm, I've lost).  It's not only VACUUM, it's
anything that is writing to a lot of sequential blocks, since the
writeback trails along behind by some distance (maybe a ring buffer,
maybe all of shared buffers, whatever).  The OS sees you flipping back
and forth between single block reads and writes and thinks it's
random.  I didn't investigate this much but it seemed that ZFS was
somehow smart enough to understand what was happening at some level
but other filesystems were not.




Re: error context for vacuum to include block number

2020-02-13 Thread Justin Pryzby
On Thu, Feb 13, 2020 at 02:55:53PM +0900, Masahiko Sawada wrote:
> You need to add a newline to follow the limit line lengths so that the
> code is readable in an 80-column window. Or please run pgindent.

For now I :set tw=80

> 2.
> I think that making initialization process of errcontext argument a
> function is good. But maybe we can merge these two functions into one.

Thanks, this is better, and I used that.

> init_error_context_heap and init_error_context_index actually don't
> only initialize the callback arguments but also push the vacuum
> errcallback, in spite of the function name having 'init'. Also I think
> it might be better to only initialize the callback arguments in this
> function and to set errcallback by caller, rather than to wrap pushing
> errcallback by a function.

However I think it's important not to repeat this 4 times:
errcallback->callback = vacuum_error_callback;
errcallback->arg = errcbarg;
errcallback->previous = error_context_stack;
error_context_stack = errcallback;

So I kept the first 3 of those in the function and copied only assignment to
the global.  That helps makes the heap scan function clear, which assigns to it
twice.

BTW, for testing, I'm able to consistently hit the "vacuuming block" case like
this:

SET statement_timeout=0; DROP TABLE t; CREATE TABLE t(i int); CREATE INDEX ON 
t(i); INSERT INTO t SELECT generate_series(1,9); UPDATE t SET i=i-1; SET 
statement_timeout=111;  SET vacuum_cost_delay=3; SET vacuum_cost_page_dirty=0; 
SET vacuum_cost_page_hit=11; SET vacuum_cost_limit=33; SET 
statement_timeout=; VACUUM VERBOSE t;

Thanks for re-reviewing.

-- 
Justin
>From 5b8cad37244cdc310d78719b64ff44a464910598 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v18] vacuum errcontext to show block being processed

Discussion:
https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 120 +++
 1 file changed, 120 insertions(+)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a23cdef..209f483 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -292,6 +292,14 @@ typedef struct LVRelStats
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+typedef struct
+{
+	char 		*relnamespace;
+	char		*relname;
+	char 		*indname; /* undefined while not processing index */
+	BlockNumber blkno;	/* undefined while not processing heap */
+	int			phase;	/* Reusing same enums as for progress reporting */
+} vacuum_error_callback_arg;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -361,6 +369,9 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
+static void init_error_context(ErrorContextCallback *errcallback,
+		vacuum_error_callback_arg *errcbarg, Relation onerel, int phase);
 
 
 /*
@@ -724,6 +735,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		PROGRESS_VACUUM_MAX_DEAD_TUPLES
 	};
 	int64		initprog_val[3];
+	ErrorContextCallback errcallback;
+	vacuum_error_callback_arg errcbarg;
 
 	pg_rusage_init();
 
@@ -870,6 +883,10 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
+	/* Setup error traceback support for ereport() */
+	init_error_context(, , onerel, PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+	error_context_stack = 
+
 	for (blkno = 0; blkno < nblocks; blkno++)
 	{
 		Buffer		buf;
@@ -891,6 +908,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 #define FORCE_CHECK_PAGE() \
 		(blkno == nblocks - 1 && should_attempt_truncation(params, vacrelstats))
 
+		errcbarg.blkno = blkno;
+
 		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
 		if (blkno == next_unskippable_block)
@@ -987,6 +1006,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 vmbuffer = InvalidBuffer;
 			}
 
+			/* Pop the error context stack while calling vacuum */
+			error_context_stack = errcallback.previous;
+
 			/* Work on all the indexes, then the heap */
 			lazy_vacuum_all_indexes(onerel, Irel, indstats,
 	vacrelstats, lps, nindexes);
@@ -1011,6 +1033,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			/* Report that we are once again scanning the heap */
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 		 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+			/* Set the error context while continuing heap scan */
+			error_context_stack = 
 		}
 
 		/*
@@ -1597,6 +1622,9 @@ 

Re: error context for vacuum to include block number

2020-02-12 Thread Masahiko Sawada
On Sat, 8 Feb 2020 at 10:01, Justin Pryzby  wrote:
>
> On Tue, Feb 04, 2020 at 01:58:20PM +0900, Masahiko Sawada wrote:
> > Here is the comment for v16 patch:
> >
> > 2.
> > I think we can set the error context for heap scan again after
> > freespace map vacuum finishing, maybe after reporting the new phase.
> > Otherwise the user will get confused if an error occurs during
> > freespace map vacuum. And I think the comment is unclear, how about
> > "Set the error context fro heap scan again"?
>
> Good point
>
> > 3.
> > +   if (cbarg->blkno!=InvalidBlockNumber)
> > +   errcontext(_("while scanning block %u of relation 
> > \"%s.%s\""),
> > +   cbarg->blkno, cbarg->relnamespace, cbarg->relname);
> >
> > We can use BlockNumberIsValid macro instead.
>
> Thanks.  See attached, now squished together patches.
>
> I added functions to initialize the callbacks, so error handling is out of the
> way and minimally distract from the rest of vacuum.

Thank you for updating the patch! Here is the review comments:

1.
+static void vacuum_error_callback(void *arg);
+static void init_error_context_heap(ErrorContextCallback
*errcallback, vacuum_error_callback_arg *errcbarg, Relation onerel,
int phase);
+static void init_error_context_index(ErrorContextCallback
*errcallback, vacuum_error_callback_arg *errcbarg, Relation indrel,
int phase);

You need to add a newline to follow the limit line lengths so that the
code is readable in an 80-column window. Or please run pgindent.

2.
+/* Initialize error context for heap operations */
+static void
+init_error_context_heap(ErrorContextCallback *errcallback,
vacuum_error_callback_arg *errcbarg, Relation onerel, int phase)
+{
+   errcbarg->relnamespace = get_namespace_name(RelationGetNamespace(onerel));
+   errcbarg->relname = RelationGetRelationName(onerel);
+   errcbarg->indname = NULL; /* Not used for heap */
+   errcbarg->blkno = InvalidBlockNumber; /* Not known yet */
+   errcbarg->phase = phase;
+
+   errcallback->callback = vacuum_error_callback;
+   errcallback->arg = errcbarg;
+   errcallback->previous = error_context_stack;
+   error_context_stack = errcallback;
+}

I think that making initialization process of errcontext argument a
function is good. But maybe we can merge these two functions into one.
init_error_context_heap and init_error_context_index actually don't
only initialize the callback arguments but also push the vacuum
errcallback, in spite of the function name having 'init'. Also I think
it might be better to only initialize the callback arguments in this
function and to set errcallback by caller, rather than to wrap pushing
errcallback by a function. How about the following function
initializing the vacuum callback arguments?

static void
init_vacuum_error_callback_arg(vacuum_error_callback_arg *errcbarg,
Relation rel, int phase)
{
   errcbarg->relnamespace = get_namespace_name(RelationGetNamespace(onerel));
   errcbarg->blkno = InvalidBlockNumber;
   errcbarg->phase = phase;

   switch (phase) {
   case PROGRESS_VACUUM_PHASE_SCAN_HEAP:
   case PROGRESS_VACUUM_PHASE_VACUUM_HEAP:
   errcbarg->relname = RelationGetRelationName(rel);
   errcbarg->indname = NULL;
   break;

   case PROGRESS_VACUUM_PHASE_VACUUM_INDEX:
   case PROGRESS_VACUUM_PHASE_INDEX_CLEANUP:
   /* rel is an index relation in index vacuum case */
   errcbarg->relname = get_rel_name(indrel->rd_index->indexrelid);
   errcbarg->indname = RelationGetRelationName(rel);
   break;

   }
}

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-02-07 Thread Justin Pryzby
On Tue, Feb 04, 2020 at 01:58:20PM +0900, Masahiko Sawada wrote:
> Here is the comment for v16 patch:
> 
> 2.
> I think we can set the error context for heap scan again after
> freespace map vacuum finishing, maybe after reporting the new phase.
> Otherwise the user will get confused if an error occurs during
> freespace map vacuum. And I think the comment is unclear, how about
> "Set the error context fro heap scan again"?

Good point

> 3.
> +   if (cbarg->blkno!=InvalidBlockNumber)
> +   errcontext(_("while scanning block %u of relation \"%s.%s\""),
> +   cbarg->blkno, cbarg->relnamespace, cbarg->relname);
> 
> We can use BlockNumberIsValid macro instead.

Thanks.  See attached, now squished together patches.

I added functions to initialize the callbacks, so error handling is out of the
way and minimally distract from the rest of vacuum.
>From 95265412c56f3b308eed16531d7c83243e278f4f Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v17 1/2] vacuum errcontext to show block being processed

As requested here.
https://www.postgresql.org/message-id/20190807235154.erbmr4o4bo6vgnjv%40alap3.anarazel.de
---
 src/backend/access/heap/vacuumlazy.c | 117 +++
 1 file changed, 117 insertions(+)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a23cdef..9358ab4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -292,6 +292,14 @@ typedef struct LVRelStats
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+typedef struct
+{
+	char 		*relnamespace;
+	char		*relname;
+	char 		*indname; /* undefined while not processing index */
+	BlockNumber blkno;	/* undefined while not processing heap */
+	int			phase;	/* Reusing same enums as for progress reporting */
+} vacuum_error_callback_arg;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -361,6 +369,7 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
 
 
 /*
@@ -724,6 +733,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		PROGRESS_VACUUM_MAX_DEAD_TUPLES
 	};
 	int64		initprog_val[3];
+	ErrorContextCallback errcallback;
+	vacuum_error_callback_arg errcbarg;
 
 	pg_rusage_init();
 
@@ -870,6 +881,17 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
+	/* Setup error traceback support for ereport() */
+	errcbarg.relnamespace = get_namespace_name(RelationGetNamespace(onerel));
+	errcbarg.relname = relname;
+	errcbarg.blkno = InvalidBlockNumber; /* Not known yet */
+	errcbarg.phase = PROGRESS_VACUUM_PHASE_SCAN_HEAP;
+
+	errcallback.callback = vacuum_error_callback;
+	errcallback.arg = (void *) 
+	errcallback.previous = error_context_stack;
+	error_context_stack = 
+
 	for (blkno = 0; blkno < nblocks; blkno++)
 	{
 		Buffer		buf;
@@ -891,6 +913,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 #define FORCE_CHECK_PAGE() \
 		(blkno == nblocks - 1 && should_attempt_truncation(params, vacrelstats))
 
+		errcbarg.blkno = blkno;
+
 		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
 		if (blkno == next_unskippable_block)
@@ -987,6 +1011,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 vmbuffer = InvalidBuffer;
 			}
 
+			/* Pop the error context stack */
+			error_context_stack = errcallback.previous;
+
 			/* Work on all the indexes, then the heap */
 			lazy_vacuum_all_indexes(onerel, Irel, indstats,
 	vacrelstats, lps, nindexes);
@@ -1011,6 +1038,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			/* Report that we are once again scanning the heap */
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 		 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+			/* Set the error context while continuing heap scan */
+			error_context_stack = 
 		}
 
 		/*
@@ -1597,6 +1627,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
 	}
 
+	/* Pop the error context stack */
+	error_context_stack = errcallback.previous;
+
 	/* report that everything is scanned and vacuumed */
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
@@ -1772,11 +1805,26 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 	int			npages;
 	PGRUsage	ru0;
 	Buffer		vmbuffer = InvalidBuffer;
+	ErrorContextCallback errcallback;
+	vacuum_error_callback_arg errcbarg;
 
 	/* Report that we are now vacuuming the heap */
 	

Re: error context for vacuum to include block number

2020-02-03 Thread Masahiko Sawada
On Sun, 2 Feb 2020 at 15:03, Justin Pryzby  wrote:
>
> Thanks for reviewing again
>
> On Sun, Feb 02, 2020 at 10:45:12AM +0900, Masahiko Sawada wrote:
> > Thank you for updating the patch. Here is some review comments:
> >
> > 1.
> > +typedef struct
> > +{
> > +   char*relnamespace;
> > +   char*relname;
> > +   char*indname; /* If vacuuming index */
> >
> > I think "Non-null if vacuuming index" is better.
>
> Actually it's undefined garbage (not NULL) if not vacuuming index.

So how about something like "set index name only during vacuuming
index". My point is that the current comment seems to be unclear to me
what describing.

>
> > And tablename is better than relname for accuracy?
>
> The existing code uses relname, so I left that, since it's strange to
> start using tablename and then write things like:
>
> |   errcbarg.tblname = relname;
> ...
> |   errcontext(_("while scanning block %u of relation \"%s.%s\""),
> |   cbarg->blkno, cbarg->relnamespace, cbarg->tblname);
>
> Also, mat views can be vacuumed.

ok, agreed.

>
> > 2.
> > +   BlockNumber blkno;
> > +   int stage;  /* 0: scan heap; 1: vacuum heap; 2: vacuum index */
> > +} vacuum_error_callback_arg;
> >
> > Why do we not support index cleanup phase?
>
> The patch started out just handling scan_heap.  The added vacuum_heap.  Then
> added vacuum_index.  Now, I've added index cleanup.
>
> > 4.
> > +/*
> > + * Setup error traceback support for ereport()
> > + * ->relnamespace and ->relname are already set
> > + */
> > +errcbarg.blkno = InvalidBlockNumber; /* Not known yet */
> > +errcbarg.stage = 1;
> >
> > relnamespace and relname of errcbarg is not set as it is initialized
> > in this function.
>
> Thanks. That's an oversight from switching back to local vars instead of
> LVRelStats while updating the patch while out of town..
>
> I don't know how to consistently test the vacuum_heap case, but rechecked it 
> just now.
>
> postgres=# SET client_min_messages=debug; SET statement_timeout=0; UPDATE t 
> SET a=1+a; SET statement_timeout=150; VACUUM(VERBOSE, PARALLEL 1) t;
> ...
> 2020-02-01 23:11:06.482 CST [26609] ERROR:  canceling statement due to 
> statement timeout
> 2020-02-01 23:11:06.482 CST [26609] CONTEXT:  while vacuuming block 33 of 
> relation "public.t"
>
> > 5.
> > @@ -177,6 +177,7 @@ typedef struct LVShared
> >  * the lazy vacuum.
> >  */
> > Oid relid;
> > +   charrelname[NAMEDATALEN];   /* tablename, used for error 
> > callback */
> >
> > How about getting relation
> > name from index relation? That is, in lazy_vacuum_index we can get
> > table oid from the index relation by IndexGetRelation() and therefore
> > we can get the table name which is in palloc'd memory. That way we
> > don't need to add relname to any existing struct such as LVRelStats
> > and LVShared.
>
> See attached
>
> Also, I think we shouldn't show a block number if it's "Invalid", to avoid
> saying "while vacuuming block 4294967295 of relation ..."
>
> For now, I made it not show any errcontext at all in that case.

Thank you for updating the patch!

Here is the comment for v16 patch:

1.
+   ErrorContextCallback errcallback = { error_context_stack,
vacuum_error_callback, , };

I think it's better to initialize individual fields because we might
need to fix it as well when fields of ErrorContextCallback are
changed.

2.
+   /* Replace error context while continuing heap scan */
+   error_context_stack = 

/*
 * Forget the now-vacuumed tuples, and press on, but be careful
 * not to reset latestRemovedXid since we want that value to be
 * valid.
 */
dead_tuples->num_tuples = 0;

/*
 * Vacuum the Free Space Map to make newly-freed space visible on
 * upper-level FSM pages.  Note we have not yet processed blkno.
 */
FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum, blkno);
next_fsm_block_to_vacuum = blkno;

/* Report that we are once again scanning the heap */
pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
}

I think we can set the error context for heap scan again after
freespace map vacuum finishing, maybe after reporting the new phase.
Otherwise the user will get confused if an error occurs during
freespace map vacuum. And I think the comment is unclear, how about
"Set the error context fro heap scan again"?

3.
+   if (cbarg->blkno!=InvalidBlockNumber)
+   errcontext(_("while scanning block %u of relation \"%s.%s\""),
+   cbarg->blkno, cbarg->relnamespace, cbarg->relname);

We can use BlockNumberIsValid macro instead.

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, 

Re: error context for vacuum to include block number

2020-02-01 Thread Justin Pryzby
Thanks for reviewing again

On Sun, Feb 02, 2020 at 10:45:12AM +0900, Masahiko Sawada wrote:
> Thank you for updating the patch. Here is some review comments:
> 
> 1.
> +typedef struct
> +{
> +   char*relnamespace;
> +   char*relname;
> +   char*indname; /* If vacuuming index */
> 
> I think "Non-null if vacuuming index" is better.

Actually it's undefined garbage (not NULL) if not vacuuming index.

> And tablename is better than relname for accuracy?

The existing code uses relname, so I left that, since it's strange to
start using tablename and then write things like:

|   errcbarg.tblname = relname;
...
|   errcontext(_("while scanning block %u of relation \"%s.%s\""),
|   cbarg->blkno, cbarg->relnamespace, cbarg->tblname);

Also, mat views can be vacuumed.

> 2.
> +   BlockNumber blkno;
> +   int stage;  /* 0: scan heap; 1: vacuum heap; 2: vacuum index */
> +} vacuum_error_callback_arg;
> 
> Why do we not support index cleanup phase?

The patch started out just handling scan_heap.  The added vacuum_heap.  Then
added vacuum_index.  Now, I've added index cleanup.

> 4.
> +/*
> + * Setup error traceback support for ereport()
> + * ->relnamespace and ->relname are already set
> + */
> +errcbarg.blkno = InvalidBlockNumber; /* Not known yet */
> +errcbarg.stage = 1;
> 
> relnamespace and relname of errcbarg is not set as it is initialized
> in this function.

Thanks. That's an oversight from switching back to local vars instead of
LVRelStats while updating the patch while out of town..

I don't know how to consistently test the vacuum_heap case, but rechecked it 
just now.

postgres=# SET client_min_messages=debug; SET statement_timeout=0; UPDATE t SET 
a=1+a; SET statement_timeout=150; VACUUM(VERBOSE, PARALLEL 1) t;
...
2020-02-01 23:11:06.482 CST [26609] ERROR:  canceling statement due to 
statement timeout
2020-02-01 23:11:06.482 CST [26609] CONTEXT:  while vacuuming block 33 of 
relation "public.t"

> 5.
> @@ -177,6 +177,7 @@ typedef struct LVShared
>  * the lazy vacuum.
>  */
> Oid relid;
> +   charrelname[NAMEDATALEN];   /* tablename, used for error callback 
> */
> 
> How about getting relation
> name from index relation? That is, in lazy_vacuum_index we can get
> table oid from the index relation by IndexGetRelation() and therefore
> we can get the table name which is in palloc'd memory. That way we
> don't need to add relname to any existing struct such as LVRelStats
> and LVShared.

See attached

Also, I think we shouldn't show a block number if it's "Invalid", to avoid
saying "while vacuuming block 4294967295 of relation ..."

For now, I made it not show any errcontext at all in that case.
>From 94f715818dcdf3225a3e7404e395e4a0f0818b0c Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v16 1/3] vacuum errcontext to show block being processed

As requested here.
https://www.postgresql.org/message-id/20190807235154.erbmr4o4bo6vgnjv%40alap3.anarazel.de
---
 src/backend/access/heap/vacuumlazy.c | 94 
 1 file changed, 94 insertions(+)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8ce5011..43859bd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -292,6 +292,13 @@ typedef struct LVRelStats
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+typedef struct
+{
+	char 		*relnamespace;
+	char		*relname;
+	BlockNumber blkno;
+	int			phase;	/* Reusing same enums as for progress reporting */
+} vacuum_error_callback_arg;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -361,6 +368,7 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
 
 
 /*
@@ -724,6 +732,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		PROGRESS_VACUUM_MAX_DEAD_TUPLES
 	};
 	int64		initprog_val[3];
+	ErrorContextCallback errcallback;
+	vacuum_error_callback_arg errcbarg;
 
 	pg_rusage_init();
 
@@ -870,6 +880,17 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
+	/* Setup error traceback support for ereport() */
+	errcbarg.relnamespace = get_namespace_name(RelationGetNamespace(onerel));
+	errcbarg.relname = relname;
+	errcbarg.blkno = InvalidBlockNumber; /* Not known yet */
+	errcbarg.phase = PROGRESS_VACUUM_PHASE_SCAN_HEAP;
+
+	errcallback.callback = vacuum_error_callback;
+	errcallback.arg = (void *) 
+	errcallback.previous = error_context_stack;
+	error_context_stack = 
+
 	for (blkno = 0; blkno < nblocks; blkno++)
 	{
 		Buffer		

Re: error context for vacuum to include block number

2020-02-01 Thread Masahiko Sawada
On Tue, 28 Jan 2020 at 07:50, Justin Pryzby  wrote:
>
> On Mon, Jan 27, 2020 at 03:59:58PM +0900, Masahiko Sawada wrote:
> > On Mon, 27 Jan 2020 at 14:38, Justin Pryzby  wrote:
> > > On Sun, Jan 26, 2020 at 12:29:38PM -0800, Andres Freund wrote:
> > > > > CONTEXT:  while vacuuming relation "public.t_a_idx"
> > > >
> > > > It'd be a bit nicer if it said index "public.t_a_idx" for relation 
> > > > "public.t".
> > >
> > > I think that tips the scale in favour of making vacrelstats a global.
> > > I added that as a 1st patch, and squished the callback patches into one.
> >
> > Hmm I don't think it's a good idea to make vacrelstats global. If we
> > want to display the relation name and its index name in error context
> > we might want to define a new struct dedicated for error context
> > reporting. That is it has blkno, stage and relation name and schema
> > name for both table and index and then we set these variables of
> > callback argument before performing a vacuum phase. We don't change
> > LVRelStats at all.
>
> On Mon, Jan 27, 2020 at 12:14:38AM -0600, Justin Pryzby wrote:
> > It occured to me that there's an issue with sharing vacrelstats between
> > scan/vacuum, since blkno and stage are set by the heap/index vacuum 
> > routines,
> > but not reset on their return to heap scan.  Not sure if we should reset 
> > them,
> > or go back to using a separate struct, like it was here:
> > https://www.postgresql.org/message-id/20200120054159.GT26045%40telsasoft.com
>
> I went back to this, original, way of doing it.
> The parallel vacuum patch made it harder to pass the table around :(
> And has to be separately tested:
>
> | SET statement_timeout=0; DROP TABLE t; CREATE TABLE t AS SELECT 
> generate_series(1,9)a; CREATE INDEX ON t(a); CREATE INDEX ON t(a); UPDATE 
> t SET a=1+a; SET statement_timeout=99;VACUUM(VERBOSE, PARALLEL 2) t;
>
> I had to allocate space for the table name within the LVShared struct, not 
> just
> a pointer, otherwise it would variously crash or fail to output the index 
> name.
> I think pointers can't be passed to parallel process except using some
> heavyweight thing like shm_toc_...
>
> I guess the callback could also take the index relid instead of name, and use
> something like IndexGetRelation().
>
> > Although the patch replaces get_namespace_name and
> > RelationGetRelationName but we use namespace name of relation at only
> > two places and almost ereport/elog messages use only relation name
> > gotten by RelationGetRelationName which is a macro to access the
> > relation name in Relation struct. So I think adding relname to
> > LVRelStats would not be a big benefit. Similarly, adding table
> > namespace to LVRelStats would be good to avoid calling
> > get_namespace_name whereas I'm not sure it's worth to have it because
> > it's expected not to be really many times.
>
> Right, I only tried that to save a few LOC and maybe make shorter lines.
> It's not important so I'll drop that patch.

Thank you for updating the patch. Here is some review comments:

1.
+typedef struct
+{
+   char*relnamespace;
+   char*relname;
+   char*indname; /* If vacuuming index */

I think "Non-null if vacuuming index" is better. And tablename is
better than relname for accuracy?

2.
+   BlockNumber blkno;
+   int stage;  /* 0: scan heap; 1: vacuum heap; 2: vacuum index */
+} vacuum_error_callback_arg;

Why do we not support index cleanup phase?

3.
/* Work on all the indexes, then the heap */
lazy_vacuum_all_indexes(onerel, Irel, indstats,
vacrelstats, lps, nindexes);
-
/* Remove tuples from heap */
lazy_vacuum_heap(onerel, vacrelstats);

I think it's an unnecessary removal.

4.
 static void
 lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 {
 int tupindex;
 int npages;
 PGRUsageru0;
 Buffer  vmbuffer = InvalidBuffer;
+ErrorContextCallback errcallback;
+vacuum_error_callback_arg errcbarg;

 /* Report that we are now vacuuming the heap */
 pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
  PROGRESS_VACUUM_PHASE_VACUUM_HEAP);

+/*
+ * Setup error traceback support for ereport()
+ * ->relnamespace and ->relname are already set
+ */
+errcbarg.blkno = InvalidBlockNumber; /* Not known yet */
+errcbarg.stage = 1;

relnamespace and relname of errcbarg is not set as it is initialized
in this function.

5.
@@ -177,6 +177,7 @@ typedef struct LVShared
 * the lazy vacuum.
 */
Oid relid;
+   charrelname[NAMEDATALEN];   /* tablename, used for error callback */

Hmm I think it's not a good idea to have LVShared have relname because
the parallel vacuum worker being able to know the table name by oid
and it consumes DSM memory. To pass the relation name down to
lazy_vacuum_index I thought to add new argument relname to some
functions but in 

Re: error context for vacuum to include block number

2020-01-27 Thread Justin Pryzby
On Mon, Jan 27, 2020 at 03:59:58PM +0900, Masahiko Sawada wrote:
> On Mon, 27 Jan 2020 at 14:38, Justin Pryzby  wrote:
> > On Sun, Jan 26, 2020 at 12:29:38PM -0800, Andres Freund wrote:
> > > > CONTEXT:  while vacuuming relation "public.t_a_idx"
> > >
> > > It'd be a bit nicer if it said index "public.t_a_idx" for relation 
> > > "public.t".
> >
> > I think that tips the scale in favour of making vacrelstats a global.
> > I added that as a 1st patch, and squished the callback patches into one.
> 
> Hmm I don't think it's a good idea to make vacrelstats global. If we
> want to display the relation name and its index name in error context
> we might want to define a new struct dedicated for error context
> reporting. That is it has blkno, stage and relation name and schema
> name for both table and index and then we set these variables of
> callback argument before performing a vacuum phase. We don't change
> LVRelStats at all.

On Mon, Jan 27, 2020 at 12:14:38AM -0600, Justin Pryzby wrote:
> It occured to me that there's an issue with sharing vacrelstats between
> scan/vacuum, since blkno and stage are set by the heap/index vacuum routines,
> but not reset on their return to heap scan.  Not sure if we should reset them,
> or go back to using a separate struct, like it was here:
> https://www.postgresql.org/message-id/20200120054159.GT26045%40telsasoft.com

I went back to this, original, way of doing it.
The parallel vacuum patch made it harder to pass the table around :(
And has to be separately tested:

| SET statement_timeout=0; DROP TABLE t; CREATE TABLE t AS SELECT 
generate_series(1,9)a; CREATE INDEX ON t(a); CREATE INDEX ON t(a); UPDATE t 
SET a=1+a; SET statement_timeout=99;VACUUM(VERBOSE, PARALLEL 2) t;

I had to allocate space for the table name within the LVShared struct, not just
a pointer, otherwise it would variously crash or fail to output the index name.
I think pointers can't be passed to parallel process except using some
heavyweight thing like shm_toc_...

I guess the callback could also take the index relid instead of name, and use
something like IndexGetRelation().

> Although the patch replaces get_namespace_name and
> RelationGetRelationName but we use namespace name of relation at only
> two places and almost ereport/elog messages use only relation name
> gotten by RelationGetRelationName which is a macro to access the
> relation name in Relation struct. So I think adding relname to
> LVRelStats would not be a big benefit. Similarly, adding table
> namespace to LVRelStats would be good to avoid calling
> get_namespace_name whereas I'm not sure it's worth to have it because
> it's expected not to be really many times.

Right, I only tried that to save a few LOC and maybe make shorter lines.
It's not important so I'll drop that patch.
>From fb53a62620aab180e8b250be50fefac1b40f50c2 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v15 1/2] vacuum errcontext to show block being processed

As requested here.
https://www.postgresql.org/message-id/20190807235154.erbmr4o4bo6vgnjv%40alap3.anarazel.de
---
 src/backend/access/heap/vacuumlazy.c | 85 +++-
 1 file changed, 84 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8ce5011..d11c7af 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -292,6 +292,13 @@ typedef struct LVRelStats
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+typedef struct
+{
+	char 		*relnamespace;
+	char		*relname;
+	BlockNumber blkno;
+	int			stage;	/* 0: scan heap; 1: vacuum heap; 2: vacuum index */
+} vacuum_error_callback_arg;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -361,6 +368,7 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
 
 
 /*
@@ -724,6 +732,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		PROGRESS_VACUUM_MAX_DEAD_TUPLES
 	};
 	int64		initprog_val[3];
+	ErrorContextCallback errcallback;
+	vacuum_error_callback_arg errcbarg;
 
 	pg_rusage_init();
 
@@ -870,6 +880,17 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
+	/* Setup error traceback support for ereport() */
+	errcbarg.relnamespace = get_namespace_name(RelationGetNamespace(onerel));
+	errcbarg.relname = relname;
+	errcbarg.blkno = InvalidBlockNumber; /* Not known yet */
+	errcbarg.stage = 0;
+
+	errcallback.callback = vacuum_error_callback;
+	errcallback.arg = (void *) 
+	errcallback.previous = error_context_stack;
+	error_context_stack = 
+
 	for (blkno = 0; blkno < 

Re: error context for vacuum to include block number

2020-01-26 Thread Masahiko Sawada
On Mon, 27 Jan 2020 at 14:38, Justin Pryzby  wrote:
>
> On Sun, Jan 26, 2020 at 12:29:38PM -0800, Andres Freund wrote:
> > > postgres=# SET client_min_messages=debug;SET statement_timeout=99; VACUUM 
> > > (VERBOSE, PARALLEL 0) t;
> > > INFO:  vacuuming "public.t"
> > > DEBUG:  "t_a_idx": vacuuming index
> > > 2020-01-20 15:47:36.338 CST [20139] ERROR:  canceling statement due to 
> > > statement timeout
> > > 2020-01-20 15:47:36.338 CST [20139] CONTEXT:  while vacuuming relation 
> > > "public.t_a_idx"
> > > 2020-01-20 15:47:36.338 CST [20139] STATEMENT:  VACUUM (VERBOSE, PARALLEL 
> > > 0) t;
> > > ERROR:  canceling statement due to statement timeout
> > > CONTEXT:  while vacuuming relation "public.t_a_idx"
> >
> > It'd be a bit nicer if it said index "public.t_a_idx" for relation 
> > "public.t".
>
> I think that tips the scale in favour of making vacrelstats a global.
> I added that as a 1st patch, and squished the callback patches into one.

Hmm I don't think it's a good idea to make vacrelstats global. If we
want to display the relation name and its index name in error context
we might want to define a new struct dedicated for error context
reporting. That is it has blkno, stage and relation name and schema
name for both table and index and then we set these variables of
callback argument before performing a vacuum phase. We don't change
LVRelStats at all.

Although the patch replaces get_namespace_name and
RelationGetRelationName but we use namespace name of relation at only
two places and almost ereport/elog messages use only relation name
gotten by RelationGetRelationName which is a macro to access the
relation name in Relation struct. So I think adding relname to
LVRelStats would not be a big benefit. Similarly, adding table
namespace to LVRelStats would be good to avoid calling
get_namespace_name whereas I'm not sure it's worth to have it because
it's expected not to be really many times.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




  1   2   >