Re: a verbose option for autovacuum

2021-03-22 Thread Masahiko Sawada
On Tue, Mar 23, 2021 at 1:31 PM Michael Paquier  wrote:
>
> On Mon, Mar 22, 2021 at 12:17:37PM +0900, Masahiko Sawada wrote:
> > I've updated the patch. I saved the index names at the beginning of
> > heap_vacuum_rel() for autovacuum logging, and add indstats and
> > nindexes to LVRelStats. Some functions still have nindexes as a
> > function argument but it seems to make sense since it corresponds the
> > list of index relations (*Irel). Please review the patch.
>
> Going back to that, the structure of the static APIs in this file make
> the whole logic a bit hard to follow, but the whole set of changes you
> have done here makes sense.  It took me a moment to recall and
> understand why it is safe to free *stats at the end of
> vacuum_one_index() and if the index stats array actually pointed to
> the DSM segment correctly within the shared stats.
>
> I think that there is more consolidation possible within LVRelStats,
> but let's leave that for another day, if there is any need for such a
> move.

While studying your patch (v5-index_stat_log.patch) I found we can
polish the parallel vacuum code in some places. I'll try it another
day.

>
> To keep it short.  Sold.

Thank you!

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: a verbose option for autovacuum

2021-03-22 Thread Michael Paquier
On Mon, Mar 22, 2021 at 12:17:37PM +0900, Masahiko Sawada wrote:
> I've updated the patch. I saved the index names at the beginning of
> heap_vacuum_rel() for autovacuum logging, and add indstats and
> nindexes to LVRelStats. Some functions still have nindexes as a
> function argument but it seems to make sense since it corresponds the
> list of index relations (*Irel). Please review the patch.

Going back to that, the structure of the static APIs in this file make
the whole logic a bit hard to follow, but the whole set of changes you
have done here makes sense.  It took me a moment to recall and
understand why it is safe to free *stats at the end of
vacuum_one_index() and if the index stats array actually pointed to
the DSM segment correctly within the shared stats.

I think that there is more consolidation possible within LVRelStats,
but let's leave that for another day, if there is any need for such a
move.

To keep it short.  Sold.
--
Michael


signature.asc
Description: PGP signature


Re: a verbose option for autovacuum

2021-03-21 Thread Masahiko Sawada
On Sat, Mar 20, 2021 at 3:40 PM Michael Paquier  wrote:
>
> On Sat, Mar 20, 2021 at 01:06:51PM +0900, Masahiko Sawada wrote:
> > It's not bad but it seems redundant a bit to me. We pass the idx in
> > spite of passing also Irel[idx] and &(vacrelstats->indstats[idx]). I
> > think your first idea that is done in v4 patch (saving index names at
> > the beginning of heap_vacuum_rel() for autovacuum logging purpose
> > only) and the idea of deferring to close indexes until the end of
> > heap_vacuum_rel() so that we can refer index name at autovacuum
> > logging are more simple.
>
> Okay.
>
> > We need to initialize *stats with NULL here.
>
> Right.  I am wondering why I did not get any complain here.
>
> > If shared_indstats is NULL (e.g., we do " stats =
> > vacrelstats->indstats[indnum];"), vacrelstats->indstats[indnum] is not
> > updated since we pass  I think we should pass
> > &(vacrelstats->indstats[indnum]) instead in this case.
>
> If we get rid completely of this idea around indnum, that I don't
> disagree with so let's keep just indname, you mean to keep the second
> argument IndexBulkDeleteResult of vacuum_one_index() and pass down
> &(vacrelstats->indstats[indnum]) as argument.  No objections from me
> to just do that.
>
> > Previously, we update the element of the pointer array of index
> > statistics to the pointer pointing to either the local memory or DSM.
> > But with the above change, we do that only when the index statistics
> > are in the local memory. In other words, vacrelstats->indstats[i] is
> > never updated if the corresponding index supports parallel indexes. I
> > think this is not relevant with the change that we'd like to do here
> > (i.e., passing indnum down).
>
> Yeah, that looks like just some over-engineering design on my side.
> Would you like to update the patch with what you think is most
> adapted?

I've updated the patch. I saved the index names at the beginning of
heap_vacuum_rel() for autovacuum logging, and add indstats and
nindexes to LVRelStats. Some functions still have nindexes as a
function argument but it seems to make sense since it corresponds the
list of index relations (*Irel). Please review the patch.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


v6-index_stats_log.patch
Description: Binary data


Re: a verbose option for autovacuum

2021-03-20 Thread Michael Paquier
On Sat, Mar 20, 2021 at 01:06:51PM +0900, Masahiko Sawada wrote:
> It's not bad but it seems redundant a bit to me. We pass the idx in
> spite of passing also Irel[idx] and &(vacrelstats->indstats[idx]). I
> think your first idea that is done in v4 patch (saving index names at
> the beginning of heap_vacuum_rel() for autovacuum logging purpose
> only) and the idea of deferring to close indexes until the end of
> heap_vacuum_rel() so that we can refer index name at autovacuum
> logging are more simple.

Okay.

> We need to initialize *stats with NULL here.

Right.  I am wondering why I did not get any complain here.

> If shared_indstats is NULL (e.g., we do " stats =
> vacrelstats->indstats[indnum];"), vacrelstats->indstats[indnum] is not
> updated since we pass  I think we should pass
> &(vacrelstats->indstats[indnum]) instead in this case.

If we get rid completely of this idea around indnum, that I don't
disagree with so let's keep just indname, you mean to keep the second
argument IndexBulkDeleteResult of vacuum_one_index() and pass down
&(vacrelstats->indstats[indnum]) as argument.  No objections from me
to just do that.

> Previously, we update the element of the pointer array of index
> statistics to the pointer pointing to either the local memory or DSM.
> But with the above change, we do that only when the index statistics
> are in the local memory. In other words, vacrelstats->indstats[i] is
> never updated if the corresponding index supports parallel indexes. I
> think this is not relevant with the change that we'd like to do here
> (i.e., passing indnum down).

Yeah, that looks like just some over-engineering design on my side.
Would you like to update the patch with what you think is most
adapted?
--
Michael


signature.asc
Description: PGP signature


Re: a verbose option for autovacuum

2021-03-19 Thread Masahiko Sawada
On Fri, Mar 19, 2021 at 3:14 PM Michael Paquier  wrote:
>
> On Thu, Mar 18, 2021 at 11:30:46PM +0900, Masahiko Sawada wrote:
> > Sorry, I attached the wrong version patch. So attached the right one.
>
> Thanks.  I have been hacking aon that, and I think that we could do
> more in terms of integration of the index stats into LVRelStats to
> help with debugging issues, mainly, but also to open the door at
> allowing autovacuum to use the parallel path in the future.

Thank you for updating the patch!

> Hence,
> for consistency, I think that we should change the following things in
> LVRelStats:
> - Add the number of indexes.  It looks rather unusual to not track
> down the number of indexes directly in the structure anyway, as the
> stats array gets added there.
> - Add all the index names, for parallel and non-parallel mode.

Agreed with those two changes.

> - Replace the index name in the error callback by an index number,
> pointing back to its location in indstats and indnames.

I like this idea but I'm not sure the approach that the patch took
improved the code. Please read the below my concern.

>
> As lazy_vacuum_index() requires the index number to be set internally
> to it, this means that we need to pass it down across
> vacuum_indexes_leader(), lazy_parallel_vacuum_indexes(), but that
> seems like an acceptable compromise to me for now.  I think that it
> would be good to tighten a bit more the relationship between the index
> stats in the DSM for the parallel case and the ones in local memory,
> but what we have here looks enough to me so we could figure out that
> as a future step.
>
> Sawada-san, what do you think?  Attached is the patch I have finished
> with.

With this idea, vacuum_one_index() will become:

static void
lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
  LVDeadTuples *dead_tuples, double reltuples,
  LVRelStats *vacrelstats, int indnum)

and the caller calls this function as follow:

for (idx = 0; idx < nindexes; idx++)
lazy_vacuum_index(Irel[idx], &(vacrelstats->indstats[idx]),
  vacrelstats->dead_tuples,
  vacrelstats->old_live_tuples, vacrelstats,
  idx);

It's not bad but it seems redundant a bit to me. We pass the idx in
spite of passing also Irel[idx] and &(vacrelstats->indstats[idx]). I
think your first idea that is done in v4 patch (saving index names at
the beginning of heap_vacuum_rel() for autovacuum logging purpose
only) and the idea of deferring to close indexes until the end of
heap_vacuum_rel() so that we can refer index name at autovacuum
logging are more simple.

BTW the patch led to a crash in my environment. The problem is here:

 static void
-vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats,
+vacuum_one_index(Relation indrel,
 LVShared *lvshared, LVSharedIndStats *shared_indstats,
-LVDeadTuples *dead_tuples, LVRelStats *vacrelstats)
+LVDeadTuples *dead_tuples, LVRelStats *vacrelstats,
+int indnum)
 {
IndexBulkDeleteResult *bulkdelete_res = NULL;
+   IndexBulkDeleteResult *stats;

We need to initialize *stats with NULL here.

And while looking at the change of vacuum_one_index() I found another problem:

@@ -2349,17 +2400,20 @@ vacuum_one_index(Relation indrel,
IndexBulkDeleteResult **stats,
 * Update the pointer to the corresponding
bulk-deletion result if
 * someone has already updated it.
 */
-   if (shared_indstats->updated && *stats == NULL)
-   *stats = bulkdelete_res;
+   if (shared_indstats->updated)
+   stats = bulkdelete_res;
}
+   else
+   stats = vacrelstats->indstats[indnum];

/* Do vacuum or cleanup of the index */
if (lvshared->for_cleanup)
-   lazy_cleanup_index(indrel, stats, lvshared->reltuples,
-
lvshared->estimated_count, vacrelstats);
+   lazy_cleanup_index(indrel, , lvshared->reltuples,
+
lvshared->estimated_count, vacrelstats,
+  indnum);
else
-   lazy_vacuum_index(indrel, stats, dead_tuples,
- lvshared->reltuples,
vacelstats);
+   lazy_vacuum_index(indrel, , dead_tuples,
+ lvshared->reltuples,
vacrelstats, indnum);

/*
 * Copy the index bulk-deletion result returned from ambulkdelete and

If shared_indstats is NULL (e.g., we do " stats =
vacrelstats->indstats[indnum];"), vacrelstats->indstats[indnum] is not
updated since we pass  I think we should pass
&(vacrelstats->indstats[indnum]) instead in this case.

Previously, we update the element of the pointer array of index
statistics to the pointer pointing to either the local memory or DSM.

Re: a verbose option for autovacuum

2021-03-19 Thread Michael Paquier
On Thu, Mar 18, 2021 at 11:30:46PM +0900, Masahiko Sawada wrote:
> Sorry, I attached the wrong version patch. So attached the right one.

Thanks.  I have been hacking aon that, and I think that we could do
more in terms of integration of the index stats into LVRelStats to
help with debugging issues, mainly, but also to open the door at
allowing autovacuum to use the parallel path in the future.  Hence,
for consistency, I think that we should change the following things in
LVRelStats:
- Add the number of indexes.  It looks rather unusual to not track
down the number of indexes directly in the structure anyway, as the
stats array gets added there.
- Add all the index names, for parallel and non-parallel mode.
- Replace the index name in the error callback by an index number,
pointing back to its location in indstats and indnames.

As lazy_vacuum_index() requires the index number to be set internally
to it, this means that we need to pass it down across
vacuum_indexes_leader(), lazy_parallel_vacuum_indexes(), but that
seems like an acceptable compromise to me for now.  I think that it
would be good to tighten a bit more the relationship between the index
stats in the DSM for the parallel case and the ones in local memory,
but what we have here looks enough to me so we could figure out that
as a future step.

Sawada-san, what do you think?  Attached is the patch I have finished
with.
--
Michael
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8341879d89..e7ba83f500 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -315,8 +315,14 @@ typedef struct LVRelStats
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
 
+	/* Statistics about indexes */
+	int			nindexes;
+	char	  **indnames;
+	IndexBulkDeleteResult **indstats;
+
 	/* Used for error callback */
-	char	   *indname;
+	int			indnum;			/* index number to indnames and indstats,
+ * or -1 if no index being worked on. */
 	BlockNumber blkno;			/* used only for heap operations */
 	OffsetNumber offnum;		/* used only for heap operations */
 	VacErrPhase phase;
@@ -348,14 +354,18 @@ static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
 static bool lazy_check_needs_freeze(Buffer buf, bool *hastup,
 	LVRelStats *vacrelstats);
 static void lazy_vacuum_all_indexes(Relation onerel, Relation *Irel,
-	IndexBulkDeleteResult **stats,
 	LVRelStats *vacrelstats, LVParallelState *lps,
 	int nindexes);
-static void lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
-			  LVDeadTuples *dead_tuples, double reltuples, LVRelStats *vacrelstats);
+static void lazy_vacuum_index(Relation indrel,
+			  IndexBulkDeleteResult **stats,
+			  LVDeadTuples *dead_tuples,
+			  double reltuples,
+			  LVRelStats *vacrelstats,
+			  int indnum);
 static void lazy_cleanup_index(Relation indrel,
 			   IndexBulkDeleteResult **stats,
-			   double reltuples, bool estimated_count, LVRelStats *vacrelstats);
+			   double reltuples, bool estimated_count,
+			   LVRelStats *vacrelstats, int indnum);
 static int	lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
 			 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
 static bool should_attempt_truncation(VacuumParams *params,
@@ -371,21 +381,19 @@ static int	vac_cmp_itemptr(const void *left, const void *right);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 	 LVRelStats *vacrelstats,
 	 TransactionId *visibility_cutoff_xid, bool *all_frozen);
-static void lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
-		 LVRelStats *vacrelstats, LVParallelState *lps,
-		 int nindexes);
-static void parallel_vacuum_index(Relation *Irel, IndexBulkDeleteResult **stats,
-  LVShared *lvshared, LVDeadTuples *dead_tuples,
-  int nindexes, LVRelStats *vacrelstats);
-static void vacuum_indexes_leader(Relation *Irel, IndexBulkDeleteResult **stats,
-  LVRelStats *vacrelstats, LVParallelState *lps,
-  int nindexes);
-static void vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats,
+static void lazy_parallel_vacuum_indexes(Relation *Irel, LVRelStats *vacrelstats,
+		 LVParallelState *lps, int nindexes);
+static void parallel_vacuum_index(Relation *Irel, LVShared *lvshared,
+  LVDeadTuples *dead_tuples, int nindexes,
+  LVRelStats *vacrelstats);
+static void vacuum_indexes_leader(Relation *Irel, LVRelStats *vacrelstats,
+  LVParallelState *lps, int nindexes);
+static void vacuum_one_index(Relation indrel,
 			 LVShared *lvshared, LVSharedIndStats *shared_indstats,
-			 LVDeadTuples *dead_tuples, LVRelStats *vacrelstats);
-static void lazy_cleanup_all_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
-	 LVRelStats *vacrelstats, LVParallelState *lps,
-	 int 

Re: a verbose option for autovacuum

2021-03-18 Thread Peter Geoghegan
On Thu, Mar 18, 2021 at 2:08 PM Michael Paquier  wrote:
> Yes, I was waiting for Sawada-san to send an updated version, which he
> just did.

Great. This really seems worth having. I was hoping that somebody else
could pick this one up.

-- 
Peter Geoghegan




Re: a verbose option for autovacuum

2021-03-18 Thread Michael Paquier
On Thu, Mar 18, 2021 at 09:38:05AM -0700, Peter Geoghegan wrote:
> Were you going to take care of this, Michael?

Yes, I was waiting for Sawada-san to send an updated version, which he
just did.
--
Michael


signature.asc
Description: PGP signature


Re: a verbose option for autovacuum

2021-03-18 Thread Peter Geoghegan
On Thu, Mar 18, 2021 at 5:14 AM Masahiko Sawada  wrote:
> Okay, I've updated the patch accordingly. If we add
> IndexBulkDeleteResult to LVRelStats I think we can remove
> IndexBulkDeleteResult function argument also from some other functions
> such as lazy_parallel_vacuum_indexes() and vacuum_indexes_leader().
> Please review the attached patch.

That seems much clearer.

Were you going to take care of this, Michael?

-- 
Peter Geoghegan




Re: a verbose option for autovacuum

2021-03-18 Thread Masahiko Sawada
On Thu, Mar 18, 2021 at 9:13 PM Masahiko Sawada  wrote:
>
> On Thu, Mar 18, 2021 at 3:41 PM Michael Paquier  wrote:
> >
> > On Wed, Mar 17, 2021 at 08:50:26AM -0300, Euler Taveira wrote:
> > > Since commit 5f8727f5a6, this patch doesn't apply anymore. Fortunately, 
> > > it is
> > > just a small hunk. I reviewed this patch and it looks good to me. There 
> > > is just
> > > a small issue (double space after 'if') that I fixed in the attached 
> > > version.
> >
> > No major objections to what you are proposing here.
> >
> > >   /* and log the action if appropriate */
> > > - if (IsAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
> > > + if (IsAutoVacuumWorkerProcess())
> > >   {
> > > - TimestampTz endtime = GetCurrentTimestamp();
> > > + TimestampTz endtime = 0;
> > > + int i;
> > >
> > > - if (params->log_min_duration == 0 ||
> > > - TimestampDifferenceExceeds(starttime, endtime,
> > > -
> > > params->log_min_duration))
> > > + if (params->log_min_duration >= 0)
> > > + endtime = GetCurrentTimestamp();
> > > +
> > > + if (endtime > 0 &&
> > > + (params->log_min_duration == 0 ||
> > > +  TimestampDifferenceExceeds(starttime, endtime,
> >
> > Why is there any need to actually change this part?  If I am following
> > the patch correctly, the reason why you are doing things this way is
> > to free the set of N statistics all the time for autovacuum.  However,
> > we could make that much simpler, and your patch is already half-way
> > through that by adding the stats of the N indexes to LVRelStats.  Here
> > is the idea:
> > - Allocation of the N items for IndexBulkDeleteResult at the beginning
> > of heap_vacuum_rel().  It seems to me that we are going to need the N
> > index names within LVRelStats, to be able to still call
> > vac_close_indexes() *before* logging the stats.
> > - No need to pass down indstats for the two callers of
> > lazy_vacuum_all_indexes().
> > - Clean up of vacrelstats once heap_vacuum_rel() is done with it.
>
> Okay, I've updated the patch accordingly. If we add
> IndexBulkDeleteResult to LVRelStats I think we can remove
> IndexBulkDeleteResult function argument also from some other functions
> such as lazy_parallel_vacuum_indexes() and vacuum_indexes_leader().
> Please review the attached patch.

Sorry, I attached the wrong version patch. So attached the right one.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


v4-index_stats_log.patch
Description: Binary data


Re: a verbose option for autovacuum

2021-03-18 Thread Masahiko Sawada
On Thu, Mar 18, 2021 at 3:41 PM Michael Paquier  wrote:
>
> On Wed, Mar 17, 2021 at 08:50:26AM -0300, Euler Taveira wrote:
> > Since commit 5f8727f5a6, this patch doesn't apply anymore. Fortunately, it 
> > is
> > just a small hunk. I reviewed this patch and it looks good to me. There is 
> > just
> > a small issue (double space after 'if') that I fixed in the attached 
> > version.
>
> No major objections to what you are proposing here.
>
> >   /* and log the action if appropriate */
> > - if (IsAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
> > + if (IsAutoVacuumWorkerProcess())
> >   {
> > - TimestampTz endtime = GetCurrentTimestamp();
> > + TimestampTz endtime = 0;
> > + int i;
> >
> > - if (params->log_min_duration == 0 ||
> > - TimestampDifferenceExceeds(starttime, endtime,
> > -
> > params->log_min_duration))
> > + if (params->log_min_duration >= 0)
> > + endtime = GetCurrentTimestamp();
> > +
> > + if (endtime > 0 &&
> > + (params->log_min_duration == 0 ||
> > +  TimestampDifferenceExceeds(starttime, endtime,
>
> Why is there any need to actually change this part?  If I am following
> the patch correctly, the reason why you are doing things this way is
> to free the set of N statistics all the time for autovacuum.  However,
> we could make that much simpler, and your patch is already half-way
> through that by adding the stats of the N indexes to LVRelStats.  Here
> is the idea:
> - Allocation of the N items for IndexBulkDeleteResult at the beginning
> of heap_vacuum_rel().  It seems to me that we are going to need the N
> index names within LVRelStats, to be able to still call
> vac_close_indexes() *before* logging the stats.
> - No need to pass down indstats for the two callers of
> lazy_vacuum_all_indexes().
> - Clean up of vacrelstats once heap_vacuum_rel() is done with it.

Okay, I've updated the patch accordingly. If we add
IndexBulkDeleteResult to LVRelStats I think we can remove
IndexBulkDeleteResult function argument also from some other functions
such as lazy_parallel_vacuum_indexes() and vacuum_indexes_leader().
Please review the attached patch.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


v3-index_stats_log.patch
Description: Binary data


Re: a verbose option for autovacuum

2021-03-18 Thread Michael Paquier
On Wed, Mar 17, 2021 at 08:50:26AM -0300, Euler Taveira wrote:
> Since commit 5f8727f5a6, this patch doesn't apply anymore. Fortunately, it is
> just a small hunk. I reviewed this patch and it looks good to me. There is 
> just
> a small issue (double space after 'if') that I fixed in the attached version.

No major objections to what you are proposing here.

>   /* and log the action if appropriate */
> - if (IsAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
> + if (IsAutoVacuumWorkerProcess())
>   {
> - TimestampTz endtime = GetCurrentTimestamp();
> + TimestampTz endtime = 0;
> + int i;
>  
> - if (params->log_min_duration == 0 ||
> - TimestampDifferenceExceeds(starttime, endtime,
> -
> params->log_min_duration))
> + if (params->log_min_duration >= 0)
> + endtime = GetCurrentTimestamp();
> +
> + if (endtime > 0 &&
> + (params->log_min_duration == 0 ||
> +  TimestampDifferenceExceeds(starttime, endtime,

Why is there any need to actually change this part?  If I am following
the patch correctly, the reason why you are doing things this way is
to free the set of N statistics all the time for autovacuum.  However,
we could make that much simpler, and your patch is already half-way
through that by adding the stats of the N indexes to LVRelStats.  Here
is the idea:
- Allocation of the N items for IndexBulkDeleteResult at the beginning
of heap_vacuum_rel().  It seems to me that we are going to need the N
index names within LVRelStats, to be able to still call
vac_close_indexes() *before* logging the stats.
- No need to pass down indstats for the two callers of
lazy_vacuum_all_indexes().
- Clean up of vacrelstats once heap_vacuum_rel() is done with it.
--
Michael


signature.asc
Description: PGP signature


Re: a verbose option for autovacuum

2021-03-17 Thread Euler Taveira
On Wed, Mar 10, 2021, at 12:46 AM, Masahiko Sawada wrote:
> Attached a patch. I've slightly modified the format for consistency
> with heap statistics.
Since commit 5f8727f5a6, this patch doesn't apply anymore. Fortunately, it is
just a small hunk. I reviewed this patch and it looks good to me. There is just
a small issue (double space after 'if') that I fixed in the attached version.


--
Euler Taveira
EDB   https://www.enterprisedb.com/
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8341879d89..35be34ea06 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -314,6 +314,7 @@ typedef struct LVRelStats
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
+	IndexBulkDeleteResult **indstats; /* used for autovacuum logs */
 
 	/* Used for error callback */
 	char	   *indname;
@@ -531,9 +532,6 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
 	/* Do the vacuuming */
 	lazy_scan_heap(onerel, params, vacrelstats, Irel, nindexes, aggressive);
 
-	/* Done with indexes */
-	vac_close_indexes(nindexes, Irel, NoLock);
-
 	/*
 	 * Compute whether we actually scanned the all unfrozen pages. If we did,
 	 * we can adjust relfrozenxid and relminmxid.
@@ -614,13 +612,18 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
 	pgstat_progress_end_command();
 
 	/* and log the action if appropriate */
-	if (IsAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
+	if (IsAutoVacuumWorkerProcess())
 	{
-		TimestampTz endtime = GetCurrentTimestamp();
+		TimestampTz endtime = 0;
+		int i;
 
-		if (params->log_min_duration == 0 ||
-			TimestampDifferenceExceeds(starttime, endtime,
-	   params->log_min_duration))
+		if (params->log_min_duration >= 0)
+			endtime = GetCurrentTimestamp();
+
+		if (endtime > 0 &&
+			(params->log_min_duration == 0 ||
+			 TimestampDifferenceExceeds(starttime, endtime,
+		params->log_min_duration)))
 		{
 			StringInfoData buf;
 			char	   *msgfmt;
@@ -680,6 +683,21 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
 			 (long long) VacuumPageHit,
 			 (long long) VacuumPageMiss,
 			 (long long) VacuumPageDirty);
+			for (i = 0; i < nindexes; i++)
+			{
+IndexBulkDeleteResult *stats = vacrelstats->indstats[i];
+
+if (!stats)
+	continue;
+
+appendStringInfo(,
+ _("index \"%s\": pages: %u remain, %u newly deleted, %u currently deleted, %u reusable\n"),
+ RelationGetRelationName(Irel[i]),
+ stats->num_pages,
+ stats->pages_newly_deleted,
+ stats->pages_deleted,
+ stats->pages_free);
+			}
 			appendStringInfo(, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
 			 read_rate, write_rate);
 			if (track_io_timing)
@@ -704,7 +722,17 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
 	(errmsg_internal("%s", buf.data)));
 			pfree(buf.data);
 		}
+
+		/* Cleanup index statistics */
+		for (i = 0; i < nindexes; i++)
+		{
+			if (vacrelstats->indstats[i])
+pfree(vacrelstats->indstats[i]);
+		}
 	}
+
+	/* Done with indexes */
+	vac_close_indexes(nindexes, Irel, NoLock);
 }
 
 /*
@@ -1758,7 +1786,10 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 
 	/* Update index statistics */
 	if (vacrelstats->useindex)
+	{
 		update_index_statistics(Irel, indstats, nindexes);
+		vacrelstats->indstats = indstats;
+	}
 
 	/* If no indexes, make log report that lazy_vacuum_heap would've made */
 	if (vacuumed_pages)
@@ -3243,7 +3274,13 @@ update_index_statistics(Relation *Irel, IndexBulkDeleteResult **stats,
 			InvalidTransactionId,
 			InvalidMultiXactId,
 			false);
-		pfree(stats[i]);
+
+		/*
+		 * Autovacuum worker keeps the index statistics until the end
+		 * of lazy vacuum for autovacuum logs.
+		 */
+		if (!IsAutoVacuumWorkerProcess())
+			pfree(stats[i]);
 	}
 }
 


Re: a verbose option for autovacuum

2021-03-09 Thread Masahiko Sawada
On Tue, Mar 9, 2021 at 12:58 AM Euler Taveira  wrote:
>
> On Mon, Mar 8, 2021, at 2:32 AM, Masahiko Sawada wrote:
>
> * Proposed idea
> LOG:  automatic vacuum of table "postgres.public.test": index scans: 1
> pages: 0 removed, 443 remain, 0 skipped due to pins, 0 skipped frozen
> tuples: 1000 removed, 99000 remain, 0 are dead but not yet removable,
> oldest xmin: 545
> indexes: "postgres.public.test_idx1" 276 pages, 0 newly deleted, 0
> currently deleted, 0 reusable.
> "postgres.public.test_idx2" 300 pages, 10 newly deleted, 0 currently
> deleted, 3 reusable.
> "postgres.public.test_idx2" 310 pages, 4 newly deleted, 0 currently
> deleted, 0 reusable.
>
> Instead of using "indexes:" and add a list of indexes (one on each line), it
> would be more parse-friendly if it prints one index per line using 'index
> "postgres.public.idxname" 123 pages, 45 newly deleted, 67 currently deleted, 8
> reusable.'.

Agreed.

Attached a patch. I've slightly modified the format for consistency
with heap statistics.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


index_stats_log.patch
Description: Binary data


Re: a verbose option for autovacuum

2021-03-08 Thread Euler Taveira
On Mon, Mar 8, 2021, at 2:32 AM, Masahiko Sawada wrote:
> * Proposed idea
> LOG:  automatic vacuum of table "postgres.public.test": index scans: 1
> pages: 0 removed, 443 remain, 0 skipped due to pins, 0 skipped frozen
> tuples: 1000 removed, 99000 remain, 0 are dead but not yet removable,
> oldest xmin: 545
> indexes: "postgres.public.test_idx1" 276 pages, 0 newly deleted, 0
> currently deleted, 0 reusable.
> "postgres.public.test_idx2" 300 pages, 10 newly deleted, 0 currently
> deleted, 3 reusable.
> "postgres.public.test_idx2" 310 pages, 4 newly deleted, 0 currently
> deleted, 0 reusable.
Instead of using "indexes:" and add a list of indexes (one on each line), it
   
would be more parse-friendly if it prints one index per line using 'index   
  
"postgres.public.idxname" 123 pages, 45 newly deleted, 67 currently deleted, 8  
reusable.'.

> It still lacks some of what VACUUM VERBOSE shows (e.g., each index
> vacuum execution time etc) but it would be enough information to know
> the index page statistics. Probably we can output those by default
> without adding a new parameter controlling that.
Perfect is the enemy of the good. Let start with this piece of information.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: a verbose option for autovacuum

2021-03-07 Thread Masahiko Sawada
On Tue, Feb 2, 2021 at 10:51 AM Euler Taveira  wrote:
>
> On Fri, Jan 29, 2021, at 4:35 AM, Masahiko Sawada wrote:
>
> I agree to have autovacuum log more information, especially index
> vacuums. Currently, the log related to index vacuum is only the number
> of index scans. I think it would be helpful if the log has more
> details about each index vacuum.
>
> +1 for this feature. Sometimes this analysis is useful to confirm your theory;
> without data, it is just a wild guess.
>
> But I'm not sure that neither always logging that nor having set the
> parameter per-table basis is a good idea. In the former case, it could
> be log spam for example in the case of anti-wraparound vacuums that
> vacuums on all tables (and their indexes) in the database. If we set
> it per-table basis, it’s useful when the user already knows which
> tables are likely to take a long time for autovacuum but won’t work
> when the users want to check the autovacuum details for tables that
> autovacuum could take a long time for.
>
> I prefer a per-table parameter since it allows us a fine-grained tuning. It
> covers the cases you provided above. You can disable it at all and only enable
> it in critical tables or enable it and disable it for known-to-be-spam tables.
>
> Given that we already have log_autovacuum_min_duration, I think this
> verbose logging should work together with that. I’d prefer to enable
> the verbose logging by default for the same reason Stephen mentioned.
> Or maybe we can have a parameter to control verbosity, say
> log_autovaucum_verbosity.
>
> IMO this new parameter is just an option to inject VERBOSE into VACUUM 
> command.
> Since there is already a parameter to avoid spam autovacuum messages, this
> feature shouldn't hijack log_autovacuum_min_duration behavior. If the
> autovacuum command execution time runs less than l_a_m_d, the output should be
> discarded.

Yeah, if autovacuum execution time doesn't exceed
log_autovacuum_min_duration, the output should be discarded. My idea
is to show autovacuum log along with the new information about index
vacuum etc if autovacuum execution time exceeds the threshold.

Regarding the new parameter being discussed here, I think it depends
on how much the amount of autovacuum logs increases. I think if we
added a few information about indexes to the current autovacuum log
the new parameter would not required. So I just posted[1] another idea
to show only index page statistics (i.g., what lazy_cleanup_index()
shows) along with the current autovacuum logs. Please refer to it.

Regards,

[1] 
https://www.postgresql.org/message-id/CAD21AoAy6SxHiTivh5yAPJSUE4S%3DQRPpSZUdafOSz0R%2BfRcM6Q%40mail.gmail.com

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: a verbose option for autovacuum

2021-03-07 Thread Masahiko Sawada
On Tue, Feb 2, 2021 at 9:59 AM Tommy Li  wrote:
>
> Hi Masahiko
>
> > If we set
> > it per-table basis, it’s useful when the user already knows which
> > tables are likely to take a long time for autovacuum
>
> I would assume that's the default case, most apps I've seen are designed 
> around a small
> number of large tables that take up most of the maintenance effort
>
> > Regarding when to log, we can have autovacuum emit index vacuum log
> > after each lazy_vacuum/cleanup_index() end like VACUUM VERBOSE does,
> > but I’m not sure how it could work together with
> > log_autovacuum_min_duration.
>
> I do like having this rolled into the existing configuration. This might be 
> an absurd idea, but
> what if the autovacuum process accumulates the per-index vacuum information 
> until that
> threshold is reached, and then spits out the logs all at once? And after the 
> min duration is
> passed, it just logs the rest of the index vacuum information as they occur. 
> That way the
> information is more likely to be available to investigate an abnormally long 
> running vacuum
> while it's still happening.

Since index vacuum can be executed more than once within an
autovacuum, we need to keep all of them. It's not impossible.

As the second idea, I think showing index vacuum statistics (i.g.,
what lazy_cleanup_index shows) together with the current autovacuum
logs might be a good start. The autovacuum log becomes like follows:

* HEAD
LOG:  automatic vacuum of table "postgres.public.test": index scans: 1
pages: 0 removed, 443 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 1000 removed, 99000 remain, 0 are dead but not yet removable,
oldest xmin: 545
buffer usage: 2234 hits, 4 misses, 4 dirtied
avg read rate: 0.504 MB/s, avg write rate: 0.504 MB/s
system usage: CPU: user: 0.03 s, system: 0.00 s, elapsed: 0.06 s
WAL usage: 2162 records, 4 full page images, 159047 bytes

* Proposed idea
LOG:  automatic vacuum of table "postgres.public.test": index scans: 1
pages: 0 removed, 443 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 1000 removed, 99000 remain, 0 are dead but not yet removable,
oldest xmin: 545
indexes: "postgres.public.test_idx1" 276 pages, 0 newly deleted, 0
currently deleted, 0 reusable.
"postgres.public.test_idx2" 300 pages, 10 newly deleted, 0 currently
deleted, 3 reusable.
"postgres.public.test_idx2" 310 pages, 4 newly deleted, 0 currently
deleted, 0 reusable.
buffer usage: 2234 hits, 4 misses, 4 dirtied
avg read rate: 0.504 MB/s, avg write rate: 0.504 MB/s
system usage: CPU: user: 0.03 s, system: 0.00 s, elapsed: 0.06 s
WAL usage: 2162 records, 4 full page images, 159047 bytes

It still lacks some of what VACUUM VERBOSE shows (e.g., each index
vacuum execution time etc) but it would be enough information to know
the index page statistics. Probably we can output those by default
without adding a new parameter controlling that.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: a verbose option for autovacuum

2021-02-01 Thread Euler Taveira
On Fri, Jan 29, 2021, at 4:35 AM, Masahiko Sawada wrote:
> I agree to have autovacuum log more information, especially index
> vacuums. Currently, the log related to index vacuum is only the number
> of index scans. I think it would be helpful if the log has more
> details about each index vacuum.
+1 for this feature. Sometimes this analysis is useful to confirm your theory;
without data, it is just a wild guess.

> But I'm not sure that neither always logging that nor having set the
> parameter per-table basis is a good idea. In the former case, it could
> be log spam for example in the case of anti-wraparound vacuums that
> vacuums on all tables (and their indexes) in the database. If we set
> it per-table basis, it’s useful when the user already knows which
> tables are likely to take a long time for autovacuum but won’t work
> when the users want to check the autovacuum details for tables that
> autovacuum could take a long time for.
I prefer a per-table parameter since it allows us a fine-grained tuning. It
covers the cases you provided above. You can disable it at all and only enable
it in critical tables or enable it and disable it for known-to-be-spam tables.

> Given that we already have log_autovacuum_min_duration, I think this
> verbose logging should work together with that. I’d prefer to enable
> the verbose logging by default for the same reason Stephen mentioned.
> Or maybe we can have a parameter to control verbosity, say
> log_autovaucum_verbosity.
IMO this new parameter is just an option to inject VERBOSE into VACUUM command.
Since there is already a parameter to avoid spam autovacuum messages, this
feature shouldn't hijack log_autovacuum_min_duration behavior. If the
autovacuum command execution time runs less than l_a_m_d, the output should be
discarded.

I don't have a strong opinion about this parameter name but I think your
suggestion (log_autovaccum_verbosity) is easier to guess what this parameter is
for.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: a verbose option for autovacuum

2021-02-01 Thread Tommy Li
Hi Masahiko

> If we set
> it per-table basis, it’s useful when the user already knows which
> tables are likely to take a long time for autovacuum

I would assume that's the default case, most apps I've seen are designed
around a small
number of large tables that take up most of the maintenance effort

> Regarding when to log, we can have autovacuum emit index vacuum log
> after each lazy_vacuum/cleanup_index() end like VACUUM VERBOSE does,
> but I’m not sure how it could work together with
> log_autovacuum_min_duration.

I do like having this rolled into the existing configuration. This might be
an absurd idea, but
what if the autovacuum process accumulates the per-index vacuum information
until that
threshold is reached, and then spits out the logs all at once? And after
the min duration is
passed, it just logs the rest of the index vacuum information as they
occur. That way the
information is more likely to be available to investigate an abnormally
long running vacuum
while it's still happening.


Tommy


Re: a verbose option for autovacuum

2021-01-28 Thread Masahiko Sawada
On Tue, Jan 26, 2021 at 2:46 AM Tommy Li  wrote:
>
> Hi Stephen
>
> > ... can set vacuum options on a table level which autovacuum should respect,
> > such as vacuum_index_cleanup and vacuum_truncate.  For skip locked,
> > autovacuum already will automatically release it's attempt to acquire a
> > lock if someone backs up behind it for too long.
>
> This is good information, I wasn't aware that autovacuum respected those 
> settings.
> In that case I'd like to focus specifically on the verbose aspect.
>
> My first thought was a new boolean configuration called "autovacuum_verbose".
> I'd want it to behave similarly to autovacuum_vacuum_cost_limit in that it 
> can be
> set globally or on a per-table basis.

I agree to have autovacuum log more information, especially index
vacuums. Currently, the log related to index vacuum is only the number
of index scans. I think it would be helpful if the log has more
details about each index vacuum.

But I'm not sure that neither always logging that nor having set the
parameter per-table basis is a good idea. In the former case, it could
be log spam for example in the case of anti-wraparound vacuums that
vacuums on all tables (and their indexes) in the database. If we set
it per-table basis, it’s useful when the user already knows which
tables are likely to take a long time for autovacuum but won’t work
when the users want to check the autovacuum details for tables that
autovacuum could take a long time for.

Given that we already have log_autovacuum_min_duration, I think this
verbose logging should work together with that. I’d prefer to enable
the verbose logging by default for the same reason Stephen mentioned.
Or maybe we can have a parameter to control verbosity, say
log_autovaucum_verbosity.

Regarding when to log, we can have autovacuum emit index vacuum log
after each lazy_vacuum/cleanup_index() end like VACUUM VERBOSE does,
but I’m not sure how it could work together with
log_autovacuum_min_duration. So one idea could be to have autovacuum
emit a log for each index vacuum statistics along with the current
autovacuum logs at the end of lazy vacuum if the execution time
exceeds log_autovacuum_min_duration. A downside would be one
autovacuum log could be very long if the table has many indexes, and
we still don’t know how much time taken for each index vacuum. But you
can see if a specific index has bloated more than usual. And for the
latter part, we would be able to add the statistics of execution time
for each vacuum phase to the log as a further improvement.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: a verbose option for autovacuum

2021-01-25 Thread Tommy Li
Hi Stephen

> ... can set vacuum options on a table level which autovacuum should
respect,
> such as vacuum_index_cleanup and vacuum_truncate.  For skip locked,
> autovacuum already will automatically release it's attempt to acquire a
> lock if someone backs up behind it for too long.

This is good information, I wasn't aware that autovacuum respected those
settings.
In that case I'd like to focus specifically on the verbose aspect.

My first thought was a new boolean configuration called
"autovacuum_verbose".
I'd want it to behave similarly to autovacuum_vacuum_cost_limit in that it
can be
set globally or on a per-table basis.

Thoughts?


Tommy


Re: a verbose option for autovacuum

2021-01-23 Thread Stephen Frost
Greetings,

On Fri, Jan 22, 2021 at 2:33 PM Tom Lane  wrote:
> Tommy Li  writes:
> > Additionally, is there any interest in exposing more vacuum options to be
> > run by autovac? Right now it runs FREEZE and ANALYZE, which leaves the
> > VERBOSE, SKIP_LOCKED, INDEX_CLEANUP, and TRUNCATE unconfigurable.
>
> To the extent that any of these make sense in autovacuum, I'd say they
> ought to be managed automatically.  I don't see a strong argument for
> users configuring this.  (See also nearby thread about allowing index
> AMs to control some of this.)

I agree that it'd be nice to figure out some way to have these managed
automatically, but it's probably useful to point out to Tommy that you
can set vacuum options on a table level which autovacuum should respect,
such as vacuum_index_cleanup and vacuum_truncate.  For skip locked,
autovacuum already will automatically release it's attempt to acquire a
lock if someone backs up behind it for too long.

Until we get something automatic though, I could see being able to set
TRUNCATE, in particular, to be off globally as useful when running a
system with replicas that might end up having queries which block WAL
replay.  If no one is stepping up to build some way to handle that
automatically then I'd be in support of making it something that an
administrator can configure (to avoid having to always remember to do it
for each table created...).

* Tommy Li (to...@coffeemeetsbagel.com) wrote:
> > Seems like that would very soon feel like log spam.  What would be the
> > use case for having this on?  If you want one-off results you could
> > run VACUUM manually.
> 
> In my case I have a fairly large, fairly frequently updated table with a
> large number of indexes where autovacuum's runtime can fluctuate between 12
> and 24 hours. If I want to investigate why autovacuum today is running many
> hours longer than it did last week, the only information I have to go off
> is from pg_stat_progress_vacuum, which reports only progress based on the
> number of blocks completed across _all_ indexes.
> 
> VACUUM VERBOSE's output is nice because it reports runtime per index, which
> would allow me to see if a specific index has bloated more than usual.
> 
> I also have autovacuum throttled much more aggressively than manual
> vacuums, so information from a one-off manual VACUUM isn't comparable.

I tend to agree that this is pretty useful information to have included
when trying to figure out what autovacuum's doing.

> As for log spam, I'm not sure it's a problem as long as the verbose option
> is disabled by default.

While this would be in-line with our existing dismal logging defaults,
I'd be happier with a whole bunch more logging enabled by default,
including this, so that we don't have to tell everyone who deploys PG to
go enable this very sensible logging.  Arguments of 'log spam' really
fall down when you're on the receiving end of practically empty PG logs
and trying to figure out what's going on.  Further, log analysis tools
exist to aggregate this data and bring it up to a higher level for
administrators.

Still, I'd much rather have the option, even if disabled by default,
than not have it at all.

Thanks,

Stephen



signature.asc
Description: PGP signature


Re: a verbose option for autovacuum

2021-01-22 Thread Tommy Li
Hey Tom

> Seems like that would very soon feel like log spam.  What would be the
> use case for having this on?  If you want one-off results you could
> run VACUUM manually.

In my case I have a fairly large, fairly frequently updated table with a
large number of indexes where autovacuum's runtime can fluctuate between 12
and 24 hours. If I want to investigate why autovacuum today is running many
hours longer than it did last week, the only information I have to go off
is from pg_stat_progress_vacuum, which reports only progress based on the
number of blocks completed across _all_ indexes.

VACUUM VERBOSE's output is nice because it reports runtime per index, which
would allow me to see if a specific index has bloated more than usual.

I also have autovacuum throttled much more aggressively than manual
vacuums, so information from a one-off manual VACUUM isn't comparable.

As for log spam, I'm not sure it's a problem as long as the verbose option
is disabled by default.


Tommy

On Fri, Jan 22, 2021 at 2:33 PM Tom Lane  wrote:

> Tommy Li  writes:
> > I was surprised to see that there's no way to get `VACUUM VERBOSE`-like
> > output from autovacuum. Is there any interest in enabling this?
>
> Seems like that would very soon feel like log spam.  What would be the
> use case for having this on?  If you want one-off results you could
> run VACUUM manually.
>
> > Additionally, is there any interest in exposing more vacuum options to be
> > run by autovac? Right now it runs FREEZE and ANALYZE, which leaves the
> > VERBOSE, SKIP_LOCKED, INDEX_CLEANUP, and TRUNCATE unconfigurable.
>
> To the extent that any of these make sense in autovacuum, I'd say they
> ought to be managed automatically.  I don't see a strong argument for
> users configuring this.  (See also nearby thread about allowing index
> AMs to control some of this.)
>
> regards, tom lane
>


Re: a verbose option for autovacuum

2021-01-22 Thread Tom Lane
Tommy Li  writes:
> I was surprised to see that there's no way to get `VACUUM VERBOSE`-like
> output from autovacuum. Is there any interest in enabling this?

Seems like that would very soon feel like log spam.  What would be the
use case for having this on?  If you want one-off results you could
run VACUUM manually.

> Additionally, is there any interest in exposing more vacuum options to be
> run by autovac? Right now it runs FREEZE and ANALYZE, which leaves the
> VERBOSE, SKIP_LOCKED, INDEX_CLEANUP, and TRUNCATE unconfigurable.

To the extent that any of these make sense in autovacuum, I'd say they
ought to be managed automatically.  I don't see a strong argument for
users configuring this.  (See also nearby thread about allowing index
AMs to control some of this.)

regards, tom lane




a verbose option for autovacuum

2021-01-22 Thread Tommy Li
Hi all

I was surprised to see that there's no way to get `VACUUM VERBOSE`-like
output from autovacuum. Is there any interest in enabling this?

Additionally, is there any interest in exposing more vacuum options to be
run by autovac? Right now it runs FREEZE and ANALYZE, which leaves the
VERBOSE, SKIP_LOCKED, INDEX_CLEANUP, and TRUNCATE unconfigurable. I skipped
FULL in that list because I'm assuming no one would ever want autovac to
run VACUUM FULL.


Tommy