Re: [PATCH] Add sampling statistics to autoanalyze log output

Chao Li Mon, 12 Jan 2026 01:22:53 -0800

> On Jan 11, 2026, at 02:25, Tatsuya Kawata <[email protected]> wrote:
> 
> Hi Fujii-san, Sami-san, Chao-san,
> 
> Thank you for the valuable feedback!
> I addressed these comments and  updated the patch.
> 
> > I noticed an issue with the sampling information shown when
> > analyzing foreign tables. Based on my testing, the reported values seem
> > incorrect. Is this expected behavior?
> > 
> > 
> > 
> > For example, running ANALYZE VERBOSE on a regular table and its foreign
> > table produces different sampling output:
> > 
> > 
> > 
> >     CREATE EXTENSION postgres_fdw ;
> >     CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw ;
> >     CREATE USER MAPPING FOR public SERVER loopback ;
> >     CREATE TABLE t (i int) ;
> >     CREATE FOREIGN TABLE ft (i int) SERVER loopback OPTIONS (table_name 
> > 't') ;
> >     INSERT INTO ft SELECT n FROM generate_series(1, 100000) n ;
> > 
> > 
> > 
> >     The sampling output by ANALYZE VERBOSE t:
> >     sampling: scanned 443 of 443 pages, containing 100000 live rows
> > and 0 dead rows; 30000 rows in sample, 100000 estimated total rows
> > 
> > 
> > 
> >     The sampling output by ANALYZE VERBOSE ft:
> >     sampling: scanned 1000 of 1000 pages, containing 30261 live rows
> > and 0 dead rows; 30000 rows in sample, 100000 estimated total rows
> > 
> > 
> > 
> > In particular, the reported number of scanned pages and live rows for
> > the foreign table look wrong.
> 
> > 2/ I think this patch should just focus on unifying the existing logging 
> > only.
> > 
> > 
> > 
> > The " estimated total rows" for a foreign table should be a separate
> > thread/patch, IMO.
> 
> >  echo Fujii-san’s test result, I think that is because:
> > 
> > 
> > 
> > ```
> > + *scannedpages = relpages; /* use estimate for foreign tables */
> > ```
> > 
> > 
> > 
> > You are using estimated value for foreign tables. I also have a concern 
> > about assuming 100 rows per page. IMO, if something we don’t know, we 
> > should n> ot pretend that we know. Because from the code comment, we read 
> > that that’s assumed, but from an end user’s point of view, they may 
> > consider the numb> er is real.
> 
> Thank you for pointing this out. I agree that the displayed values were 
> inappropriate. I have removed the foreign table sampling estimation logic 
> from this patch. To avoid misleading, when analyzing foreign tables, the log 
> now omits the page and row details:
> 
> ```
> scanned 0 of 0 pages, containing 0 live rows and 0 dead rows
> ```
> 
> 
> > 1/ Wouldn’t it be better to combine the new sampling output fields into a
> > single struct? This is more idiomatic, similar to BufferUsage or WalUsage,
> > and it would also make the API easier to extend in the future. Right now,
> > changing the function signature breaks the AcquireSampleRowsFunc ABI,
> > which is acceptable for a major release, but using a struct would help
> > avoid future breaks if we ever add more sampling data in later releases.
> > 
> > 
> > 
> > ```
> > @@ -535,11 +546,15 @@ do_analyze_rel(Relation onerel, const VacuumParams 
> > params,
> >         if (inh)
> >                 numrows = acquire_inherited_sample_rows(onerel, elevel,
> > 
> > 
> > 
> >                          rows, targrows,
> > -
> >                          &totalrows, &totaldeadrows);
> > +
> >                          &totalrows, &totaldeadrows,
> > +
> >                          &totalpages, &scannedpages,
> > +
> >                          &sampleliverows, &sampledeadrows);
> >         else
> >                 numrows = (*acquirefunc) (onerel, elevel,
> > 
> > 
> > 
> > rows, targrows,
> > -
> > &totalrows, &totaldeadrows);
> > +
> > &totalrows, &totaldeadrows,
> > +
> > &totalpages, &scannedpages,
> > +
> > &sampleliverows, &sampledeadrows);
> > 
> > 
> > 
> > ```
> 
> > The other nit comment is, as you add 4 parameters to acquire_sample_rows(), 
> > they are all output parameters, I think you need to update the header 
> > comment to describe them.
> 
> I agree with consolidating the sampling statistics into a single struct, and 
> I have implemented this change. I considered whether to include the existing 
> totalrows and totaldeadrows in the new struct, but decided against it for 
> now. Since totalrows in particular is used extensively throughout the 
> codebase, including it would expand the scope beyond the original goal of 
> improving log output. This can be addressed in a future patch if needed.
> 
> 
> > Also, the patch moves the sampling information to appear after
> > buffer usage, WAL usage, etc. In my opinion, it's more intuitive to
> > report analyze activity before buffer and WAL usage, as ANALYZE VERBOSE
> > currently does. VACUUM VERBOSE follows the same pattern,
> > reporting activity details before buffer and WAL usage.
> 
> Fixed as suggested. The sampling information now appears before buffer and 
> WAL usage, consistent with ANALYZE VERBOSE and VACUUM VERBOSE.
> 
> 
> > 3/ You should also run pgindent.
> 
> I ran pgindent, but the results differed from the existing coding style in 
> the same files:
>  - Three tabs were inserted before the struct name at the end of typedef 
> (existing style uses one space)
>  - A space was inserted between * and the variable name (existing style has 
> no space)
> So, I have reverted the pgindent changes. If there is a specific 
> configuration needed, please let me know.
> 
> 
> Regarding the log output, I would like to add some context. Originally, when 
> running ANALYZE VERBOSE on a parent table with inheritance, the parent 
> table's sampling information was displayed twice - once for the parent alone 
> and once at the beginning of the inheritance tree output. This may appear to 
> be unintentional:
> 
> ```
> ANALYZE VERBOSE parent_table;
> INFO:  analyzing "public.parent_table"
> INFO:  "parent_table": scanned 4 of 4 pages, containing 500 live rows and 0 
> dead rows; 500 rows in sample, 500 estimated total rows
> ...
> INFO:  analyzing "public.parent_table" inheritance tree
> INFO:  "parent_table": scanned 4 of 4 pages, containing 500 live rows and 0 
> dead rows; 500 rows in sample, 500 estimated total rows
> INFO:  "child_table1": scanned 16 of 16 pages, containing 2500 live rows and 
> 0 dead rows; 2500 rows in sample, 2500 estimated total rows
> INFO:  "child_table2": scanned 13 of 13 pages, containing 2000 live rows and 
> 0 dead rows; 2000 rows in sample, 2000 estimated total rows
> INFO:  "child_table3": scanned 10 of 10 pages, containing 1500 live rows and 
> 0 dead rows; 1500 rows in sample, 1500 estimated total rows
> ...
> ```
> 
> In this patch, I have added "inheritance tree" to the autoanalyze log message 
> as well, and the inheritance tree statistics now show the aggregated totals 
> from all child tables:
> ```
> 2026-01-11 02:05:34.620 JST [864411] LOG:  automatic analyze of table 
> "postgres.public.parent_table"
> sampling: scanned 6 of 6 pages, containing 1000 live rows and 0 dead rows; 
> 1000 rows in sample, 1000 estimated total rows
> avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
> buffer usage: 215 hits, 0 reads, 0 dirtied
> WAL usage: 5 records, 0 full page images, 3839 bytes, 0 full page image 
> bytes, 0 buffers full
> system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
> 2026-01-11 02:05:34.649 JST [864411] LOG:  automatic analyze of table 
> "postgres.public.parent_table" inheritance tree
> sampling: scanned 84 of 84 pages, containing 14500 live rows and 0 dead rows; 
> 14500 rows in sample, 14500 estimated total rows
> avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
> buffer usage: 169 hits, 0 reads, 0 dirtied
> WAL usage: 6 records, 0 full page images, 3213 bytes, 0 full page image 
> bytes, 0 buffers full
> system usage: CPU: user: 0.01 s, system: 0.01 s, elapsed: 0.02 s
> 2026-01-11 02:05:34.659 JST [864411] LOG:  automatic analyze of table 
> "postgres.public.child_table1"
> sampling: scanned 29 of 29 pages, containing 5000 live rows and 0 dead rows; 
> 5000 rows in sample, 5000 estimated total rows
> avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
> buffer usage: 64 hits, 0 reads, 0 dirtied
> WAL usage: 4 records, 0 full page images, 3211 bytes, 0 full page image 
> bytes, 0 buffers full
> ```
> 
> I have attached the updated patch v3.
> 
> Regards,
> Tatsuya Kawata
> 
> <v3-0001-Add-sampling-statistics-to-autoanalyze-log-output.patch>

Thanks for updating the patch. Just have a few comments on v3:

1
```
 static int
 acquire_inherited_sample_rows(Relation onerel, int elevel,
                                                          HeapTuple *rows, int 
targrows,
-                                                         double *totalrows, 
double *totaldeadrows)
+                                                         double *totalrows, 
double *totaldeadrows,
+                                                         SamplingStats 
*sampling_stats)
 {
        List       *tableOIDs;
        Relation   *rels;
@@ -1408,10 +1426,12 @@ acquire_inherited_sample_rows(Relation onerel, int 
elevel,
                                i;
        ListCell   *lc;
        bool            has_child;
+       SamplingStats child_sampling_stats;
```

child_sampling_stats is only used inside if (childblocks > 0), so it would be 
better to just define it there.

2
```
+typedef struct SamplingStats
+{
+       BlockNumber totalpages;         /* total pages in relation */
+       BlockNumber scannedpages;       /* pages actually scanned */
+       double          liverows;               /* live rows found during 
sampling */
+       double          deadrows;               /* dead rows found during 
sampling */
+} SamplingStats;
```

SamplingStats is a very generic name, maybe rename to AnalyzeSamplingStats or 
something else.

BTW, from reviewing this patch, I found an issue and proposed a patch [1], 
would you please help review that?

[1] 
https://www.postgresql.org/message-id/CAEoWx2mAwXnxVWTP%3DdgwA98dJ7JxzxM8qToaxAGuByHN1-%3DoNQ%40mail.gmail.com

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Re: [PATCH] Add sampling statistics to autoanalyze log output

Reply via email to