Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment

David Rowley Thu, 20 Mar 2025 04:15:12 -0700

On Thu, 20 Mar 2025 at 21:48, Ilia Evdokimov
<[email protected]> wrote:
>          ->  Memoize  (cost=0.30..0.41 rows=1 width=4)
>                Cache Key: t2.thousand
>                Cache Mode: logical
>                Cache Estimated Entries: 655
>                Cache Estimated NDistinct: 721
>                ->  Index Only Scan using tenk1_unique1 on tenk1 t1  
> (cost=0.29..0.40 rows=1 width=4)
>                      Index Cond: (unique1 = t2.thousand)
> (11 rows)
>
> Additionally, since this information would only be shown in EXPLAIN when 
> costs are enabled, it should not cause any performance regression in normal 
> execution. However, reviewers should be especially careful when verifying 
> test outputs, as this change could affect plan details in regression tests.
>
> Any thoughts?


> + ExplainIndentText(es);
> + appendStringInfo(es->str, "Cache Estimated Entries: %d\n", ((Memoize *) 
> plan)->est_entries);
> + ExplainIndentText(es);
> + appendStringInfo(es->str, "Cache Estimated NDistinct: %0.0f\n", ((Memoize 
> *) plan)->ndistinct);

est_entries is a uint32, so %u is the correct format character for that type.

I don't think you need to prefix all these properties with "Cache"
just because the other two properties have that prefix. I also don't
think the names you've chosen really reflect the meaning. How about
something like: "Estimated Distinct Lookup Keys: 721  Estimated
Capacity: 655", in that order. I think maybe having that as one line
for format=text is better than 2 lines. The EXPLAIN output is already
often taking up more lines in v18 than in v17, would be good to not
make that even worse unnecessarily.

I see the existing code there could use ExplainPropertyText rather
than have a special case for text and non-text formats. That's likely
my fault. If we're touching this code, then we should probably tidy
that up. Do you want to create a precursor fixup patch for that?

+ double ndistinct; /* Estimated number of distinct memoization keys,
+ * used for cache size evaluation. Kept for EXPLAIN */

Maybe this field in MemoizePath needs a better name. How about
"est_unique_keys"? and also do the rename in struct Memoize.

I'm also slightly concerned about making struct Memoize bigger. I had
issues with a performance regression [1] for 908a96861 when increasing
the WindowAgg struct size last year and the only way I found to make
it go away was to shuffle the fields around so that the struct size
didn't increase. I think we'll need to see a benchmark of a query that
hits Memoize quite hard with a small cache size to see if the
performance decreases as a result of adding the ndistinct field. It's
unfortunate that we'll not have the luxury of squeezing this double
into padding if we do see a slowdown.

David

[1] 
https://postgr.es/m/flat/CAHoyFK9n-QCXKTUWT_xxtXninSMEv%2BgbJN66-y6prM3f4WkEHw%40mail.gmail.com

Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment

Reply via email to