Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment

Andrei Lepikhov Sat, 05 Apr 2025 10:21:04 -0700

On 3/23/25 22:16, David Rowley wrote:

On Fri, 21 Mar 2025 at 22:02, Andrei Lepikhov <lepi...@gmail.com> wrote:
Can you explain why "Estimated Capacity" and "Estimated Distinct
Lookup Keys" don't answer that?  If there are more distinct lookup
keys than there is capacity to store them, then some will be evicted.

I wouldn't say these parameters don't answer. I try to debate how usablethey are. To be more practical, let me demonstrate the example:


EXPLAIN (COSTS OFF) SELECT * FROM t1,t2 WHERE t1.x=t2.x;

 Nested Loop  (cost=0.44..7312.65 rows=211330 width=33)
   ->  Seq Scan on t1  (cost=0.00..492.00 rows=30000 width=22)
   ->  Memoize  (cost=0.44..3.82 rows=7 width=11)
         Cache Key: t1.x
         Cache Mode: logical
         Estimated Capacity: 1001  Estimated Distinct Lookup Keys: 1001
         ->  Index Scan using t2_x_idx2 on t2  (cost=0.43..3.81 rows=7)
               Index Cond: (x = t1.x)

At first, I began to look for documentation because it was unclear whatboth new parameters specifically meant. Okay, there was no documentationbut trivial code, and after a short discovery, I realised the meaning.The first fact I see from this EXPLAIN is that Postgres estimates it hasenough memory to fit all the entries. Okay, but what does it give me? Imay just increase work_mem and provide the query with more memory ifneeded. My main concern is how frequently this cache is planned to beused. Doing some mental effort, I found the line "rows=30000."Calculating a bit more, I may suppose it means that we have a 95% chanceto reuse the cache. Okay, I got it.

Now, see:

1. I needed to discover the meaning of the new parameters because theywere different from the earlier "hit" and "miss."2. I need to find a common JOIN for keys of this node. Imagine a typical200-row EXPLAIN with 2-3 Memoization keys from different tables.

3. I need to make calculations

On the opposite, the hit ratio, written instead, already known byanalogy, already provides me with necessary cache efficacy data; no needto watch outside the node; it may be easily compared with the actualvalue. Am I wrong?


Both approaches provide the data, but each one is more usable?

I think we may ask more people, for example, Nikolay Samokhvalov, who,as I heard, works hard with explains.


Once again, I'm not necessarily objecting to hit and evict ratios
being shown, I just want to know they're actually useful enough to
show and don't just bloat the EXPLAIN output needlessly. So far your
arguments aren't convincing me that they are.

I'm -1 for this redundancy.

--
regards, Andrei Lepikhov

Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment

Reply via email to