On 3/23/25 22:16, David Rowley wrote:
On Fri, 21 Mar 2025 at 22:02, Andrei Lepikhov <lepi...@gmail.com> wrote:
Can you explain why "Estimated Capacity" and "Estimated Distinct
Lookup Keys" don't answer that?  If there are more distinct lookup
keys than there is capacity to store them, then some will be evicted.
I wouldn't say these parameters don't answer. I try to debate how usable they are. To be more practical, let me demonstrate the example:

EXPLAIN (COSTS OFF) SELECT * FROM t1,t2 WHERE t1.x=t2.x;

 Nested Loop  (cost=0.44..7312.65 rows=211330 width=33)
   ->  Seq Scan on t1  (cost=0.00..492.00 rows=30000 width=22)
   ->  Memoize  (cost=0.44..3.82 rows=7 width=11)
         Cache Key: t1.x
         Cache Mode: logical
         Estimated Capacity: 1001  Estimated Distinct Lookup Keys: 1001
         ->  Index Scan using t2_x_idx2 on t2  (cost=0.43..3.81 rows=7)
               Index Cond: (x = t1.x)

At first, I began to look for documentation because it was unclear what both new parameters specifically meant. Okay, there was no documentation but trivial code, and after a short discovery, I realised the meaning. The first fact I see from this EXPLAIN is that Postgres estimates it has enough memory to fit all the entries. Okay, but what does it give me? I may just increase work_mem and provide the query with more memory if needed. My main concern is how frequently this cache is planned to be used. Doing some mental effort, I found the line "rows=30000." Calculating a bit more, I may suppose it means that we have a 95% chance to reuse the cache. Okay, I got it.
Now, see:
1. I needed to discover the meaning of the new parameters because they were different from the earlier "hit" and "miss." 2. I need to find a common JOIN for keys of this node. Imagine a typical 200-row EXPLAIN with 2-3 Memoization keys from different tables.
3. I need to make calculations

On the opposite, the hit ratio, written instead, already known by analogy, already provides me with necessary cache efficacy data; no need to watch outside the node; it may be easily compared with the actual value. Am I wrong?

Both approaches provide the data, but each one is more usable?
I think we may ask more people, for example, Nikolay Samokhvalov, who, as I heard, works hard with explains.


Once again, I'm not necessarily objecting to hit and evict ratios
being shown, I just want to know they're actually useful enough to
show and don't just bloat the EXPLAIN output needlessly. So far your
arguments aren't convincing me that they are.
I'm -1 for this redundancy.

--
regards, Andrei Lepikhov


Reply via email to