On 3/23/25 22:16, David Rowley wrote:
On Fri, 21 Mar 2025 at 22:02, Andrei Lepikhov <lepi...@gmail.com> wrote:
Can you explain why "Estimated Capacity" and "Estimated Distinct
Lookup Keys" don't answer that? If there are more distinct lookup
keys than there is capacity to store them, then some will be evicted.
I wouldn't say these parameters don't answer. I try to debate how usable
they are. To be more practical, let me demonstrate the example:
EXPLAIN (COSTS OFF) SELECT * FROM t1,t2 WHERE t1.x=t2.x;
Nested Loop (cost=0.44..7312.65 rows=211330 width=33)
-> Seq Scan on t1 (cost=0.00..492.00 rows=30000 width=22)
-> Memoize (cost=0.44..3.82 rows=7 width=11)
Cache Key: t1.x
Cache Mode: logical
Estimated Capacity: 1001 Estimated Distinct Lookup Keys: 1001
-> Index Scan using t2_x_idx2 on t2 (cost=0.43..3.81 rows=7)
Index Cond: (x = t1.x)
At first, I began to look for documentation because it was unclear what
both new parameters specifically meant. Okay, there was no documentation
but trivial code, and after a short discovery, I realised the meaning.
The first fact I see from this EXPLAIN is that Postgres estimates it has
enough memory to fit all the entries. Okay, but what does it give me? I
may just increase work_mem and provide the query with more memory if
needed. My main concern is how frequently this cache is planned to be
used. Doing some mental effort, I found the line "rows=30000."
Calculating a bit more, I may suppose it means that we have a 95% chance
to reuse the cache. Okay, I got it.
Now, see:
1. I needed to discover the meaning of the new parameters because they
were different from the earlier "hit" and "miss."
2. I need to find a common JOIN for keys of this node. Imagine a typical
200-row EXPLAIN with 2-3 Memoization keys from different tables.
3. I need to make calculations
On the opposite, the hit ratio, written instead, already known by
analogy, already provides me with necessary cache efficacy data; no need
to watch outside the node; it may be easily compared with the actual
value. Am I wrong?
Both approaches provide the data, but each one is more usable?
I think we may ask more people, for example, Nikolay Samokhvalov, who,
as I heard, works hard with explains.
Once again, I'm not necessarily objecting to hit and evict ratios
being shown, I just want to know they're actually useful enough to
show and don't just bloat the EXPLAIN output needlessly. So far your
arguments aren't convincing me that they are.
I'm -1 for this redundancy.
--
regards, Andrei Lepikhov