On Thu, Jul 13, 2023 at 3:12 PM Richard Guo <guofengli...@gmail.com> wrote:
> So I'm wondering if it'd be better that we move all this logic of > computing additional lateral references within PHVs to get_memoize_path, > where we can examine only PHVs that are evaluated at innerrel. And > considering that these lateral refs are only used by Memoize, it seems > more sensible to compute them there. But I'm a little worried that > doing this would make get_memoize_path too expensive. > > Please see v4 patch for this change. > I'd like to add that not checking PHVs for lateral references can lead to performance regressions with Memoize node. For instance, -- by default, enable_memoize is on regression=# explain (analyze, costs off) select * from tenk1 t1 left join lateral (select *, t1.four as x from tenk1 t2) s on t1.two = s.two; QUERY PLAN ------------------------------------------------------------------------------------- Nested Loop Left Join (actual time=0.028..105245.547 rows=50000000 loops=1) -> Seq Scan on tenk1 t1 (actual time=0.011..3.760 rows=10000 loops=1) -> Memoize (actual time=0.010..8.051 rows=5000 loops=10000) Cache Key: t1.two Cache Mode: logical Hits: 0 Misses: 10000 Evictions: 9999 Overflows: 0 Memory Usage: 1368kB -> Seq Scan on tenk1 t2 (actual time=0.004..3.594 rows=5000 loops=10000) Filter: (t1.two = two) Rows Removed by Filter: 5000 Planning Time: 1.943 ms Execution Time: 106806.043 ms (11 rows) -- turn enable_memoize off regression=# set enable_memoize to off; SET regression=# explain (analyze, costs off) select * from tenk1 t1 left join lateral (select *, t1.four as x from tenk1 t2) s on t1.two = s.two; QUERY PLAN ----------------------------------------------------------------------------- Nested Loop Left Join (actual time=0.048..44831.707 rows=50000000 loops=1) -> Seq Scan on tenk1 t1 (actual time=0.026..2.340 rows=10000 loops=1) -> Seq Scan on tenk1 t2 (actual time=0.002..3.282 rows=5000 loops=10000) Filter: (t1.two = two) Rows Removed by Filter: 5000 Planning Time: 0.641 ms Execution Time: 46472.609 ms (7 rows) As we can see, when Memoize enabled (which is the default setting), the execution time increases by around 129.83%, indicating a significant performance regression. This is caused by that we fail to realize that 't1.four', which is from the PHV, should be included in the cache keys. And that makes us have to purge the entire cache every time we get a new outer tuple. This is also implied by the abnormal Memoize runtime stats: Hits: 0 Misses: 10000 Evictions: 9999 Overflows: 0 This regression can be fixed by the patch here. After applying the v4 patch, 't1.four' is added into the cache keys, and the same query runs much faster. regression=# explain (analyze, costs off) select * from tenk1 t1 left join lateral (select *, t1.four as x from tenk1 t2) s on t1.two = s.two; QUERY PLAN --------------------------------------------------------------------------------- Nested Loop Left Join (actual time=0.060..20446.004 rows=50000000 loops=1) -> Seq Scan on tenk1 t1 (actual time=0.027..5.845 rows=10000 loops=1) -> Memoize (actual time=0.001..0.209 rows=5000 loops=10000) Cache Key: t1.two, t1.four Cache Mode: binary Hits: 9996 Misses: 4 Evictions: 0 Overflows: 0 Memory Usage: 5470kB -> Seq Scan on tenk1 t2 (actual time=0.005..3.659 rows=5000 loops=4) Filter: (t1.two = two) Rows Removed by Filter: 5000 Planning Time: 0.579 ms Execution Time: 21756.598 ms (11 rows) Comparing the first plan and the third plan, this query runs ~5 times faster. Thanks Richard