Hi Vladlen, On Fri, Jan 10, 2025 at 11:49 PM Vladlen Popolitov <v.popoli...@postgrespro.ru> wrote: > Amit Langote писал(а) 2025-01-10 18:22: > > On Fri, Jan 10, 2025 at 7:36 PM David Rowley <dgrowle...@gmail.com> > > wrote: > >> On Fri, 10 Jan 2025 at 22:53, Vladlen Popolitov > >> <v.popoli...@postgrespro.ru> wrote: > >> > In case of query > >> > select count(*) from test_table where a_1 = 1000000; > >> > I would expect increase of query time due to additional if...else . It > >> > is not clear > >> > what code was eliminated to decrease query time. > >> > >> Are you talking about the code added to ExecInitSeqScan() to determine > >> which node function to call? If so, that's only called during executor > >> startup. The idea here is to reduce the branching during execution by > >> calling one of those special functions which has a more specialised > >> version of the ExecScan code for the particular purpose it's going to > >> be used for. > > > > Looks like I hadn't mentioned this key aspect of the patch in the > > commit message, so did that in the attached. > > > > Vladlen, does what David wrote and the new commit message answer your > > question(s)? > > Hi Amit, > > Yes, David clarified the idea, but it is still hard to believe in 5% of > improvements. > The query > select count(*) from test_table where a_1 = 1000000; > has both qual and projection, and ExecScanExtended() will be generated > similar to ExecScan() (the same not NULL values to check in if()).
Yes, I've noticed that if the plan for the above query contains a projection, like when it contains a Gather node, the inlined version of ExecScanExtended() will look more or less the same as the full ExecScan(). There won't be noticeable speedup with the patch in that case. However, I ran the benchmark tests with Gather disabled such that I get a plan without projection, which uses an inlined version that doesn't have branches related to projection. I illustrate my example below. > Do you have some scripts to reproduce your benchmark? Use these steps. Set max_parallel_workers_per_gather to 0, shared_buffers to 512MB. Compile the patch using --buildtype=release. create table foo (a int, b int, c int, d int, e int); insert into foo select i, i, i, i, i from generate_series(1, 10000000) i; -- pg_prewarm: to ensure that no buffers lead to I/O to reduce noise select pg_size_pretty(pg_prewarm('foo')); select count(*) from foo where a = 10000000; Times I get on v17, master, and with the patch for the above query are as follows: v17: 173, 173, 174 ms master: 173, 175, 169 ms Patched: 160, 161, 158 ms Please let me know if you're still unable to reproduce such numbers with the steps I described. -- Thanks, Amit Langote