Re: [PATCH] minor optimization for ineq_histogram_selectivity()
On 11/23/22 16:59, Tom Lane wrote: =?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Yhuel?= writes: On 10/24/22 17:26, Frédéric Yhuel wrote: When studying the weird planner issue reported here [1], I came up with the attached patch. It reduces the probability of calling get_actual_variable_range(). This isn't very useful anymore thanks to this patch: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9c6ad5eaa957bdc2132b900a96e0d2ec9264d39c I hadn't looked at this patch before, but now that I have, I'm inclined to reject it anyway. It just moves the problem around: now, instead of possibly doing an unnecessary index probe at the right end, you're possibly doing an unnecessary index probe at the left end. Indeed... it seemed to me that both versions would do an unnecessary index probe at the left end, but I wasn't careful enough :-/ It also looks quite weird compared to the normal coding of binary search. That's right. I wonder if there'd be something to be said for leaving the initial probe calculation alone and doing this: else if (probe == sslot.nvalues - 1 && sslot.nvalues > 2) + { + /* Don't probe the endpoint until we have to. */ + if (probe > lobound) + probe--; + else have_end = get_actual_variable_range(root, vardata, sslot.staop, collation, NULL, &sslot.values[probe]); + } On the whole though, it seems like a wart. Yeah... it's probably wiser not risking introducing a bug, only to save an index probe in rare cases (and only 100 reads, thanks to 9c6ad5ea). Thank you for having had a look at it. Best regards, Frédéric
Re: [PATCH] minor optimization for ineq_histogram_selectivity()
=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Yhuel?= writes: > On 10/24/22 17:26, Frédéric Yhuel wrote: >> When studying the weird planner issue reported here [1], I came up with >> the attached patch. It reduces the probability of calling >> get_actual_variable_range(). > This isn't very useful anymore thanks to this patch: > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9c6ad5eaa957bdc2132b900a96e0d2ec9264d39c I hadn't looked at this patch before, but now that I have, I'm inclined to reject it anyway. It just moves the problem around: now, instead of possibly doing an unnecessary index probe at the right end, you're possibly doing an unnecessary index probe at the left end. It also looks quite weird compared to the normal coding of binary search. I wonder if there'd be something to be said for leaving the initial probe calculation alone and doing this: else if (probe == sslot.nvalues - 1 && sslot.nvalues > 2) + { + /* Don't probe the endpoint until we have to. */ + if (probe > lobound) + probe--; + else have_end = get_actual_variable_range(root, vardata, sslot.staop, collation, NULL, &sslot.values[probe]); + } On the whole though, it seems like a wart. regards, tom lane
Re: [PATCH] minor optimization for ineq_histogram_selectivity()
On 10/24/22 17:26, Frédéric Yhuel wrote: Hello, When studying the weird planner issue reported here [1], I came up with the attached patch. It reduces the probability of calling get_actual_variable_range(). This isn't very useful anymore thanks to this patch: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9c6ad5eaa957bdc2132b900a96e0d2ec9264d39c Unless we want to save a hundred page reads in rare cases.
Re: [PATCH] minor optimization for ineq_histogram_selectivity()
On 10/24/22 17:26, Frédéric Yhuel wrote: Hello, When studying the weird planner issue reported here [1], I came up with the attached patch. It reduces the probability of calling get_actual_variable_range(). The patch applies to the master branch. How to test : CREATE TABLE foo (a bigint, b TEXT) WITH (autovacuum_enabled = off); INSERT INTO foo SELECT i%213, md5(i::text) from generate_series(1,100) i; VACUUM ANALYZE foo; SELECT * FROM pg_stats WHERE tablename = 'foo' AND attname='a'\gx CREATE INDEX ON foo(a); DELETE FROM foo WHERE a = 212; EXPLAIN (BUFFERS) SELECT count(a) FROM foo WHERE a > 208; With the above example, the variables "lobound", "hibound", and "probe" would vary like this : without patch : lobound hibound probe --- 0 101 50 51 101 76 77 101 89 90 101 95 96 101 98 99 101 100 99 100 99 99 99 with patch : lobound hibound probe --- 0 101 50 51 101 75 76 101 88 89 101 94 95 101 97 98 101 99 98 99 98 99 99 So we find the correct right end of the histogram bin (99) in both cases, but "probe" doesn't reach 100 in the latter one, and get_actual_variable_range() is never called. Now, if we'd run the query SELECT count(a) FROM foo WHERE a > 211 : without patch : lobound hibound probe --- 0 101 50 51 101 76 77 101 89 90 101 95 96 101 98 99 101 100 99 100 99 100 100 with patch : lobound hibound probe --- 0 101 50 51 101 75 76 101 88 89 101 94 95 101 97 98 101 99 100 101 100 100 100 Here, the correct right end of the histogram bin (100) is also found is both cases. I'm well aware that an example doesn't prove the correctness of an algorithm, though. Best regards, Frédéric