On Tue, 2023-01-24 at 21:42 -0500, Robert Haas wrote: > I find it a bit premature to include this comment in the very first > email.... what if other people don't like the idea?
The trust_strxfrm GUC was pulled from the larger collation refactoring patch, which has been out for a while. The sort_abbreviated_keys GUC is new, and I posted these both in a new thread because they started to look independently useful. If someone doesn't like the idea, they are free to comment, like in every other case (though this patch doesn't seem very controversial to me?). I suppose the wording was off-putting, so I'll choose different words next time. > I would like to hear about the cases where abbreviated keys resulted > in a regression. I want to be clear that this is not a general criticism of the abbreviated keys optimization, nor a comprehensive analysis of its performance. I am highlighting this case because the existence of a single non- contrived case or regression suggests that we may want to explore further and tweak heuristics. That's quite natural when the heuristics are based on a complex dependency like a collation provider. The sort_abbreviated_keys GUC makes that kind of exploration and tweaking a lot easier. Built with meson on linux, gcc 11.3.0, opt -O3. Times are the middle of three runs, taken from the sort operator's "first returned tuple" time in EXPLAIN ANALYZE. Total runtime (as reported in EXPLAIN ANALYZE) is pretty much the same story, but I think there was slightly more noise in that number. $ perl text_generator.pl 10000000 10 > /tmp/strings.txt CREATE TABLE s (t TEXT); COPY s FROM '/tmp/strings.txt'; VACUUM FREEZE s; CHECKPOINT; SET work_mem='10GB'; SET max_parallel_workers = 0; SET max_parallel_workers_per_gather = 0; SET sort_abbreviated_keys = false; EXPLAIN ANALYZE SELECT t FROM s ORDER BY t COLLATE "en-US-x-icu"; -- 20875ms SET sort_abbreviated_keys = true; EXPLAIN ANALYZE SELECT t FROM s ORDER BY t COLLATE "en-US-x-icu"; -- 22931ms Regression for abbreviated keys optimization in this case: 9.8% > I'd also like to know whether there's a realistic possibility that > making this a run-time test could itself result in a regression. The sort_abbreviated_keys branch is happening after tuplesort_begin_common (which creates memory contexts, etc.) and before preparing the sort keys (which involves catalog lookups). The trust_strxfrm branch is happening in the type-specific sort support function, which needs to be looked up in the catalog before being called (using V1 calling convention). It doesn't look likely that a single branch in that path will have a perf impact. Do you have a more specific concern? -- Jeff Davis PostgreSQL Contributor Team - AWS
text_generator.pl
Description: Perl program