On Tue, Feb 07, 2012 at 09:38:39PM -0500, Robert Haas wrote: > Second, there's a concern about binary bloat: duplicating lots of code > with different comparators inlined generates, well, a lot of machine > code. Of course, an 0.8% increase in the size of the resulting binary > is very unlikely to produce any measurable performance regression on > its own, but that's not really the issue. The issue is rather that we > could easily go nuts applying this technique in lots of different > places - or even just in one place to lots of different types. Let's > say that we find that even for types with very complex comparators, we > can get a 5% speedup on quick-sorting speed using this technique. > Should we, then, apply the technique to every type we have? Perhaps > the binary would grow by, I don't know, 10%. Maybe that's still not > enough to start hurting performance (or making packagers complain), > but surely it's getting there, and what happens when we discover that > similar speedups are possible using the same trick in five other > scenarios? I think it's a bit like going out to dinner and spending > $50. It's nicer than eating at home, and many people can afford to do > it occasionally, but if you do it every night, it gets expensive (and > possibly fattening). > > So we need some principled way of deciding how much inlining is > reasonable, because I am 100% certain this is not going to be the last > time someone discovers that a massive exercise in inlining can yield a > nifty performance benefit in some case or another: index builds and > COPY have already been mentioned on this thread, and I expect that > people will find other cases as well. I'm not really sure what the > "budget" is - i.e. how much binary bloat we can afford to add - or how > many cases there are that can benefit, but the first isn't infinite > and the second is more than the first.
Having such a metric would resolve this discussion, but formulating it will be all but impossible. We don't know how much time PostgreSQL installations now spend sorting or how much additional time they would spend on cache misses, TLB misses, page faults, etc. That data won't show up on this thread. You posed the straw man of a sin(numeric) optimization requiring a 40GB lookup table. I would not feel bad about rejecting that, because it can live quite comfortably as an external module. Indeed, I think the best thing we can do to constrain long-term bloat in the "postgres" executable is to improve pluggability. Let a wider range of features live outside the postmaster binary. For example, if we had the infrastructure to let hash, GIN, GiST and SP-GiST index access methods live in their own DSOs (like PL/pgSQL does today), I would support doing that. Extensibility is a hallmark of PostgreSQL. It's a bad day when we reject a minority-interest feature or optimization that has no other way to exist. Better pluggability can't ease the decision process for Peter's patch, because its specializations essentially must live in the "postgres" executable to live at all. Nominally, we could let external modules inject sort specializations for core types, and Peter could publish an all_fast_sorts extension. That would be useless punting: if we can't make a principled decision on whether these accelerated sorts justify 85 KiB of binary, how will a DBA who discovers the module make that decision? This patch has gotten more than its fair share of attention for bloat, and I think that's mostly because there's a dial-a-bloat-level knob sticking out and begging to be frobbed. On my system, fastpath_sort_2012_01_19.patch adds 85 KiB to the postgres binary. A recent commit added 57 KiB and bloat never came up in discussions thereof. I appreciate your concern over a slippery slope of inlining proposals, but I also don't wish to see the day when every patch needs to declare and justify its binary byte count. Unless, of course, we discover that elusive metric for evaluating it fairly. If we'd like to start taking interest in binary bloat, how about having the buildfarm log capture an "ls -l" on the bin directory? We'll then have data to mine if we ever wish to really take action. All that being said, I'd want to see a 15-30% (depending on how contrived) improvement to a microbenchmark or a 5% improvement to a generic benchmark (pgbench, DBT-<N>) before adopting an optimization of this complexity. Based on your measurements, per-type inlining gives us a 7% microbenchmark improvement. I'd therefore excise the per-type inlining. (For the record, given a 20% improvement to the same benchmark, I'd vote yea on the submission and perhaps even suggest int2 and uuid support.) nm  http://archives.postgresql.org/message-id/ca+tgmozo1xsz+yiqz2mrokmcmqtb+jir0lz43cne6de7--q...@mail.gmail.com  http://archives.postgresql.org/message-id/ca+tgmobfz8sqtgtaaryerywuzfodgetprbuygbhsplr4+li...@mail.gmail.com -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers