Re: [HACKERS] Progress on fast path sorting, btree index creation time

Noah Misch Thu, 09 Feb 2012 04:25:31 -0800

On Tue, Feb 07, 2012 at 09:38:39PM -0500, Robert Haas wrote:
> Second, there's a concern about binary bloat: duplicating lots of code
> with different comparators inlined generates, well, a lot of machine
> code.  Of course, an 0.8% increase in the size of the resulting binary
> is very unlikely to produce any measurable performance regression on
> its own, but that's not really the issue.  The issue is rather that we
> could easily go nuts applying this technique in lots of different
> places - or even just in one place to lots of different types.  Let's
> say that we find that even for types with very complex comparators, we
> can get a 5% speedup on quick-sorting speed using this technique.
> Should we, then, apply the technique to every type we have?  Perhaps
> the binary would grow by, I don't know, 10%.  Maybe that's still not
> enough to start hurting performance (or making packagers complain),
> but surely it's getting there, and what happens when we discover that
> similar speedups are possible using the same trick in five other
> scenarios?  I think it's a bit like going out to dinner and spending
> $50.  It's nicer than eating at home, and many people can afford to do
> it occasionally, but if you do it every night, it gets expensive (and
> possibly fattening).
> 
> So we need some principled way of deciding how much inlining is
> reasonable, because I am 100% certain this is not going to be the last
> time someone discovers that a massive exercise in inlining can yield a
> nifty performance benefit in some case or another: index builds and
> COPY have already been mentioned on this thread, and I expect that
> people will find other cases as well.  I'm not really sure what the
> "budget" is - i.e. how much binary bloat we can afford to add - or how
> many cases there are that can benefit, but the first isn't infinite
> and the second is more than the first.

Having such a metric would resolve this discussion, but formulating it will be
all but impossible. We don't know how much time PostgreSQL installations now
spend sorting or how much additional time they would spend on cache misses,
TLB misses, page faults, etc. That data won't show up on this thread.

You posed[1] the straw man of a sin(numeric) optimization requiring a 40GB
lookup table. I would not feel bad about rejecting that, because it can live
quite comfortably as an external module. Indeed, I think the best thing we
can do to constrain long-term bloat in the "postgres" executable is to improve
pluggability. Let a wider range of features live outside the postmaster
binary. For example, if we had the infrastructure to let hash, GIN, GiST and
SP-GiST index access methods live in their own DSOs (like PL/pgSQL does
today), I would support doing that. Extensibility is a hallmark of
PostgreSQL. It's a bad day when we reject a minority-interest feature or
optimization that has no other way to exist.

Better pluggability can't ease the decision process for Peter's patch, because
its specializations essentially must live in the "postgres" executable to live
at all. Nominally, we could let external modules inject sort specializations
for core types, and Peter could publish an all_fast_sorts extension. That
would be useless punting: if we can't make a principled decision on whether
these accelerated sorts justify 85 KiB of binary, how will a DBA who discovers
the module make that decision?

This patch has gotten more than its fair share of attention for bloat, and I
think that's mostly because there's a dial-a-bloat-level knob sticking out and
begging to be frobbed. On my system, fastpath_sort_2012_01_19.patch adds 85
KiB to the postgres binary. A recent commit added 57 KiB and bloat never came
up in discussions thereof. I appreciate your concern over a slippery slope of
inlining proposals, but I also don't wish to see the day when every patch
needs to declare and justify its binary byte count. Unless, of course, we
discover that elusive metric for evaluating it fairly.

If we'd like to start taking interest in binary bloat, how about having the
buildfarm log capture an "ls -l" on the bin directory? We'll then have data
to mine if we ever wish to really take action.

All that being said, I'd want to see a 15-30% (depending on how contrived)
improvement to a microbenchmark or a 5% improvement to a generic benchmark
(pgbench, DBT-<N>) before adopting an optimization of this complexity. Based
on your measurements[2], per-type inlining gives us a 7% microbenchmark
improvement. I'd therefore excise the per-type inlining. (For the record,
given a 20% improvement to the same benchmark, I'd vote yea on the submission
and perhaps even suggest int2 and uuid support.)

[1]
http://archives.postgresql.org/message-id/ca+tgmozo1xsz+yiqz2mrokmcmqtb+jir0lz43cne6de7--q...@mail.gmail.com
[2]
http://archives.postgresql.org/message-id/ca+tgmobfz8sqtgtaaryerywuzfodgetprbuygbhsplr4+li...@mail.gmail.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Progress on fast path sorting, btree index creation time

Reply via email to