Re: [HACKERS] Progress on fast path sorting, btree index creation time

Robert Haas Thu, 09 Feb 2012 05:51:05 -0800

On Thu, Feb 9, 2012 at 7:24 AM, Noah Misch <n...@leadboat.com> wrote:
> On Tue, Feb 07, 2012 at 09:38:39PM -0500, Robert Haas wrote:
>> Second, there's a concern about binary bloat: duplicating lots of code
>> with different comparators inlined generates, well, a lot of machine
>> code.  Of course, an 0.8% increase in the size of the resulting binary
>> is very unlikely to produce any measurable performance regression on
>> its own, but that's not really the issue.  The issue is rather that we
>> could easily go nuts applying this technique in lots of different
>> places - or even just in one place to lots of different types.  Let's
>> say that we find that even for types with very complex comparators, we
>> can get a 5% speedup on quick-sorting speed using this technique.
>> Should we, then, apply the technique to every type we have?  Perhaps
>> the binary would grow by, I don't know, 10%.  Maybe that's still not
>> enough to start hurting performance (or making packagers complain),
>> but surely it's getting there, and what happens when we discover that
>> similar speedups are possible using the same trick in five other
>> scenarios?  I think it's a bit like going out to dinner and spending
>> $50.  It's nicer than eating at home, and many people can afford to do
>> it occasionally, but if you do it every night, it gets expensive (and
>> possibly fattening).
>>
>> So we need some principled way of deciding how much inlining is
>> reasonable, because I am 100% certain this is not going to be the last
>> time someone discovers that a massive exercise in inlining can yield a
>> nifty performance benefit in some case or another: index builds and
>> COPY have already been mentioned on this thread, and I expect that
>> people will find other cases as well.  I'm not really sure what the
>> "budget" is - i.e. how much binary bloat we can afford to add - or how
>> many cases there are that can benefit, but the first isn't infinite
>> and the second is more than the first.
>
> Having such a metric would resolve this discussion, but formulating it will be
> all but impossible.  We don't know how much time PostgreSQL installations now
> spend sorting or how much additional time they would spend on cache misses,
> TLB misses, page faults, etc.  That data won't show up on this thread.
>
> You posed[1] the straw man of a sin(numeric) optimization requiring a 40GB
> lookup table.  I would not feel bad about rejecting that, because it can live
> quite comfortably as an external module.  Indeed, I think the best thing we
> can do to constrain long-term bloat in the "postgres" executable is to improve
> pluggability.  Let a wider range of features live outside the postmaster
> binary.  For example, if we had the infrastructure to let hash, GIN, GiST and
> SP-GiST index access methods live in their own DSOs (like PL/pgSQL does
> today), I would support doing that.  Extensibility is a hallmark of
> PostgreSQL.  It's a bad day when we reject a minority-interest feature or
> optimization that has no other way to exist.
>
> Better pluggability can't ease the decision process for Peter's patch, because
> its specializations essentially must live in the "postgres" executable to live
> at all.  Nominally, we could let external modules inject sort specializations
> for core types, and Peter could publish an all_fast_sorts extension.  That
> would be useless punting: if we can't make a principled decision on whether
> these accelerated sorts justify 85 KiB of binary, how will a DBA who discovers
> the module make that decision?
>
> This patch has gotten more than its fair share of attention for bloat, and I
> think that's mostly because there's a dial-a-bloat-level knob sticking out and
> begging to be frobbed.  On my system, fastpath_sort_2012_01_19.patch adds 85
> KiB to the postgres binary.  A recent commit added 57 KiB and bloat never came
> up in discussions thereof.  I appreciate your concern over a slippery slope of
> inlining proposals, but I also don't wish to see the day when every patch
> needs to declare and justify its binary byte count.  Unless, of course, we
> discover that elusive metric for evaluating it fairly.
>
> If we'd like to start taking interest in binary bloat, how about having the
> buildfarm log capture an "ls -l" on the bin directory?  We'll then have data
> to mine if we ever wish to really take action.
>
>
> All that being said, I'd want to see a 15-30% (depending on how contrived)
> improvement to a microbenchmark or a 5% improvement to a generic benchmark
> (pgbench, DBT-<N>) before adopting an optimization of this complexity.  Based
> on your measurements[2], per-type inlining gives us a 7% microbenchmark
> improvement.  I'd therefore excise the per-type inlining.  (For the record,
> given a 20% improvement to the same benchmark, I'd vote yea on the submission
> and perhaps even suggest int2 and uuid support.)


That's all well-said and I mostly agree with all of it, including your
suggested percentages in the last paragraph (but does anyone really
sort int2s or uuids?).  What I think is different about inlining as
compared to writing code is that it's possible to generate an amount
of binary bloat that is disproportionate to the amount of code you
actually wrote.  You can blow up your binary more or less on
auto-pilot, just by adding more macro invocations.  Of course, the
overall impact is the same, but when you have to actually write the
code by hand, you're more likely to notice that you're cranking out
hundreds of lines of code of dubious real-world value.

I think the other thing to keep in mind is that, inevitably, people
decide what considerations to worry about with reference to a
particular patch based on the perceived value of the feature.  I
thought about the fact that Tom's parameterized-path patch adds a lot
of new code and to be honest I'm not totally sanguine about it, but on
the other hand it has the potential to make certain kinds of queries
that were seriously slow in previous releases run hundreds or
thousands of times faster, and maybe open the door to further planner
optimizations in the future.  When you start talking about multiples
with lots of zeros on the end, binary bloat fades out of the picture:
you COULD assess it, but we already pretty much know what the outcome
of that discussion will be.  Similarly, when you hear Tom complaining
about the cost of adding a new grammar keyword, that's a tip-off that
he either thinks that no effort whatsoever was made to find a syntax
that will work with the existing keywords, or he doesn't think the
feature is worth very much, because if you look at git log -p
--author=Lane src/include/parser/kwlist.h you'll find he's added them
on various occasions with nary a word spoken.  What we need to be wary
of, in my opinion anyway, is not adding binary bloat or keywords per
se, but rather of adding them for trivial gains.  And I do consider a
6% gain on a highly artificial macrobenchmark to be trivial.  If we
were talking about a 10x speedup, I doubt the word "bloat" would be
anywhere on this thread, and it certainly wouldn't be getting the
prominence that it has here.

I'm also more than slightly concerned that we are losing sight of the
forest for the trees.  I have heard reports that sorting large amounts
of data is many TIMES slower for us than for a certain other database
product.  I frankly wonder if this entire patch is a distraction.
When inlining is the tool of choice for further performance
improvement, it suggests that we're pretty much out of ideas, and that
whatever we get out of inlining - be it 6% or 30% - will be the end.
I don't like that idea very much.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Progress on fast path sorting, btree index creation time

Reply via email to