On 5 December 2011 13:23, Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> wrote: > I'm still not convinced the big gain is in inlining the comparison > functions. Your patch contains a few orthogonal tricks, too: > > * A fastpath for the case of just one scankey. No reason we can't do that > without inlining.
True, though Tom did seem to not like that idea very much. > * inlining the swap-functions within qsort. Thaẗ́'s different from inlining > the comparison functions, and could be done regardless of the data type. The > qsort element size in tuplesort.c is always sizeof(SortTuple) Sure. I think that Tom mostly objects to hard-coded intelligence about which datatypes are used, rather than specialisations per se. > And there's some additional specializations we can do, not implemented in > your patch, that don't depend on inlining the comparison functions: > > * A fastpath for the case of no NULLs > * separate comparetup functions for ascending and descending sorts (this > allows tail call elimination of the call to type-specific comparison > function in the ascending version) > * call CHECK_FOR_INTERRUPTS() less frequently. All of those had occurred to me, except the last - if you look at the definition of that macro, it looks like more trouble than it's worth to do less frequently. I didn't revise my patch with them though, because the difference that they made, while clearly measurable, seemed fairly small, or at least relatively so. I wasn't about to add additional kludge for marginal benefits given the existing controversy, though I have not dismissed the idea entirely - it could be important on some other machine. > Perhaps you can get even more gain by adding the no-NULLs, and asc/desc > special cases to your inlining patch, too, but let's squeeze all the fat out > of the non-inlined version first. As I said, those things are simple enough to test, and were not found to make a significant difference, at least to my exact test case + hardware. > One nice aspect of many of these > non-inlining optimizations is that they help with external sorts, too. I'd expect the benefits to be quite a lot smaller there, but fair point. Results from running the same test on your patch are attached. I think that while you're right to suggest that the inlining of the comparator proper, rather than any other function or other optimisation isn't playing a dominant role in these optimisations, I think that you're underestimating the role of locality of reference. To illustrate this, I've also included results for when I simply move the comparator to another translation unit, logtape.c . There's clearly a regression from doing so (I ran the numbers twice, in a clinical environment). The point is, there is a basically unknown overhead paid for not inlining - maybe inlining is not worth it, but that's a hard call to make. I didn't try to make the numbers look worse by putting the comparator proper in some other translation unit, but it may be that I'd have shown considerably worse numbers by doing so. Why the strong aversion to what I've done? I accept that it's ugly, but is it really worth worrying about that sort of regression in maintainability for something that was basically untouched since 2006, and will probably remain untouched after this work concludes, whatever happens? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
-- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers