Re: [HACKERS] Inlining comparators as a performance optimisation

Pierre C Fri, 13 Jan 2012 01:49:40 -0800

On Wed, 21 Sep 2011 18:13:07 +0200, Tom Lane <t...@sss.pgh.pa.us> wrote:

Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> writes:

On 21.09.2011 18:46, Tom Lane wrote:

The idea that I was toying with was to allow the regular SQL-callable
comparison function to somehow return a function pointer to the
alternate comparison function,

You could have a new function with a pg_proc entry, that just returns a
function pointer to the qsort-callback.


Yeah, possibly.  That would be a much more invasive change, but cleaner
in some sense.  I'm not really prepared to do all the legwork involved
in that just to get to a performance-testable patch though.

A few years ago I had looked for a way to speed up COPY operations, and itturned out that COPY TO has a good optimization opportunity. At that time,for each datum, COPY TO would :


- test for nullness
- call an outfunc through fmgr

- outfunc pallocs() a bytea or text, fills it with data, and returns it(sometimes it uses an extensible string buffer which may be repalloc()dseveral times)- COPY memcpy()s returned data to a buffer and eventually flushes thebuffer to client socket.

I introduced a special write buffer with an on-flush callback (ie, a closerelative of the existing string-buffer), in this case the callback was"flush to client socket", and several outfuncs (one per type) which tookthat buffer as argument, besides the datum to output, and simply put thedatum inside the buffer, with appropriate transformations (like convertingto bytea or text), and flushed if needed.


Then the COPY TO BINARY of a constant-size datum would turn to :
- one test for nullness
- one C function call

- one test to ensure appropriate space available in buffer (flush ifneeded)- one htonl() and memcpy of constant size, which the compiler turns outinto a couple of simple instructions

I recall measuring speedups of 2x - 8x on COPY BINARY, less for text, butstill large gains.

Although eliminating fmgr call and palloc overhead was an important partof it, another large part was getting rid of memcpy()'s which the compilerturned into simple movs for known-size types, a transformation that can bedone only if the buffer write functions are inlined inside the outfuncs.Compilers love constants...

Additionnally, code size growth was minimal since I moved the old outfuncscode into the new outfuncs, and replaced the old fmgr-callable outfuncswith "create buffer with on-full callback=extend_and_repalloc() - pass tonew outfunc(buffer,datum) - return buffer". Which is basically equivalentto the previous palloc()-based code, maybe with a few extra instructions.

When I submitted the patch for review, Tom rightfully pointed out that myway of obtaining the C function pointer sucked very badly (I don'tremember how I did it, only that it was butt-ugly) but the idea was to geta quick measurement of what could be gained, and the result was positive.Unfortunately I had no time available to finish it and make it into a realpatch, I'm sorry about that.

So why do I post in this sorting topic ? It seems, by bypassing fmgr forfunctions which are small, simple, and called lots of times, there is alarge gain to be made, not only because of fmgr overhead but also becauseof the opportunity for new compiler optimizations, palloc removal, etc.However, in my experiment the arguments and return types of the newfunctions were DIFFERENT from the old functions : the new ones do the samething, but in a different manner. One manner was suited to sql-callablefunctions (ie, palloc and return a bytea) and another one to writing largeamounts of data (direct buffer write). Since both have very differentrequirements, being fast at both is impossible for the same function.


Anyway, all that rant boils down to :

Some functions could benefit having two versions (while sharing almost allthe code between them) :

- User-callable (fmgr) version (current one)
- C-callable version, usually with different parameters and return type

And it would be cool to have a way to grab a bare function pointer on thesecond one.

Maybe an extra column in pg_proc would do (but then, the proargtypes andfriends would describe only the sql-callable version) ?

Or an extra table ? pg_cproc ?
Or an in-memory hash : hashtable[ fmgr-callable function ] => C version
- What happens if a C function has no SQL-callable equivalent ?

Or (ugly) introduce an extra per-type function type_get_function_ptr(function_kind ) which returns the requested function ptr


If one of those happens, I'll dust off my old copy-optimization patch ;)

Hmm... just my 2c

Regards
Pierre

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inlining comparators as a performance optimisation

Reply via email to