2016-12-20 9:11 GMT+01:00 Andres Freund <and...@anarazel.de>: > On 2016-12-19 15:25:42 -0500, Robert Haas wrote: > > On Mon, Dec 19, 2016 at 3:13 PM, Peter Eisentraut > > <peter.eisentr...@2ndquadrant.com> wrote: > > > On 12/9/16 7:52 AM, Robert Haas wrote: > > >> It's kind of ironic, at least IMHO, that the DirectionFunctionCall > > >> macros are anything but direct. Each one is a non-inlined function > > >> call that does a minimum of 8 variable assignments before actually > > >> calling the function. > > > > > > If this is a problem (it might be), then we can just make those calls, > > > er, direct C function calls to different function. For example, > > > > > > result = DatumGetObjectId(DirectFunctionCall1(oidin, > > > CStringGetDatum(pro_name_or_oid))); > > > > > > could just be > > > > > > result = oidin_internal(pro_name_or_oid); > > > > Yeah, true -- that works for simple cases, and might be beneficial > > where you're doing lots and lots of calls in a tight loop. > > > > For the more general problem, I wonder if we should try to figure out > > something where we have one calling convention for "simple" functions > > (those that little or none of the information in fcinfo and can pretty > > much just take their SQL args as C args) and another for "complex" > > functions (where we do the full push-ups). > > Unfortunately I think so would likely also increase overhead in a bunch > of places, because suddenly we need branches determining which callling > convention is in use. There's a fair amount of places, some of them > performance sensitive, that deal with functions that can get different > number of arguments. Prominently nodeAgg.c's transition functions for > example. > > When JITing expressions that doesn't matter, because we avoid doing so > repetitively. But that's not really sufficient imo, non JITed > performance matters a lot too. > > There's imo primarily two parts that make our calling convention > expensive: > a) Additional instructions required to push to/from fcinfo->arg[null], > including stack spilling required to have space for the arguments. > b) Increased number of cachelines touched, even for even trivial > functions. We need to read/write to at least five: > 1) fcinfo->isnull to reset it > 2) fcinfo->arg[0...] to set the argument > 3) fcinfo->argnull[0...] to set argnull (separate cacheline) > 4) fcinfo->flinfo->fn_addr to get the actual address to call > 5) instruction cache miss for the function's contents > > We can doctor around this some. A retively easy way to reduce the > overhead of a similar function call interface would be to restructure > things so we have something like: > > typedef struct FunctionArg2 > { > Datum arg; > bool argnull; > } FunctionArg2; > > typedef struct FunctionCallInfoData2 > { > /* cached function address from ->flinfo */ > PGFunction fn_addr; > /* ptr to lookup info used for this call */ > FmgrInfo *flinfo; > /* function must set true if result is NULL */ > bool isnull; > /* # arguments actually passed */ > short nargs; > /* collation for function to use */ > Oid fncollation; /* collation for function to use */ > /* pass or return extra info about result */ > fmNodePtr resultinfo; > /* array big enough to fit flinfo->fn_nargs */ > FunctionArg2 args[FLEXIBLE_ARRAY_MEMBER]; > } FunctionCallInfoData2; > > That'd mean some code changes because accessing arguments can't be done > quite as now, but it'd be fairly minor. We'd end up with a lot fewer > cache misses for common cases, because there's no need to fetch fn_addr, > and because the first two arg/argnull pairs still fit within the first > cacheline (which they never can in the current interface!). > > But we'd still be passing arguments via memory, instead of using > registers. > > I think a more efficient variant would make the function signature look > something like: > > Datum /* directly returned argument */ > pgfunc( > /* extra information about function call */ > FunctionCallInfo *fcinfo, > /* bitmap of NULL arguments */ > uint64_t nulls, > /* first argument */ > Datum arg0, > /* second argument */ > Datum arg1, > /* returned NULL */ > bool *isnull > ); > > with additional arguments passed via fcinfo as currently done. Since > most of the performance critical function calls only have two arguments > that should allow to keep usage of fcinfo-> to the cases where we need > the extra information. On 64bit linuxes the above will be entirely in > registers, on 64bit windows isnull would be placed on the stack. > > I don't quite see how we could avoid stack usage at all for windows, any > of the above arguments seems important. > > There'd be a need for some trickery to make PG_GETARG_DATUM() work > efficiently - afaics the argument numbers passed to PG_GETARG_*() are > always constants, so that shouldn't be too bad. >
In this case some benchmark can be very important (and interesting). I am not sure if faster function execution has significant benefit on Vulcano like executor. Regards Pavel > > Regards, > > Andres > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >