On 2016-12-19 15:25:42 -0500, Robert Haas wrote: > On Mon, Dec 19, 2016 at 3:13 PM, Peter Eisentraut > <peter.eisentr...@2ndquadrant.com> wrote: > > On 12/9/16 7:52 AM, Robert Haas wrote: > >> It's kind of ironic, at least IMHO, that the DirectionFunctionCall > >> macros are anything but direct. Each one is a non-inlined function > >> call that does a minimum of 8 variable assignments before actually > >> calling the function. > > > > If this is a problem (it might be), then we can just make those calls, > > er, direct C function calls to different function. For example, > > > > result = DatumGetObjectId(DirectFunctionCall1(oidin, > > CStringGetDatum(pro_name_or_oid))); > > > > could just be > > > > result = oidin_internal(pro_name_or_oid); > > Yeah, true -- that works for simple cases, and might be beneficial > where you're doing lots and lots of calls in a tight loop. > > For the more general problem, I wonder if we should try to figure out > something where we have one calling convention for "simple" functions > (those that little or none of the information in fcinfo and can pretty > much just take their SQL args as C args) and another for "complex" > functions (where we do the full push-ups).
Unfortunately I think so would likely also increase overhead in a bunch of places, because suddenly we need branches determining which callling convention is in use. There's a fair amount of places, some of them performance sensitive, that deal with functions that can get different number of arguments. Prominently nodeAgg.c's transition functions for example. When JITing expressions that doesn't matter, because we avoid doing so repetitively. But that's not really sufficient imo, non JITed performance matters a lot too. There's imo primarily two parts that make our calling convention expensive: a) Additional instructions required to push to/from fcinfo->arg[null], including stack spilling required to have space for the arguments. b) Increased number of cachelines touched, even for even trivial functions. We need to read/write to at least five: 1) fcinfo->isnull to reset it 2) fcinfo->arg[0...] to set the argument 3) fcinfo->argnull[0...] to set argnull (separate cacheline) 4) fcinfo->flinfo->fn_addr to get the actual address to call 5) instruction cache miss for the function's contents We can doctor around this some. A retively easy way to reduce the overhead of a similar function call interface would be to restructure things so we have something like: typedef struct FunctionArg2 { Datum arg; bool argnull; } FunctionArg2; typedef struct FunctionCallInfoData2 { /* cached function address from ->flinfo */ PGFunction fn_addr; /* ptr to lookup info used for this call */ FmgrInfo *flinfo; /* function must set true if result is NULL */ bool isnull; /* # arguments actually passed */ short nargs; /* collation for function to use */ Oid fncollation; /* collation for function to use */ /* pass or return extra info about result */ fmNodePtr resultinfo; /* array big enough to fit flinfo->fn_nargs */ FunctionArg2 args[FLEXIBLE_ARRAY_MEMBER]; } FunctionCallInfoData2; That'd mean some code changes because accessing arguments can't be done quite as now, but it'd be fairly minor. We'd end up with a lot fewer cache misses for common cases, because there's no need to fetch fn_addr, and because the first two arg/argnull pairs still fit within the first cacheline (which they never can in the current interface!). But we'd still be passing arguments via memory, instead of using registers. I think a more efficient variant would make the function signature look something like: Datum /* directly returned argument */ pgfunc( /* extra information about function call */ FunctionCallInfo *fcinfo, /* bitmap of NULL arguments */ uint64_t nulls, /* first argument */ Datum arg0, /* second argument */ Datum arg1, /* returned NULL */ bool *isnull ); with additional arguments passed via fcinfo as currently done. Since most of the performance critical function calls only have two arguments that should allow to keep usage of fcinfo-> to the cases where we need the extra information. On 64bit linuxes the above will be entirely in registers, on 64bit windows isnull would be placed on the stack. I don't quite see how we could avoid stack usage at all for windows, any of the above arguments seems important. There'd be a need for some trickery to make PG_GETARG_DATUM() work efficiently - afaics the argument numbers passed to PG_GETARG_*() are always constants, so that shouldn't be too bad. Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers