On Mon, Jul 30, 2018 at 1:23 PM, Andres Freund <and...@anarazel.de> wrote:

> On 2018-07-30 07:19:07 -0400, Jeff Janes wrote:
>
> > And indeed, in my hands JIT makes it almost 3 times worse.
>
> Not in my measurement. Your example won't use JIT at all, because it's
> below the cost threshold. So I think you might just be seeing cache +
> hint bit effects?
>

No, it is definitely JIT.  The explain plans show it, and the cost of the
query is 230,000 while the default setting of jit_above_cost is 100,000.
It is fully reproducible by repeatedly toggling the JIT setting.  It
doesn't seem to be the cost of compiling the code that slows it down (I'm
assuming the code is compiled once per tuple descriptor, not once per
tuple), but rather the efficiency of the compiled code.



>
> > Run against ab87b8fedce3fa77ca0d6, I get 12669.619 ms for the 2nd JIT
> > execution and 4594.994 ms for the JIT=off.
>
> Even with a debug LLVM build, which greatly increases compilation
> overhead, I actually see quite the benefit when I force JIT to be used:
>

I don't see a change when I compile without --enable-debug,
and jit_debugging_support is off, or in 11beta2 nonexistent.  How can I
know if I have a debug LLVM build, and turn it off if I do?


>
>
> postgres[26832][1]=# ;SET jit_above_cost = -1; set jit_optimize_above_cost
> = 0; set jit_inline_above_cost = 0;
> postgres[26832][1]=# explain (analyze, buffers, timing off) select pk,
> int200 from i200c200;
>

Lowering jit_optimize_above_cost does redeem this for me.  It brings it
back to being a tie with JIT=OFF.  I don't see any further improvement by
lowering jit_inline_above_cost, and overall it is just a statistical tie
with JIT=off, not an improvement as you get, but at least it isn't a
substantial loss.

Under what conditions would I want to do jit without doing optimizations on
it?  Is there a rule of thumb that could be documented, or do we just use
the experimental method for each query?

I don't know how sensitive JIT is to hardware.  I'm using Ubuntu 16.04 on
VirtualBox (running on Windows 10) on an i5-7200U, which might be important.

I had previously done a poor-man's JIT where I created 4 versions of the
main 'for' loop in slot_deform_tuple.  I did a branch on "if(hasnulls)",
and then each branch had two loops, one for when 'slow' is false, and then
one for after 'slow' becomes true so we don't have to keep setting it true
again once it already is, in a tight loop.  I didn't see noticeable
improvement there (although perhaps I would have on different hardware), so
didn't see how JIT could help with this almost-entirely-null case.  I'm not
trying to address JIT in general, just as it applies to this particular
case.

Unrelated to JIT and relevant to the 'select pk, int199' case but not the
'select pk, int200' case, it seems we have gone to some length to make slot
deforming be efficient for incremental use, but then just deform in bulk
anyway up to maximum attnum used in the query, at least in this case.  Is
that because incremental deforming is not cache efficient?

Cheers,

Jeff

Reply via email to