Re: JIT doing duplicative optimization?

2021-11-15 Thread Andres Freund
Hi,

On 2021-11-14 18:46:34 -0500, Tom Lane wrote:
> BTW, I realized while working on this that I have little idea what the
> "Functions:" count is.  Nor does our documentation explain that (or any
> other of these numbers), at least not anywhere I could find.  That seems
> like a pretty serious documentation fail.

It's literally the number of functions created in the JIT code. That could be
functions for evaluating one expression, or functions for deforming tuples of
one type of tupledesc. It's the simplest measure of "amount of JITed stuff" I
could find. I'd proposed a more detailed explain option where we'd optionally
print details about what exactly is JITed where a while back, but there were
some explain output format challenges...


> If these numbers aren't important enough to explain in the docs, why are we
> printing them at all?

Afaict we don't have detailed docs for EXPLAIN output in general?

Greetings,

Andres Freund




Re: JIT doing duplicative optimization?

2021-11-15 Thread Andres Freund
Hi,

On 2021-11-11 17:02:09 -0300, Alvaro Herrera wrote:
> Recently I noticed extremely suspicious behavior from queries under JIT.
> A query (details for which sadly I cannot disclose) had an EXPLAIN
> looking like this:
> 
>  JIT:
>Functions: 743
>Options: Inlining true, Optimization true, Expressions true, Deforming true
>Timing: Generation 47.013 ms, Inlining 20.277 ms, Optimization 3269.381 
> ms, Emission 2392.115 ms, Total 5728.786 ms
>  Execution Time: 5735.552 ms
> 
> Disabling JIT, the query takes a few tens of milliseconds, so of course
> the typical recommendation is to disable JIT.  But I noted that the
> average time to optimize, per function, is 4.7ms.  I didn't know what to
> make of this -- is it a reasonable to to take to optimize a single
> function?

No, that's pretty insane. Any chance you share the generated bitcode for this
- it'll have some details about the query / table structure though. Perhaps
just the contained function names?


> I put that thought aside.  But then I spotted a couple of minor things
> in the query that could be made better; so I did some of that, and
> noticed that the number of functions was reduced from 743 to 671, and
> the JIT details in the explain looked like this:

> But what really surprised me is that the the average time to optimize
> per function is now 2.06ms ... less than half of the previous
> measurement.  It emits 10% less functions than before, but the time to
> both optimize and emit is reduced by 50%.  How does that make sense?

Without knowing the concrete query I don't know if it it's that. But


> I don't know how to approach an investigation into this, but I thought
> that maybe we're doing quadratic effort in the optimization / emission
> phase or something like that.

Yea. One way to investigate it is to enable jit_dump_bitcode and then use
llvm's 'opt' tool to see details about optimization times.

opt -time-passes -O3 -o /dev/null < /srv/dev/pgdev-dev/3990041.77.bc |less

Greetings,

Andres Freund




Re: JIT doing duplicative optimization?

2021-11-14 Thread Tom Lane
I wrote:
> You could probably generate some queries with lots and lots of expressions
> to characterize this better.  If it is O(N^2), it should not be hard to
> drive the cost up to the point where the guilty bit of code would stand
> out in a perf trace.

I experimented with that, using a few different-size queries generated
like this:

print "explain analyze\n";
for (my $i = 1; $i < 100; $i++) {
print " select sum(f1+$i) from base union all\n";
}
print "select sum(f1+0) from base;\n";

on a table made like

create table base as select generate_series(1,1000) f1;

What I got, after setting max_parallel_workers_per_gather = 0,
was

10 subqueries:

 Planning Time: 0.260 ms
 JIT:
   Functions: 30
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 4.651 ms, Inlining 8.870 ms, Optimization 152.937 ms, Emis
sion 95.046 ms, Total 261.504 ms
 Execution Time: 15258.249 ms

100 subqueries:

 Planning Time: 2.231 ms
 JIT:
   Functions: 300
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 44.163 ms, Inlining 9.934 ms, Optimization 1448.971 ms, Em
ission 928.438 ms, Total 2431.506 ms
 Execution Time: 154815.515 ms

1000 subqueries:

 Planning Time: 29.480 ms
 JIT:
   Functions: 3000
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 444.479 ms, Inlining 25.688 ms, Optimization 14989.696 ms,
 Emission 9891.993 ms, Total 25351.856 ms
 Execution Time: 1522011.367 ms

So the overhead looks pretty linear, or even a shade sublinear for the
"inlining" bit, *as long as only one process is involved*.  However,
I noted that if I didn't force that, the JIT overhead went up because
the planner wanted to use more workers and each worker has to do its own
compilations.  So perhaps the apparent nonlinearity in your examples comes
from that?

BTW, I realized while working on this that I have little idea what the
"Functions:" count is.  Nor does our documentation explain that (or any
other of these numbers), at least not anywhere I could find.  That seems
like a pretty serious documentation fail.  If these numbers aren't
important enough to explain in the docs, why are we printing them at all?

regards, tom lane




Re: JIT doing duplicative optimization?

2021-11-14 Thread Tom Lane
Alvaro Herrera  writes:
> On 2021-Nov-11, Alvaro Herrera wrote:
>> But what really surprised me is that the the average time to optimize
>> per function is now 2.06ms ... less than half of the previous
>> measurement.  It emits 10% less functions than before, but the time to
>> both optimize and emit is reduced by 50%.  How does that make sense?

> Ah, here's a query that illustrates what I'm on about.  I found this
> query[1] in a blog post[2].
> ...

> Query 1, 148 functions JIT-compiled.
> Average time to optimize, per function 435.153/148 = 2.940ms;
> average time to emit per function 282.216/148 = 1.906ms

> Query 2, 137 functions JIT-compiled.
> Average time to optimize, per function: 374.103/137 = 2.730ms
> Average time to emit, per function 254.557 / 137 = 1.858ms

> Query 3, 126 functions JIT-compiled.
> Average time to optimize per function 229.128 / 126 = 1.181ms
> Average time to emit per function 167.338 / 126 = 1.328ms

Yeah, in combination with your other measurement, it sure does look like
there's something worse-than-linear going on here.  The alternative is to
assume that the individual functions are more complex in one query than
the other, and that seems like a bit of a stretch.

You could probably generate some queries with lots and lots of expressions
to characterize this better.  If it is O(N^2), it should not be hard to
drive the cost up to the point where the guilty bit of code would stand
out in a perf trace.

regards, tom lane




Re: JIT doing duplicative optimization?

2021-11-14 Thread Zhihong Yu
On Sun, Nov 14, 2021 at 9:07 AM Alvaro Herrera 
wrote:

> On 2021-Nov-11, Alvaro Herrera wrote:
>
> > But what really surprised me is that the the average time to optimize
> > per function is now 2.06ms ... less than half of the previous
> > measurement.  It emits 10% less functions than before, but the time to
> > both optimize and emit is reduced by 50%.  How does that make sense?
>
> Ah, here's a query that illustrates what I'm on about.  I found this
> query[1] in a blog post[2].
>
> ```
> WITH RECURSIVE typeinfo_tree(
> oid, ns, name, kind, basetype, elemtype, elemdelim,
> range_subtype, attrtypoids, attrnames, depth)
> AS (
> SELECT
> ti.oid, ti.ns, ti.name, ti.kind, ti.basetype,
> ti.elemtype, ti.elemdelim, ti.range_subtype,
> ti.attrtypoids, ti.attrnames, 0
> FROM
> (
> SELECT
> t.oid   AS oid,
> ns.nspname  AS ns,
> t.typname   AS name,
> t.typtype   AS kind,
> (CASE WHEN t.typtype = 'd' THEN
> (WITH RECURSIVE typebases(oid, depth) AS (
> SELECT
> t2.typbasetype  AS oid,
> 0   AS depth
> FROM
> pg_type t2
> WHERE
> t2.oid = t.oid
> UNION ALL
> SELECT
> t2.typbasetype  AS oid,
> tb.depth + 1AS depth
> FROM
> pg_type t2,
> typebases tb
> WHERE
>tb.oid = t2.oid
>AND t2.typbasetype != 0
>) SELECT oid FROM typebases ORDER BY depth DESC LIMIT 1)
>ELSE NULL
> END)AS basetype,
> t.typelem   AS elemtype,
> elem_t.typdelim AS elemdelim,
> range_t.rngsubtype  AS range_subtype,
> (CASE WHEN t.typtype = 'c' THEN
> (SELECT
> array_agg(ia.atttypid ORDER BY ia.attnum)
> FROM
> pg_attribute ia
> INNER JOIN pg_class c
> ON (ia.attrelid = c.oid)
> WHERE
> ia.attnum > 0 AND NOT ia.attisdropped
> AND c.reltype = t.oid)
> ELSE NULL
> END)AS attrtypoids,
> (CASE WHEN t.typtype = 'c' THEN
> (SELECT
> array_agg(ia.attname::text ORDER BY ia.attnum)
> FROM
> pg_attribute ia
> INNER JOIN pg_class c
> ON (ia.attrelid = c.oid)
> WHERE
> ia.attnum > 0 AND NOT ia.attisdropped
> AND c.reltype = t.oid)
> ELSE NULL
> END)AS attrnames
> FROM
> pg_catalog.pg_type AS t
> INNER JOIN pg_catalog.pg_namespace ns ON (
> ns.oid = t.typnamespace)
> LEFT JOIN pg_type elem_t ON (
> t.typlen = -1 AND
> t.typelem != 0 AND
> t.typelem = elem_t.oid
> )
> LEFT JOIN pg_range range_t ON (
> t.oid = range_t.rngtypid
> )
> ) AS ti
> WHERE
> ti.oid = any(ARRAY[16,17]::oid[])
>
> UNION ALL
>
> SELECT
> ti.oid, ti.ns, ti.name, ti.kind, ti.basetype,
> ti.elemtype, ti.elemdelim, ti.range_subtype,
> ti.attrtypoids, ti.attrnames, tt.depth + 1
> FROM
> (
> SELECT
> t.oid   AS oid,
> ns.nspname  AS ns,
> t.typname   AS name,
> t.typtype   AS kind,
> (CASE WHEN t.typtype = 'd' THEN
> (WITH RECURSIVE typebases(oid, depth) AS (
> SELECT
> t2.typbasetype  AS oid,
> 0   AS depth
> FROM
> pg_type t2
> WHERE
> t2.oid = t.oid
> UNION ALL
> SELECT
> t2.typbasetype  AS oid,
> tb.depth + 1AS depth
> FROM
> pg_type t2,
> typebases tb
> WHERE
>tb.oid = t2.oid
>AND t2.typbasetype != 0
>) SELECT oid FROM typebases ORDER BY depth DESC 

Re: JIT doing duplicative optimization?

2021-11-14 Thread Alvaro Herrera
On 2021-Nov-11, Alvaro Herrera wrote:

> But what really surprised me is that the the average time to optimize
> per function is now 2.06ms ... less than half of the previous
> measurement.  It emits 10% less functions than before, but the time to
> both optimize and emit is reduced by 50%.  How does that make sense?

Ah, here's a query that illustrates what I'm on about.  I found this
query[1] in a blog post[2].

```
WITH RECURSIVE typeinfo_tree(
oid, ns, name, kind, basetype, elemtype, elemdelim,
range_subtype, attrtypoids, attrnames, depth)
AS (
SELECT
ti.oid, ti.ns, ti.name, ti.kind, ti.basetype,
ti.elemtype, ti.elemdelim, ti.range_subtype,
ti.attrtypoids, ti.attrnames, 0
FROM
(
SELECT
t.oid   AS oid,
ns.nspname  AS ns,
t.typname   AS name,
t.typtype   AS kind,
(CASE WHEN t.typtype = 'd' THEN
(WITH RECURSIVE typebases(oid, depth) AS (
SELECT
t2.typbasetype  AS oid,
0   AS depth
FROM
pg_type t2
WHERE
t2.oid = t.oid
UNION ALL
SELECT
t2.typbasetype  AS oid,
tb.depth + 1AS depth
FROM
pg_type t2,
typebases tb
WHERE
   tb.oid = t2.oid
   AND t2.typbasetype != 0
   ) SELECT oid FROM typebases ORDER BY depth DESC LIMIT 1)
   ELSE NULL
END)AS basetype,
t.typelem   AS elemtype,
elem_t.typdelim AS elemdelim,
range_t.rngsubtype  AS range_subtype,
(CASE WHEN t.typtype = 'c' THEN
(SELECT
array_agg(ia.atttypid ORDER BY ia.attnum)
FROM
pg_attribute ia
INNER JOIN pg_class c
ON (ia.attrelid = c.oid)
WHERE
ia.attnum > 0 AND NOT ia.attisdropped
AND c.reltype = t.oid)
ELSE NULL
END)AS attrtypoids,
(CASE WHEN t.typtype = 'c' THEN
(SELECT
array_agg(ia.attname::text ORDER BY ia.attnum)
FROM
pg_attribute ia
INNER JOIN pg_class c
ON (ia.attrelid = c.oid)
WHERE
ia.attnum > 0 AND NOT ia.attisdropped
AND c.reltype = t.oid)
ELSE NULL
END)AS attrnames
FROM
pg_catalog.pg_type AS t
INNER JOIN pg_catalog.pg_namespace ns ON (
ns.oid = t.typnamespace)
LEFT JOIN pg_type elem_t ON (
t.typlen = -1 AND
t.typelem != 0 AND
t.typelem = elem_t.oid
)
LEFT JOIN pg_range range_t ON (
t.oid = range_t.rngtypid
)
) AS ti
WHERE
ti.oid = any(ARRAY[16,17]::oid[])

UNION ALL

SELECT
ti.oid, ti.ns, ti.name, ti.kind, ti.basetype,
ti.elemtype, ti.elemdelim, ti.range_subtype,
ti.attrtypoids, ti.attrnames, tt.depth + 1
FROM
(
SELECT
t.oid   AS oid,
ns.nspname  AS ns,
t.typname   AS name,
t.typtype   AS kind,
(CASE WHEN t.typtype = 'd' THEN
(WITH RECURSIVE typebases(oid, depth) AS (
SELECT
t2.typbasetype  AS oid,
0   AS depth
FROM
pg_type t2
WHERE
t2.oid = t.oid
UNION ALL
SELECT
t2.typbasetype  AS oid,
tb.depth + 1AS depth
FROM
pg_type t2,
typebases tb
WHERE
   tb.oid = t2.oid
   AND t2.typbasetype != 0
   ) SELECT oid FROM typebases ORDER BY depth DESC LIMIT 1)
   ELSE NULL
END)AS basetype,
t.typelem   AS elemtype,
elem_t.typdelim AS elemdelim,
range_t.rngsubtype  AS range_subtype,
(CASE WHEN t.typtype 

JIT doing duplicative optimization?

2021-11-11 Thread Alvaro Herrera
Hello

Recently I noticed extremely suspicious behavior from queries under JIT.
A query (details for which sadly I cannot disclose) had an EXPLAIN
looking like this:

 JIT:
   Functions: 743
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 47.013 ms, Inlining 20.277 ms, Optimization 3269.381 ms, 
Emission 2392.115 ms, Total 5728.786 ms
 Execution Time: 5735.552 ms

Disabling JIT, the query takes a few tens of milliseconds, so of course
the typical recommendation is to disable JIT.  But I noted that the
average time to optimize, per function, is 4.7ms.  I didn't know what to
make of this -- is it a reasonable to to take to optimize a single
function?

I put that thought aside.  But then I spotted a couple of minor things
in the query that could be made better; so I did some of that, and
noticed that the number of functions was reduced from 743 to 671, and
the JIT details in the explain looked like this:

 Planning Time: 15.060 ms
 JIT:
   Functions: 671
   Options: Inlining false, Optimization true, Expressions true, Deforming true
   Timing: Generation 75.205 ms, Inlining 0.000 ms, Optimization 1387.741 ms, 
Emission 1110.807 ms, Total 2573.753 ms
 Execution Time: 2587.776 ms

But what really surprised me is that the the average time to optimize
per function is now 2.06ms ... less than half of the previous
measurement.  It emits 10% less functions than before, but the time to
both optimize and emit is reduced by 50%.  How does that make sense?

I don't know how to approach an investigation into this, but I thought
that maybe we're doing quadratic effort in the optimization / emission
phase or something like that.

-- 
Álvaro Herrera