Hi,
On 2018-08-22 18:15:29 -0400, Tom Lane wrote:
> Andres Freund <[email protected]> writes:
> > On 2018-08-22 06:20:21 +0000, Noah Misch wrote:
> >> Regardless of the choice of jit={on|off} default, these numbers tell me
> >> that
> >> some or all of jit_*_cost defaults are too low.
>
> > I don't think it really shows that. The reason that JITing gets started
> > there is that the tables aren't analyzed and we end up with crazy ass
> > estimates about the cost of the queries. No useful setting of the cost
> > limits will protect against that... :(
>
> I don't buy that line of argument one bit. No, we generally don't
> analyze most of the regression test tables, but the planner still
> knows that they're not very large. If JIT is kicking in for those
> queries, the defaults are set wrong.
I looked at the queries that get JITed, I didn't just make that claim up
out of thin air. The first query that's JITed e.g. is:
+explain analyze SELECT '' AS tf_12, BOOLTBL1.*, BOOLTBL2.*
+ FROM BOOLTBL1, BOOLTBL2
+ WHERE BOOLTBL2.f1 <> BOOLTBL1.f1;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Nested Loop (cost=0.00..118524.73 rows=3948050 width=34) (actual
time=8.376..8.390 rows=12 loops=1)
+ Join Filter: (booltbl2.f1 <> booltbl1.f1)
+ Rows Removed by Join Filter: 4
+ -> Seq Scan on booltbl1 (cost=0.00..38.10 rows=2810 width=1) (actual
time=0.018..0.019 rows=4 loops=1)
+ -> Materialize (cost=0.00..52.15 rows=2810 width=1) (actual
time=0.004..0.005 rows=4 loops=4)
+ -> Seq Scan on booltbl2 (cost=0.00..38.10 rows=2810 width=1)
(actual time=0.007..0.009 rows=4 loops=1)
+ Planning Time: 0.074 ms
+ JIT:
+ Functions: 6
+ Generation Time: 0.935 ms
+ Inlining: false
+ Inlining Time: 0.000 ms
+ Optimization: false
+ Optimization Time: 0.451 ms
+ Emission Time: 7.716 ms
+ Execution Time: 43.466 ms
+(16 rows)
Now you can say that'd be solved by bumping the cost up, sure. But
obviously the row / cost model is pretty much out of whack here, I don't
see how we can make reasonable decisions in a trivial query that has a
misestimation by five orders of magnitude.
Another subsequent case is:
set enable_sort = off; -- try to make it pick a hash setop implementation
select '(2,5)'::cashrange except select '(5,6)'::cashrange;
which is expensive because a sort is chosen even though sort is disabled
(yes, this might be a bug in the test):
EXPLAIN select '(2,5)'::cashrange except select '(5,6)'::cashrange;
┌────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN
│
├────────────────────────────────────────────────────────────────────────────────────┤
│ SetOp Except (cost=10000000000.06..10000000000.07 rows=1 width=36)
│
│ -> Sort (cost=10000000000.06..10000000000.06 rows=2 width=36)
│
│ Sort Key: ('($2.00,$5.00)'::cashrange)
│
│ -> Append (cost=0.00..0.05 rows=2 width=36)
│
│ -> Subquery Scan on "*SELECT* 1" (cost=0.00..0.02 rows=1
width=36) │
│ -> Result (cost=0.00..0.01 rows=1 width=32)
│
│ -> Subquery Scan on "*SELECT* 2" (cost=0.00..0.02 rows=1
width=36) │
│ -> Result (cost=0.00..0.01 rows=1 width=32)
│
│ JIT:
│
│ Functions: 7
│
│ Inlining: true
│
│ Optimization: true
│
└────────────────────────────────────────────────────────────────────────────────────┘
(12 rows)
Obviously the high costing here distorts things. Many of the other
cases here are along similar lines as the two cases before.
> Additional evidence for the
> defaults being wrong is the number of reports we've had of JIT making
> things slower.
Maybe.
Greetings,
Andres Freund