[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory

Tim Armstrong (Code Review) Thu, 17 Nov 2016 09:01:50 -0800

Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory
......................................................................



Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

PS4, Line 190: CODEGEN_WRITE_TUPLES_MAX_COLUMNS = 250;
> I agree that it's more actionable if it's based on number of tuples.
This is more about avoiding codegen blow-ups instead of making a cost-based 
decision (that's tricky because the codegen cost is non-linear and 
unpredictable and we'd need to build a cost model for the scanners factoring in 
# rows,# columns, predicates, etc).

I don't think there's much point in modelling the cost unless a) there is a 
substantial benefit to justify the complexity and b) the cost model is going to 
be fairly accurate. Otherwise we end up in an uncomfortable middle ground with 
a half-baked code model where the code is more complex but we don't get 
measurable benefit or have confidence in the cost model's accuracy.

I'd rather not do this from the planner because it's tied up in the 
implementation details of codegen for the scanners. The only reason we have a 
hard threshold is because the backend codegen logic builds everything in a 
single function. If that changed we could switch to the approach we use 
elsewhere of disabling inlining. It makes more sense to me for the planner to 
make a high-level decision about enabling codegen for the node, then the 
backend implements that decision (or bails out if it can't do the codegen 
without blowing up).


http://gerrit.cloudera.org:8080/#/c/4956/4/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

PS4, Line 823: if (!hasGrouping) mergeAggNode.setDisableCodegen(true);
> This isn't exactly what I had in mind. If you look at ExchangeNode::Codegen
I think it's different from the agg node. For the agg node we know based on the 
plan shape that the number of input rows is small, so even though we could 
codegen the agg, it's not worth it. But as far as I can see the only reason we 
don't codegen non-merging exchanges in general is that there's no backend 
support for it.

There's no reason I see that the planner would always disable codegen for 
non-merging exchanges if backend support existed.


-- 
To view, visit http://gerrit.cloudera.org:8080/4956
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory

Reply via email to