Michael Ho has posted comments on this change.

Change subject: IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()

Patch Set 4:


File be/src/exec/exec-node.h:

PS4, Line 72:  If overridden in subclass, must also
            :   /// call superclass's Codegen().
> does it matter whether this happens first or last?
Not really. Comments updated.

File be/src/exec/hash-join-node.cc:

Line 148:   }
> why not do this in ExecNode::Prepare() to avoid having to do the same thing
That would have the unfortunate side effect of adding this message to operators 
which aren't codegen enabled. This may be confusing.

File be/src/exec/partitioned-hash-join-builder.h:

PS4, Line 88: whether the codegen was
            :   /// enabled 
> we only call this if codegen is enabled, right?  So I'm not sure what this 

File be/src/exprs/scalar-fn-call.cc:

Line 409:   // TODO: fix this
> not for this change, but i think this timestamp stuff may have been fixed b
Done. TODO removed.

File be/src/runtime/runtime-state.h:

PS4, Line 161: 'codegen_'.
> public method comments shouldn't really talk about private members. how abo

File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 532
> why is it okay to always enable codegen now, whereas before we were so care
We always codegen conjunct evaluation for parquet scanner now so this code is 
mostly irrelevant for parquet scanner. For other file formats, they weren't 
really affected by this flag as they always codegen (if implemented). For 
parquet scanner, this flag used to affect whether codegen will happen in 
ScalarFnCall::Prepare() and that had performance impact on conjuncts evaluation.

All this complication stems from lazy creation of the LLVM module to avoid the 
overhead of LLVM module creation when the fragment doesn't have any codegen 
enabled operator:


We didn't have lazy IR code materialization back then so the preparation time 
of an LLVM module was higher (in the order of at least 150+ ms on my dev box). 
With lazy IR code materialization and a recent change to lazily create the 
mapping of IRFunction::Type to llvm::Function*, we pushed the creation time of 
LLVM module to about 10~11ms.

With this patch, if codegen is enabled, we will always create the codegen 
module. This avoids problem such as IMPALA-1755 and IMPALA-3638.

File tests/query_test/test_udfs.py:

Line 80:     vector.get_value('exec_option')['disable_codegen'] = 1
> this doesn't invalidate any of the preexisting test cases?
AFAIK, no. The preexisting test cases were mostly about failure to load the 
functions. Nothing tied to codegen as far as I can tell.

To view, visit http://gerrit.cloudera.org:8080/4651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to