Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4883: Implement Codegen for Union
......................................................................


Patch Set 2:

(4 comments)

The code looks generally good. I had two high-level comments: we should make 
sure that the output in the profile makes sense, and we should try to rearrange 
the inner perf-critical loop to squeeze out more perf (I think we can do much 
better than 50%).

http://gerrit.cloudera.org:8080/#/c/6459/2/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:

Line 26: 
unnecessary new line


Line 110:     if (!codegen_status.ok()) continue;
We shouldn't drop this status - at the least we should display it in the 
profile. It's also reasonable to abort codegen for the node entirely if it 
simplifies things - the cases when codegen fails are mainly corner cases like 
CHAR.


Line 129:     runtime_profile()->AddCodegenMsg(codegen_status.ok(), 
codegen_status);
How does this show up in the profile? If we're going to show the results of 
codegening each union branch we should make sure that it is displayed in a 
reasonable way - e.g. mention the child number.


Line 184:   if (UNLIKELY(codegend_union_materialize_exprs_fns_.size() == 0 ||
It would be great if we could avoid all of this per-row overhead - I'm counting 
a function call, three condition checks and an indirect function call via the 
function pointer before it does any actual work for the row, and that's just 
including this function, not the caller.

We usually try to push loops like this down into the codegen'd code. I think in 
this case the loop can be tightened up a lot and pushed down into 
cross-compiled code. If you have a for loop over (0 ..rows left in child 
batch), I think the only thing that really needs to be checked per row is the 
capacity of the destination batch - the other loop conditions can be converted 
into an iteration count before entering the loop.

HdfsParquetScanner::ProcessScratchBatch() is an example where all of this is 
pretty optimised.

I don't think that fully optimising it should block this getting in since a 
good improvement already, but there's so much low-hanging fruit that we're 
missing out on a lot of the potential benefit of your work.

Also if we leave code sitting around with inefficient patterns like this people 
tend to copy the bad patterns (e.g. the Kudu scan node inherited some bad 
patterns from the HBase scan node).


-- 
To view, visit http://gerrit.cloudera.org:8080/6459
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to