Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15408 )

Change subject: IMPALA-4080: Codegen once per fragment
......................................................................


Patch Set 7:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/15408/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15408/7//COMMIT_MSG@18
PS7, Line 18:
Maybe include an example of how the codegen in the profile looks after the 
changes?


http://gerrit.cloudera.org:8080/#/c/15408/7//COMMIT_MSG@21
PS7, Line 21:
We'll need to do some perf work for the non-mt_dop code to make sure there are 
no regressions.


http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/codegen/llvm-codegen-test.cc
File be/src/codegen/llvm-codegen-test.cc:

http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/codegen/llvm-codegen-test.cc@54
PS7, Line 54:     QueryState* qs = runtime_state_->query_state();
Maybe this could be done in CreateQueryState()? This generic setup stuff is 
more reuseable if it's there.

I think this is probably fine though.


http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/exec/exec-node.cc
File be/src/exec/exec-node.cc:

http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/exec/exec-node.cc@150
PS7, Line 150:   
codegen_status_msgs_.push_back(std::move(FragmentState::GenerateCodegenMsg(
I'm not sure that the move does anything if GenerateCodegenMsg() is returned by 
value. I think emplace_back() might avoid a copy though.


http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/exec/exec-node.cc@227
PS7, Line 227: NULL
switch to nullptr while you're changing it in this file?


http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/exec/exec-node.cc@228
PS7, Line 228:   for (int i = 0; i < children_.size(); ++i) {
Maybe change to range-based for?


http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/fragment-state.h
File be/src/runtime/fragment-state.h:

http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/fragment-state.h@19
PS7, Line 19: #ifndef IMPALA_RUNTIME_FRAGMENT_STATE_H
Use a #pragma once instead of the old-style included guards


http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/fragment-state.h@26
PS7, Line 26: #include "runtime/runtime-filter-bank.h"
This doesn't look like it's needed.


http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/fragment-state.h@32
PS7, Line 32: class RuntimeFilterBank;
This doesn't look like it's needed.


http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/fragment-state.h@56
PS7, Line 56: const TPlanFragmentInstanceCtx*
Might be better to use references, since they should be non-nullable. Not a big 
deal though


http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/fragment-state.cc
File be/src/runtime/fragment-state.cc:

http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/fragment-state.cc@97
PS7, Line 97:       query_state->obj_pool(), Substitute("Fragment-$0", 
fragment_idx()));
Should use the same fragment ID format (F01, etc) as elsewhere in the 
profile/plan.


http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/query-state.h@387
PS7, Line 387:   /// TODO: add description
This


http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/query-state.cc@267
PS7, Line 267:   // TODO: instance_ctxs contains all instances among all 
fragments, I need to run this
Does this need to be done in this patchset?


http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/query-state.cc@608
PS7, Line 608:
nit: probably don't need this vertical whitespace.


http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/query-state.cc@612
PS7, Line 612:     string thread_name = Substitute("$0 (fragment idx:$1)",
It would be be better to use the usual fragment ID (F01, etc), since that's the 
identifier we generally use.


http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/query-state.cc@621
PS7, Line 621:   for (unique_ptr<Thread>& thread : codegen_threads) {
This has the potential to change the timing of some queries significantly, 
since the fragments will not start executing until the last codegen fragment

This may be a good thing for predictability, but is a bad thing for 
parallelism, since it may delay starting up small scan fragments, etc.

I suspect it may add enough risk of regression that we want to avoid it (at 
least for non-mt_dop queries). I think we could approximate the old behaviour 
if each codegen thread started up the instances for that fragment once it 
finished codegen (I guess then it becomes a fragment start thread).


Also, have you thought about how slow codegen could be debugged in the profile 
now? I think it's good that the codegen sub-profiles are more prominent for 
each backend. But tracing back from a single fragment profile, I guess we'd 
have to look at the "Open started" in the timeline to see how much codegen 
delayed the execution startup.


http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/query-state.cc@622
PS7, Line 622:     thread->Join();
I wonder about whether using a thread pool would be a good idea here, just to 
reduce thread startup overhead and context switching under concurrency.

I think there would be some concerns here about fairness between queries and 
reflecting queuing delays in the profile, so *probably* not a good idea to 
tackle right now.


http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/query-state.cc@805
PS7, Line 805: // TODO: maybe just return the FragmentState and use that instead
Comment probably needs cleanup?


http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/udf/udf-internal.h
File be/src/udf/udf-internal.h:

http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/udf/udf-internal.h@174
PS7, Line 174:   static int GetConstFnAttr(bool decimalV2,
Maybe pass in the query options struct?


http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/udf/udf.cc
File be/src/udf/udf.cc:

http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/udf/udf.cc@122
PS7, Line 122: // TODO: bik: will have to add a fake class above if I want to 
include any of the impala
Does this need addressing?



-- 
To view, visit http://gerrit.cloudera.org:8080/15408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3aef8bc621f96caafe9a1c378617a2987e4ad452
Gerrit-Change-Number: 15408
Gerrit-PatchSet: 7
Gerrit-Owner: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Fri, 20 Mar 2020 17:59:11 +0000
Gerrit-HasComments: Yes

Reply via email to