Dan Hecht has posted comments on this change. Change subject: IMPALA-2550: Switch to per-query exec rpc ......................................................................
Patch Set 10: (20 comments) http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS10, Line 204: GetPrepareStatus > the barrier isn't really present, because getfinstancestate already waits f Still, it's not a pattern I think we should encourage in the code base (DCHECK(x) where x has side-effects). How about at least making the non-barrier explicit by shortcircuiting it like this: DCHECK(coord_instance_->IsPrepared() && coord_instance_->GetPrepareStatus().ok()); PS10, Line 413: backend_state->GetStatus(); > good catch, this could be cancelled at this point. In your current code, will the real status eventually be propagated as the query status? Or will this race cause the query to result in CANCELLED even though there was a real error? PS10, Line 424: DCHECK(query_status_.ok()); // nobody should have been able to cancel > when does the query get displayed on the debug page? you still haven't retu It gets displayed as soon as there is a client request state registered, I believe. (i.e. RegisterQuery()). Which is what I meant - webserver cancellation is definitely available before exec() returns. PS10, Line 964: #if 0 : do we really want this? > remove it unless someone says we want it. fine with me to delete. http://gerrit.cloudera.org:8080/#/c/6535/11/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS11, Line 200: // fragment instance's executor will not comple DCHECK(coord_instance_->IsPrepared() && coord_instance_->GetPrepareStatus().ok()); to make it explicit that this has no side-effects. http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/query-exec-mgr.cc File be/src/runtime/query-exec-mgr.cc: PS10, Line 51: Prepare > i think we should do some initial setup here so we can return an error stat the renaming helped. PS10, Line 59: exec-fragment-instance-$0 > i left everything as is, because i didn't want to break existing tests. hap Which tests? CAn't they be fixed? You renamed the other thread - I think it would be best to do this one as well. The current name is really confusing because it's not plural. http://gerrit.cloudera.org:8080/#/c/6535/11/be/src/runtime/query-exec-mgr.cc File be/src/runtime/query-exec-mgr.cc: PS11, Line 57: n) QueryState reference PS11, Line 59: exec-fragment-instance start-query-finstances http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: PS10, Line 195: // TODO: this might flood the log : LOG(WARNING) << "Couldn't get a client for " << query_ctx().coord_address : <<"\tReason: " << coord_status.GetDetail(); > address what, flooding the log? Yes, flooding the log. i.e. the TODO would be better if it says what should happen in the future. And before it looks like we'd remember the status via UpdateStatus() and then potentially report it back the next time we try (for periodic reports). Not saying that's the right solution but it does look like a subtle behavior change so want to think it through. I think the new behavior is probably not really worse though (for periodic reports, just report and try again later; for final report, in both old and new we'd drop it on the floor). PS10, Line 255: // TODO: should we try to keep rpc_status for the final report? (but the final : // report, following this Cancel(), may not succeed anyway.) > if the report rpc fails we fail the query on this node and then do nothing/ Okay. I think the old code used to remember this status (via UpdateStatus()) and then would report it back if it could talk to the coordinator later when the fragment is complete. But I agree the protocol needs to be fixed. Line 319: prepare_status = instance_status; > fis.h points out that all public non-getter functions block until the prepa That fis.h comment was just added. But I don't see how it's relevant to my initial comment here -- how can prepare_status be CANCELLED? http://gerrit.cloudera.org:8080/#/c/6535/11/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: PS11, Line 211: arams.instance_exec_status.emplace_back(); : params.__isset.instance_exec_status = true; but why doesn't the coordinator initiate cancellation if any FIS reports error? Is this to simplify the coordinator logic? I guess I'm okay with how you have this currently, but it just seems clearer to me to keep FIS status in only one place. PS11, Line 264: despite http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/query-state.h File be/src/runtime/query-state.h: PS10, Line 59: cancelled > vague in what way? Vague because it's not clear what all the side effects of "cancel" are, and when they take place. i.e. does it have an impact on resources, and if so, when? Really, just thinking ahead a bit, so let's leave this alone for now. It will probably get clarified with the release resources changes. PS10, Line 112: Blocks until all fragment instances have finished : /// their Prepare phase. > it's the externally observable behavior, and qem::startqueryhelper takes ad Okay. PS10, Line 132: query is complete > i agree the guarantees around resource release and lifetime need to be comp Agree we'll revisit this in a later change. http://gerrit.cloudera.org:8080/#/c/6535/11/be/src/runtime/query-state.h File be/src/runtime/query-state.h: PS11, Line 250: that's no longer defined; delete. PS11, Line 260: This is subtle as to why we have it so I think it would be helpful to explain in what cases you shouldn't try to cancel - when FIS haven't started yet. Or even better would be to rename this to 'fis_started' since that's what the caller really needs to care about. http://gerrit.cloudera.org:8080/#/c/6535/11/be/src/runtime/test-env.cc File be/src/runtime/test-env.cc: Line 83: exec_env_.reset(); what's this about? -- To view, visit http://gerrit.cloudera.org:8080/6535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
