Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/10813 )
Change subject: IMPALA-7163: Implement a state machine for the QueryState class ...................................................................... Patch Set 6: (15 comments) http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-exec-mgr.cc File be/src/runtime/query-exec-mgr.cc: http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-exec-mgr.cc@128 PS4, Line 128: > This seems to fit into IMPALA-4063 better. As the code stands in this patch I removed this and moved the code inside StartFInstances(). I'll bring this back with IMPALA-4063 http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.h File be/src/runtime/query-state.h: http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.h@159 PS4, Line 159: until all > IMPALA-4063 Removed this. http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.h@278 PS4, Line 278: /// CANCELLED: This query received a CancelQueryFInstances() RPC or was directed by > No implication on whether all fragment instances have finished cancelling t Yes, added a line that explains that. http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.h@292 PS4, Line 292: ackendExecS > Did you mean IMPALA-4063 ? Does it mean this comment needs to be removed. M Yes I meant IMPALA-4063. Removed this. http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.h@350 PS4, Line 350: /// created in St > std::unique_ptr Done http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.h@355 PS4, Line 355: /// whether they > std::unique_ptr Done http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@a350 PS4, Line 350: : > Was this a bug ?! Will we hang in the for loop at line 363 below if we fail This technically is a race, but we never hit it because none of the fragment instances try to access fis_map_. I changed it back to the original way. The reason I had this change in the first place was because one of my intermediate patches hit that error, but I don't need this change now. http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@141 PS4, Line 141: instances_prepared_barrier_ > As discussed offline, the behavior during "preparation" phase of the query I changed this based on our offline discussion. The gist is that I removed the FIS::prepared_promise_ since it was unnecessary and introduced a 'status_lock_' and a 'prepared_status_'. Each FIS will try to update 'prepared_status_' on its own when it hits an error. http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@225 PS4, Line 225: DCHECK(false > DCHECK(false) ?! Does this warrant a targeted BE test ? Oops, fixed. I don't think we'll be able to write a meaningful BE test for the state machine, because we'll be forcing it into cases we never expect the code to hit, just to see an error message. http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@230 PS4, Line 230: s QueryState::UpdateBackendExecState( > This function seems like a no-op in this patch. Can this be added in the ne Yes, I removed it. http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@278 PS4, Line 278: ReportExecStatusAux(done, status, fis, true); : } : : void QueryState::ReportExecStatusAux(bool done, const Status& status, : FragmentInstanceState* fis, bool instances_started) { : // if we're reporting an error, we're done : DCHECK(status.ok() || done > Will this actually be duplicating the error message which we usually log al The first operator to hit an error will log the error itself. This is a query wide acknowledgement of that error in an executor. Subsequent errors hit in the same query won't show up here. We have a pattern of multiple places logging the same error already, and I think it's helpful if we have an acknowledgement of the query status of a query in an executor. Anyway, since we discussed and agreed to remove this, I'll do so. http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@391 PS4, Line 391: Status status = DescriptorTbl::Create(&o > As discussed offline, there is no need to have a different implementation f Changed based on our offline discussion. http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@434 PS4, Line 434: rence count > IMPALA-4063 ? Done http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@502 PS4, Line 502: initiate cancellation if nobody > Please see comments in query-exec-mgr.cc Done http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@542 PS4, Line 542: : > Not sure what you think about the following. It appears to me that we may n As we discussed, we can't differentiate between the different states here. i.e. Prepare() and Open() and ExecInternal() all are called within fis->Exec(). So we won't know which stage failed if we're planning to notify the barriers here. Also, this function runs in the context of a FIS thread, so sending a final profile here in case of an error is not a good idea, as we want all profiles to be sent by the query state thread. -- To view, visit http://gerrit.cloudera.org:8080/10813 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe Gerrit-Change-Number: 10813 Gerrit-PatchSet: 6 Gerrit-Owner: Sailesh Mukil <sail...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com> Gerrit-Comment-Date: Wed, 11 Jul 2018 20:39:48 +0000 Gerrit-HasComments: Yes