Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12049 )

Change subject: IMPALA-4555: Make QueryState's status reporting more robust
......................................................................


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/12049/1/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/12049/1/be/src/runtime/coordinator-backend-state.cc@299
PS1, Line 299: report_seq_no <= instance_stats->last_report_seq_no_ || 
instance_stats->done_) {
Wouldn't this be a problem for intermediate report ? If the coordinator is slow 
to respond to the RPC, the client side (i.e. the executor) may assume that the 
intermediate reports were never applied even if it was actually applied by the 
coordinator. In which case, the new report may come with a larger sequence 
number which is larger than the old one and that may cause duplicated report.

May be it makes sense to keep track of whether a FIS' report was successfully 
applied at the executor side and bump the sequence number if the last one was 
applied successfully. Not sure if there is any corner case which may not have 
been considered in this alternate approach ?!


http://gerrit.cloudera.org:8080/#/c/12049/1/be/src/runtime/fragment-instance-state.h
File be/src/runtime/fragment-instance-state.h:

http://gerrit.cloudera.org:8080/#/c/12049/1/be/src/runtime/fragment-instance-state.h@99
PS1, Line 99: void SetFinalReportSent() { final_report_sent_ = true; }
May help to add comments on the expected caller and on when this is expected to 
be called ?


http://gerrit.cloudera.org:8080/#/c/12049/1/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/12049/1/be/src/runtime/query-state.h@417
PS1, Line 417: (bool final
Comment on the input parameter.


http://gerrit.cloudera.org:8080/#/c/12049/1/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/12049/1/be/src/runtime/query-state.cc@58
PS1, Line 58: DEFINE_int32(status_report_max_failures, 3,
            :     "Max number of consecutive failed status reports to allow 
before cancelling");
I thought we want to use a fixed timeout approach for the maximum retries ? I 
am okay with max_retries too but I think it may make sense to have larger 
number of retries than 3.


http://gerrit.cloudera.org:8080/#/c/12049/1/be/src/runtime/query-state.cc@324
PS1, Line 324: Try to send the RPC 3 times before failing. Sleep for 100ms 
between retries.
May need to be updated.



--
To view, visit http://gerrit.cloudera.org:8080/12049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6007013fc2c9e8eeba11b752ee58fb3038da971
Gerrit-Change-Number: 12049
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Marshall <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Comment-Date: Mon, 10 Dec 2018 21:56:58 +0000
Gerrit-HasComments: Yes

Reply via email to