Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12049 )

Change subject: IMPALA-4555: Make QueryState's status reporting more robust
......................................................................


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/coordinator-backend-state.cc@300
PS2, Line 300: nstance_exec_status :
> Yes, but instance_stats->done_ can be set by an intermediate report (if tha
Thanks for the explanation. This seems a bit subtle indeed.

Would things be simpler if we just stop bumping the sequence number after 
generating the last report for a fragment instance ? In other words, we need to 
record the fact that the final report may have been generated already, which is 
different from whether final report has been sent.


http://gerrit.cloudera.org:8080/#/c/12049/3/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/12049/3/be/src/runtime/query-state.h@394
PS3, Line 394: succeeds
nit: succeeded, matching "failed" below.


http://gerrit.cloudera.org:8080/#/c/12049/3/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/12049/3/be/src/runtime/query-state.cc@382
PS3, Line 382:         it = report_.mutable_instance_exec_status()->erase(it);
             :         profile_it = 
profiles_forest_.profile_trees.erase(profile_it);
Reading the documentation of vector::erase(), it's not exactly an efficient 
operation as it seems to do quite a bit of copying. Doing this in a list may 
end up being O(n^2) here.

I suppose the reason for doing this dance here is to make UpdateReport() work 
on both code paths of ReportSuccessful() and ReportFailed().

Assuming the common case is that the report succeeds, it seems to be doing 
quite a bit of extra work just to erase each element one by one. Shouldn't the 
penalty of copying be paid instead when the report fails, which is presumably 
not too common ? In other words, should we just copy / move the stateful report 
into the fragment instance state instead ?

A slightly more efficient approach of what this loop is trying to achieve could 
be swapping elements at the end of the list to this spot being erased.  In 
which case, it would be O(n) worst case. However, I am not sure if this 
complexity is warranted.


http://gerrit.cloudera.org:8080/#/c/12049/2/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/12049/2/common/protobuf/control_service.proto@117
PS2, Line 117:   // Sequence number prevents out-of-order or duplicated updates 
from being applied.
             :   optional int64 report_seq_no = 1;
May be worth documenting the relationship of this seq_no with that in 
FragmentInstanceExecStatusPB. If I understand it correctly, this one should be 
<= that in FragmentInstanceExecStatusPB.


http://gerrit.cloudera.org:8080/#/c/12049/2/common/protobuf/control_service.proto@138
PS2, Line 138:   // Cumulative structural changes made by the table sink of 
this fragment
             :   // instance. This is sent only when 'done' above is true. Not 
idempotent.
             :   optional DmlExecStatusPB dml_exec_status = 5;
> Its not quite straight-forward to do, so given this patch is already reason
Sounds good to me.



--
To view, visit http://gerrit.cloudera.org:8080/12049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6007013fc2c9e8eeba11b752ee58fb3038da971
Gerrit-Change-Number: 12049
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Marshall <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Thomas Marshall <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Thu, 24 Jan 2019 21:55:21 +0000
Gerrit-HasComments: Yes

Reply via email to