Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )
Change subject: IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC ...................................................................... Patch Set 12: (14 comments) http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/exec/hdfs-parquet-table-writer.h File be/src/exec/hdfs-parquet-table-writer.h: http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/exec/hdfs-parquet-table-writer.h@199 PS10, Line 199: > nit: should #include the appropriate .pb.h here ("include-what-you-use") Done http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.h@60 PS10, Line 60: const Coordinator& > I think this can probably change back to being const if you take the sugges Done http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.h@176 PS10, Line 176: last_report_ti > nit: I think the term "sequence number" is more usual here -- "version" to Done http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.h@220 PS10, Line 220: /// Backend exec params, owned by the QuerySchedule and has query lifetime. > This "back pointer" still seems error-prone to me. I think the object lifet Done http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.cc@267 PS10, Line 267: return num_remaining_instances_ == 0 || !status_.ok(); > I think a VLOG_QUERY about the skipped RPC is probably useful Done http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/coordinator-backend-state.cc@294 PS10, Line 294: DCHECK(!instance_stats->done_); > nit: why not: The ctor was marked explicit so not sure it's allowed: "explicit Status(const StatusPB& status);" http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/query-state.cc@287 PS10, Line 287: atus = report > Ah, I missed that we join on the reporter thread first. Good idea about using DFAKE_MUTEX(). Also switched to using a non-atomic. Also simplified the logic in Coordinator::BackendState::ApplyExecStatusReport() as we can rely purely on the sequence number as you suggested. http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/query-state.cc@375 PS10, Line 375: ReportExecStatusResponsePB resp; > should we have a failure injection point on the RPC itself? I only saw fail Please find the tests in test_rpc_timeout.py which: 1. inject delays in the RPC handler to induce timeout 2. run with a very short service queue to emulate a busy server. http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/query-state.cc@379 PS10, Line 379: reak; > should we backoff? I will refrain from changing the logic here too much. There will be a follow up patch after IMPALA-4063 which will change the retry logic. TODO added. http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/runtime-state.cc File be/src/runtime/runtime-state.cc: http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/runtime/runtime-state.cc@202 PS10, Line 202: } > the method doc says that new_errors is cleared, but it's actually written i This was lost after refactoring this function. Fixed now. http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/service/control-service.cc File be/src/service/control-service.cc: PS10: > This is a general krpc-in-Impala question: I can't find where you set up au Very good point. This is definitely a bug and it's now fixed in this commit here (https://github.com/apache/impala/commit/5c541b960491ba91533712144599fb3b6d99521d) http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/util/uid-util.h File be/src/util/uid-util.h: http://gerrit.cloudera.org:8080/#/c/10855/10/be/src/util/uid-util.h@79 PS10, Line 79: DCHECK(uid_pb.IsInitialized()); > worth DCHECKs here that the fields are set by calling uid_pb.IsInitialized( Done http://gerrit.cloudera.org:8080/#/c/10855/10/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/10855/10/bin/impala-config.sh@562 PS10, Line 562: export HBASE_CONF_DIR="$IMPALA_FE_DIR/src/test/resources" > why's this necessary? Can we change cmake to invoke it from the full path i FindProtobuf should have set PROTOBUF_PROTOC_EXECUTABLE. Not sure why I needed to set it before. http://gerrit.cloudera.org:8080/#/c/10855/10/tests/custom_cluster/test_rpc_exception.py File tests/custom_cluster/test_rpc_exception.py: http://gerrit.cloudera.org:8080/#/c/10855/10/tests/custom_cluster/test_rpc_exception.py@97 PS10, Line 97: > can we change this flag to be in millis instead of seconds? Or do we advert I don't think this flag is documented as far as I understand. We can deprecate this old flag and rename it to include '_ms' suffix. -- To view, visit http://gerrit.cloudera.org:8080/10855 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe Gerrit-Change-Number: 10855 Gerrit-PatchSet: 12 Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Thu, 06 Sep 2018 17:48:58 +0000 Gerrit-HasComments: Yes