Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14824 )

Change subject: IMPALA-9199: Add support for single query retries on cluster 
membership changes
......................................................................


Patch Set 9:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/14824/9/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/14824/9/be/src/runtime/coordinator.cc@502
PS9, Line 502:         RETURN_IF_ERROR(HandleFailedExecRpc(backend_state));
For this loop, do we have any expectation about the number of BackendStates 
that have errors? In other words, can more than one have errors?

If there can be more than one error, we would call TryQueryRetry more than 
once, but based on my reading of that function, it can handle that. Am I right?

Is there any concern that if there is more than one error, we may want to 
blacklist more than one Impalad before retrying?


http://gerrit.cloudera.org:8080/#/c/14824/9/be/src/runtime/query-driver.h
File be/src/runtime/query-driver.h:

http://gerrit.cloudera.org:8080/#/c/14824/9/be/src/runtime/query-driver.h@116
PS9, Line 116: a
             :   /// statestore updates
I think this should be singular "a statestore update"


http://gerrit.cloudera.org:8080/#/c/14824/9/be/src/runtime/query-driver.h@122
PS9, Line 122:   Status TryQueryRetry(ClientRequestState* client_request_state, 
Status* status,
             :       bool* was_retried = nullptr) WARN_UNUSED_RESULT;
>From reading through TryQueryRetry(), it looks like this is safe to call 
>multiple times and it will only do the retry once. Is that true? Can we add a 
>comment here that spells out that it can be called more than once?


http://gerrit.cloudera.org:8080/#/c/14824/9/be/src/service/impala-hs2-server.cc
File be/src/service/impala-hs2-server.cc:

http://gerrit.cloudera.org:8080/#/c/14824/9/be/src/service/impala-hs2-server.cc@763
PS9, Line 763:   ClientRequestState* request_state = nullptr;
> So I'm wondering if its safe to no longer have any shared_ptr here and in t
I had the same concern.


http://gerrit.cloudera.org:8080/#/c/14824/9/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/14824/9/be/src/service/impala-server.h@673
PS9, Line 673: ClientRequestState*&
Stylistically, I personally don't like mutable references for out arguments. 
Using ** instead makes it much clearer that values are being passed between 
functions. Callers need to pass addresses. The function itself can't treat it 
like a local variable.


http://gerrit.cloudera.org:8080/#/c/14824/9/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/14824/9/be/src/service/impala-server.cc@1453
PS9, Line 1453:   ClientRequestState::RetryState retry_state;
Nit: you could move this declaration down past the first BlockOnWait()


http://gerrit.cloudera.org:8080/#/c/14824/9/tests/custom_cluster/test_query_retries.py
File tests/custom_cluster/test_query_retries.py:

http://gerrit.cloudera.org:8080/#/c/14824/9/tests/custom_cluster/test_query_retries.py@93
PS9, Line 93: Increase the statestore
            :     heartbeat frequency so that the query actually fails during 
execution.
It looks like we are using the default value for 
statestore_heartbeat_frequency_ms. Is that intentional?

If I understand this correctly, this test uses the heavy shuffle query and 
should see an RPC failure. We are not expecting the statestore to see the 
Impalad as dead. Is that right?



--
To view, visit http://gerrit.cloudera.org:8080/14824
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2e4a0e72a9bf8ec10b91639aefd81bef17886ddd
Gerrit-Change-Number: 14824
Gerrit-PatchSet: 9
Gerrit-Owner: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-Comment-Date: Mon, 27 Apr 2020 17:48:11 +0000
Gerrit-HasComments: Yes

Reply via email to