Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14824 )

Change subject: IMPALA-9199: Add support for single query retries on cluster 
membership changes
......................................................................


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/14824/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14824/5//COMMIT_MSG@26
PS5, Line 26: A query cannot be retried once any results from the original query
            :   have been fetched, this is to prevent users from seeing 
incorrect results
Do we have any mechanism for a user to specify that they want their results to 
always be spooled until the query is complete to ensure that any failed queries 
that are retriable do in fact get retried?


http://gerrit.cloudera.org:8080/#/c/14824/5//COMMIT_MSG@41
PS5, Line 41: The Impala Web UI will list all retried queries as being in the
            :       "RETRIED" state
This may be a reasonable approach, but I think we should think about the user 
experience implications of this.

In particular, it means that a retried query will show up as two separate 
queries on the webui, which is potentially confusing, esp. since the retry will 
have a different query_id that won't correspond the anything user-facing (eg. 
won't be the query_id printed out by impala-shell) and as you've got the patch 
right now, I'm not even sure if there's a way to figure out which retries would 
correspond to which original queries. We could of course do something like add 
the retry query id to the profile of the original query, but then people have 
to click through to the profile to match things up.

There's also the problem that once CloseClientRequestState is called at the end 
of RetryQueryFromThreadPool, the original query will show up in the "Completed 
Queries" section (and may even disappear from there before the retry query 
completes if the query_log fills up), which is again probably confusing since 
from the perspective of the client that query is really still running.

Some of this could possibly wait until a follow up patch to work out, as long 
as we've got a plan.


http://gerrit.cloudera.org:8080/#/c/14824/5/be/src/service/client-request-state-map.h
File be/src/service/client-request-state-map.h:

http://gerrit.cloudera.org:8080/#/c/14824/5/be/src/service/client-request-state-map.h@38
PS5, Line 38: bool overwrite = false
I don't think this is being used anywhere anymore.


http://gerrit.cloudera.org:8080/#/c/14824/5/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/14824/5/be/src/service/client-request-state.h@82
PS5, Line 82: RETRYING, RETRIED, UNKNOWN
I think this approach, and the way you have to save the previous state and then 
sometimes use this or sometimes the previous state, is confusing.

Its probably more straight forward to just have a separate 'enum Retry State { 
RETRYING, RETRIED, NOT_RETRIED }

>From my other comments, if we add a ClientRequestState wrapper, it could go in 
>there, or even might not be necessary since the wrapper would already know if 
>we're retrying based on if there's multiple ClientRequestStates that its 
>wrapped around.


http://gerrit.cloudera.org:8080/#/c/14824/5/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/14824/5/be/src/service/impala-server.cc@632
PS5, Line 632:     shared_ptr<ClientRequestState> request_state = 
GetClientFacingRequestState(query_id);
Unless I'm missing something in this patch, this means that only the profile 
for the retry query will be available through the GetRuntimeProfile() api. 
Possibly fine to leave for followup work, but I think we definitely need to 
have some way of returning the profile for the original query too.


http://gerrit.cloudera.org:8080/#/c/14824/5/be/src/service/impala-server.cc@920
PS5, Line 920:   unique_ptr<TExecRequest> exec_request = 
make_unique<TExecRequest>();
So I'm wondering if this patch would benefit from adding another layer of 
abstraction.

What I have in mind is something like a wrapper around ClientRequestState that 
can contain multiple ClientRequestStates (though possibly we would name the 
wrapper ClientRequestState and rename the current ClientRequestState to 
something like QueryInstanceState or whatever).

Some benefits of doing this:
- It would provide a natural place for TExecRequest to live without doing this 
stuff with unique_ptr and std::move, as well as anything else that's common 
across all retries.
- It would allow us to hide some of the details of retries from impala-server, 
eg. we wouldn't need the logic around BlockOnWait that you've duplicated 
between the hs2 server and the beeswax server.


http://gerrit.cloudera.org:8080/#/c/14824/5/be/src/service/impala-server.cc@1025
PS5, Line 1025: query_ctx->query_id = UuidToQueryId(random_generator()());
As mentioned before, I definitely think it would be awesome if we could create 
a separation between "user facing query_id" and "internal query_id".

It would solve some user experisnce problems, for example right now if someone 
is debugging issues and looking through the logs, they're going to have to 
manually figure out the mapping from original query_id to retry query_id from 
looking at the profile or something (or I guess we can log the mapping). 
Similarly, the issues I've mentioned elsewhere with the webui. And in general 
its a cleaner abstraction than the current "the first query_id becomes the 
user-facing query_id, and the subsequent retry ids are treated differently".

Its also potentially a lot more work, since we use query_id all over the place 
and we'd have to think about the places where we want to use "internal" vs. 
"user facing" id and pass both through in a lot of places.

Maybe a compromise would be to do something clever like have all of the 
query_ids for a query and its retries share the same 'hi' with different 'lo's 
to make it obvious which queries correspond to each other.



--
To view, visit http://gerrit.cloudera.org:8080/14824
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2e4a0e72a9bf8ec10b91639aefd81bef17886ddd
Gerrit-Change-Number: 14824
Gerrit-PatchSet: 5
Gerrit-Owner: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Comment-Date: Fri, 31 Jan 2020 21:12:11 +0000
Gerrit-HasComments: Yes

Reply via email to