Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/14824 )
Change subject: IMPALA-9199: Add support for single query retries on cluster membership changes ...................................................................... Patch Set 5: (7 comments) http://gerrit.cloudera.org:8080/#/c/14824/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14824/5//COMMIT_MSG@26 PS5, Line 26: A query cannot be retried once any results from the original query : have been fetched, this is to prevent users from seeing incorrect results Do we have any mechanism for a user to specify that they want their results to always be spooled until the query is complete to ensure that any failed queries that are retriable do in fact get retried? http://gerrit.cloudera.org:8080/#/c/14824/5//COMMIT_MSG@41 PS5, Line 41: The Impala Web UI will list all retried queries as being in the : "RETRIED" state This may be a reasonable approach, but I think we should think about the user experience implications of this. In particular, it means that a retried query will show up as two separate queries on the webui, which is potentially confusing, esp. since the retry will have a different query_id that won't correspond the anything user-facing (eg. won't be the query_id printed out by impala-shell) and as you've got the patch right now, I'm not even sure if there's a way to figure out which retries would correspond to which original queries. We could of course do something like add the retry query id to the profile of the original query, but then people have to click through to the profile to match things up. There's also the problem that once CloseClientRequestState is called at the end of RetryQueryFromThreadPool, the original query will show up in the "Completed Queries" section (and may even disappear from there before the retry query completes if the query_log fills up), which is again probably confusing since from the perspective of the client that query is really still running. Some of this could possibly wait until a follow up patch to work out, as long as we've got a plan. http://gerrit.cloudera.org:8080/#/c/14824/5/be/src/service/client-request-state-map.h File be/src/service/client-request-state-map.h: http://gerrit.cloudera.org:8080/#/c/14824/5/be/src/service/client-request-state-map.h@38 PS5, Line 38: bool overwrite = false I don't think this is being used anywhere anymore. http://gerrit.cloudera.org:8080/#/c/14824/5/be/src/service/client-request-state.h File be/src/service/client-request-state.h: http://gerrit.cloudera.org:8080/#/c/14824/5/be/src/service/client-request-state.h@82 PS5, Line 82: RETRYING, RETRIED, UNKNOWN I think this approach, and the way you have to save the previous state and then sometimes use this or sometimes the previous state, is confusing. Its probably more straight forward to just have a separate 'enum Retry State { RETRYING, RETRIED, NOT_RETRIED } >From my other comments, if we add a ClientRequestState wrapper, it could go in >there, or even might not be necessary since the wrapper would already know if >we're retrying based on if there's multiple ClientRequestStates that its >wrapped around. http://gerrit.cloudera.org:8080/#/c/14824/5/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/14824/5/be/src/service/impala-server.cc@632 PS5, Line 632: shared_ptr<ClientRequestState> request_state = GetClientFacingRequestState(query_id); Unless I'm missing something in this patch, this means that only the profile for the retry query will be available through the GetRuntimeProfile() api. Possibly fine to leave for followup work, but I think we definitely need to have some way of returning the profile for the original query too. http://gerrit.cloudera.org:8080/#/c/14824/5/be/src/service/impala-server.cc@920 PS5, Line 920: unique_ptr<TExecRequest> exec_request = make_unique<TExecRequest>(); So I'm wondering if this patch would benefit from adding another layer of abstraction. What I have in mind is something like a wrapper around ClientRequestState that can contain multiple ClientRequestStates (though possibly we would name the wrapper ClientRequestState and rename the current ClientRequestState to something like QueryInstanceState or whatever). Some benefits of doing this: - It would provide a natural place for TExecRequest to live without doing this stuff with unique_ptr and std::move, as well as anything else that's common across all retries. - It would allow us to hide some of the details of retries from impala-server, eg. we wouldn't need the logic around BlockOnWait that you've duplicated between the hs2 server and the beeswax server. http://gerrit.cloudera.org:8080/#/c/14824/5/be/src/service/impala-server.cc@1025 PS5, Line 1025: query_ctx->query_id = UuidToQueryId(random_generator()()); As mentioned before, I definitely think it would be awesome if we could create a separation between "user facing query_id" and "internal query_id". It would solve some user experisnce problems, for example right now if someone is debugging issues and looking through the logs, they're going to have to manually figure out the mapping from original query_id to retry query_id from looking at the profile or something (or I guess we can log the mapping). Similarly, the issues I've mentioned elsewhere with the webui. And in general its a cleaner abstraction than the current "the first query_id becomes the user-facing query_id, and the subsequent retry ids are treated differently". Its also potentially a lot more work, since we use query_id all over the place and we'd have to think about the places where we want to use "internal" vs. "user facing" id and pass both through in a lot of places. Maybe a compromise would be to do something clever like have all of the query_ids for a query and its retries share the same 'hi' with different 'lo's to make it obvious which queries correspond to each other. -- To view, visit http://gerrit.cloudera.org:8080/14824 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2e4a0e72a9bf8ec10b91639aefd81bef17886ddd Gerrit-Change-Number: 14824 Gerrit-PatchSet: 5 Gerrit-Owner: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Comment-Date: Fri, 31 Jan 2020 21:12:11 +0000 Gerrit-HasComments: Yes