[Impala-CR](cdh5-trunk) IMPALA-2626: In-flight queries fail when statestore comes back online.
Jim Apple has abandoned this change. Change subject: IMPALA-2626: In-flight queries fail when statestore comes back online. .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/1380 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I102391ab63270a9686cf45457b8384ffcd2abe8a Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Sailesh Mukil
[Impala-CR](cdh5-trunk) IMPALA-2626: In-flight queries fail when statestore comes back online.
Jim Apple has posted comments on this change. Change subject: IMPALA-2626: In-flight queries fail when statestore comes back online. .. Patch Set 2: Please update to using the new gerrit project, "Impala-ASF". Instructions are here: https://cwiki.apache.org/confluence/display/IMPALA/How+to+switch+to+Apache-hosted+git Pushes to this project will be disabled on October 1. -- To view, visit http://gerrit.cloudera.org:8080/1380 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I102391ab63270a9686cf45457b8384ffcd2abe8a Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-CR](cdh5-trunk) IMPALA-2626: In-flight queries fail when statestore comes back online.
Henry Robinson has posted comments on this change. Change subject: IMPALA-2626: In-flight queries fail when statestore comes back online. .. Patch Set 2: Sailesh - are you still working on this one? -- To view, visit http://gerrit.cloudera.org:8080/1380 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I102391ab63270a9686cf45457b8384ffcd2abe8a Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-CR](cdh5-trunk) IMPALA-2626: In-flight queries fail when statestore comes back online.
Henry Robinson has posted comments on this change. Change subject: IMPALA-2626: In-flight queries fail when statestore comes back online. .. Patch Set 2: How does your patch handle that case? I think it waits for the next topic update that has some entries in it, and then computes the diff. Why not do the same thing: If the node has disappeared from the topic, but there has been no deletion event, wait for another topic update (or some number) before declaring the node dead. This generalises your current patch to handle the case where the topic update contains a partial update, and makes Impala a bit more robust to slow recovery from a statestore failure. -- To view, visit http://gerrit.cloudera.org:8080/1380 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I102391ab63270a9686cf45457b8384ffcd2abe8a Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-CR](cdh5-trunk) IMPALA-2626: In-flight queries fail when statestore comes back online.
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-2626: In-flight queries fail when statestore comes back online. .. Patch Set 2: > My suggestion is that there's some way to tell whether a backend > was removed because it failed, or because the statestore restarted, > because in the former case you get a deletion notification, and in > the other it just stops showing up in the topic. Yes, but what if a node(s) goes down the same time as the statestore goes down? The statestore wouldn't send a deletion topic for that node(s) because it wouldn't know it existed and so the query would never get cancelled. Also, I would think that this could happen with a higher than negligible chance on larger clusters, so it's safer to be pessimistic in this case. -- To view, visit http://gerrit.cloudera.org:8080/1380 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I102391ab63270a9686cf45457b8384ffcd2abe8a Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-CR](cdh5-trunk) IMPALA-2626: In-flight queries fail when statestore comes back online.
Henry Robinson has posted comments on this change. Change subject: IMPALA-2626: In-flight queries fail when statestore comes back online. .. Patch Set 1: My suggestion is that there's some way to tell whether a backend was removed because it failed, or because the statestore restarted, because in the former case you get a deletion notification, and in the other it just stops showing up in the topic. -- To view, visit http://gerrit.cloudera.org:8080/1380 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I102391ab63270a9686cf45457b8384ffcd2abe8a Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-CR](cdh5-trunk) IMPALA-2626: In-flight queries fail when statestore comes back online.
Sailesh Mukil has uploaded a new patch set (#2). Change subject: IMPALA-2626: In-flight queries fail when statestore comes back online. .. IMPALA-2626: In-flight queries fail when statestore comes back online. During a session, if the statestore goes down, the impalads can continue execution without the statestore with the stale metadata that they posses. However, when the statestore comes back online, the first membership callback it makes to the impalad hosts, erases the "known_backends" list that the impalads have stored locally. Therefore, in-flight queries fail. This patch makes sure that when the impalad is reconnected with the statestore, it does not delete it's 'known_backends' list if there are zero topic entry updates from the statestore. The in-flight queries still can fail if the initial backend list from the statestore does not contain all the backends that the impalad is already working with on the in-flight query. Change-Id: I102391ab63270a9686cf45457b8384ffcd2abe8a --- M be/src/service/impala-server.cc 1 file changed, 3 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/80/1380/2 -- To view, visit http://gerrit.cloudera.org:8080/1380 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I102391ab63270a9686cf45457b8384ffcd2abe8a Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-CR](cdh5-trunk) IMPALA-2626: In-flight queries fail when statestore comes back online.
Sailesh Mukil has posted comments on this change.
Change subject: IMPALA-2626: In-flight queries fail when statestore comes back
online.
..
Patch Set 1:
(1 comment)
> (1 comment)
>
> I think this only fixes a particular instance of the problem: if
> the statestore hasn't yet got updates from all the subscribers, it
> will send a partial update which will have roughly the same effect
> (since most queries run on all machines).
>
> Doesn't the statestore give a list of deletions with an update?
> Presumably if it restarts, it won't send deletions for any entries
> because it never knew they existed. The subscriber could only
> cancel queries on nodes for which there is an actual deletion (i.e.
> the node was known to have failed), but not include the missing
> nodes in any new scheduling decisions.
Yes you're right, it does only fix the problem if the statestore's first
callback after coming back up is empty, I've mentioned that as the last para of
the commit message.
If the statestore comes back up and gets updates only from a few subscribers,
it sends a partial update. But it's hard to determine at that point if this
callback is a partial update, or if it is the complete update which means all
the hosts that are not in the update actually went down. Due to this ambiguity,
we handle only the empty updates case.
Before this patch when the statestore sends an empty update, the known_backend_
map gets cleared. So all queries get cancelled.
The deletion picks out individual backends from the map, but it doesn't matter
if the map is empty. In short, if a backend is not in the known_backends_ map,
the queries running on that backend are cancelled.
http://gerrit.cloudera.org:8080/#/c/1380/1/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:
Line 1377: if (!delta.is_delta) {
> prefer
Done
--
To view, visit http://gerrit.cloudera.org:8080/1380
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I102391ab63270a9686cf45457b8384ffcd2abe8a
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil
Gerrit-Reviewer: Henry Robinson
Gerrit-Reviewer: Sailesh Mukil
Gerrit-Reviewer: Tim Armstrong
Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-2626: In-flight queries fail when statestore comes back online.
Henry Robinson has posted comments on this change.
Change subject: IMPALA-2626: In-flight queries fail when statestore comes back
online.
..
Patch Set 1:
(1 comment)
I think this only fixes a particular instance of the problem: if the statestore
hasn't yet got updates from all the subscribers, it will send a partial update
which will have roughly the same effect (since most queries run on all
machines).
Doesn't the statestore give a list of deletions with an update? Presumably if
it restarts, it won't send deletions for any entries because it never knew they
existed. The subscriber could only cancel queries on nodes for which there is
an actual deletion (i.e. the node was known to have failed), but not include
the missing nodes in any new scheduling decisions.
http://gerrit.cloudera.org:8080/#/c/1380/1/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:
Line 1377: if (!delta.is_delta) {
prefer
!delta.is_delta && delta.topic_entries.size() > 0
--
To view, visit http://gerrit.cloudera.org:8080/1380
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I102391ab63270a9686cf45457b8384ffcd2abe8a
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil
Gerrit-Reviewer: Henry Robinson
Gerrit-Reviewer: Tim Armstrong
Gerrit-HasComments: Yes
