[Impala-CR](cdh5-trunk) IMPALA-2626: In-flight queries fail when statestore comes back online.

2016-10-01 Thread Jim Apple (Code Review)
Jim Apple has abandoned this change.

Change subject: IMPALA-2626: In-flight queries fail when statestore comes back 
online.
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/1380
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I102391ab63270a9686cf45457b8384ffcd2abe8a
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-CR](cdh5-trunk) IMPALA-2626: In-flight queries fail when statestore comes back online.

2016-09-01 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-2626: In-flight queries fail when statestore comes back 
online.
..


Patch Set 2:

Please update to using the new gerrit project, "Impala-ASF".
Instructions are here:

https://cwiki.apache.org/confluence/display/IMPALA/How+to+switch+to+Apache-hosted+git

Pushes to this project will be disabled on October 1.

-- 
To view, visit http://gerrit.cloudera.org:8080/1380
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I102391ab63270a9686cf45457b8384ffcd2abe8a
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-CR](cdh5-trunk) IMPALA-2626: In-flight queries fail when statestore comes back online.

2016-08-05 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-2626: In-flight queries fail when statestore comes back 
online.
..


Patch Set 2:

Sailesh - are you still working on this one?

-- 
To view, visit http://gerrit.cloudera.org:8080/1380
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I102391ab63270a9686cf45457b8384ffcd2abe8a
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-CR](cdh5-trunk) IMPALA-2626: In-flight queries fail when statestore comes back online.

2016-03-08 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-2626: In-flight queries fail when statestore comes back 
online.
..


Patch Set 2:

How does your patch handle that case? I think it waits for the next topic 
update that has some entries in it, and then computes the diff. Why not do the 
same thing:

If the node has disappeared from the topic, but there has been no deletion 
event, wait for another topic update (or some number) before declaring the node 
dead.

This generalises your current patch to handle the case where the topic update 
contains a partial update, and makes Impala a bit more robust to slow recovery 
from a statestore failure.

-- 
To view, visit http://gerrit.cloudera.org:8080/1380
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I102391ab63270a9686cf45457b8384ffcd2abe8a
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-CR](cdh5-trunk) IMPALA-2626: In-flight queries fail when statestore comes back online.

2016-03-08 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-2626: In-flight queries fail when statestore comes back 
online.
..


Patch Set 2:

> My suggestion is that there's some way to tell whether a backend
 > was removed because it failed, or because the statestore restarted,
 > because in the former case you get a deletion notification, and in
 > the other it just stops showing up in the topic.

Yes, but what if a node(s) goes down the same time as the statestore goes down? 
The statestore wouldn't send a deletion topic for that node(s) because it 
wouldn't know it existed and so the query would never get cancelled.

Also, I would think that this could happen with a higher than negligible chance 
on larger clusters, so it's safer to be pessimistic in this case.

-- 
To view, visit http://gerrit.cloudera.org:8080/1380
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I102391ab63270a9686cf45457b8384ffcd2abe8a
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-CR](cdh5-trunk) IMPALA-2626: In-flight queries fail when statestore comes back online.

2016-03-08 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-2626: In-flight queries fail when statestore comes back 
online.
..


Patch Set 1:

My suggestion is that there's some way to tell whether a backend was removed 
because it failed, or because the statestore restarted, because in the former 
case you get a deletion notification, and in the other it just stops showing up 
in the topic.

-- 
To view, visit http://gerrit.cloudera.org:8080/1380
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I102391ab63270a9686cf45457b8384ffcd2abe8a
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-CR](cdh5-trunk) IMPALA-2626: In-flight queries fail when statestore comes back online.

2016-03-07 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new patch set (#2).

Change subject: IMPALA-2626: In-flight queries fail when statestore comes back 
online.
..

IMPALA-2626: In-flight queries fail when statestore comes back online.

During a session, if the statestore goes down, the impalads can
continue execution without the statestore with the stale metadata
that they posses.
However, when the statestore comes back online, the first membership
callback it makes to the impalad hosts, erases the "known_backends"
list that the impalads have stored locally. Therefore, in-flight
queries fail.

This patch makes sure that when the impalad is reconnected with the
statestore, it does not delete it's 'known_backends' list if there are
zero topic entry updates from the statestore.

The in-flight queries still can fail if the initial backend list from
the statestore does not contain all the backends that the impalad
is already working with on the in-flight query.

Change-Id: I102391ab63270a9686cf45457b8384ffcd2abe8a
---
M be/src/service/impala-server.cc
1 file changed, 3 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/80/1380/2
-- 
To view, visit http://gerrit.cloudera.org:8080/1380
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I102391ab63270a9686cf45457b8384ffcd2abe8a
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 


[Impala-CR](cdh5-trunk) IMPALA-2626: In-flight queries fail when statestore comes back online.

2016-03-07 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-2626: In-flight queries fail when statestore comes back 
online.
..


Patch Set 1:

(1 comment)

> (1 comment)
 > 
 > I think this only fixes a particular instance of the problem: if
 > the statestore hasn't yet got updates from all the subscribers, it
 > will send a partial update which will have roughly the same effect
 > (since most queries run on all machines).
 > 
 > Doesn't the statestore give a list of deletions with an update?
 > Presumably if it restarts, it won't send deletions for any entries
 > because it never knew they existed. The subscriber could only
 > cancel queries on nodes for which there is an actual deletion (i.e.
 > the node was known to have failed), but not include the missing
 > nodes in any new scheduling decisions.

Yes you're right, it does only fix the problem if the statestore's first 
callback after coming back up is empty, I've mentioned that as the last para of 
the commit message.
If the statestore comes back up and gets updates only from a few subscribers, 
it sends a partial update. But it's hard to determine at that point if this 
callback is a partial update, or if it is the complete update which means all 
the hosts that are not in the update actually went down. Due to this ambiguity, 
we handle only the empty updates case.

Before this patch when the statestore sends an empty update, the known_backend_ 
map gets cleared. So all queries get cancelled.
The deletion picks out individual backends from the map, but it doesn't matter 
if the map is empty. In short, if a backend is not in the known_backends_ map, 
the queries running on that backend are cancelled.

http://gerrit.cloudera.org:8080/#/c/1380/1/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 1377: if (!delta.is_delta) {
> prefer 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/1380
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I102391ab63270a9686cf45457b8384ffcd2abe8a
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-CR](cdh5-trunk) IMPALA-2626: In-flight queries fail when statestore comes back online.

2016-03-07 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-2626: In-flight queries fail when statestore comes back 
online.
..


Patch Set 1:

(1 comment)

I think this only fixes a particular instance of the problem: if the statestore 
hasn't yet got updates from all the subscribers, it will send a partial update 
which will have roughly the same effect (since most queries run on all 
machines).

Doesn't the statestore give a list of deletions with an update? Presumably if 
it restarts, it won't send deletions for any entries because it never knew they 
existed. The subscriber could only cancel queries on nodes for which there is 
an actual deletion (i.e. the node was known to have failed), but not include 
the missing nodes in any new scheduling decisions.

http://gerrit.cloudera.org:8080/#/c/1380/1/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 1377: if (!delta.is_delta) {
prefer 

  !delta.is_delta && delta.topic_entries.size() > 0


-- 
To view, visit http://gerrit.cloudera.org:8080/1380
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I102391ab63270a9686cf45457b8384ffcd2abe8a
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes