[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange

2017-09-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5199: prevent hang on empty row batch exchange
..


IMPALA-5199: prevent hang on empty row batch exchange

The error path where delivery of "eos" fails now behaves
the same as if delivery of a row batch fails.

Testing:
Added a timeout test where the leaf fragments return 0 rows. Before
the change this reproduced the hang.

Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645
Reviewed-on: http://gerrit.cloudera.org:8080/8005
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/runtime/data-stream-mgr.cc
M testdata/workloads/functional-query/queries/QueryTest/exchange-delays.test
2 files changed, 23 insertions(+), 4 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/8005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange

2017-09-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5199: prevent hang on empty row batch exchange
..


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/8005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange

2017-09-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5199: prevent hang on empty row batch exchange
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1228/

-- 
To view, visit http://gerrit.cloudera.org:8080/8005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange

2017-09-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5199: prevent hang on empty row batch exchange
..


Patch Set 3: Code-Review+2

carry

-- 
To view, visit http://gerrit.cloudera.org:8080/8005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange

2017-09-14 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5199: prevent hang on empty row batch exchange
..


Patch Set 2:

I spoke to Alex. He had some general concerns with whether it could possible 
cause queries to fail when they didn't before. I think our reasoning is valid 
that this is a small subset of the queries that would fail anyway in the same 
circumstance - those with non-empty exchanges.

-- 
To view, visit http://gerrit.cloudera.org:8080/8005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange

2017-09-14 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5199: prevent hang on empty row batch exchange
..


Patch Set 2: Code-Review+2

Please check if Alex is still concerned before committing.

-- 
To view, visit http://gerrit.cloudera.org:8080/8005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange

2017-09-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5199: prevent hang on empty row batch exchange
..


Patch Set 2: Code-Review+1

The only concern I had is listed above. However, since this is not going as a 
part of a release now, we have a good amount of time for testing. So, I think 
we can go ahead with this and observe the impact.

-- 
To view, visit http://gerrit.cloudera.org:8080/8005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange

2017-09-14 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5199: prevent hang on empty row batch exchange
..


Patch Set 2:

Alex/Sailesh, are you okay with this change?

-- 
To view, visit http://gerrit.cloudera.org:8080/8005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange

2017-09-11 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5199: prevent hang on empty row batch exchange
..


Patch Set 2:

Yeah I think there are some fundamental problems with the current parallel 
startup logic - I don't think it's actually possible to make all cases work 
with the current approach where all state is receiver-side and there's no 
additional communication (point-to-point or via the coordinator) to 
disambiguate hangs and delays.

IMPALA-3990

-- 
To view, visit http://gerrit.cloudera.org:8080/8005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange

2017-09-11 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5199: prevent hang on empty row batch exchange
..


Patch Set 2: Code-Review+1

Making these cases consistent makes sense to me.  Really, I think we need to 
eventually rethink the whole parallel startup stuff, especially now that we 
have per-query execrpc, etc. It just seems too fragile.  But in the mean time, 
I see no reason the eos and non-eos case inconsistent.

-- 
To view, visit http://gerrit.cloudera.org:8080/8005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange

2017-09-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5199: prevent hang on empty row batch exchange
..


Patch Set 2:

The query is definitely cancellable but I'm not sure how a user would 
necessarily know that it's hung as opposed to slow or what to do about it (this 
got reported to me by a user as a "slow" query).

-- 
To view, visit http://gerrit.cloudera.org:8080/8005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange

2017-09-08 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5199: prevent hang on empty row batch exchange
..


Patch Set 2:

If we hit IMPALA-5199 without this patch, can system be brought back into a 
good state by cancelling the query?

I think it's generally preferable to avoid hangs even if it means we 
unnecessarily (gracefully) have cancel some queries under load. Whether the 
latter case is rare is really the key question here. Is there an experiment we 
can do to get data to make a decision?

-- 
To view, visit http://gerrit.cloudera.org:8080/8005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange

2017-09-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5199: prevent hang on empty row batch exchange
..


Patch Set 2:

I think there is a fundamental problem with the "closed streams cache" 
mechanism and long-running queries with limits in the middle of the plan though 
- IMPALA-3990 - which can happen even not under load. I think that's the bug 
you're describing.

I guess this change makes IMPALA-3990 possible when 0 rows are sent. To me, it 
seems better to make behaviour consistent and avoid this hang even if adds 
another presumably rare case to IMPALA-3990. I don't think we've seen 
IMPALA-3990 in practice. 

I can see the argument though for just leaving this alone given the hang also 
seems to be rare. What do you think?

-- 
To view, visit http://gerrit.cloudera.org:8080/8005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange

2017-09-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5199: prevent hang on empty row batch exchange
..


Patch Set 2:

That seems pretty rare but possible.

In the common case where the limit is at the top level of the query, I think 
this would be fine, because the query would hit the limit on the coordinator, 
return successfully, and ignore any future errors. Those timeout errors would 
have to arrive after STREAM_EXPIRATION_TIME_MS (since if the closed stream was 
in the cache, we wouldn't report an error before that time). 

So I think in general the timeline would need to be:
* Limit is hit at t1
* The eos message is received at t2 >= t1 + STREAM_EXPIRATION_TIME_MS and an 
error is sent to the coordinator
* The query fails, but it would have succeeded at t3 > t2 if the error hadn't 
been received

If the limit is not at the top level, it seems possible but unlikely that the 
error would cancel a query that was on the path to succeeding. I'm not sure if 
it changes the odds of queries succeeding enough to worry about, given that the 
cluster is already unhealthy if these timeouts are occurring.

-- 
To view, visit http://gerrit.cloudera.org:8080/8005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange

2017-09-07 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5199: prevent hang on empty row batch exchange
..


Patch Set 2:

The fix looks good to me, and should work fine for most cases.

One case I'm worried about is that, under heavy load, if a receiver deregisters 
itself early (due to a LIMIT, etc.), and the sender sends the 'eos' after 
STREAM_EXPIRATION_TIME_MS, the query would have previously succeeded, even 
though the sender couldn't find the receiver in CloseSender(). And the query 
would be right to succeed in this case since the 'eos' here is spurious as the 
receiver is already gone. Although, I know this counts as succeeding "by 
mistake".

Now, we would be failing these queries. And given that we see the 
'DATASTREAM_SENDER_TIMEOUT' fairly often, that would regress a few workloads.

What do you think?

-- 
To view, visit http://gerrit.cloudera.org:8080/8005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange

2017-09-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#2).

Change subject: IMPALA-5199: prevent hang on empty row batch exchange
..

IMPALA-5199: prevent hang on empty row batch exchange

The error path where delivery of "eos" fails now behaves
the same as if delivery of a row batch fails.

Testing:
Added a timeout test where the leaf fragments return 0 rows. Before
the change this reproduced the hang.

Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645
---
M be/src/runtime/data-stream-mgr.cc
M testdata/workloads/functional-query/queries/QueryTest/exchange-delays.test
2 files changed, 23 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/8005/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-5199: prevent hang on empty row batch exchange

2017-09-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/8005

Change subject: IMPALA-5199: prevent hang on empty row batch exchange
..

IMPALA-5199: prevent hang on empty row batch exchange

The error path where delivery of "eos" fails now behaves
the same as if delivery of a row batch fails.

Testing:
Added a timeout test where the leaf fragments return 0 rows. Before
the change this reproduced the hang.

Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645
---
M be/src/runtime/data-stream-mgr.cc
M testdata/workloads/functional-query/queries/QueryTest/exchange-delays.test
2 files changed, 20 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/8005/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong