[Impala-ASF-CR] IMPALA-10258, IMPALA-10109: Fixed flaky test in test query retries.py

2020-12-09 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has abandoned this change. ( http://gerrit.cloudera.org:8080/16763 )

Change subject: IMPALA-10258, IMPALA-10109: Fixed flaky test in 
test_query_retries.py
..


Abandoned
--
To view, visit http://gerrit.cloudera.org:8080/16763
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: Ib89f7b01a0f2a66a97f312e779a4ab04f4f347f3
Gerrit-Change-Number: 16763
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10258, IMPALA-10109: Fixed flaky test in test query retries.py

2020-11-24 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16763 )

Change subject: IMPALA-10258, IMPALA-10109: Fixed flaky test in 
test_query_retries.py
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16763/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16763/2//COMMIT_MSG@7
PS2, Line 7: IMPALA-10258, IMPALA-10109
since these issues are basically unrelated, could you separate them out into 
two reviews?


http://gerrit.cloudera.org:8080/#/c/16763/2//COMMIT_MSG@9
PS2, Line 9: When TestQueryRetries.test_original_query_cancel was ran on s3
I'm not sure I understand what you're saying the issue is:

According to the JIRA, the test was waiting for the query to reach state 
"RUNNING", but it was already at state "EXCEPTION" (QueryState = 5, see 
beeswax.thrift). At that point in the test, the query shouldn't have failed, 
since the impalad hasn't been killed yet, so really not sure what could have 
happened, and unfortunately it doesn't look like we have the logs for it.


http://gerrit.cloudera.org:8080/#/c/16763/2//COMMIT_MSG@16
PS2, Line 16: For IMPALA-10109, test_retries_from_cancellation_pool did not
I'm not sure I understand what you're saying the issue is:

According to the JIRA, the query timed out after ~784s, which is a lot longer 
than the default statestore time-to-detect-failure of heartbeat_frequency x 
max_missed = 1000ms x 10 = 10s. So it seems like the coordinator should have 
had plenty of time to get the statestore message, even under the old values.

Looking through the logs, I'm a little confused by what I see - the coordinator 
says the query was only scheduled on 2 backends, but I think the test assumes 
that it gets scheduled on all 3 backends in the minicluster (see 
__kill_random_impalad()). I also see a reference to CancelFromThreadPool in 
QueryExecMgr on impalad_node1, but that should be hit unless the coordinator is 
killed, which it shouldn't have been.



--
To view, visit http://gerrit.cloudera.org:8080/16763
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib89f7b01a0f2a66a97f312e779a4ab04f4f347f3
Gerrit-Change-Number: 16763
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 24 Nov 2020 20:36:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10258, IMPALA-10109: Fixed flaky test in test query retries.py

2020-11-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16763 )

Change subject: IMPALA-10258, IMPALA-10109: Fixed flaky test in 
test_query_retries.py
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7727/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16763
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib89f7b01a0f2a66a97f312e779a4ab04f4f347f3
Gerrit-Change-Number: 16763
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 24 Nov 2020 18:46:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10258, IMPALA-10109: Fixed flaky test in test query retries.py

2020-11-24 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/16763 )

Change subject: IMPALA-10258, IMPALA-10109: Fixed flaky test in 
test_query_retries.py
..

IMPALA-10258, IMPALA-10109: Fixed flaky test in test_query_retries.py

When TestQueryRetries.test_original_query_cancel was ran on s3
with query option spool_query_results enabled, the query was
timeout before reaching the expected state.
This patch double the timeout for the query when the test is
running on S3 and double the timeout for query to reaching
"FINISHED" state.

For IMPALA-10109, test_retries_from_cancellation_pool did not
trigger query-retry when one of impalad was killed. It seems
that membership updating message was not received and processed
by coordinator before reaching terminated state, hence the
query-retry was not triggered.
This patch reduce the heartbeat_frequency and
max_missed_heartbeats so that statestore will take much less
time to update membership when one impalad was killed so that
coordinator could start query-retry.

Testing:
 - Ran the two tests in a loop for more than 3 hours. The test
   failures did not happen.

Change-Id: Ib89f7b01a0f2a66a97f312e779a4ab04f4f347f3
---
M tests/custom_cluster/test_query_retries.py
1 file changed, 12 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/16763/2
--
To view, visit http://gerrit.cloudera.org:8080/16763
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib89f7b01a0f2a66a97f312e779a4ab04f4f347f3
Gerrit-Change-Number: 16763
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall