[jira] [Commented] (IMPALA-6591) TestClientSsl hung for a long time
[ https://issues.apache.org/jira/browse/IMPALA-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720165#comment-16720165 ] ASF subversion and git services commented on IMPALA-6591: - Commit 9c44853998705172015626e6e29d1d23a7f9e53e in impala's branch refs/heads/master from [~fredyw] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=9c44853 ] IMPALA-6591: Fix test_ssl flaky test test_ssl has a logic that waits for the number of in-flight queries to be 1. However, the logic for wait_for_num_in_flight_queries(1) only waits for the condition to be true for a period of time and does not throw an exception when the time has elapsed and the condition is not met. In other words, the logic in test_ssl that loops while the number of in-flight queries is 1 never gets executed. I was able to simulate this issue by making Impala shell start much longer. Prior to this patch, in the event that Impala shell took much longer to start, the test started sending the commands to Impala shell even when Impala shell was not ready to receive commands. The patch fixes the issue by waiting until Impala shell is connected. The patch also adds assert in other places that calls wait_for_num_in_flight_queries and updates the default behavior for Impala shell to wait until it is connected. Testing: - Ran core and exhaustive tests several times on CentOS 6 without any issue Change-Id: I9805269d8b806aecf5d744c219967649a041d49f Reviewed-on: http://gerrit.cloudera.org:8080/12047 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > TestClientSsl hung for a long time > -- > > Key: IMPALA-6591 > URL: https://issues.apache.org/jira/browse/IMPALA-6591 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 2.12.0, Impala 3.1.0, Impala 3.2.0 >Reporter: Tim Armstrong >Assignee: Csaba Ringhofer >Priority: Blocker > Labels: broken-build, flaky, hang > > {noformat} > 18:49:13 > custom_cluster/test_catalog_wait.py::TestCatalogWait::test_delayed_catalog > PASSED > 18:49:53 > custom_cluster/test_client_ssl.py::TestClientSsl::test_ssl[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: text/none] Build timed > out (after 1,440 minutes). Marking the build as failed. > 12:20:15 Build was aborted > 12:20:15 Archiving artifacts > {noformat} > I unfortunately wasn't able to get any logs... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-6591) TestClientSsl hung for a long time
[ https://issues.apache.org/jira/browse/IMPALA-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16709060#comment-16709060 ] Lars Volker commented on IMPALA-6591: - This fails consistently on the exhaustive builds, making it a P0. > TestClientSsl hung for a long time > -- > > Key: IMPALA-6591 > URL: https://issues.apache.org/jira/browse/IMPALA-6591 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 2.12.0, Impala 3.1.0, Impala 3.2.0 >Reporter: Tim Armstrong >Assignee: Csaba Ringhofer >Priority: Blocker > Labels: broken-build, flaky, hang > > {noformat} > 18:49:13 > custom_cluster/test_catalog_wait.py::TestCatalogWait::test_delayed_catalog > PASSED > 18:49:53 > custom_cluster/test_client_ssl.py::TestClientSsl::test_ssl[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: text/none] Build timed > out (after 1,440 minutes). Marking the build as failed. > 12:20:15 Build was aborted > 12:20:15 Archiving artifacts > {noformat} > I unfortunately wasn't able to get any logs... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-6591) TestClientSsl hung for a long time
[ https://issues.apache.org/jira/browse/IMPALA-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707533#comment-16707533 ] Lars Volker commented on IMPALA-6591: - I've seen this again. Here's the code loop with the failed assertion: {code} # In practice, sending SIGINT to the shell process doesn't always seem to get caught # (and a search shows up some bugs in Python where SIGINT might be ignored). So retry # for 30s until one signal takes. while impalad.get_num_in_flight_queries() == 1: time.sleep(1) LOG.info("Sending signal...") os.kill(p.pid(), signal.SIGINT) num_tries += 1 assert num_tries < 30, "SIGINT was not caught by shell within 30s" {code} {{p}} is an {{ImpalaShell}} object. There seems to be a possibility that the shell process has been terminated but the query is still registered. I think we should at least improve the code to log if the shell process is still alive to tell the two cases apart and better understand where to look next. [~csringhofer] - I picked you randomly; feel free to find another person or assign back to me if you're swamped. > TestClientSsl hung for a long time > -- > > Key: IMPALA-6591 > URL: https://issues.apache.org/jira/browse/IMPALA-6591 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 2.12.0, Impala 3.1.0, Impala 3.2.0 >Reporter: Tim Armstrong >Assignee: Sailesh Mukil >Priority: Critical > Labels: broken-build, flaky, hang > > {noformat} > 18:49:13 > custom_cluster/test_catalog_wait.py::TestCatalogWait::test_delayed_catalog > PASSED > 18:49:53 > custom_cluster/test_client_ssl.py::TestClientSsl::test_ssl[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: text/none] Build timed > out (after 1,440 minutes). Marking the build as failed. > 12:20:15 Build was aborted > 12:20:15 Archiving artifacts > {noformat} > I unfortunately wasn't able to get any logs... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-6591) TestClientSsl hung for a long time
[ https://issues.apache.org/jira/browse/IMPALA-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16542104#comment-16542104 ] Tim Armstrong commented on IMPALA-6591: --- [~sailesh] cannot reproduce? > TestClientSsl hung for a long time > -- > > Key: IMPALA-6591 > URL: https://issues.apache.org/jira/browse/IMPALA-6591 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 2.12.0 >Reporter: Tim Armstrong >Assignee: Sailesh Mukil >Priority: Critical > Labels: broken-build, hang > > {noformat} > 18:49:13 > custom_cluster/test_catalog_wait.py::TestCatalogWait::test_delayed_catalog > PASSED > 18:49:53 > custom_cluster/test_client_ssl.py::TestClientSsl::test_ssl[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: text/none] Build timed > out (after 1,440 minutes). Marking the build as failed. > 12:20:15 Build was aborted > 12:20:15 Archiving artifacts > {noformat} > I unfortunately wasn't able to get any logs... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-6591) TestClientSsl hung for a long time
[ https://issues.apache.org/jira/browse/IMPALA-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503653#comment-16503653 ] Tim Armstrong commented on IMPALA-6591: --- [~sailesh] mentioned that this was infrequent and he hadn't been able to reproduce locally. > TestClientSsl hung for a long time > -- > > Key: IMPALA-6591 > URL: https://issues.apache.org/jira/browse/IMPALA-6591 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 2.12.0 >Reporter: Tim Armstrong >Assignee: Sailesh Mukil >Priority: Critical > Labels: broken-build, hang > > {noformat} > 18:49:13 > custom_cluster/test_catalog_wait.py::TestCatalogWait::test_delayed_catalog > PASSED > 18:49:53 > custom_cluster/test_client_ssl.py::TestClientSsl::test_ssl[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: text/none] Build timed > out (after 1,440 minutes). Marking the build as failed. > 12:20:15 Build was aborted > 12:20:15 Archiving artifacts > {noformat} > I unfortunately wasn't able to get any logs... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org