[jira] [Commented] (IMPALA-6591) TestClientSsl hung for a long time

2018-12-13 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720165#comment-16720165
 ] 

ASF subversion and git services commented on IMPALA-6591:
-

Commit 9c44853998705172015626e6e29d1d23a7f9e53e in impala's branch 
refs/heads/master from [~fredyw]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=9c44853 ]

IMPALA-6591: Fix test_ssl flaky test

test_ssl has a logic that waits for the number of in-flight queries to
be 1. However, the logic for wait_for_num_in_flight_queries(1) only
waits for the condition to be true for a period of time and does not
throw an exception when the time has elapsed and the condition is not
met. In other words, the logic in test_ssl that loops while the number
of in-flight queries is 1 never gets executed. I was able to simulate
this issue by making Impala shell start much longer.

Prior to this patch, in the event that Impala shell took much longer to
start, the test started sending the commands to Impala shell even when
Impala shell was not ready to receive commands. The patch fixes the
issue by waiting until Impala shell is connected. The patch also adds
assert in other places that calls wait_for_num_in_flight_queries and
updates the default behavior for Impala shell to wait until it is
connected.

Testing:
- Ran core and exhaustive tests several times on CentOS 6 without any
  issue

Change-Id: I9805269d8b806aecf5d744c219967649a041d49f
Reviewed-on: http://gerrit.cloudera.org:8080/12047
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> TestClientSsl hung for a long time
> --
>
> Key: IMPALA-6591
> URL: https://issues.apache.org/jira/browse/IMPALA-6591
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.12.0, Impala 3.1.0, Impala 3.2.0
>Reporter: Tim Armstrong
>Assignee: Csaba Ringhofer
>Priority: Blocker
>  Labels: broken-build, flaky, hang
>
> {noformat}
> 18:49:13 
> custom_cluster/test_catalog_wait.py::TestCatalogWait::test_delayed_catalog 
> PASSED
> 18:49:53 
> custom_cluster/test_client_ssl.py::TestClientSsl::test_ssl[exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: text/none] Build timed 
> out (after 1,440 minutes). Marking the build as failed.
> 12:20:15 Build was aborted
> 12:20:15 Archiving artifacts
> {noformat}
> I unfortunately wasn't able to get any logs...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6591) TestClientSsl hung for a long time

2018-12-04 Thread Lars Volker (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16709060#comment-16709060
 ] 

Lars Volker commented on IMPALA-6591:
-

This fails consistently on the exhaustive builds, making it a P0.

> TestClientSsl hung for a long time
> --
>
> Key: IMPALA-6591
> URL: https://issues.apache.org/jira/browse/IMPALA-6591
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.12.0, Impala 3.1.0, Impala 3.2.0
>Reporter: Tim Armstrong
>Assignee: Csaba Ringhofer
>Priority: Blocker
>  Labels: broken-build, flaky, hang
>
> {noformat}
> 18:49:13 
> custom_cluster/test_catalog_wait.py::TestCatalogWait::test_delayed_catalog 
> PASSED
> 18:49:53 
> custom_cluster/test_client_ssl.py::TestClientSsl::test_ssl[exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: text/none] Build timed 
> out (after 1,440 minutes). Marking the build as failed.
> 12:20:15 Build was aborted
> 12:20:15 Archiving artifacts
> {noformat}
> I unfortunately wasn't able to get any logs...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6591) TestClientSsl hung for a long time

2018-12-03 Thread Lars Volker (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707533#comment-16707533
 ] 

Lars Volker commented on IMPALA-6591:
-

I've seen this again. Here's the code loop with the failed assertion:

 
{code}
# In practice, sending SIGINT to the shell process doesn't always seem to get 
caught
# (and a search shows up some bugs in Python where SIGINT might be ignored). So 
retry
# for 30s until one signal takes.
while impalad.get_num_in_flight_queries() == 1:
  time.sleep(1)
  LOG.info("Sending signal...")
  os.kill(p.pid(), signal.SIGINT)
  num_tries += 1
  assert num_tries < 30, "SIGINT was not caught by shell within 30s"
{code}

{{p}} is an {{ImpalaShell}} object. There seems to be a possibility that the 
shell process has been terminated but the query is still registered. I think we 
should at least improve the code to log if the shell process is still alive to 
tell the two cases apart and better understand where to look next.

[~csringhofer] - I picked you randomly; feel free to find another person or 
assign back to me if you're swamped.

> TestClientSsl hung for a long time
> --
>
> Key: IMPALA-6591
> URL: https://issues.apache.org/jira/browse/IMPALA-6591
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.12.0, Impala 3.1.0, Impala 3.2.0
>Reporter: Tim Armstrong
>Assignee: Sailesh Mukil
>Priority: Critical
>  Labels: broken-build, flaky, hang
>
> {noformat}
> 18:49:13 
> custom_cluster/test_catalog_wait.py::TestCatalogWait::test_delayed_catalog 
> PASSED
> 18:49:53 
> custom_cluster/test_client_ssl.py::TestClientSsl::test_ssl[exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: text/none] Build timed 
> out (after 1,440 minutes). Marking the build as failed.
> 12:20:15 Build was aborted
> 12:20:15 Archiving artifacts
> {noformat}
> I unfortunately wasn't able to get any logs...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6591) TestClientSsl hung for a long time

2018-07-12 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16542104#comment-16542104
 ] 

Tim Armstrong commented on IMPALA-6591:
---

[~sailesh] cannot reproduce?

> TestClientSsl hung for a long time
> --
>
> Key: IMPALA-6591
> URL: https://issues.apache.org/jira/browse/IMPALA-6591
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.12.0
>Reporter: Tim Armstrong
>Assignee: Sailesh Mukil
>Priority: Critical
>  Labels: broken-build, hang
>
> {noformat}
> 18:49:13 
> custom_cluster/test_catalog_wait.py::TestCatalogWait::test_delayed_catalog 
> PASSED
> 18:49:53 
> custom_cluster/test_client_ssl.py::TestClientSsl::test_ssl[exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: text/none] Build timed 
> out (after 1,440 minutes). Marking the build as failed.
> 12:20:15 Build was aborted
> 12:20:15 Archiving artifacts
> {noformat}
> I unfortunately wasn't able to get any logs...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6591) TestClientSsl hung for a long time

2018-06-06 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503653#comment-16503653
 ] 

Tim Armstrong commented on IMPALA-6591:
---

[~sailesh] mentioned that this was infrequent and he hadn't been able to 
reproduce locally.

> TestClientSsl hung for a long time
> --
>
> Key: IMPALA-6591
> URL: https://issues.apache.org/jira/browse/IMPALA-6591
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.12.0
>Reporter: Tim Armstrong
>Assignee: Sailesh Mukil
>Priority: Critical
>  Labels: broken-build, hang
>
> {noformat}
> 18:49:13 
> custom_cluster/test_catalog_wait.py::TestCatalogWait::test_delayed_catalog 
> PASSED
> 18:49:53 
> custom_cluster/test_client_ssl.py::TestClientSsl::test_ssl[exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: text/none] Build timed 
> out (after 1,440 minutes). Marking the build as failed.
> 12:20:15 Build was aborted
> 12:20:15 Archiving artifacts
> {noformat}
> I unfortunately wasn't able to get any logs...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org