[jira] [Commented] (CASSANDRA-10659) Windows CassCI: Fail on timed-out tests

2017-09-21 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175902#comment-16175902
 ] 

Jeff Jirsa commented on CASSANDRA-10659:


[~mambocab] / [~JoshuaMcKenzie] / [~philipthompson] - if any of you believe 
this should still be open, please let me know. Otherwise I'm inclined to close 
as won't-fix since cassci is no longer generally available , and I suspect any 
other more general test failures can be addressed elsewhere?


> Windows CassCI: Fail on timed-out tests
> ---
>
> Key: CASSANDRA-10659
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10659
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jim Witschey
>Assignee: Jim Witschey
>
> On our Windows CassCI environments, it looks like some dtests are prone to 
> hanging, e.g.:
> https://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/josh-mckenzie-10641_windows-dtest_win32/1/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/131/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/129/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/128/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/126/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/125/
> Ideally these tests wouldn't hang, but regardless, we should figure out a way 
> to make them fail, rather than timing out Jenkins and botching the rest of 
> the test run.
> The built-in [{{nosetests}} {{multiprocess}} 
> plugin|http://nose.readthedocs.org/en/latest/plugins/multiprocess.html] would 
> solve this problem for us -- we could run the tests with {{nosetests 
> --processes=1 --process-timeout=X}} and it would stop the test and fail if 
> the test took too long. However, it's broken on Windows. I've filed [a quick 
> issue on the {{nose}} GitHub|https://github.com/nose-devs/nose/issues/966], 
> but in the meantime, we should figure out how to avoid this.
> Possible solutions:
> # [~philipthompson] had a script that would shell out to {{nosetests}} for 
> each test and kill that process if it took too long. If I understand 
> correctly, that script is broken, or assumes things that are no longer true. 
> We can revamp it if we want.
> # We could make a patch for {{nose}} to fix the {{multiprocess}} plugin.
> # We could hack in some of {{multiprocessing}}'s functionality into the 
> {{dtest}} suite itself.
> 3. may be the best workaround for this problem -- our timeouts aren't caused 
> just when a tests runs long, but when Jenkins doesn't get any output on 
> stdout from a hanging test. We may be able to monitor stdout from a second 
> process and fail the test before Jenkins would time out.
> Pinging [~JoshuaMcKenzie] as this is a Windows issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10659) Windows CassCI: Fail on timed-out tests

2015-11-13 Thread Jim Witschey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004559#comment-15004559
 ] 

Jim Witschey commented on CASSANDRA-10659:
--

I've made good progress on a plugin and nose-wrapping script to fail tests if 
they produce no output for some length of time:

https://github.com/mambocab/nose_call_on_hang
https://gist.github.com/mambocab/760928e01a5e1ee5489f

I believe these are just about ready to use for the main CassCI jobs, though 
some changes to the dtests may still be necessary handle exceptions correctly. 
I've fixed some exception handling problems here:

https://github.com/riptano/cassandra-dtest/pull/660
https://github.com/riptano/cassandra-dtest/pull/657


> Windows CassCI: Fail on timed-out tests
> ---
>
> Key: CASSANDRA-10659
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10659
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jim Witschey
>Assignee: Jim Witschey
>
> On our Windows CassCI environments, it looks like some dtests are prone to 
> hanging, e.g.:
> https://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/josh-mckenzie-10641_windows-dtest_win32/1/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/131/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/129/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/128/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/126/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/125/
> Ideally these tests wouldn't hang, but regardless, we should figure out a way 
> to make them fail, rather than timing out Jenkins and botching the rest of 
> the test run.
> The built-in [{{nosetests}} {{multiprocess}} 
> plugin|http://nose.readthedocs.org/en/latest/plugins/multiprocess.html] would 
> solve this problem for us -- we could run the tests with {{nosetests 
> --processes=1 --process-timeout=X}} and it would stop the test and fail if 
> the test took too long. However, it's broken on Windows. I've filed [a quick 
> issue on the {{nose}} GitHub|https://github.com/nose-devs/nose/issues/966], 
> but in the meantime, we should figure out how to avoid this.
> Possible solutions:
> # [~philipthompson] had a script that would shell out to {{nosetests}} for 
> each test and kill that process if it took too long. If I understand 
> correctly, that script is broken, or assumes things that are no longer true. 
> We can revamp it if we want.
> # We could make a patch for {{nose}} to fix the {{multiprocess}} plugin.
> # We could hack in some of {{multiprocessing}}'s functionality into the 
> {{dtest}} suite itself.
> 3. may be the best workaround for this problem -- our timeouts aren't caused 
> just when a tests runs long, but when Jenkins doesn't get any output on 
> stdout from a hanging test. We may be able to monitor stdout from a second 
> process and fail the test before Jenkins would time out.
> Pinging [~JoshuaMcKenzie] as this is a Windows issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10659) Windows CassCI: Fail on timed-out tests

2015-11-13 Thread Jim Witschey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004562#comment-15004562
 ] 

Jim Witschey commented on CASSANDRA-10659:
--

Here's the CassCI job where I've been testing this:

http://cassci.datastax.com/job/mambocab-stop_hung_jobs/

> Windows CassCI: Fail on timed-out tests
> ---
>
> Key: CASSANDRA-10659
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10659
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jim Witschey
>Assignee: Jim Witschey
>
> On our Windows CassCI environments, it looks like some dtests are prone to 
> hanging, e.g.:
> https://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/josh-mckenzie-10641_windows-dtest_win32/1/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/131/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/129/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/128/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/126/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/125/
> Ideally these tests wouldn't hang, but regardless, we should figure out a way 
> to make them fail, rather than timing out Jenkins and botching the rest of 
> the test run.
> The built-in [{{nosetests}} {{multiprocess}} 
> plugin|http://nose.readthedocs.org/en/latest/plugins/multiprocess.html] would 
> solve this problem for us -- we could run the tests with {{nosetests 
> --processes=1 --process-timeout=X}} and it would stop the test and fail if 
> the test took too long. However, it's broken on Windows. I've filed [a quick 
> issue on the {{nose}} GitHub|https://github.com/nose-devs/nose/issues/966], 
> but in the meantime, we should figure out how to avoid this.
> Possible solutions:
> # [~philipthompson] had a script that would shell out to {{nosetests}} for 
> each test and kill that process if it took too long. If I understand 
> correctly, that script is broken, or assumes things that are no longer true. 
> We can revamp it if we want.
> # We could make a patch for {{nose}} to fix the {{multiprocess}} plugin.
> # We could hack in some of {{multiprocessing}}'s functionality into the 
> {{dtest}} suite itself.
> 3. may be the best workaround for this problem -- our timeouts aren't caused 
> just when a tests runs long, but when Jenkins doesn't get any output on 
> stdout from a hanging test. We may be able to monitor stdout from a second 
> process and fail the test before Jenkins would time out.
> Pinging [~JoshuaMcKenzie] as this is a Windows issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10659) Windows CassCI: Fail on timed-out tests

2015-11-09 Thread Jim Witschey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996922#comment-14996922
 ] 

Jim Witschey commented on CASSANDRA-10659:
--

Having slept on this, I think #2 is only worth it for us as a fallback -- some 
tests run longer than 30 minutes, and this is correct behavior. The 
{{multiprocess}} nose plugin can't be nuanced about this and, even when if 
fixed to work correctly on Windows, will make those tests fail. We need to 
detect periods of inactivity, not long-running tests.

> Windows CassCI: Fail on timed-out tests
> ---
>
> Key: CASSANDRA-10659
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10659
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jim Witschey
>Assignee: Jim Witschey
>
> On our Windows CassCI environments, it looks like some dtests are prone to 
> hanging, e.g.:
> https://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/josh-mckenzie-10641_windows-dtest_win32/1/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/131/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/129/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/128/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/126/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/125/
> Ideally these tests wouldn't hang, but regardless, we should figure out a way 
> to make them fail, rather than timing out Jenkins and botching the rest of 
> the test run.
> The built-in [{{nosetests}} {{multiprocess}} 
> plugin|http://nose.readthedocs.org/en/latest/plugins/multiprocess.html] would 
> solve this problem for us -- we could run the tests with {{nosetests 
> --processes=1 --process-timeout=X}} and it would stop the test and fail if 
> the test took too long. However, it's broken on Windows. I've filed [a quick 
> issue on the {{nose}} GitHub|https://github.com/nose-devs/nose/issues/966], 
> but in the meantime, we should figure out how to avoid this.
> Possible solutions:
> # [~philipthompson] had a script that would shell out to {{nosetests}} for 
> each test and kill that process if it took too long. If I understand 
> correctly, that script is broken, or assumes things that are no longer true. 
> We can revamp it if we want.
> # We could make a patch for {{nose}} to fix the {{multiprocess}} plugin.
> # We could hack in some of {{multiprocessing}}'s functionality into the 
> {{dtest}} suite itself.
> 3. may be the best workaround for this problem -- our timeouts aren't caused 
> just when a tests runs long, but when Jenkins doesn't get any output on 
> stdout from a hanging test. We may be able to monitor stdout from a second 
> process and fail the test before Jenkins would time out.
> Pinging [~JoshuaMcKenzie] as this is a Windows issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10659) Windows CassCI: Fail on timed-out tests

2015-11-09 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996924#comment-14996924
 ] 

Joshua McKenzie commented on CASSANDRA-10659:
-

Sounds reasonable.

> Windows CassCI: Fail on timed-out tests
> ---
>
> Key: CASSANDRA-10659
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10659
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jim Witschey
>Assignee: Jim Witschey
>
> On our Windows CassCI environments, it looks like some dtests are prone to 
> hanging, e.g.:
> https://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/josh-mckenzie-10641_windows-dtest_win32/1/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/131/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/129/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/128/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/126/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/125/
> Ideally these tests wouldn't hang, but regardless, we should figure out a way 
> to make them fail, rather than timing out Jenkins and botching the rest of 
> the test run.
> The built-in [{{nosetests}} {{multiprocess}} 
> plugin|http://nose.readthedocs.org/en/latest/plugins/multiprocess.html] would 
> solve this problem for us -- we could run the tests with {{nosetests 
> --processes=1 --process-timeout=X}} and it would stop the test and fail if 
> the test took too long. However, it's broken on Windows. I've filed [a quick 
> issue on the {{nose}} GitHub|https://github.com/nose-devs/nose/issues/966], 
> but in the meantime, we should figure out how to avoid this.
> Possible solutions:
> # [~philipthompson] had a script that would shell out to {{nosetests}} for 
> each test and kill that process if it took too long. If I understand 
> correctly, that script is broken, or assumes things that are no longer true. 
> We can revamp it if we want.
> # We could make a patch for {{nose}} to fix the {{multiprocess}} plugin.
> # We could hack in some of {{multiprocessing}}'s functionality into the 
> {{dtest}} suite itself.
> 3. may be the best workaround for this problem -- our timeouts aren't caused 
> just when a tests runs long, but when Jenkins doesn't get any output on 
> stdout from a hanging test. We may be able to monitor stdout from a second 
> process and fail the test before Jenkins would time out.
> Pinging [~JoshuaMcKenzie] as this is a Windows issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10659) Windows CassCI: Fail on timed-out tests

2015-11-05 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992110#comment-14992110
 ] 

Philip Thompson commented on CASSANDRA-10659:
-

I support option 3. (1) is too blunt of a solution, and fixing (2) ourselves 
may end up being more work than we need to solve our own problem.

> Windows CassCI: Fail on timed-out tests
> ---
>
> Key: CASSANDRA-10659
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10659
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jim Witschey
>Assignee: Jim Witschey
>
> On our Windows CassCI environments, it looks like some dtests are prone to 
> hanging, e.g.:
> https://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/josh-mckenzie-10641_windows-dtest_win32/1/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/131/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/129/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/128/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/126/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/125/
> Ideally these tests wouldn't hang, but regardless, we should figure out a way 
> to make them fail, rather than timing out Jenkins and botching the rest of 
> the test run.
> The built-in [{{nosetests}} {{multiprocess}} 
> plugin|http://nose.readthedocs.org/en/latest/plugins/multiprocess.html] would 
> solve this problem for us -- we could run the tests with {{nosetests 
> --processes=1 --process-timeout=X}} and it would stop the test and fail if 
> the test took too long. However, it's broken on Windows. I've filed [a quick 
> issue on the {{nose}} GitHub|https://github.com/nose-devs/nose/issues/966], 
> but in the meantime, we should figure out how to avoid this.
> Possible solutions:
> # [~philipthompson] had a script that would shell out to {{nosetests}} for 
> each test and kill that process if it took too long. If I understand 
> correctly, that script is broken, or assumes things that are no longer true. 
> We can revamp it if we want.
> # We could make a patch for {{nose}} to fix the {{multiprocess}} plugin.
> # We could hack in some of {{multiprocessing}}'s functionality into the 
> {{dtest}} suite itself.
> 3. may be the best workaround for this problem -- our timeouts aren't caused 
> just when a tests runs long, but when Jenkins doesn't get any output on 
> stdout from a hanging test. We may be able to monitor stdout from a second 
> process and fail the test before Jenkins would time out.
> Pinging [~JoshuaMcKenzie] as this is a Windows issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10659) Windows CassCI: Fail on timed-out tests

2015-11-05 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992284#comment-14992284
 ] 

Joshua McKenzie commented on CASSANDRA-10659:
-

I'd recommend a brief inspection of the amount of work required for #2 before 
taking on the burden of porting in and maintaining multiprocessing in the dtest 
suite. I agree w/Philip that #1 is best left alone.

> Windows CassCI: Fail on timed-out tests
> ---
>
> Key: CASSANDRA-10659
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10659
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jim Witschey
>Assignee: Jim Witschey
>
> On our Windows CassCI environments, it looks like some dtests are prone to 
> hanging, e.g.:
> https://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/josh-mckenzie-10641_windows-dtest_win32/1/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/131/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/129/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/128/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/126/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/125/
> Ideally these tests wouldn't hang, but regardless, we should figure out a way 
> to make them fail, rather than timing out Jenkins and botching the rest of 
> the test run.
> The built-in [{{nosetests}} {{multiprocess}} 
> plugin|http://nose.readthedocs.org/en/latest/plugins/multiprocess.html] would 
> solve this problem for us -- we could run the tests with {{nosetests 
> --processes=1 --process-timeout=X}} and it would stop the test and fail if 
> the test took too long. However, it's broken on Windows. I've filed [a quick 
> issue on the {{nose}} GitHub|https://github.com/nose-devs/nose/issues/966], 
> but in the meantime, we should figure out how to avoid this.
> Possible solutions:
> # [~philipthompson] had a script that would shell out to {{nosetests}} for 
> each test and kill that process if it took too long. If I understand 
> correctly, that script is broken, or assumes things that are no longer true. 
> We can revamp it if we want.
> # We could make a patch for {{nose}} to fix the {{multiprocess}} plugin.
> # We could hack in some of {{multiprocessing}}'s functionality into the 
> {{dtest}} suite itself.
> 3. may be the best workaround for this problem -- our timeouts aren't caused 
> just when a tests runs long, but when Jenkins doesn't get any output on 
> stdout from a hanging test. We may be able to monitor stdout from a second 
> process and fail the test before Jenkins would time out.
> Pinging [~JoshuaMcKenzie] as this is a Windows issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)