[jira] [Commented] (CASSANDRA-10659) Windows CassCI: Fail on timed-out tests
[ https://issues.apache.org/jira/browse/CASSANDRA-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175902#comment-16175902 ] Jeff Jirsa commented on CASSANDRA-10659: [~mambocab] / [~JoshuaMcKenzie] / [~philipthompson] - if any of you believe this should still be open, please let me know. Otherwise I'm inclined to close as won't-fix since cassci is no longer generally available , and I suspect any other more general test failures can be addressed elsewhere? > Windows CassCI: Fail on timed-out tests > --- > > Key: CASSANDRA-10659 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10659 > Project: Cassandra > Issue Type: Bug >Reporter: Jim Witschey >Assignee: Jim Witschey > > On our Windows CassCI environments, it looks like some dtests are prone to > hanging, e.g.: > https://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/josh-mckenzie-10641_windows-dtest_win32/1/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/131/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/129/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/128/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/126/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/125/ > Ideally these tests wouldn't hang, but regardless, we should figure out a way > to make them fail, rather than timing out Jenkins and botching the rest of > the test run. > The built-in [{{nosetests}} {{multiprocess}} > plugin|http://nose.readthedocs.org/en/latest/plugins/multiprocess.html] would > solve this problem for us -- we could run the tests with {{nosetests > --processes=1 --process-timeout=X}} and it would stop the test and fail if > the test took too long. However, it's broken on Windows. I've filed [a quick > issue on the {{nose}} GitHub|https://github.com/nose-devs/nose/issues/966], > but in the meantime, we should figure out how to avoid this. > Possible solutions: > # [~philipthompson] had a script that would shell out to {{nosetests}} for > each test and kill that process if it took too long. If I understand > correctly, that script is broken, or assumes things that are no longer true. > We can revamp it if we want. > # We could make a patch for {{nose}} to fix the {{multiprocess}} plugin. > # We could hack in some of {{multiprocessing}}'s functionality into the > {{dtest}} suite itself. > 3. may be the best workaround for this problem -- our timeouts aren't caused > just when a tests runs long, but when Jenkins doesn't get any output on > stdout from a hanging test. We may be able to monitor stdout from a second > process and fail the test before Jenkins would time out. > Pinging [~JoshuaMcKenzie] as this is a Windows issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10659) Windows CassCI: Fail on timed-out tests
[ https://issues.apache.org/jira/browse/CASSANDRA-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004559#comment-15004559 ] Jim Witschey commented on CASSANDRA-10659: -- I've made good progress on a plugin and nose-wrapping script to fail tests if they produce no output for some length of time: https://github.com/mambocab/nose_call_on_hang https://gist.github.com/mambocab/760928e01a5e1ee5489f I believe these are just about ready to use for the main CassCI jobs, though some changes to the dtests may still be necessary handle exceptions correctly. I've fixed some exception handling problems here: https://github.com/riptano/cassandra-dtest/pull/660 https://github.com/riptano/cassandra-dtest/pull/657 > Windows CassCI: Fail on timed-out tests > --- > > Key: CASSANDRA-10659 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10659 > Project: Cassandra > Issue Type: Bug >Reporter: Jim Witschey >Assignee: Jim Witschey > > On our Windows CassCI environments, it looks like some dtests are prone to > hanging, e.g.: > https://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/josh-mckenzie-10641_windows-dtest_win32/1/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/131/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/129/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/128/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/126/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/125/ > Ideally these tests wouldn't hang, but regardless, we should figure out a way > to make them fail, rather than timing out Jenkins and botching the rest of > the test run. > The built-in [{{nosetests}} {{multiprocess}} > plugin|http://nose.readthedocs.org/en/latest/plugins/multiprocess.html] would > solve this problem for us -- we could run the tests with {{nosetests > --processes=1 --process-timeout=X}} and it would stop the test and fail if > the test took too long. However, it's broken on Windows. I've filed [a quick > issue on the {{nose}} GitHub|https://github.com/nose-devs/nose/issues/966], > but in the meantime, we should figure out how to avoid this. > Possible solutions: > # [~philipthompson] had a script that would shell out to {{nosetests}} for > each test and kill that process if it took too long. If I understand > correctly, that script is broken, or assumes things that are no longer true. > We can revamp it if we want. > # We could make a patch for {{nose}} to fix the {{multiprocess}} plugin. > # We could hack in some of {{multiprocessing}}'s functionality into the > {{dtest}} suite itself. > 3. may be the best workaround for this problem -- our timeouts aren't caused > just when a tests runs long, but when Jenkins doesn't get any output on > stdout from a hanging test. We may be able to monitor stdout from a second > process and fail the test before Jenkins would time out. > Pinging [~JoshuaMcKenzie] as this is a Windows issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10659) Windows CassCI: Fail on timed-out tests
[ https://issues.apache.org/jira/browse/CASSANDRA-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004562#comment-15004562 ] Jim Witschey commented on CASSANDRA-10659: -- Here's the CassCI job where I've been testing this: http://cassci.datastax.com/job/mambocab-stop_hung_jobs/ > Windows CassCI: Fail on timed-out tests > --- > > Key: CASSANDRA-10659 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10659 > Project: Cassandra > Issue Type: Bug >Reporter: Jim Witschey >Assignee: Jim Witschey > > On our Windows CassCI environments, it looks like some dtests are prone to > hanging, e.g.: > https://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/josh-mckenzie-10641_windows-dtest_win32/1/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/131/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/129/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/128/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/126/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/125/ > Ideally these tests wouldn't hang, but regardless, we should figure out a way > to make them fail, rather than timing out Jenkins and botching the rest of > the test run. > The built-in [{{nosetests}} {{multiprocess}} > plugin|http://nose.readthedocs.org/en/latest/plugins/multiprocess.html] would > solve this problem for us -- we could run the tests with {{nosetests > --processes=1 --process-timeout=X}} and it would stop the test and fail if > the test took too long. However, it's broken on Windows. I've filed [a quick > issue on the {{nose}} GitHub|https://github.com/nose-devs/nose/issues/966], > but in the meantime, we should figure out how to avoid this. > Possible solutions: > # [~philipthompson] had a script that would shell out to {{nosetests}} for > each test and kill that process if it took too long. If I understand > correctly, that script is broken, or assumes things that are no longer true. > We can revamp it if we want. > # We could make a patch for {{nose}} to fix the {{multiprocess}} plugin. > # We could hack in some of {{multiprocessing}}'s functionality into the > {{dtest}} suite itself. > 3. may be the best workaround for this problem -- our timeouts aren't caused > just when a tests runs long, but when Jenkins doesn't get any output on > stdout from a hanging test. We may be able to monitor stdout from a second > process and fail the test before Jenkins would time out. > Pinging [~JoshuaMcKenzie] as this is a Windows issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10659) Windows CassCI: Fail on timed-out tests
[ https://issues.apache.org/jira/browse/CASSANDRA-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996922#comment-14996922 ] Jim Witschey commented on CASSANDRA-10659: -- Having slept on this, I think #2 is only worth it for us as a fallback -- some tests run longer than 30 minutes, and this is correct behavior. The {{multiprocess}} nose plugin can't be nuanced about this and, even when if fixed to work correctly on Windows, will make those tests fail. We need to detect periods of inactivity, not long-running tests. > Windows CassCI: Fail on timed-out tests > --- > > Key: CASSANDRA-10659 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10659 > Project: Cassandra > Issue Type: Bug >Reporter: Jim Witschey >Assignee: Jim Witschey > > On our Windows CassCI environments, it looks like some dtests are prone to > hanging, e.g.: > https://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/josh-mckenzie-10641_windows-dtest_win32/1/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/131/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/129/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/128/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/126/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/125/ > Ideally these tests wouldn't hang, but regardless, we should figure out a way > to make them fail, rather than timing out Jenkins and botching the rest of > the test run. > The built-in [{{nosetests}} {{multiprocess}} > plugin|http://nose.readthedocs.org/en/latest/plugins/multiprocess.html] would > solve this problem for us -- we could run the tests with {{nosetests > --processes=1 --process-timeout=X}} and it would stop the test and fail if > the test took too long. However, it's broken on Windows. I've filed [a quick > issue on the {{nose}} GitHub|https://github.com/nose-devs/nose/issues/966], > but in the meantime, we should figure out how to avoid this. > Possible solutions: > # [~philipthompson] had a script that would shell out to {{nosetests}} for > each test and kill that process if it took too long. If I understand > correctly, that script is broken, or assumes things that are no longer true. > We can revamp it if we want. > # We could make a patch for {{nose}} to fix the {{multiprocess}} plugin. > # We could hack in some of {{multiprocessing}}'s functionality into the > {{dtest}} suite itself. > 3. may be the best workaround for this problem -- our timeouts aren't caused > just when a tests runs long, but when Jenkins doesn't get any output on > stdout from a hanging test. We may be able to monitor stdout from a second > process and fail the test before Jenkins would time out. > Pinging [~JoshuaMcKenzie] as this is a Windows issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10659) Windows CassCI: Fail on timed-out tests
[ https://issues.apache.org/jira/browse/CASSANDRA-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996924#comment-14996924 ] Joshua McKenzie commented on CASSANDRA-10659: - Sounds reasonable. > Windows CassCI: Fail on timed-out tests > --- > > Key: CASSANDRA-10659 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10659 > Project: Cassandra > Issue Type: Bug >Reporter: Jim Witschey >Assignee: Jim Witschey > > On our Windows CassCI environments, it looks like some dtests are prone to > hanging, e.g.: > https://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/josh-mckenzie-10641_windows-dtest_win32/1/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/131/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/129/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/128/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/126/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/125/ > Ideally these tests wouldn't hang, but regardless, we should figure out a way > to make them fail, rather than timing out Jenkins and botching the rest of > the test run. > The built-in [{{nosetests}} {{multiprocess}} > plugin|http://nose.readthedocs.org/en/latest/plugins/multiprocess.html] would > solve this problem for us -- we could run the tests with {{nosetests > --processes=1 --process-timeout=X}} and it would stop the test and fail if > the test took too long. However, it's broken on Windows. I've filed [a quick > issue on the {{nose}} GitHub|https://github.com/nose-devs/nose/issues/966], > but in the meantime, we should figure out how to avoid this. > Possible solutions: > # [~philipthompson] had a script that would shell out to {{nosetests}} for > each test and kill that process if it took too long. If I understand > correctly, that script is broken, or assumes things that are no longer true. > We can revamp it if we want. > # We could make a patch for {{nose}} to fix the {{multiprocess}} plugin. > # We could hack in some of {{multiprocessing}}'s functionality into the > {{dtest}} suite itself. > 3. may be the best workaround for this problem -- our timeouts aren't caused > just when a tests runs long, but when Jenkins doesn't get any output on > stdout from a hanging test. We may be able to monitor stdout from a second > process and fail the test before Jenkins would time out. > Pinging [~JoshuaMcKenzie] as this is a Windows issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10659) Windows CassCI: Fail on timed-out tests
[ https://issues.apache.org/jira/browse/CASSANDRA-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992110#comment-14992110 ] Philip Thompson commented on CASSANDRA-10659: - I support option 3. (1) is too blunt of a solution, and fixing (2) ourselves may end up being more work than we need to solve our own problem. > Windows CassCI: Fail on timed-out tests > --- > > Key: CASSANDRA-10659 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10659 > Project: Cassandra > Issue Type: Bug >Reporter: Jim Witschey >Assignee: Jim Witschey > > On our Windows CassCI environments, it looks like some dtests are prone to > hanging, e.g.: > https://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/josh-mckenzie-10641_windows-dtest_win32/1/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/131/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/129/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/128/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/126/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/125/ > Ideally these tests wouldn't hang, but regardless, we should figure out a way > to make them fail, rather than timing out Jenkins and botching the rest of > the test run. > The built-in [{{nosetests}} {{multiprocess}} > plugin|http://nose.readthedocs.org/en/latest/plugins/multiprocess.html] would > solve this problem for us -- we could run the tests with {{nosetests > --processes=1 --process-timeout=X}} and it would stop the test and fail if > the test took too long. However, it's broken on Windows. I've filed [a quick > issue on the {{nose}} GitHub|https://github.com/nose-devs/nose/issues/966], > but in the meantime, we should figure out how to avoid this. > Possible solutions: > # [~philipthompson] had a script that would shell out to {{nosetests}} for > each test and kill that process if it took too long. If I understand > correctly, that script is broken, or assumes things that are no longer true. > We can revamp it if we want. > # We could make a patch for {{nose}} to fix the {{multiprocess}} plugin. > # We could hack in some of {{multiprocessing}}'s functionality into the > {{dtest}} suite itself. > 3. may be the best workaround for this problem -- our timeouts aren't caused > just when a tests runs long, but when Jenkins doesn't get any output on > stdout from a hanging test. We may be able to monitor stdout from a second > process and fail the test before Jenkins would time out. > Pinging [~JoshuaMcKenzie] as this is a Windows issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10659) Windows CassCI: Fail on timed-out tests
[ https://issues.apache.org/jira/browse/CASSANDRA-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992284#comment-14992284 ] Joshua McKenzie commented on CASSANDRA-10659: - I'd recommend a brief inspection of the amount of work required for #2 before taking on the burden of porting in and maintaining multiprocessing in the dtest suite. I agree w/Philip that #1 is best left alone. > Windows CassCI: Fail on timed-out tests > --- > > Key: CASSANDRA-10659 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10659 > Project: Cassandra > Issue Type: Bug >Reporter: Jim Witschey >Assignee: Jim Witschey > > On our Windows CassCI environments, it looks like some dtests are prone to > hanging, e.g.: > https://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/josh-mckenzie-10641_windows-dtest_win32/1/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/131/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/129/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/128/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/126/ > http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/125/ > Ideally these tests wouldn't hang, but regardless, we should figure out a way > to make them fail, rather than timing out Jenkins and botching the rest of > the test run. > The built-in [{{nosetests}} {{multiprocess}} > plugin|http://nose.readthedocs.org/en/latest/plugins/multiprocess.html] would > solve this problem for us -- we could run the tests with {{nosetests > --processes=1 --process-timeout=X}} and it would stop the test and fail if > the test took too long. However, it's broken on Windows. I've filed [a quick > issue on the {{nose}} GitHub|https://github.com/nose-devs/nose/issues/966], > but in the meantime, we should figure out how to avoid this. > Possible solutions: > # [~philipthompson] had a script that would shell out to {{nosetests}} for > each test and kill that process if it took too long. If I understand > correctly, that script is broken, or assumes things that are no longer true. > We can revamp it if we want. > # We could make a patch for {{nose}} to fix the {{multiprocess}} plugin. > # We could hack in some of {{multiprocessing}}'s functionality into the > {{dtest}} suite itself. > 3. may be the best workaround for this problem -- our timeouts aren't caused > just when a tests runs long, but when Jenkins doesn't get any output on > stdout from a hanging test. We may be able to monitor stdout from a second > process and fail the test before Jenkins would time out. > Pinging [~JoshuaMcKenzie] as this is a Windows issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)