Jim Witschey created CASSANDRA-10659:
----------------------------------------

             Summary: Windows CassCI: Fail on timed-out tests
                 Key: CASSANDRA-10659
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10659
             Project: Cassandra
          Issue Type: Bug
            Reporter: Jim Witschey
            Assignee: Jim Witschey


On our Windows CassCI environments, it looks like some dtests are prone to 
hanging, e.g.:

https://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/josh-mckenzie-10641_windows-dtest_win32/1/
http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/131/
http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/129/
http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/128/
http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/126/
http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/125/

Ideally these tests wouldn't hang, but regardless, we should figure out a way 
to make them fail, rather than timing out Jenkins and botching the rest of the 
test run.

The built-in [{{nosetests}} {{multiprocess}} 
plugin|http://nose.readthedocs.org/en/latest/plugins/multiprocess.html] would 
solve this problem for us -- we could run the tests with {{nosetests 
--processes=1 --process-timeout=X}} and it would stop the test and fail if the 
test took too long. However, it's broken on Windows. I've filed [a quick issue 
on the {{nose}} GitHub|https://github.com/nose-devs/nose/issues/966], but in 
the meantime, we should figure out how to avoid this.

Possible solutions:

# [~philipthompson] had a script that would shell out to {{nosetests}} for each 
test and kill that process if it took too long. If I understand correctly, that 
script is broken, or assumes things that are no longer true. We can revamp it 
if we want.
# We could make a patch for {{nose}} to fix the {{multiprocess}} plugin.
# We could hack in some of {{multiprocessing}}'s functionality into the 
{{dtest}} suite itself.

3. may be the best workaround for this problem -- our timeouts aren't caused 
just when a tests runs long, but when Jenkins doesn't get any output on stdout 
from a hanging test. We may be able to monitor stdout from a second process and 
fail the test before Jenkins would time out.

Pinging [~JoshuaMcKenzie] as this is a Windows issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to