Re: Cassandra DTests deadlocks on random test

2017-10-19 Thread Murukesh Mohanan
I have had this problem too on similarly spec'd OpenStack instances (but
I'm reasonably certain I didn't include the resource-intensive tests). My
solution was to run the dtests in small batches (say 5-10 each), with a
timeout (say 1.2x the max for 5 tests from a good run). Kill the test if
exceeds the timeout, that way you lose only 5-10 test results.

Another thing I did was to run the tests in docker, killing the instance
itself on timeout and ensuring no orphan processes. This also allows you to
run two or more test sets in parallel (which is not otherwise possible due
to the use of CCM).

On Thu, Oct 19, 2017, 22:58 Michael Shuler  wrote:

> 7.6G RAM may be a little bit too small, we've seen similar random hangs
> in the past on non-resource-intensive tests on m3.large. It doesn't
> appear you are skipping resource-intensive tests. Our standard dtest
> instance type has been an m3.xlarge, and the resource-intensive tests
> are run m3.2xlarge. (or something comparable on RAM & SSD)
>
> dtest run command (excludes resource-intensive tests):
>
> https://github.com/apache/cassandra-builds/blob/master/build-scripts/cassandra-dtest.sh#L53
>
> dtest-large run command (only resource-intensive tests):
>
> https://github.com/apache/cassandra-builds/blob/master/build-scripts/cassandra-dtest.sh#L61
>
> Running dtest without excluding resource-intensive will run everything.
>
> If you wish to troubleshoot your tests when they hang, there should be
> /tmp/dtest-X directories with the ccm cluster left on disk from the
> hung test, since they never get to cleanup stage.
>
> --
> Michael
>
> On 10/19/2017 05:29 AM, Sergey La wrote:
> > Hi!
> >
> > I have created the patch for the Cassandra version 3.0.14 and trying to
> > test it using the cassandra dtests.
> >
> > Problem is - dtests deadlocks at some random tests, time and again - on
> > unpatched  3.0.14 version of Cassandra.
> >
> > What I have done.
> >
> > I cloned the cassandra repository (origin
> > http://git-wip-us.apache.org/repos/asf/cassandra.git), and checked out
> to
> > tags/cassandra-3.0.14 - head is on
> f3e38cb638113c2a23855a104d6082da5bc10ddb.
> >
> > Then I have cloned the cassandra-dtest repo (origin  git://
> > github.com/riptano/cassandra-dtest.git). Head is on
> > 6843d76d0a85ad82edf889e8280b87786dc48486.
> >
> > I setup dtests according to this instructions:
> > https://github.com/riptano/cassandra-dtest/blob/master/INSTALL.md
> >
> > In addition, I have setup JAVA8_HOME and JAVA_HOME variables to public
> jre
> > of my 1.8.0_144 jdk.
> >
> > I start testing using this command:
> > JAVA8_HOME=$JAVA8_HOME nosetests --with-flaky  --with-xunit
> > --xunit-file=out.xunit.xml  --force-flaky --max-runs=3 --verbose
> > --debug-log=err.debug.nose.txt  1> out.txt 2> err.txt
> > I run the tests on x86_64 CentOS 7 with 7.6G of RAM.
> >
> > Problem symptoms:
> > During the "normal" run (I have got only 1 "normal" run in 5 attempts),
> > err.txt is updated constantly with name of the test recently completed,
> and
> > in the end out.xunit.xml file appears, with test summary results.
> nosetests
> > process exits.
> >
> > During the "problem" run tests stop progressing (err.txt was not modified
> > for 10 hours), out.xunit.xml is not appearing, nosetests process runs. I
> > killed java processes, but nothing changed for 2 hours - nosetests
> process
> > still runs, but files are unchanged.
> >
> > Any help would be appreciated,
> > Sergey
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
> --

Murukesh Mohanan,
Yahoo! Japan


Cassandra DTests deadlocks on random test

2017-10-19 Thread Sergey La
Hi!

I have created the patch for the Cassandra version 3.0.14 and trying to
test it using the cassandra dtests.

Problem is - dtests deadlocks at some random tests, time and again - on
unpatched  3.0.14 version of Cassandra.

What I have done.

I cloned the cassandra repository (origin
http://git-wip-us.apache.org/repos/asf/cassandra.git), and checked out to
tags/cassandra-3.0.14 - head is on f3e38cb638113c2a23855a104d6082da5bc10ddb.

Then I have cloned the cassandra-dtest repo (origin  git://
github.com/riptano/cassandra-dtest.git). Head is on
6843d76d0a85ad82edf889e8280b87786dc48486.

I setup dtests according to this instructions:
https://github.com/riptano/cassandra-dtest/blob/master/INSTALL.md

In addition, I have setup JAVA8_HOME and JAVA_HOME variables to public jre
of my 1.8.0_144 jdk.

I start testing using this command:
JAVA8_HOME=$JAVA8_HOME nosetests --with-flaky  --with-xunit
--xunit-file=out.xunit.xml  --force-flaky --max-runs=3 --verbose
--debug-log=err.debug.nose.txt  1> out.txt 2> err.txt
I run the tests on x86_64 CentOS 7 with 7.6G of RAM.

Problem symptoms:
During the "normal" run (I have got only 1 "normal" run in 5 attempts),
err.txt is updated constantly with name of the test recently completed, and
in the end out.xunit.xml file appears, with test summary results. nosetests
process exits.

During the "problem" run tests stop progressing (err.txt was not modified
for 10 hours), out.xunit.xml is not appearing, nosetests process runs. I
killed java processes, but nothing changed for 2 hours - nosetests process
still runs, but files are unchanged.

Any help would be appreciated,
Sergey