[jira] [Commented] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.

2015-12-17 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061871#comment-15061871
 ] 

Jan Schlicht commented on MESOS-4025:
-

This seems like a different issue: Perf events don't work in most VMs, and lead 
to above errors if {{sudo ./bin/mesos-tests.sh}} is run. You'll have to disable 
these tests.

> SlaveRecoveryTest/0.GCExecutor is flaky.
> 
>
> Key: MESOS-4025
> URL: https://issues.apache.org/jira/browse/MESOS-4025
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>Assignee: Jan Schlicht
>  Labels: flaky, flaky-test, test
>
> Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based 
> on 0.26.0-rc1.
> Testsuite was run as root.
> {noformat}
> sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1
> {noformat}
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.GCExecutor
> I1130 16:49:16.336833  1032 exec.cpp:136] Version: 0.26.0
> I1130 16:49:16.345212  1049 exec.cpp:210] Executor registered on slave 
> dde9fd4e-b016-4a99-9081-b047e9df9afa-S0
> Registered executor on ubuntu14
> Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114
> sh -c 'sleep 1000'
> Forked command at 1057
> ../../src/tests/mesos.cpp:779: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave':
>  Device or resource busy
> *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are 
> using GNU date ***
> PC: @  0x1443e9a testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; 
> stack trace: ***
> @ 0x7f1be92b80b7 os::Linux::chained_handler()
> @ 0x7f1be92bc219 JVM_handle_linux_signal
> @ 0x7f1bf7bbc340 (unknown)
> @  0x1443e9a testing::UnitTest::AddTestPartResult()
> @  0x1438b99 testing::internal::AssertHelper::operator=()
> @   0xf0b3bb 
> mesos::internal::tests::ContainerizerTest<>::TearDown()
> @  0x1461882 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145c6f8 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x143de4a testing::Test::Run()
> @  0x143e584 testing::TestInfo::Run()
> @  0x143ebca testing::TestCase::Run()
> @  0x1445312 testing::internal::UnitTestImpl::RunAllTests()
> @  0x14624a7 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145d26e 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x14440ae testing::UnitTest::Run()
> @   0xd15cd4 RUN_ALL_TESTS()
> @   0xd158c1 main
> @ 0x7f1bf7808ec5 (unknown)
> @   0x913009 (unknown)
> {noformat}
> My Vagrantfile generator;
> {noformat}
> #!/usr/bin/env bash
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.box = "bento/ubuntu-14.04"
>   config.vm.hostname = "${PLATFORM_NAME}"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
> vb.customize ["modifyvm", :id, "--nictype1", "virtio"]
> vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
> vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"]
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
>   end
>   config.vm.provision "file", source: "../test.sh", destination: "~/test.sh"
>   config.vm.provision "shell", inline: <<-SHELL
> sudo apt-get update
> sudo apt-get -y install openjdk-7-jdk autoconf libtool
> sudo apt-get -y install build-essential python-dev python-boto  \
> libcurl4-nss-dev libsasl2-dev maven \
> libapr1-dev libsvn-dev libssl-dev libevent-dev
> sudo apt-get -y install git
> sudo wget -qO- https://get.docker.com/ | sh
>   SHELL
> end
> EOF
> {noformat}
> The problem is kicking in frequently in my tests - I'ld say > 10% but less 
> than 50%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.

2015-12-11 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052635#comment-15052635
 ] 

Jan Schlicht commented on MESOS-4025:
-

I saw the same behavior. Both tests have to be run in the same {{mesos-tests}} 
session to see the issue.

> SlaveRecoveryTest/0.GCExecutor is flaky.
> 
>
> Key: MESOS-4025
> URL: https://issues.apache.org/jira/browse/MESOS-4025
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>Assignee: Jan Schlicht
>  Labels: flaky, flaky-test, test
>
> Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based 
> on 0.26.0-rc1.
> Testsuite was run as root.
> {noformat}
> sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1
> {noformat}
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.GCExecutor
> I1130 16:49:16.336833  1032 exec.cpp:136] Version: 0.26.0
> I1130 16:49:16.345212  1049 exec.cpp:210] Executor registered on slave 
> dde9fd4e-b016-4a99-9081-b047e9df9afa-S0
> Registered executor on ubuntu14
> Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114
> sh -c 'sleep 1000'
> Forked command at 1057
> ../../src/tests/mesos.cpp:779: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave':
>  Device or resource busy
> *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are 
> using GNU date ***
> PC: @  0x1443e9a testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; 
> stack trace: ***
> @ 0x7f1be92b80b7 os::Linux::chained_handler()
> @ 0x7f1be92bc219 JVM_handle_linux_signal
> @ 0x7f1bf7bbc340 (unknown)
> @  0x1443e9a testing::UnitTest::AddTestPartResult()
> @  0x1438b99 testing::internal::AssertHelper::operator=()
> @   0xf0b3bb 
> mesos::internal::tests::ContainerizerTest<>::TearDown()
> @  0x1461882 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145c6f8 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x143de4a testing::Test::Run()
> @  0x143e584 testing::TestInfo::Run()
> @  0x143ebca testing::TestCase::Run()
> @  0x1445312 testing::internal::UnitTestImpl::RunAllTests()
> @  0x14624a7 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145d26e 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x14440ae testing::UnitTest::Run()
> @   0xd15cd4 RUN_ALL_TESTS()
> @   0xd158c1 main
> @ 0x7f1bf7808ec5 (unknown)
> @   0x913009 (unknown)
> {noformat}
> My Vagrantfile generator;
> {noformat}
> #!/usr/bin/env bash
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.box = "bento/ubuntu-14.04"
>   config.vm.hostname = "${PLATFORM_NAME}"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
> vb.customize ["modifyvm", :id, "--nictype1", "virtio"]
> vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
> vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"]
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
>   end
>   config.vm.provision "file", source: "../test.sh", destination: "~/test.sh"
>   config.vm.provision "shell", inline: <<-SHELL
> sudo apt-get update
> sudo apt-get -y install openjdk-7-jdk autoconf libtool
> sudo apt-get -y install build-essential python-dev python-boto  \
> libcurl4-nss-dev libsasl2-dev maven \
> libapr1-dev libsvn-dev libssl-dev libevent-dev
> sudo apt-get -y install git
> sudo wget -qO- https://get.docker.com/ | sh
>   SHELL
> end
> EOF
> {noformat}
> The problem is kicking in frequently in my tests - I'ld say > 10% but less 
> than 50%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.

2015-12-11 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052636#comment-15052636
 ] 

Jan Schlicht commented on MESOS-4025:
-

I saw the same behavior. Both tests have to be run in the same {{mesos-tests}} 
session to see the issue.

> SlaveRecoveryTest/0.GCExecutor is flaky.
> 
>
> Key: MESOS-4025
> URL: https://issues.apache.org/jira/browse/MESOS-4025
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>Assignee: Jan Schlicht
>  Labels: flaky, flaky-test, test
>
> Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based 
> on 0.26.0-rc1.
> Testsuite was run as root.
> {noformat}
> sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1
> {noformat}
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.GCExecutor
> I1130 16:49:16.336833  1032 exec.cpp:136] Version: 0.26.0
> I1130 16:49:16.345212  1049 exec.cpp:210] Executor registered on slave 
> dde9fd4e-b016-4a99-9081-b047e9df9afa-S0
> Registered executor on ubuntu14
> Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114
> sh -c 'sleep 1000'
> Forked command at 1057
> ../../src/tests/mesos.cpp:779: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave':
>  Device or resource busy
> *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are 
> using GNU date ***
> PC: @  0x1443e9a testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; 
> stack trace: ***
> @ 0x7f1be92b80b7 os::Linux::chained_handler()
> @ 0x7f1be92bc219 JVM_handle_linux_signal
> @ 0x7f1bf7bbc340 (unknown)
> @  0x1443e9a testing::UnitTest::AddTestPartResult()
> @  0x1438b99 testing::internal::AssertHelper::operator=()
> @   0xf0b3bb 
> mesos::internal::tests::ContainerizerTest<>::TearDown()
> @  0x1461882 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145c6f8 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x143de4a testing::Test::Run()
> @  0x143e584 testing::TestInfo::Run()
> @  0x143ebca testing::TestCase::Run()
> @  0x1445312 testing::internal::UnitTestImpl::RunAllTests()
> @  0x14624a7 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145d26e 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x14440ae testing::UnitTest::Run()
> @   0xd15cd4 RUN_ALL_TESTS()
> @   0xd158c1 main
> @ 0x7f1bf7808ec5 (unknown)
> @   0x913009 (unknown)
> {noformat}
> My Vagrantfile generator;
> {noformat}
> #!/usr/bin/env bash
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.box = "bento/ubuntu-14.04"
>   config.vm.hostname = "${PLATFORM_NAME}"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
> vb.customize ["modifyvm", :id, "--nictype1", "virtio"]
> vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
> vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"]
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
>   end
>   config.vm.provision "file", source: "../test.sh", destination: "~/test.sh"
>   config.vm.provision "shell", inline: <<-SHELL
> sudo apt-get update
> sudo apt-get -y install openjdk-7-jdk autoconf libtool
> sudo apt-get -y install build-essential python-dev python-boto  \
> libcurl4-nss-dev libsasl2-dev maven \
> libapr1-dev libsvn-dev libssl-dev libevent-dev
> sudo apt-get -y install git
> sudo wget -qO- https://get.docker.com/ | sh
>   SHELL
> end
> EOF
> {noformat}
> The problem is kicking in frequently in my tests - I'ld say > 10% but less 
> than 50%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.

2015-12-11 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052634#comment-15052634
 ] 

Jan Schlicht commented on MESOS-4025:
-

Hey, I've seen that you posted a review for that. I really appreciate the work 
you done on this, but unfortunately there was already some work done, it just 
took a bit to find a shepherd for this. Your approach to use {{MockDocker}} is 
the same that I've chosen, so I'd really appreciate if you'd take a look at my 
review request for this ticket and comment on this. Again, sorry that we both 
basically did the same work and that my review request is online just now.

> SlaveRecoveryTest/0.GCExecutor is flaky.
> 
>
> Key: MESOS-4025
> URL: https://issues.apache.org/jira/browse/MESOS-4025
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>Assignee: Jan Schlicht
>  Labels: flaky, flaky-test, test
>
> Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based 
> on 0.26.0-rc1.
> Testsuite was run as root.
> {noformat}
> sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1
> {noformat}
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.GCExecutor
> I1130 16:49:16.336833  1032 exec.cpp:136] Version: 0.26.0
> I1130 16:49:16.345212  1049 exec.cpp:210] Executor registered on slave 
> dde9fd4e-b016-4a99-9081-b047e9df9afa-S0
> Registered executor on ubuntu14
> Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114
> sh -c 'sleep 1000'
> Forked command at 1057
> ../../src/tests/mesos.cpp:779: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave':
>  Device or resource busy
> *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are 
> using GNU date ***
> PC: @  0x1443e9a testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; 
> stack trace: ***
> @ 0x7f1be92b80b7 os::Linux::chained_handler()
> @ 0x7f1be92bc219 JVM_handle_linux_signal
> @ 0x7f1bf7bbc340 (unknown)
> @  0x1443e9a testing::UnitTest::AddTestPartResult()
> @  0x1438b99 testing::internal::AssertHelper::operator=()
> @   0xf0b3bb 
> mesos::internal::tests::ContainerizerTest<>::TearDown()
> @  0x1461882 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145c6f8 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x143de4a testing::Test::Run()
> @  0x143e584 testing::TestInfo::Run()
> @  0x143ebca testing::TestCase::Run()
> @  0x1445312 testing::internal::UnitTestImpl::RunAllTests()
> @  0x14624a7 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145d26e 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x14440ae testing::UnitTest::Run()
> @   0xd15cd4 RUN_ALL_TESTS()
> @   0xd158c1 main
> @ 0x7f1bf7808ec5 (unknown)
> @   0x913009 (unknown)
> {noformat}
> My Vagrantfile generator;
> {noformat}
> #!/usr/bin/env bash
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.box = "bento/ubuntu-14.04"
>   config.vm.hostname = "${PLATFORM_NAME}"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
> vb.customize ["modifyvm", :id, "--nictype1", "virtio"]
> vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
> vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"]
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
>   end
>   config.vm.provision "file", source: "../test.sh", destination: "~/test.sh"
>   config.vm.provision "shell", inline: <<-SHELL
> sudo apt-get update
> sudo apt-get -y install openjdk-7-jdk autoconf libtool
> sudo apt-get -y install build-essential python-dev python-boto  \
> libcurl4-nss-dev libsasl2-dev maven \
> libapr1-dev libsvn-dev libssl-dev libevent-dev
> sudo apt-get -y install git
> sudo wget -qO- https://get.docker.com/ | sh
>   SHELL
> end
> EOF
> {noformat}
> The problem is kicking in frequently in my tests - I'ld say > 10% but less 
> than 50%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.

2015-12-10 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052125#comment-15052125
 ] 

haosdent commented on MESOS-4025:
-

Thank you very much for your help [~nfnt] [~greggomann]. Let me check whether 
ROOT_DOCKER_DockerHealthStatusChange break SlaveRecoveryTest

> SlaveRecoveryTest/0.GCExecutor is flaky.
> 
>
> Key: MESOS-4025
> URL: https://issues.apache.org/jira/browse/MESOS-4025
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>Assignee: Jan Schlicht
>  Labels: flaky, flaky-test, test
>
> Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based 
> on 0.26.0-rc1.
> Testsuite was run as root.
> {noformat}
> sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1
> {noformat}
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.GCExecutor
> I1130 16:49:16.336833  1032 exec.cpp:136] Version: 0.26.0
> I1130 16:49:16.345212  1049 exec.cpp:210] Executor registered on slave 
> dde9fd4e-b016-4a99-9081-b047e9df9afa-S0
> Registered executor on ubuntu14
> Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114
> sh -c 'sleep 1000'
> Forked command at 1057
> ../../src/tests/mesos.cpp:779: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave':
>  Device or resource busy
> *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are 
> using GNU date ***
> PC: @  0x1443e9a testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; 
> stack trace: ***
> @ 0x7f1be92b80b7 os::Linux::chained_handler()
> @ 0x7f1be92bc219 JVM_handle_linux_signal
> @ 0x7f1bf7bbc340 (unknown)
> @  0x1443e9a testing::UnitTest::AddTestPartResult()
> @  0x1438b99 testing::internal::AssertHelper::operator=()
> @   0xf0b3bb 
> mesos::internal::tests::ContainerizerTest<>::TearDown()
> @  0x1461882 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145c6f8 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x143de4a testing::Test::Run()
> @  0x143e584 testing::TestInfo::Run()
> @  0x143ebca testing::TestCase::Run()
> @  0x1445312 testing::internal::UnitTestImpl::RunAllTests()
> @  0x14624a7 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145d26e 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x14440ae testing::UnitTest::Run()
> @   0xd15cd4 RUN_ALL_TESTS()
> @   0xd158c1 main
> @ 0x7f1bf7808ec5 (unknown)
> @   0x913009 (unknown)
> {noformat}
> My Vagrantfile generator;
> {noformat}
> #!/usr/bin/env bash
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.box = "bento/ubuntu-14.04"
>   config.vm.hostname = "${PLATFORM_NAME}"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
> vb.customize ["modifyvm", :id, "--nictype1", "virtio"]
> vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
> vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"]
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
>   end
>   config.vm.provision "file", source: "../test.sh", destination: "~/test.sh"
>   config.vm.provision "shell", inline: <<-SHELL
> sudo apt-get update
> sudo apt-get -y install openjdk-7-jdk autoconf libtool
> sudo apt-get -y install build-essential python-dev python-boto  \
> libcurl4-nss-dev libsasl2-dev maven \
> libapr1-dev libsvn-dev libssl-dev libevent-dev
> sudo apt-get -y install git
> sudo wget -qO- https://get.docker.com/ | sh
>   SHELL
> end
> EOF
> {noformat}
> The problem is kicking in frequently in my tests - I'ld say > 10% but less 
> than 50%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.

2015-12-09 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049741#comment-15049741
 ] 

Greg Mann commented on MESOS-4025:
--

I was just observing this error on Ubuntu 14.04 when running the tests as root. 
It does indeed seem to be due to artifacts left behind by some tests. After 
running the entire test suite the first time I saw that several of the 
{{SlaveRecoveryTest}}s had failed:

{code}
[==] 988 tests from 144 test cases ran. (590989 ms total)
[  PASSED  ] 980 tests.
[  FAILED  ] 8 tests, listed below:
[  FAILED  ] SlaveRecoveryTest/0.GCExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.ShutdownSlave, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.ShutdownSlaveSIGUSR1, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveTest.PingTimeoutNoPings
[  FAILED  ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample
[  FAILED  ] CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_Perf
[  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
[  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery
{code}

and the {{SlaveRecoveryTest}} errors all looked the same:

{code}
[ RUN  ] SlaveRecoveryTest/0.ShutdownSlave
../../src/tests/mesos.cpp:906: Failure
(cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
'/sys/fs/cgroup/memory/mesos_test_55fd4c14-a506-40ec-8965-2350ca5556c7/slave': 
Device or resource busy
---
We're very sorry but we can't seem to destroy existing
cgroups that we likely created as part of an earlier
invocation of the tests. Please manually destroy the cgroup
at 
'/sys/fs/cgroup/memory/mesos_test_55fd4c14-a506-40ec-8965-2350ca5556c7/slave' 
by first
manually killing all the processes found in the file at 
'/sys/fs/cgroup/memory/mesos_test_55fd4c14-a506-40ec-8965-2350ca5556c7/slave/tasks'
---
../../src/tests/mesos.cpp:940: Failure
(cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
'/sys/fs/cgroup/memory/mesos_test_55fd4c14-a506-40ec-8965-2350ca5556c7/slave': 
Device or resource busy
[  FAILED  ] SlaveRecoveryTest/0.ShutdownSlave, where TypeParam = 
mesos::internal::slave::MesosContainerizer (31 ms)
{code}

Next, re-running just these tests with {{sudo GTEST_FILTER="SlaveRecoverTest*" 
bin/mesos-tests.sh}}, they _all_ failed, with a slightly different error. At 
this point, these failures are reliably produced 100% of the time:

{code}
[ RUN  ] SlaveRecoveryTest/0.MultipleSlaves
../../src/tests/mesos.cpp:906: Failure
(cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
'/sys/fs/cgroup/perf_event/mesos_test': Device or resource busy
---
We're very sorry but we can't seem to destroy existing
cgroups that we likely created as part of an earlier
invocation of the tests. Please manually destroy the cgroup
at '/sys/fs/cgroup/perf_event/mesos_test' by first
manually killing all the processes found in the file at 
'/sys/fs/cgroup/perf_event/mesos_test/tasks'
---
../../src/tests/mesos.cpp:940: Failure
(cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
'/sys/fs/cgroup/perf_event/mesos_test': Device or resource busy
[  FAILED  ] SlaveRecoveryTest/0.MultipleSlaves, where TypeParam = 
mesos::internal::slave::MesosContainerizer (14 ms)
{code}

Finally, I recompiled from scratch and ran _only_ the {{SlaveRecoveryTest}}s, 
doing:

{code}
GTEST_FILTER="" make check
sudo GTEST_FILTER="SlaveRecoveryTest*" bin/mesos-tests.sh
{code}

and they all passed. Regarding [~nfnt]'s comment above on the 
{{HealthCheckTest}}s, after all this I was able to run {{sudo 
GTEST_FILTER="HealthCheckTest*" bin/mesos-tests.sh}}, followed by {{sudo 
GTEST_FILTER="SlaveRecoveryTest*" bin/mesos-tests.sh}}, and all of the 
{{SlaveRecoveryTest}}s passed, so perhaps it isn't an artifact of the 
{{HealthCheckTest}}s that is causing this problem.

> SlaveRecoveryTest/0.GCExecutor is flaky.
> 
>
> Key: MESOS-4025
> URL: https://issues.apache.org/jira/browse/MESOS-4025
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>Assignee: Jan Schlicht
>  Labels: flaky, flaky-test, test
>
> Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based 
> on 0.26.0-rc1.
> Testsuite was run as root.
> {noformat}
> sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1
> {noformat}
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.GCExecutor
> I1130 16:49:16.336833  1032 exec.cpp:136] 

[jira] [Commented] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.

2015-12-04 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15041534#comment-15041534
 ] 

Jan Schlicht commented on MESOS-4025:
-

Comparing {{HealthCheckTest.ROOT_DOCKER_*}} tests with e.g. 
{{DockerContainerizerTest.ROOT_DOCKER_*}} tests, the difference is that the 
ones in {{HealthCheckTest}} don't use mocks and futures to ensure the 
termination of the started container. Not doing this probably leaves some 
artifacts (in form of locked cgroups) that aren't cleaned up when 
{{GCExecutor}} runs and result in the crash during tear down of the test.

> SlaveRecoveryTest/0.GCExecutor is flaky.
> 
>
> Key: MESOS-4025
> URL: https://issues.apache.org/jira/browse/MESOS-4025
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>Assignee: Jan Schlicht
>  Labels: flaky, flaky-test, test
>
> Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based 
> on 0.26.0-rc1.
> Testsuite was run as root.
> {noformat}
> sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1
> {noformat}
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.GCExecutor
> I1130 16:49:16.336833  1032 exec.cpp:136] Version: 0.26.0
> I1130 16:49:16.345212  1049 exec.cpp:210] Executor registered on slave 
> dde9fd4e-b016-4a99-9081-b047e9df9afa-S0
> Registered executor on ubuntu14
> Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114
> sh -c 'sleep 1000'
> Forked command at 1057
> ../../src/tests/mesos.cpp:779: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave':
>  Device or resource busy
> *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are 
> using GNU date ***
> PC: @  0x1443e9a testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; 
> stack trace: ***
> @ 0x7f1be92b80b7 os::Linux::chained_handler()
> @ 0x7f1be92bc219 JVM_handle_linux_signal
> @ 0x7f1bf7bbc340 (unknown)
> @  0x1443e9a testing::UnitTest::AddTestPartResult()
> @  0x1438b99 testing::internal::AssertHelper::operator=()
> @   0xf0b3bb 
> mesos::internal::tests::ContainerizerTest<>::TearDown()
> @  0x1461882 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145c6f8 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x143de4a testing::Test::Run()
> @  0x143e584 testing::TestInfo::Run()
> @  0x143ebca testing::TestCase::Run()
> @  0x1445312 testing::internal::UnitTestImpl::RunAllTests()
> @  0x14624a7 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145d26e 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x14440ae testing::UnitTest::Run()
> @   0xd15cd4 RUN_ALL_TESTS()
> @   0xd158c1 main
> @ 0x7f1bf7808ec5 (unknown)
> @   0x913009 (unknown)
> {noformat}
> My Vagrantfile generator;
> {noformat}
> #!/usr/bin/env bash
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.box = "bento/ubuntu-14.04"
>   config.vm.hostname = "${PLATFORM_NAME}"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
> vb.customize ["modifyvm", :id, "--nictype1", "virtio"]
> vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
> vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"]
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
>   end
>   config.vm.provision "file", source: "../test.sh", destination: "~/test.sh"
>   config.vm.provision "shell", inline: <<-SHELL
> sudo apt-get update
> sudo apt-get -y install openjdk-7-jdk autoconf libtool
> sudo apt-get -y install build-essential python-dev python-boto  \
> libcurl4-nss-dev libsasl2-dev maven \
> libapr1-dev libsvn-dev libssl-dev libevent-dev
> sudo apt-get -y install git
> sudo wget -qO- https://get.docker.com/ | sh
>   SHELL
> end
> EOF
> {noformat}
> The problem is kicking in frequently in my tests - I'ld say > 10% but less 
> than 50%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.

2015-12-01 Thread Jojy Varghese (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034701#comment-15034701
 ] 

Jojy Varghese commented on MESOS-4025:
--

On debian8:

{code}
[ RUN  ] SlaveRecoveryTest/0.Reboot
I1201 21:57:11.562711  7964 exec.cpp:136] Version: 0.26.0
I1201 21:57:11.571506  7978 exec.cpp:210] Executor registered on slave 
00a179f0-f087-4054-a0c7-c15281d5e7ff-S0
Registered executor on debian8
Starting task 791255fc-88dd-452e-ba12-6b2dfced99a0
Forked command at 7987
sh -c 'sleep 1000'
I1201 21:57:11.640627  7982 exec.cpp:383] Executor asked to shutdown
Shutting down
Sending SIGTERM to process tree at pid 7987
Killing the following process trees:
[ 
-+- 7987 sh -c sleep 1000 
 \--- 7988 sleep 1000 
]
Command terminated with signal Terminated (pid: 7987)
[   OK ] SlaveRecoveryTest/0.Reboot (1730 ms)
[ RUN  ] SlaveRecoveryTest/0.GCExecutor
2015-12-01 
21:57:13,187:1473(0x7f9bf4e36700):ZOO_ERROR@handle_socket_error_msg@1697: 
Socket [127.0.0.1:44262] zk retcode=-4, errno=111(Connection refused): server 
refused to accept the client
I1201 21:57:13.296581  8012 exec.cpp:136] Version: 0.26.0
I1201 21:57:13.305498  8028 exec.cpp:210] Executor registered on slave 
44a46bd2-d24a-48d6-bd62-492c15845841-S0
Registered executor on debian8
Starting task 8affc624-c95d-43f5-a2b9-967663c3151b
sh -c 'sleep 1000'
Forked command at 8035
../../src/tests/mesos.cpp:781: Failure
(cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
'/sys/fs/cgroup/memory/mesos_test_a894bd47-5e1a-4442-bc6b-303d2aed6945/slave': 
Device or resource busy
*** Aborted at 1449007033 (unix time) try "date -d @1449007033" if you are 
using GNU date ***
PC: @  0x14b079e testing::UnitTest::AddTestPartResult()
*** SIGSEGV (@0x0) received by PID 1473 (TID 0x7f9c3db5d7c0) from PID 0; stack 
trace: ***
@ 0x7f9c28c2166c os::Linux::chained_handler()
@ 0x7f9c28c25a0a JVM_handle_linux_signal
@ 0x7f9c374728d0 (unknown)
@  0x14b079e testing::UnitTest::AddTestPartResult()
@  0x14a51d7 testing::internal::AssertHelper::operator=()
@   0xf564c1 mesos::internal::tests::ContainerizerTest<>::TearDown()
@  0x14ce2c0 
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@  0x14c9238 
testing::internal::HandleExceptionsInMethodIfSupported<>()
@  0x14aa5c0 testing::Test::Run()
@  0x14aad05 testing::TestInfo::Run()
@  0x14ab340 testing::TestCase::Run()
@  0x14b1c8f testing::internal::UnitTestImpl::RunAllTests()
@  0x14cef4f 
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@  0x14c9d8e 
testing::internal::HandleExceptionsInMethodIfSupported<>()
@  0x14b09bf testing::UnitTest::Run()
@   0xd63df2 RUN_ALL_TESTS()
@   0xd639d0 main
@ 0x7f9c370dbb45 (unknown)
@   0x9588e9 (unknown)

{code}

* The crash was inside 
*ContainerizerTest::TearDown*. 
* The assertion *AWAIT_READY(cgroups::destroy(hierarchy, cgroup));* failed. The 
cgroup in question was 
*/sys/fs/cgroup/memory/mesos_test_a894bd47-5e1a-4442-bc6b-303d2aed6945/slave* 
as seen from the log above.

> SlaveRecoveryTest/0.GCExecutor is flaky.
> 
>
> Key: MESOS-4025
> URL: https://issues.apache.org/jira/browse/MESOS-4025
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>  Labels: test
>
> Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based 
> on 0.26.0-rc1.
> Testsuite was run as root.
> {noformat}
> sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1
> {noformat}
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.GCExecutor
> I1130 16:49:16.336833  1032 exec.cpp:136] Version: 0.26.0
> I1130 16:49:16.345212  1049 exec.cpp:210] Executor registered on slave 
> dde9fd4e-b016-4a99-9081-b047e9df9afa-S0
> Registered executor on ubuntu14
> Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114
> sh -c 'sleep 1000'
> Forked command at 1057
> ../../src/tests/mesos.cpp:779: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave':
>  Device or resource busy
> *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are 
> using GNU date ***
> PC: @  0x1443e9a testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; 
> stack trace: ***
> @ 0x7f1be92b80b7 os::Linux::chained_handler()
> @ 0x7f1be92bc219 JVM_handle_linux_signal
> @ 0x7f1bf7bbc340 (unknown)
> @  0x1443e9a testing::UnitTest::AddTestPartResult()
> @  0x1438b99 

[jira] [Commented] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.

2015-12-01 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033479#comment-15033479
 ] 

Jan Schlicht commented on MESOS-4025:
-

{{sudo ./bin/mesos-tests.sh --gtest_repeat=1 --gtest_break_on_failure 
--gtest_filter="*ROOT_DOCKER_DockerHealthStatusChange:SlaveRecoveryTest*GCExecutor"}}
 also triggers the failure. Seems that there's some problem during clean-up of 
the HealthCheckTest fixture.

> SlaveRecoveryTest/0.GCExecutor is flaky.
> 
>
> Key: MESOS-4025
> URL: https://issues.apache.org/jira/browse/MESOS-4025
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>
> Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based 
> on 0.26.0-rc1.
> Testsuite was run as root.
> {noformat}
> sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1
> {noformat}
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.GCExecutor
> I1130 16:49:16.336833  1032 exec.cpp:136] Version: 0.26.0
> I1130 16:49:16.345212  1049 exec.cpp:210] Executor registered on slave 
> dde9fd4e-b016-4a99-9081-b047e9df9afa-S0
> Registered executor on ubuntu14
> Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114
> sh -c 'sleep 1000'
> Forked command at 1057
> ../../src/tests/mesos.cpp:779: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave':
>  Device or resource busy
> *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are 
> using GNU date ***
> PC: @  0x1443e9a testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; 
> stack trace: ***
> @ 0x7f1be92b80b7 os::Linux::chained_handler()
> @ 0x7f1be92bc219 JVM_handle_linux_signal
> @ 0x7f1bf7bbc340 (unknown)
> @  0x1443e9a testing::UnitTest::AddTestPartResult()
> @  0x1438b99 testing::internal::AssertHelper::operator=()
> @   0xf0b3bb 
> mesos::internal::tests::ContainerizerTest<>::TearDown()
> @  0x1461882 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145c6f8 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x143de4a testing::Test::Run()
> @  0x143e584 testing::TestInfo::Run()
> @  0x143ebca testing::TestCase::Run()
> @  0x1445312 testing::internal::UnitTestImpl::RunAllTests()
> @  0x14624a7 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145d26e 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x14440ae testing::UnitTest::Run()
> @   0xd15cd4 RUN_ALL_TESTS()
> @   0xd158c1 main
> @ 0x7f1bf7808ec5 (unknown)
> @   0x913009 (unknown)
> {noformat}
> My Vagrantfile generator;
> {noformat}
> #!/usr/bin/env bash
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.box = "bento/ubuntu-14.04"
>   config.vm.hostname = "${PLATFORM_NAME}"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
> vb.customize ["modifyvm", :id, "--nictype1", "virtio"]
> vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
> vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"]
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
>   end
>   config.vm.provision "file", source: "../test.sh", destination: "~/test.sh"
>   config.vm.provision "shell", inline: <<-SHELL
> sudo apt-get update
> sudo apt-get -y install openjdk-7-jdk autoconf libtool
> sudo apt-get -y install build-essential python-dev python-boto  \
> libcurl4-nss-dev libsasl2-dev maven \
> libapr1-dev libsvn-dev libssl-dev libevent-dev
> sudo apt-get -y install git
> sudo wget -qO- https://get.docker.com/ | sh
>   SHELL
> end
> EOF
> {noformat}
> The problem is kicking in frequently in my tests - I'ld say > 10% but less 
> than 50%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.

2015-12-01 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033828#comment-15033828
 ] 

Till Toenshoff commented on MESOS-4025:
---

Thanks for your analysis. We will declare it as a non-blocker as it is a 
test-only issue according to your research. Thanks again!!

> SlaveRecoveryTest/0.GCExecutor is flaky.
> 
>
> Key: MESOS-4025
> URL: https://issues.apache.org/jira/browse/MESOS-4025
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>  Labels: test
>
> Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based 
> on 0.26.0-rc1.
> Testsuite was run as root.
> {noformat}
> sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1
> {noformat}
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.GCExecutor
> I1130 16:49:16.336833  1032 exec.cpp:136] Version: 0.26.0
> I1130 16:49:16.345212  1049 exec.cpp:210] Executor registered on slave 
> dde9fd4e-b016-4a99-9081-b047e9df9afa-S0
> Registered executor on ubuntu14
> Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114
> sh -c 'sleep 1000'
> Forked command at 1057
> ../../src/tests/mesos.cpp:779: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave':
>  Device or resource busy
> *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are 
> using GNU date ***
> PC: @  0x1443e9a testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; 
> stack trace: ***
> @ 0x7f1be92b80b7 os::Linux::chained_handler()
> @ 0x7f1be92bc219 JVM_handle_linux_signal
> @ 0x7f1bf7bbc340 (unknown)
> @  0x1443e9a testing::UnitTest::AddTestPartResult()
> @  0x1438b99 testing::internal::AssertHelper::operator=()
> @   0xf0b3bb 
> mesos::internal::tests::ContainerizerTest<>::TearDown()
> @  0x1461882 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145c6f8 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x143de4a testing::Test::Run()
> @  0x143e584 testing::TestInfo::Run()
> @  0x143ebca testing::TestCase::Run()
> @  0x1445312 testing::internal::UnitTestImpl::RunAllTests()
> @  0x14624a7 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145d26e 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x14440ae testing::UnitTest::Run()
> @   0xd15cd4 RUN_ALL_TESTS()
> @   0xd158c1 main
> @ 0x7f1bf7808ec5 (unknown)
> @   0x913009 (unknown)
> {noformat}
> My Vagrantfile generator;
> {noformat}
> #!/usr/bin/env bash
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.box = "bento/ubuntu-14.04"
>   config.vm.hostname = "${PLATFORM_NAME}"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
> vb.customize ["modifyvm", :id, "--nictype1", "virtio"]
> vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
> vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"]
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
>   end
>   config.vm.provision "file", source: "../test.sh", destination: "~/test.sh"
>   config.vm.provision "shell", inline: <<-SHELL
> sudo apt-get update
> sudo apt-get -y install openjdk-7-jdk autoconf libtool
> sudo apt-get -y install build-essential python-dev python-boto  \
> libcurl4-nss-dev libsasl2-dev maven \
> libapr1-dev libsvn-dev libssl-dev libevent-dev
> sudo apt-get -y install git
> sudo wget -qO- https://get.docker.com/ | sh
>   SHELL
> end
> EOF
> {noformat}
> The problem is kicking in frequently in my tests - I'ld say > 10% but less 
> than 50%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.

2015-11-30 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032840#comment-15032840
 ] 

Till Toenshoff commented on MESOS-4025:
---

{noformat}
[ RUN  ] SlaveRecoveryTest/0.GCExecutor
I1201 01:07:05.936290 25504 leveldb.cpp:176] Opened db in 3.468541ms
I1201 01:07:05.937160 25504 leveldb.cpp:183] Compacted db in 805344ns
I1201 01:07:05.937222 25504 leveldb.cpp:198] Created db iterator in 15726ns
I1201 01:07:05.937234 25504 leveldb.cpp:204] Seeked to beginning of db in 1106ns
I1201 01:07:05.937242 25504 leveldb.cpp:273] Iterated through 0 keys in the db 
in 118ns
I1201 01:07:05.937283 25504 replica.cpp:780] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1201 01:07:05.937808 25522 recover.cpp:449] Starting replica recovery
I1201 01:07:05.938863 25524 master.cpp:367] Master 
3a496a61-8184-4416-b323-27ac6436661d (ubuntu14) started on 127.0.1.1:58897
I1201 01:07:05.938920 25519 recover.cpp:475] Replica is in EMPTY status
I1201 01:07:05.938886 25524 master.cpp:369] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/uDNEII/credentials" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="25secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/uDNEII/master" 
--zk_session_timeout="10secs"
I1201 01:07:05.939036 25524 master.cpp:414] Master only allowing authenticated 
frameworks to register
I1201 01:07:05.939045 25524 master.cpp:419] Master only allowing authenticated 
slaves to register
I1201 01:07:05.939062 25524 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/uDNEII/credentials'
I1201 01:07:05.939237 25524 master.cpp:458] Using default 'crammd5' 
authenticator
I1201 01:07:05.939326 25524 master.cpp:495] Authorization enabled
I1201 01:07:05.939713 25521 replica.cpp:676] Replica in EMPTY status received a 
broadcasted recover request from (8245)@127.0.1.1:58897
I1201 01:07:05.940044 25522 recover.cpp:195] Received a recover response from a 
replica in EMPTY status
I1201 01:07:05.940450 25524 recover.cpp:566] Updating replica status to STARTING
I1201 01:07:05.940498 25521 master.cpp:1606] The newly elected leader is 
master@127.0.1.1:58897 with id 3a496a61-8184-4416-b323-27ac6436661d
I1201 01:07:05.940523 25521 master.cpp:1619] Elected as the leading master!
I1201 01:07:05.940537 25521 master.cpp:1379] Recovering from registrar
I1201 01:07:05.940670 25524 registrar.cpp:309] Recovering registrar
I1201 01:07:05.941493 25522 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 943836ns
I1201 01:07:05.941589 25522 replica.cpp:323] Persisted replica status to 
STARTING
I1201 01:07:05.941745 25522 recover.cpp:475] Replica is in STARTING status
I1201 01:07:05.942773 25522 replica.cpp:676] Replica in STARTING status 
received a broadcasted recover request from (8246)@127.0.1.1:58897
I1201 01:07:05.943061 25522 recover.cpp:195] Received a recover response from a 
replica in STARTING status
I1201 01:07:05.943423 25522 recover.cpp:566] Updating replica status to VOTING
I1201 01:07:05.943894 25522 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 333689ns
I1201 01:07:05.943923 25522 replica.cpp:323] Persisted replica status to VOTING
I1201 01:07:05.943987 25522 recover.cpp:580] Successfully joined the Paxos group
I1201 01:07:05.944084 25522 recover.cpp:464] Recover process terminated
I1201 01:07:05.944319 25519 log.cpp:661] Attempting to start the writer
I1201 01:07:05.945137 25519 replica.cpp:496] Replica received implicit promise 
request from (8247)@127.0.1.1:58897 with proposal 1
I1201 01:07:05.945484 25519 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 321958ns
I1201 01:07:05.945513 25519 replica.cpp:345] Persisted promised to 1
I1201 01:07:05.945940 25524 coordinator.cpp:240] Coordinator attempting to fill 
missing positions
I1201 01:07:05.947418 25522 replica.cpp:391] Replica received explicit promise 
request from (8248)@127.0.1.1:58897 for position 0 with proposal 2
I1201 01:07:05.947824 25522 leveldb.cpp:343] Persisting action (8 bytes) to 
leveldb took 307454ns
I1201 01:07:05.947880 25522 replica.cpp:715] Persisted action at 0
I1201 01:07:05.948781 25519 replica.cpp:540] Replica received write request for 
position 0 from (8249)@127.0.1.1:58897
I1201 01:07:05.948858 25519 leveldb.cpp:438] Reading position