[jira] [Updated] (MESOS-4038) SlaveRecoveryTests, UserCgroupIsolatorTests fail on CentOS 6.6
[ https://issues.apache.org/jira/browse/MESOS-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-4038: - Summary: SlaveRecoveryTests, UserCgroupIsolatorTests fail on CentOS 6.6 (was: SlaveRecoveryTests fail on CentOS 6.6) > SlaveRecoveryTests, UserCgroupIsolatorTests fail on CentOS 6.6 > -- > > Key: MESOS-4038 > URL: https://issues.apache.org/jira/browse/MESOS-4038 > Project: Mesos > Issue Type: Bug > Environment: CentOS 6.6 >Reporter: Greg Mann > Labels: mesosphere, test-failure > > All {{SlaveRecoveryTest.\*}} tests, > {{MesosContainerizerSlaveRecoveryTest.\*}} tests, and > {{UserCgroupIsolatorTest*}} tests fail on CentOS 6.6 with {{TypeParam = > mesos::internal::slave::MesosContainerizer}}. They all fail with the same > error: > {code} > [--] 1 test from SlaveRecoveryTest/0, where TypeParam = > mesos::internal::slave::MesosContainerizer > [ RUN ] SlaveRecoveryTest/0.ReconnectExecutor > ../../src/tests/mesos.cpp:722: Failure > cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in > the file system > - > We cannot run any cgroups tests that require > a hierarchy with subsystem 'perf_event' > because we failed to find an existing hierarchy > or create a new one (tried '/cgroup/perf_event'). > You can either remove all existing > hierarchies, or disable this test case > (i.e., --gtest_filter=-SlaveRecoveryTest/0.*). > - > ../../src/tests/mesos.cpp:776: Failure > cgroups: '/cgroup/perf_event' is not a valid hierarchy > [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = > mesos::internal::slave::MesosContainerizer (8 ms) > [--] 1 test from SlaveRecoveryTest/0 (9 ms total) > [--] Global test environment tear-down > [==] 1 test from 1 test case ran. (15 ms total) > [ PASSED ] 0 tests. > [ FAILED ] 1 test, listed below: > [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = > mesos::internal::slave::MesosContainerizer > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4038) SlaveRecoveryTests fail on CentOS 6.6
[ https://issues.apache.org/jira/browse/MESOS-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-4038: - Description: All {{SlaveRecoveryTest.\*}} tests, {{MesosContainerizerSlaveRecoveryTest.\*}} tests, and {{UserCgroupIsolatorTest*}} tests fail on CentOS 6.6 with {{TypeParam = mesos::internal::slave::MesosContainerizer}}. They all fail with the same error: {code} [--] 1 test from SlaveRecoveryTest/0, where TypeParam = mesos::internal::slave::MesosContainerizer [ RUN ] SlaveRecoveryTest/0.ReconnectExecutor ../../src/tests/mesos.cpp:722: Failure cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in the file system - We cannot run any cgroups tests that require a hierarchy with subsystem 'perf_event' because we failed to find an existing hierarchy or create a new one (tried '/cgroup/perf_event'). You can either remove all existing hierarchies, or disable this test case (i.e., --gtest_filter=-SlaveRecoveryTest/0.*). - ../../src/tests/mesos.cpp:776: Failure cgroups: '/cgroup/perf_event' is not a valid hierarchy [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer (8 ms) [--] 1 test from SlaveRecoveryTest/0 (9 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (15 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer {code} was: All {{SlaveRecoveryTest.\*}} tests and {{MesosContainerizerSlaveRecoveryTest.\*}} tests fail on CentOS 6.6 with {{TypeParam = mesos::internal::slave::MesosContainerizer}}. They all fail with the same error: {code} [--] 1 test from SlaveRecoveryTest/0, where TypeParam = mesos::internal::slave::MesosContainerizer [ RUN ] SlaveRecoveryTest/0.ReconnectExecutor ../../src/tests/mesos.cpp:722: Failure cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in the file system - We cannot run any cgroups tests that require a hierarchy with subsystem 'perf_event' because we failed to find an existing hierarchy or create a new one (tried '/cgroup/perf_event'). You can either remove all existing hierarchies, or disable this test case (i.e., --gtest_filter=-SlaveRecoveryTest/0.*). - ../../src/tests/mesos.cpp:776: Failure cgroups: '/cgroup/perf_event' is not a valid hierarchy [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer (8 ms) [--] 1 test from SlaveRecoveryTest/0 (9 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (15 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer {code} > SlaveRecoveryTests fail on CentOS 6.6 > - > > Key: MESOS-4038 > URL: https://issues.apache.org/jira/browse/MESOS-4038 > Project: Mesos > Issue Type: Bug > Environment: CentOS 6.6 >Reporter: Greg Mann > Labels: mesosphere, test-failure > > All {{SlaveRecoveryTest.\*}} tests, > {{MesosContainerizerSlaveRecoveryTest.\*}} tests, and > {{UserCgroupIsolatorTest*}} tests fail on CentOS 6.6 with {{TypeParam = > mesos::internal::slave::MesosContainerizer}}. They all fail with the same > error: > {code} > [--] 1 test from SlaveRecoveryTest/0, where TypeParam = > mesos::internal::slave::MesosContainerizer > [ RUN ] SlaveRecoveryTest/0.ReconnectExecutor > ../../src/tests/mesos.cpp:722: Failure > cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in > the file system > - > We cannot run any cgroups tests that require > a hierarchy with subsystem 'perf_event' > because we failed to find an existing hierarchy > or create a new one (tried '/cgroup/perf_event'). > You can either remove all existing > hierarchies, or disable this test case > (i.e., --gtest_filter=-SlaveRecoveryTest/0.*). > - > ../../src/tests/mesos.cpp:776: Failure > cgroups: '/cgroup/perf_event' is not a valid hierarchy > [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = > mesos::internal::slave::MesosContainerizer (8 ms) > [--] 1 test from SlaveRecoveryTest/0 (9 ms total) > [--] Global test environment tear-down > [==] 1 test from 1 test case ran. (15 ms t
[jira] [Created] (MESOS-4039) PerfEventIsolatorTest.ROOT_CGROUPS_Sample fails
Greg Mann created MESOS-4039: Summary: PerfEventIsolatorTest.ROOT_CGROUPS_Sample fails Key: MESOS-4039 URL: https://issues.apache.org/jira/browse/MESOS-4039 Project: Mesos Issue Type: Bug Reporter: Greg Mann PerfEventIsolatorTest.ROOT_CGROUPS_Sample fails on CentOS 6.6: {code} [--] 1 test from PerfEventIsolatorTest [ RUN ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample ../../src/tests/containerizer/isolator_tests.cpp:848: Failure isolator: Perf is not supported [ FAILED ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample (79 ms) [--] 1 test from PerfEventIsolatorTest (79 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (86 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4038) SlaveRecoveryTests fail on CentOS 6.6
[ https://issues.apache.org/jira/browse/MESOS-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-4038: - Description: All {{SlaveRecoveryTest.*}} tests and {{MesosContainerizerSlaveRecoveryTest.*}} tests fail on CentOS 6.6 with {{TypeParam = mesos::internal::slave::MesosContainerizer}}. They all fail with the same error: {code} [--] 1 test from SlaveRecoveryTest/0, where TypeParam = mesos::internal::slave::MesosContainerizer [ RUN ] SlaveRecoveryTest/0.ReconnectExecutor ../../src/tests/mesos.cpp:722: Failure cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in the file system - We cannot run any cgroups tests that require a hierarchy with subsystem 'perf_event' because we failed to find an existing hierarchy or create a new one (tried '/cgroup/perf_event'). You can either remove all existing hierarchies, or disable this test case (i.e., --gtest_filter=-SlaveRecoveryTest/0.*). - ../../src/tests/mesos.cpp:776: Failure cgroups: '/cgroup/perf_event' is not a valid hierarchy [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer (8 ms) [--] 1 test from SlaveRecoveryTest/0 (9 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (15 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer {code} was: All {{SlaveRecoveryTest.*}} tests fail on CentOS 6.6 with {{TypeParam = mesos::internal::slave::MesosContainerizer}}. They all fail with the same error: {code} [--] 1 test from SlaveRecoveryTest/0, where TypeParam = mesos::internal::slave::MesosContainerizer [ RUN ] SlaveRecoveryTest/0.ReconnectExecutor ../../src/tests/mesos.cpp:722: Failure cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in the file system - We cannot run any cgroups tests that require a hierarchy with subsystem 'perf_event' because we failed to find an existing hierarchy or create a new one (tried '/cgroup/perf_event'). You can either remove all existing hierarchies, or disable this test case (i.e., --gtest_filter=-SlaveRecoveryTest/0.*). - ../../src/tests/mesos.cpp:776: Failure cgroups: '/cgroup/perf_event' is not a valid hierarchy [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer (8 ms) [--] 1 test from SlaveRecoveryTest/0 (9 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (15 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer {code} > SlaveRecoveryTests fail on CentOS 6.6 > - > > Key: MESOS-4038 > URL: https://issues.apache.org/jira/browse/MESOS-4038 > Project: Mesos > Issue Type: Bug > Environment: CentOS 6.6 >Reporter: Greg Mann > Labels: mesosphere, test-failure > > All {{SlaveRecoveryTest.*}} tests and > {{MesosContainerizerSlaveRecoveryTest.*}} tests fail on CentOS 6.6 with > {{TypeParam = mesos::internal::slave::MesosContainerizer}}. They all fail > with the same error: > {code} > [--] 1 test from SlaveRecoveryTest/0, where TypeParam = > mesos::internal::slave::MesosContainerizer > [ RUN ] SlaveRecoveryTest/0.ReconnectExecutor > ../../src/tests/mesos.cpp:722: Failure > cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in > the file system > - > We cannot run any cgroups tests that require > a hierarchy with subsystem 'perf_event' > because we failed to find an existing hierarchy > or create a new one (tried '/cgroup/perf_event'). > You can either remove all existing > hierarchies, or disable this test case > (i.e., --gtest_filter=-SlaveRecoveryTest/0.*). > - > ../../src/tests/mesos.cpp:776: Failure > cgroups: '/cgroup/perf_event' is not a valid hierarchy > [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = > mesos::internal::slave::MesosContainerizer (8 ms) > [--] 1 test from SlaveRecoveryTest/0 (9 ms total) > [--] Global test environment tear-down > [==] 1 test from 1 test case ran. (15 ms total) > [ PASSED ] 0 tests. > [ FAILED ] 1 test, listed below: > [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam
[jira] [Updated] (MESOS-4038) SlaveRecoveryTests fail on CentOS 6.6
[ https://issues.apache.org/jira/browse/MESOS-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-4038: - Description: All {{SlaveRecoveryTest.\*}} tests and {{MesosContainerizerSlaveRecoveryTest.\*}} tests fail on CentOS 6.6 with {{TypeParam = mesos::internal::slave::MesosContainerizer}}. They all fail with the same error: {code} [--] 1 test from SlaveRecoveryTest/0, where TypeParam = mesos::internal::slave::MesosContainerizer [ RUN ] SlaveRecoveryTest/0.ReconnectExecutor ../../src/tests/mesos.cpp:722: Failure cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in the file system - We cannot run any cgroups tests that require a hierarchy with subsystem 'perf_event' because we failed to find an existing hierarchy or create a new one (tried '/cgroup/perf_event'). You can either remove all existing hierarchies, or disable this test case (i.e., --gtest_filter=-SlaveRecoveryTest/0.*). - ../../src/tests/mesos.cpp:776: Failure cgroups: '/cgroup/perf_event' is not a valid hierarchy [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer (8 ms) [--] 1 test from SlaveRecoveryTest/0 (9 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (15 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer {code} was: All {{SlaveRecoveryTest.*}} tests and {{MesosContainerizerSlaveRecoveryTest.*}} tests fail on CentOS 6.6 with {{TypeParam = mesos::internal::slave::MesosContainerizer}}. They all fail with the same error: {code} [--] 1 test from SlaveRecoveryTest/0, where TypeParam = mesos::internal::slave::MesosContainerizer [ RUN ] SlaveRecoveryTest/0.ReconnectExecutor ../../src/tests/mesos.cpp:722: Failure cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in the file system - We cannot run any cgroups tests that require a hierarchy with subsystem 'perf_event' because we failed to find an existing hierarchy or create a new one (tried '/cgroup/perf_event'). You can either remove all existing hierarchies, or disable this test case (i.e., --gtest_filter=-SlaveRecoveryTest/0.*). - ../../src/tests/mesos.cpp:776: Failure cgroups: '/cgroup/perf_event' is not a valid hierarchy [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer (8 ms) [--] 1 test from SlaveRecoveryTest/0 (9 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (15 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer {code} > SlaveRecoveryTests fail on CentOS 6.6 > - > > Key: MESOS-4038 > URL: https://issues.apache.org/jira/browse/MESOS-4038 > Project: Mesos > Issue Type: Bug > Environment: CentOS 6.6 >Reporter: Greg Mann > Labels: mesosphere, test-failure > > All {{SlaveRecoveryTest.\*}} tests and > {{MesosContainerizerSlaveRecoveryTest.\*}} tests fail on CentOS 6.6 with > {{TypeParam = mesos::internal::slave::MesosContainerizer}}. They all fail > with the same error: > {code} > [--] 1 test from SlaveRecoveryTest/0, where TypeParam = > mesos::internal::slave::MesosContainerizer > [ RUN ] SlaveRecoveryTest/0.ReconnectExecutor > ../../src/tests/mesos.cpp:722: Failure > cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in > the file system > - > We cannot run any cgroups tests that require > a hierarchy with subsystem 'perf_event' > because we failed to find an existing hierarchy > or create a new one (tried '/cgroup/perf_event'). > You can either remove all existing > hierarchies, or disable this test case > (i.e., --gtest_filter=-SlaveRecoveryTest/0.*). > - > ../../src/tests/mesos.cpp:776: Failure > cgroups: '/cgroup/perf_event' is not a valid hierarchy > [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = > mesos::internal::slave::MesosContainerizer (8 ms) > [--] 1 test from SlaveRecoveryTest/0 (9 ms total) > [--] Global test environment tear-down > [==] 1 test from 1 test case ran. (15 ms total) > [ PASSED ] 0 tests. > [ FAILED ] 1 test, listed below: > [ FAILED
[jira] [Commented] (MESOS-3831) Document operator HTTP endpoints
[ https://issues.apache.org/jira/browse/MESOS-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035386#comment-15035386 ] Klaus Ma commented on MESOS-3831: - +1 to have a single page to list all HTTP endpoints. > Document operator HTTP endpoints > > > Key: MESOS-3831 > URL: https://issues.apache.org/jira/browse/MESOS-3831 > Project: Mesos > Issue Type: Documentation > Components: documentation >Reporter: Neil Conway >Priority: Minor > Labels: documentation, mesosphere, newbie > > These are not exhaustively documented; they probably should be. > Some endpoints have docs: e.g., {{/reserve}} and {{/unreserve}} are described > in the reservation doc page. But it would be good to have a single page that > lists all the endpoints and their semantics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4038) SlaveRecoveryTests fail on CentOS 6.6
[ https://issues.apache.org/jira/browse/MESOS-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-4038: - Description: All {{SlaveRecoveryTest.*}} tests fail on CentOS 6.6 with {{TypeParam = mesos::internal::slave::MesosContainerizer}}. They all fail with the same error: {code} [--] 1 test from SlaveRecoveryTest/0, where TypeParam = mesos::internal::slave::MesosContainerizer [ RUN ] SlaveRecoveryTest/0.ReconnectExecutor ../../src/tests/mesos.cpp:722: Failure cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in the file system - We cannot run any cgroups tests that require a hierarchy with subsystem 'perf_event' because we failed to find an existing hierarchy or create a new one (tried '/cgroup/perf_event'). You can either remove all existing hierarchies, or disable this test case (i.e., --gtest_filter=-SlaveRecoveryTest/0.*). - ../../src/tests/mesos.cpp:776: Failure cgroups: '/cgroup/perf_event' is not a valid hierarchy [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer (8 ms) [--] 1 test from SlaveRecoveryTest/0 (9 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (15 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer {code} was: All {{SlaveRecoveryTest.*}} tests fail on CentOS 6.6 with {{TypeParam = mesos::internal::slave::MesosContainerizer}}: {code} [--] 1 test from SlaveRecoveryTest/0, where TypeParam = mesos::internal::slave::MesosContainerizer [ RUN ] SlaveRecoveryTest/0.ReconnectExecutor ../../src/tests/mesos.cpp:722: Failure cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in the file system - We cannot run any cgroups tests that require a hierarchy with subsystem 'perf_event' because we failed to find an existing hierarchy or create a new one (tried '/cgroup/perf_event'). You can either remove all existing hierarchies, or disable this test case (i.e., --gtest_filter=-SlaveRecoveryTest/0.*). - ../../src/tests/mesos.cpp:776: Failure cgroups: '/cgroup/perf_event' is not a valid hierarchy [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer (8 ms) [--] 1 test from SlaveRecoveryTest/0 (9 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (15 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer {code} > SlaveRecoveryTests fail on CentOS 6.6 > - > > Key: MESOS-4038 > URL: https://issues.apache.org/jira/browse/MESOS-4038 > Project: Mesos > Issue Type: Bug > Environment: CentOS 6.6 >Reporter: Greg Mann > Labels: mesosphere, test-failure > > All {{SlaveRecoveryTest.*}} tests fail on CentOS 6.6 with {{TypeParam = > mesos::internal::slave::MesosContainerizer}}. They all fail with the same > error: > {code} > [--] 1 test from SlaveRecoveryTest/0, where TypeParam = > mesos::internal::slave::MesosContainerizer > [ RUN ] SlaveRecoveryTest/0.ReconnectExecutor > ../../src/tests/mesos.cpp:722: Failure > cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in > the file system > - > We cannot run any cgroups tests that require > a hierarchy with subsystem 'perf_event' > because we failed to find an existing hierarchy > or create a new one (tried '/cgroup/perf_event'). > You can either remove all existing > hierarchies, or disable this test case > (i.e., --gtest_filter=-SlaveRecoveryTest/0.*). > - > ../../src/tests/mesos.cpp:776: Failure > cgroups: '/cgroup/perf_event' is not a valid hierarchy > [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = > mesos::internal::slave::MesosContainerizer (8 ms) > [--] 1 test from SlaveRecoveryTest/0 (9 ms total) > [--] Global test environment tear-down > [==] 1 test from 1 test case ran. (15 ms total) > [ PASSED ] 0 tests. > [ FAILED ] 1 test, listed below: > [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = > mesos::internal::slave::MesosContainerizer > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4038) SlaveRecoveryTests fail on CentOS 6.6
Greg Mann created MESOS-4038: Summary: SlaveRecoveryTests fail on CentOS 6.6 Key: MESOS-4038 URL: https://issues.apache.org/jira/browse/MESOS-4038 Project: Mesos Issue Type: Bug Environment: CentOS 6.6 Reporter: Greg Mann All {{SlaveRecoveryTest.*}} tests fail on CentOS 6.6 with {{TypeParam = mesos::internal::slave::MesosContainerizer}}: {code} [--] 1 test from SlaveRecoveryTest/0, where TypeParam = mesos::internal::slave::MesosContainerizer [ RUN ] SlaveRecoveryTest/0.ReconnectExecutor ../../src/tests/mesos.cpp:722: Failure cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in the file system - We cannot run any cgroups tests that require a hierarchy with subsystem 'perf_event' because we failed to find an existing hierarchy or create a new one (tried '/cgroup/perf_event'). You can either remove all existing hierarchies, or disable this test case (i.e., --gtest_filter=-SlaveRecoveryTest/0.*). - ../../src/tests/mesos.cpp:776: Failure cgroups: '/cgroup/perf_event' is not a valid hierarchy [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer (8 ms) [--] 1 test from SlaveRecoveryTest/0 (9 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (15 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4037) Images are broken at least on http://mesos.apache.org/documentation/latest/architecture/
Kirill Zaborsky created MESOS-4037: -- Summary: Images are broken at least on http://mesos.apache.org/documentation/latest/architecture/ Key: MESOS-4037 URL: https://issues.apache.org/jira/browse/MESOS-4037 Project: Mesos Issue Type: Documentation Components: documentation Reporter: Kirill Zaborsky Priority: Minor http://mesos.apache.org/documentation/latest/architecture/ does not show pictures correctly, e.g. http://mesos.apache.org/documentation/latest/architecture/images/architecture3.jpg returns error 404. While e.g. https://github.com/apache/mesos/blob/master/docs/architecture.md works OK -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4036) perf will not run on CentOS 6.6
[ https://issues.apache.org/jira/browse/MESOS-4036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-4036: - Environment: CentOS 6.6 Description: After using the current installation instructions in the getting started documentation, {{perf}} will not run on CentOS 6.6 because the version of elfutils included in devtoolset-2 is not compatible with the version of {{perf}} installed by {{yum}}. Installing and using devtoolset-3, however (http://linux.web.cern.ch/linux/scientific6/docs/softwarecollections.shtml) fixes this issue. This could be resolved by updating the getting started documentation to recommend installing devtoolset-3. (was: After using the current installation instructions in the getting started documentation, {{perf}} will not run because the version of elfutils included in devtoolset-2 is not compatible with the version of {{perf}} installed by {{yum}}. Installing and using devtoolset-3, however (http://linux.web.cern.ch/linux/scientific6/docs/softwarecollections.shtml) fixes this issue. This could be resolved by updating the getting started documentation to recommend installing devtoolset-3.) > perf will not run on CentOS 6.6 > --- > > Key: MESOS-4036 > URL: https://issues.apache.org/jira/browse/MESOS-4036 > Project: Mesos > Issue Type: Bug > Environment: CentOS 6.6 >Reporter: Greg Mann > Labels: mesosphere > > After using the current installation instructions in the getting started > documentation, {{perf}} will not run on CentOS 6.6 because the version of > elfutils included in devtoolset-2 is not compatible with the version of > {{perf}} installed by {{yum}}. Installing and using devtoolset-3, however > (http://linux.web.cern.ch/linux/scientific6/docs/softwarecollections.shtml) > fixes this issue. This could be resolved by updating the getting started > documentation to recommend installing devtoolset-3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4036) perf will not run on CentOS 6.6
Greg Mann created MESOS-4036: Summary: perf will not run on CentOS 6.6 Key: MESOS-4036 URL: https://issues.apache.org/jira/browse/MESOS-4036 Project: Mesos Issue Type: Bug Reporter: Greg Mann After using the current installation instructions in the getting started documentation, {{perf}} will not run because the version of elfutils included in devtoolset-2 is not compatible with the version of {{perf}} installed by {{yum}}. Installing and using devtoolset-3, however (http://linux.web.cern.ch/linux/scientific6/docs/softwarecollections.shtml) fixes this issue. This could be resolved by updating the getting started documentation to recommend installing devtoolset-3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4034) URLs with doubled slashes return 404
[ https://issues.apache.org/jira/browse/MESOS-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035106#comment-15035106 ] Klaus Ma commented on MESOS-4034: - Yes, you're right, please ignore my append :). > URLs with doubled slashes return 404 > > > Key: MESOS-4034 > URL: https://issues.apache.org/jira/browse/MESOS-4034 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.25.0 >Reporter: James Peach >Priority: Minor > > The Mesos 0.25 HTTP request router no longer coalesces doubled slashes in the > URL path. Previous versions did so this; we noticed when we upgraded a > cluster and our metrics poller started getting 404s. > {code} > $ curl -v http://localhost:5050//metrics/snapshot > * About to connect() to localhost port 5050 (#0) > * Trying 17.138.64.22... connected > * Connected to localhost (127.0.0.1) port 5050 (#0) > > GET //metrics/snapshot HTTP/1.1 > > User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 > > NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2 > > Host: localhost:5050 > > Accept: */* > > > < HTTP/1.1 404 Not Found > < Date: Wed, 02 Dec 2015 00:50:57 GMT > < Content-Length: 0 > < > * Connection #0 to host localhost left intact > * Closing connection #0 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4035) UserCgroupIsolatorTest.ROOT_CGROUPS_UserCgroup fails on CentOS 6.6
Gilbert Song created MESOS-4035: --- Summary: UserCgroupIsolatorTest.ROOT_CGROUPS_UserCgroup fails on CentOS 6.6 Key: MESOS-4035 URL: https://issues.apache.org/jira/browse/MESOS-4035 Project: Mesos Issue Type: Bug Environment: CentOS6.6 Reporter: Gilbert Song `ROOT_CGROUPS_UserCgroup` on CentOS6.6 with 0.26rc3. The environment setup on CentOS6.6 is based on latest update of /docs/getting-started.md. Either using devtoolset-2 or devtoolset-3 returns the same failure. If running `sudo ./bin/mesos-tests.sh --gtest_filter="*ROOT_CGROUPS_UserCgroup*"`, it would return failures as following log: {noformat} [==] Running 3 tests from 3 test cases. [--] Global test environment set-up. [--] 1 test from UserCgroupIsolatorTest/0, where TypeParam = mesos::internal::slave::CgroupsMemIsolatorProcess userdel: user 'mesos.test.unprivileged.user' does not exist [ RUN ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup ../../src/tests/mesos.cpp:722: Failure cgroups::mount(hierarchy, subsystem): '/tmp/mesos_test_cgroup/perf_event' already exists in the file system - We cannot run any cgroups tests that require a hierarchy with subsystem 'perf_event' because we failed to find an existing hierarchy or create a new one (tried '/tmp/mesos_test_cgroup/perf_event'). You can either remove all existing hierarchies, or disable this test case (i.e., --gtest_filter=-UserCgroupIsolatorTest/0.*). - ../../src/tests/mesos.cpp:776: Failure cgroups: '/tmp/mesos_test_cgroup/perf_event' is not a valid hierarchy [ FAILED ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where TypeParam = mesos::internal::slave::CgroupsMemIsolatorProcess (1 ms) [--] 1 test from UserCgroupIsolatorTest/0 (1 ms total) [--] 1 test from UserCgroupIsolatorTest/1, where TypeParam = mesos::internal::slave::CgroupsCpushareIsolatorProcess userdel: user 'mesos.test.unprivileged.user' does not exist [ RUN ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup ../../src/tests/mesos.cpp:722: Failure cgroups::mount(hierarchy, subsystem): '/tmp/mesos_test_cgroup/perf_event' already exists in the file system - We cannot run any cgroups tests that require a hierarchy with subsystem 'perf_event' because we failed to find an existing hierarchy or create a new one (tried '/tmp/mesos_test_cgroup/perf_event'). You can either remove all existing hierarchies, or disable this test case (i.e., --gtest_filter=-UserCgroupIsolatorTest/1.*). - ../../src/tests/mesos.cpp:776: Failure cgroups: '/tmp/mesos_test_cgroup/perf_event' is not a valid hierarchy [ FAILED ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup, where TypeParam = mesos::internal::slave::CgroupsCpushareIsolatorProcess (4 ms) [--] 1 test from UserCgroupIsolatorTest/1 (5 ms total) [--] 1 test from UserCgroupIsolatorTest/2, where TypeParam = mesos::internal::slave::CgroupsPerfEventIsolatorProcess userdel: user 'mesos.test.unprivileged.user' does not exist [ RUN ] UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup ../../src/tests/mesos.cpp:722: Failure cgroups::mount(hierarchy, subsystem): '/tmp/mesos_test_cgroup/perf_event' already exists in the file system - We cannot run any cgroups tests that require a hierarchy with subsystem 'perf_event' because we failed to find an existing hierarchy or create a new one (tried '/tmp/mesos_test_cgroup/perf_event'). You can either remove all existing hierarchies, or disable this test case (i.e., --gtest_filter=-UserCgroupIsolatorTest/2.*). - ../../src/tests/mesos.cpp:776: Failure cgroups: '/tmp/mesos_test_cgroup/perf_event' is not a valid hierarchy [ FAILED ] UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup, where TypeParam = mesos::internal::slave::CgroupsPerfEventIsolatorProcess (2 ms) [--] 1 test from UserCgroupIsolatorTest/2 (2 ms total) [--] Global test environment tear-down [==] 3 tests from 3 test cases ran. (349 ms total) [ PASSED ] 0 tests. [ FAILED ] 3 tests, listed below: [ FAILED ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where TypeParam = mesos::internal::slave::CgroupsMemIsolatorProcess [ FAILED ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup, where TypeParam = mesos::internal::slave::CgroupsCpushareIsolatorProcess [ FAILED ] UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup, where TypeParam = mesos::internal::slave::CgroupsPerfEventIsolatorProcess 3 FAILED TESTS {noformat} If running it with `sudo ./bin/mesos-tests.sh --gtest_filter="*ROOT_CGROUPS_UserCgroup*" --g
[jira] [Commented] (MESOS-4034) URLs with doubled slashes return 404
[ https://issues.apache.org/jira/browse/MESOS-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035059#comment-15035059 ] James Peach commented on MESOS-4034: Do you mean {{strings::tokenize}}? That's supposed to ignore empty tokens, so it should consider {{//foo/bar}} to be equivalent to {{/foo/bar}}. > URLs with doubled slashes return 404 > > > Key: MESOS-4034 > URL: https://issues.apache.org/jira/browse/MESOS-4034 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.25.0 >Reporter: James Peach >Priority: Minor > > The Mesos 0.25 HTTP request router no longer coalesces doubled slashes in the > URL path. Previous versions did so this; we noticed when we upgraded a > cluster and our metrics poller started getting 404s. > {code} > $ curl -v http://localhost:5050//metrics/snapshot > * About to connect() to localhost port 5050 (#0) > * Trying 17.138.64.22... connected > * Connected to localhost (127.0.0.1) port 5050 (#0) > > GET //metrics/snapshot HTTP/1.1 > > User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 > > NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2 > > Host: localhost:5050 > > Accept: */* > > > < HTTP/1.1 404 Not Found > < Date: Wed, 02 Dec 2015 00:50:57 GMT > < Content-Length: 0 > < > * Connection #0 to host localhost left intact > * Closing connection #0 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4034) URLs with doubled slashes return 404
[ https://issues.apache.org/jira/browse/MESOS-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035026#comment-15035026 ] Klaus Ma commented on MESOS-4034: - That's because we use {{tokenzie("/")}} to get the path info; so if there're two {{/}}, the index is wrong :). Should we build related function for URL? e.g. URL sub-path parsing. > URLs with doubled slashes return 404 > > > Key: MESOS-4034 > URL: https://issues.apache.org/jira/browse/MESOS-4034 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.25.0 >Reporter: James Peach >Priority: Minor > > The Mesos 0.25 HTTP request router no longer coalesces doubled slashes in the > URL path. Previous versions did so this; we noticed when we upgraded a > cluster and our metrics poller started getting 404s. > {code} > $ curl -v http://localhost:5050//metrics/snapshot > * About to connect() to localhost port 5050 (#0) > * Trying 17.138.64.22... connected > * Connected to localhost (127.0.0.1) port 5050 (#0) > > GET //metrics/snapshot HTTP/1.1 > > User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 > > NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2 > > Host: localhost:5050 > > Accept: */* > > > < HTTP/1.1 404 Not Found > < Date: Wed, 02 Dec 2015 00:50:57 GMT > < Content-Length: 0 > < > * Connection #0 to host localhost left intact > * Closing connection #0 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3586) MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and CGROUPS_ROOT_SlaveRecovery are flaky
[ https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035017#comment-15035017 ] Joseph Wu commented on MESOS-3586: -- Review: https://reviews.apache.org/r/40849/ > MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and > CGROUPS_ROOT_SlaveRecovery are flaky > > > Key: MESOS-3586 > URL: https://issues.apache.org/jira/browse/MESOS-3586 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 0.24.0, 0.26.0 > Environment: Ubuntu 14.04, 3.13.0-32 generic > Debian 8, gcc 4.9.2 >Reporter: Miguel Bernadin >Assignee: Joseph Wu > Labels: flaky, flaky-test > > I am install Mesos 0.24.0 on 4 servers which have very similar hardware and > software configurations. > After performing ../configure, make, and make check some servers have > completed successfully and other failed on test [ RUN ] > MemoryPressureMesosTest.CGROUPS_ROOT_Statistics. > Is there something I should check in this test? > PERFORMED MAKE CHECK NODE-001 > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0 > I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave > 20151005-143735-2393768202-35106-27900-S0 > Registered executor on svdidac038.techlabs.accenture.com > Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0 > Forked command at 38510 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > PERFORMED MAKE CHECK NODE-002 > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0 > I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave > 20151005-143857-2360213770-50427-26325-S0 > Registered executor on svdidac039.techlabs.accenture.com > Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > Forked command at 37028 > ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure > Expected: (usage.get().mem_medium_pressure_counter()) >= > (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6 > 2015-10-05 > 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4034) URLs with doubled slashes return 404
James Peach created MESOS-4034: -- Summary: URLs with doubled slashes return 404 Key: MESOS-4034 URL: https://issues.apache.org/jira/browse/MESOS-4034 Project: Mesos Issue Type: Bug Affects Versions: 0.25.0 Reporter: James Peach Priority: Minor The Mesos 0.25 HTTP request router no longer coalesces doubled slashes in the URL path. Previous versions did so this; we noticed when we upgraded a cluster and our metrics poller started getting 404s. {code} $ curl -v http://localhost:5050//metrics/snapshot * About to connect() to localhost port 5050 (#0) * Trying 17.138.64.22... connected * Connected to localhost (127.0.0.1) port 5050 (#0) > GET //metrics/snapshot HTTP/1.1 > User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 > zlib/1.2.3 libidn/1.18 libssh2/1.4.2 > Host: localhost:5050 > Accept: */* > < HTTP/1.1 404 Not Found < Date: Wed, 02 Dec 2015 00:50:57 GMT < Content-Length: 0 < * Connection #0 to host localhost left intact * Closing connection #0 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-3586) MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and CGROUPS_ROOT_SlaveRecovery are flaky
[ https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu reassigned MESOS-3586: Assignee: Joseph Wu > MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and > CGROUPS_ROOT_SlaveRecovery are flaky > > > Key: MESOS-3586 > URL: https://issues.apache.org/jira/browse/MESOS-3586 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 0.24.0, 0.26.0 > Environment: Ubuntu 14.04, 3.13.0-32 generic > Debian 8, gcc 4.9.2 >Reporter: Miguel Bernadin >Assignee: Joseph Wu > Labels: flaky, flaky-test > > I am install Mesos 0.24.0 on 4 servers which have very similar hardware and > software configurations. > After performing ../configure, make, and make check some servers have > completed successfully and other failed on test [ RUN ] > MemoryPressureMesosTest.CGROUPS_ROOT_Statistics. > Is there something I should check in this test? > PERFORMED MAKE CHECK NODE-001 > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0 > I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave > 20151005-143735-2393768202-35106-27900-S0 > Registered executor on svdidac038.techlabs.accenture.com > Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0 > Forked command at 38510 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > PERFORMED MAKE CHECK NODE-002 > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0 > I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave > 20151005-143857-2360213770-50427-26325-S0 > Registered executor on svdidac039.techlabs.accenture.com > Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > Forked command at 37028 > ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure > Expected: (usage.get().mem_medium_pressure_counter()) >= > (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6 > 2015-10-05 > 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4029) ContentType/SchedulerTest is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034974#comment-15034974 ] Anand Mazumdar commented on MESOS-4029: --- The culprit is this: https://github.com/apache/mesos/blob/master/src/scheduler/scheduler.cpp#L260 We pass the {{Callbacks}} mock object by reference and not by value. Since we do an {{async}} , the call is queued on another thread but it does not ensure that it is invoked before the object is destroyed. Hence, we might invoke the {{received}} callback even after the original {{Callbacks}} object is destroyed. > ContentType/SchedulerTest is flaky. > --- > > Key: MESOS-4029 > URL: https://issues.apache.org/jira/browse/MESOS-4029 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.26.0 >Reporter: Till Toenshoff >Assignee: Anand Mazumdar > Labels: flaky, flaky-test, mesosphere > > SSL build, [Ubuntu > 14.04|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh], > non-root test run. > {noformat} > [--] 22 tests from ContentType/SchedulerTest > [ RUN ] ContentType/SchedulerTest.Subscribe/0 > [ OK ] ContentType/SchedulerTest.Subscribe/0 (48 ms) > *** Aborted at 1448928007 (unix time) try "date -d @1448928007" if you are > using GNU date *** > [ RUN ] ContentType/SchedulerTest.Subscribe/1 > PC: @ 0x1451b8e > testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith() > *** SIGSEGV (@0x10030) received by PID 21320 (TID 0x2b549e5d4700) from > PID 48; stack trace: *** > @ 0x2b54c95940b7 os::Linux::chained_handler() > @ 0x2b54c9598219 JVM_handle_linux_signal > @ 0x2b5496300340 (unknown) > @ 0x1451b8e > testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith() > @ 0xe2ea6d > _ZN7testing8internal18FunctionMockerBaseIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS6_SaIS6_E10InvokeWithERKSt5tupleIJSC_EE > @ 0xe2b1bc testing::internal::FunctionMocker<>::Invoke() > @ 0x1118aed > mesos::internal::tests::SchedulerTest::Callbacks::received() > @ 0x111c453 > _ZNKSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS0_2v19scheduler5EventESt5dequeIS8_SaIS8_EclIJSE_EvEEvRS4_DpOT_ > @ 0x111c001 > _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi16__callIvJSF_EJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE > @ 0x111b90d > _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi1clIJSF_EvEET0_DpOT_ > @ 0x111ae09 std::_Function_handler<>::_M_invoke() > @ 0x2b5493c6da09 std::function<>::operator()() > @ 0x2b5493c688ee process::AsyncExecutorProcess::execute<>() > @ 0x2b5493c6db2a > _ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS8_SaIS8_ESC_PvSG_SC_SJ_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSO_FSL_T1_T2_T3_ET4_T5_T6_ENKUlPNS_11ProcessBaseEE_clES11_ > @ 0x2b5493c765a4 > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingNS0_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISC_SaISC_ESG_PvSK_SG_SN_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSS_FSP_T1_T2_T3_ET4_T5_T6_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ > @ 0x2b54946b1201 std::function<>::operator()() > @ 0x2b549469960f process::ProcessBase::visit() > @ 0x2b549469d480 process::DispatchEvent::visit() > @ 0x9dc0ba process::ProcessBase::serve() > @ 0x2b54946958cc process::ProcessManager::resume() > @ 0x2b5494692a9c > _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_ > @ 0x2b549469ccac > _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE > @ 0x2b549469cc5c > _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_ > @ 0x2b549469cbee > _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE > @ 0x2b549469cb45 > _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv > @ 0x2b549469cade > _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager1
[jira] [Commented] (MESOS-3586) MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and CGROUPS_ROOT_SlaveRecovery are flaky
[ https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034948#comment-15034948 ] Joseph Wu commented on MESOS-3586: -- This race _almost_ seems unavoidable (at least, given the test currently), and I don't think the sleep duration is really a problem. *Background* Both tests are essentially hammering away at memory, resulting in "memory pressure". Depending on the load (low, medium, critical), this triggers some cgroup status events. By definition, the "low" pressure event is always triggered whenever there is any pressure at all: {quote} Application will be notified through eventfd when memory pressure is at the specific level (or higher). {quote} [Reference section "11. Memory Pressure"|https://www.kernel.org/doc/Documentation/cgroups/memory.txt] In the tests, we check this by expecting "number of low pressure events" >= "number of medium pressure events" >= "number of critical pressure events". *Problem* There's no guarantee of the order of notification. When we read from our memory pressure counters, there might be some events in-flight that haven't been processed yet. Therefore, we occasionally see our expectations betrayed. *???* The memory pressure event counts should be eventually consistent with our expectations. So the test should probably: * Stop the memory-hammering task at some point. * Wait for all pressure events to be processed. * Then check the counters. > MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and > CGROUPS_ROOT_SlaveRecovery are flaky > > > Key: MESOS-3586 > URL: https://issues.apache.org/jira/browse/MESOS-3586 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 0.24.0, 0.26.0 > Environment: Ubuntu 14.04, 3.13.0-32 generic > Debian 8, gcc 4.9.2 >Reporter: Miguel Bernadin > Labels: flaky, flaky-test > > I am install Mesos 0.24.0 on 4 servers which have very similar hardware and > software configurations. > After performing ../configure, make, and make check some servers have > completed successfully and other failed on test [ RUN ] > MemoryPressureMesosTest.CGROUPS_ROOT_Statistics. > Is there something I should check in this test? > PERFORMED MAKE CHECK NODE-001 > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0 > I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave > 20151005-143735-2393768202-35106-27900-S0 > Registered executor on svdidac038.techlabs.accenture.com > Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0 > Forked command at 38510 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > PERFORMED MAKE CHECK NODE-002 > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0 > I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave > 20151005-143857-2360213770-50427-26325-S0 > Registered executor on svdidac039.techlabs.accenture.com > Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > Forked command at 37028 > ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure > Expected: (usage.get().mem_medium_pressure_counter()) >= > (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6 > 2015-10-05 > 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3586) MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and CGROUPS_ROOT_SlaveRecovery are flaky
[ https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3586: -- Labels: flaky flaky-test (was: ) > MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and > CGROUPS_ROOT_SlaveRecovery are flaky > > > Key: MESOS-3586 > URL: https://issues.apache.org/jira/browse/MESOS-3586 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 0.24.0, 0.26.0 > Environment: Ubuntu 14.04, 3.13.0-32 generic > Debian 8, gcc 4.9.2 >Reporter: Miguel Bernadin > Labels: flaky, flaky-test > > I am install Mesos 0.24.0 on 4 servers which have very similar hardware and > software configurations. > After performing ../configure, make, and make check some servers have > completed successfully and other failed on test [ RUN ] > MemoryPressureMesosTest.CGROUPS_ROOT_Statistics. > Is there something I should check in this test? > PERFORMED MAKE CHECK NODE-001 > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0 > I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave > 20151005-143735-2393768202-35106-27900-S0 > Registered executor on svdidac038.techlabs.accenture.com > Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0 > Forked command at 38510 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > PERFORMED MAKE CHECK NODE-002 > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0 > I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave > 20151005-143857-2360213770-50427-26325-S0 > Registered executor on svdidac039.techlabs.accenture.com > Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > Forked command at 37028 > ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure > Expected: (usage.get().mem_medium_pressure_counter()) >= > (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6 > 2015-10-05 > 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3586) MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and CGROUPS_ROOT_SlaveRecovery are flaky
[ https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3586: -- Component/s: test > MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and > CGROUPS_ROOT_SlaveRecovery are flaky > > > Key: MESOS-3586 > URL: https://issues.apache.org/jira/browse/MESOS-3586 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 0.24.0, 0.26.0 > Environment: Ubuntu 14.04, 3.13.0-32 generic > Debian 8, gcc 4.9.2 >Reporter: Miguel Bernadin > Labels: flaky, flaky-test > > I am install Mesos 0.24.0 on 4 servers which have very similar hardware and > software configurations. > After performing ../configure, make, and make check some servers have > completed successfully and other failed on test [ RUN ] > MemoryPressureMesosTest.CGROUPS_ROOT_Statistics. > Is there something I should check in this test? > PERFORMED MAKE CHECK NODE-001 > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0 > I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave > 20151005-143735-2393768202-35106-27900-S0 > Registered executor on svdidac038.techlabs.accenture.com > Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0 > Forked command at 38510 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > PERFORMED MAKE CHECK NODE-002 > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0 > I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave > 20151005-143857-2360213770-50427-26325-S0 > Registered executor on svdidac039.techlabs.accenture.com > Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > Forked command at 37028 > ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure > Expected: (usage.get().mem_medium_pressure_counter()) >= > (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6 > 2015-10-05 > 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3586) MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and CGROUPS_ROOT_SlaveRecovery are flaky
[ https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu updated MESOS-3586: - Affects Version/s: 0.26.0 Environment: Ubuntu 14.04, 3.13.0-32 generic Debian 8, gcc 4.9.2 was:Ubuntu 14.04, 3.13.0-32 generic Summary: MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and CGROUPS_ROOT_SlaveRecovery are flaky (was: Installing Mesos 0.24.0 on multiple systems. Failed test on MemoryPressureMesosTest.CGROUPS_ROOT_Statistics) The {{CGROUPS_ROOT_Statistics}} and {{CGROUPS_ROOT_SlaveRecovery}} are both similarly flaky. The tests also fail on Debian 8 with the same error. > MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and > CGROUPS_ROOT_SlaveRecovery are flaky > > > Key: MESOS-3586 > URL: https://issues.apache.org/jira/browse/MESOS-3586 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.24.0, 0.26.0 > Environment: Ubuntu 14.04, 3.13.0-32 generic > Debian 8, gcc 4.9.2 >Reporter: Miguel Bernadin > > I am install Mesos 0.24.0 on 4 servers which have very similar hardware and > software configurations. > After performing ../configure, make, and make check some servers have > completed successfully and other failed on test [ RUN ] > MemoryPressureMesosTest.CGROUPS_ROOT_Statistics. > Is there something I should check in this test? > PERFORMED MAKE CHECK NODE-001 > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0 > I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave > 20151005-143735-2393768202-35106-27900-S0 > Registered executor on svdidac038.techlabs.accenture.com > Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0 > Forked command at 38510 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > PERFORMED MAKE CHECK NODE-002 > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0 > I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave > 20151005-143857-2360213770-50427-26325-S0 > Registered executor on svdidac039.techlabs.accenture.com > Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > Forked command at 37028 > ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure > Expected: (usage.get().mem_medium_pressure_counter()) >= > (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6 > 2015-10-05 > 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-4025: -- Labels: flaky flaky-test test (was: test) > SlaveRecoveryTest/0.GCExecutor is flaky. > > > Key: MESOS-4025 > URL: https://issues.apache.org/jira/browse/MESOS-4025 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 0.26.0 >Reporter: Till Toenshoff > Labels: flaky, flaky-test, test > > Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based > on 0.26.0-rc1. > Testsuite was run as root. > {noformat} > sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1 > {noformat} > {noformat} > [ RUN ] SlaveRecoveryTest/0.GCExecutor > I1130 16:49:16.336833 1032 exec.cpp:136] Version: 0.26.0 > I1130 16:49:16.345212 1049 exec.cpp:210] Executor registered on slave > dde9fd4e-b016-4a99-9081-b047e9df9afa-S0 > Registered executor on ubuntu14 > Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114 > sh -c 'sleep 1000' > Forked command at 1057 > ../../src/tests/mesos.cpp:779: Failure > (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup > '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave': > Device or resource busy > *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are > using GNU date *** > PC: @ 0x1443e9a testing::UnitTest::AddTestPartResult() > *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; > stack trace: *** > @ 0x7f1be92b80b7 os::Linux::chained_handler() > @ 0x7f1be92bc219 JVM_handle_linux_signal > @ 0x7f1bf7bbc340 (unknown) > @ 0x1443e9a testing::UnitTest::AddTestPartResult() > @ 0x1438b99 testing::internal::AssertHelper::operator=() > @ 0xf0b3bb > mesos::internal::tests::ContainerizerTest<>::TearDown() > @ 0x1461882 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x145c6f8 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x143de4a testing::Test::Run() > @ 0x143e584 testing::TestInfo::Run() > @ 0x143ebca testing::TestCase::Run() > @ 0x1445312 testing::internal::UnitTestImpl::RunAllTests() > @ 0x14624a7 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x145d26e > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x14440ae testing::UnitTest::Run() > @ 0xd15cd4 RUN_ALL_TESTS() > @ 0xd158c1 main > @ 0x7f1bf7808ec5 (unknown) > @ 0x913009 (unknown) > {noformat} > My Vagrantfile generator; > {noformat} > #!/usr/bin/env bash > cat << EOF > Vagrantfile > # -*- mode: ruby -*-" > > # vi: set ft=ruby : > Vagrant.configure(2) do |config| > # Disable shared folder to prevent certain kernel module dependencies. > config.vm.synced_folder ".", "/vagrant", disabled: true > config.vm.box = "bento/ubuntu-14.04" > config.vm.hostname = "${PLATFORM_NAME}" > config.vm.provider "virtualbox" do |vb| > vb.memory = ${VAGRANT_MEM} > vb.cpus = ${VAGRANT_CPUS} > vb.customize ["modifyvm", :id, "--nictype1", "virtio"] > vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"] > vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"] > end > config.vm.provider "vmware_fusion" do |vb| > vb.memory = ${VAGRANT_MEM} > vb.cpus = ${VAGRANT_CPUS} > end > config.vm.provision "file", source: "../test.sh", destination: "~/test.sh" > config.vm.provision "shell", inline: <<-SHELL > sudo apt-get update > sudo apt-get -y install openjdk-7-jdk autoconf libtool > sudo apt-get -y install build-essential python-dev python-boto \ > libcurl4-nss-dev libsasl2-dev maven \ > libapr1-dev libsvn-dev libssl-dev libevent-dev > sudo apt-get -y install git > sudo wget -qO- https://get.docker.com/ | sh > SHELL > end > EOF > {noformat} > The problem is kicking in frequently in my tests - I'ld say > 10% but less > than 50%. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-4025: -- Component/s: test > SlaveRecoveryTest/0.GCExecutor is flaky. > > > Key: MESOS-4025 > URL: https://issues.apache.org/jira/browse/MESOS-4025 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 0.26.0 >Reporter: Till Toenshoff > Labels: flaky, flaky-test, test > > Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based > on 0.26.0-rc1. > Testsuite was run as root. > {noformat} > sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1 > {noformat} > {noformat} > [ RUN ] SlaveRecoveryTest/0.GCExecutor > I1130 16:49:16.336833 1032 exec.cpp:136] Version: 0.26.0 > I1130 16:49:16.345212 1049 exec.cpp:210] Executor registered on slave > dde9fd4e-b016-4a99-9081-b047e9df9afa-S0 > Registered executor on ubuntu14 > Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114 > sh -c 'sleep 1000' > Forked command at 1057 > ../../src/tests/mesos.cpp:779: Failure > (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup > '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave': > Device or resource busy > *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are > using GNU date *** > PC: @ 0x1443e9a testing::UnitTest::AddTestPartResult() > *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; > stack trace: *** > @ 0x7f1be92b80b7 os::Linux::chained_handler() > @ 0x7f1be92bc219 JVM_handle_linux_signal > @ 0x7f1bf7bbc340 (unknown) > @ 0x1443e9a testing::UnitTest::AddTestPartResult() > @ 0x1438b99 testing::internal::AssertHelper::operator=() > @ 0xf0b3bb > mesos::internal::tests::ContainerizerTest<>::TearDown() > @ 0x1461882 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x145c6f8 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x143de4a testing::Test::Run() > @ 0x143e584 testing::TestInfo::Run() > @ 0x143ebca testing::TestCase::Run() > @ 0x1445312 testing::internal::UnitTestImpl::RunAllTests() > @ 0x14624a7 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x145d26e > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x14440ae testing::UnitTest::Run() > @ 0xd15cd4 RUN_ALL_TESTS() > @ 0xd158c1 main > @ 0x7f1bf7808ec5 (unknown) > @ 0x913009 (unknown) > {noformat} > My Vagrantfile generator; > {noformat} > #!/usr/bin/env bash > cat << EOF > Vagrantfile > # -*- mode: ruby -*-" > > # vi: set ft=ruby : > Vagrant.configure(2) do |config| > # Disable shared folder to prevent certain kernel module dependencies. > config.vm.synced_folder ".", "/vagrant", disabled: true > config.vm.box = "bento/ubuntu-14.04" > config.vm.hostname = "${PLATFORM_NAME}" > config.vm.provider "virtualbox" do |vb| > vb.memory = ${VAGRANT_MEM} > vb.cpus = ${VAGRANT_CPUS} > vb.customize ["modifyvm", :id, "--nictype1", "virtio"] > vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"] > vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"] > end > config.vm.provider "vmware_fusion" do |vb| > vb.memory = ${VAGRANT_MEM} > vb.cpus = ${VAGRANT_CPUS} > end > config.vm.provision "file", source: "../test.sh", destination: "~/test.sh" > config.vm.provision "shell", inline: <<-SHELL > sudo apt-get update > sudo apt-get -y install openjdk-7-jdk autoconf libtool > sudo apt-get -y install build-essential python-dev python-boto \ > libcurl4-nss-dev libsasl2-dev maven \ > libapr1-dev libsvn-dev libssl-dev libevent-dev > sudo apt-get -y install git > sudo wget -qO- https://get.docker.com/ | sh > SHELL > end > EOF > {noformat} > The problem is kicking in frequently in my tests - I'ld say > 10% but less > than 50%. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034701#comment-15034701 ] Jojy Varghese commented on MESOS-4025: -- On debian8: {code} [ RUN ] SlaveRecoveryTest/0.Reboot I1201 21:57:11.562711 7964 exec.cpp:136] Version: 0.26.0 I1201 21:57:11.571506 7978 exec.cpp:210] Executor registered on slave 00a179f0-f087-4054-a0c7-c15281d5e7ff-S0 Registered executor on debian8 Starting task 791255fc-88dd-452e-ba12-6b2dfced99a0 Forked command at 7987 sh -c 'sleep 1000' I1201 21:57:11.640627 7982 exec.cpp:383] Executor asked to shutdown Shutting down Sending SIGTERM to process tree at pid 7987 Killing the following process trees: [ -+- 7987 sh -c sleep 1000 \--- 7988 sleep 1000 ] Command terminated with signal Terminated (pid: 7987) [ OK ] SlaveRecoveryTest/0.Reboot (1730 ms) [ RUN ] SlaveRecoveryTest/0.GCExecutor 2015-12-01 21:57:13,187:1473(0x7f9bf4e36700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:44262] zk retcode=-4, errno=111(Connection refused): server refused to accept the client I1201 21:57:13.296581 8012 exec.cpp:136] Version: 0.26.0 I1201 21:57:13.305498 8028 exec.cpp:210] Executor registered on slave 44a46bd2-d24a-48d6-bd62-492c15845841-S0 Registered executor on debian8 Starting task 8affc624-c95d-43f5-a2b9-967663c3151b sh -c 'sleep 1000' Forked command at 8035 ../../src/tests/mesos.cpp:781: Failure (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup '/sys/fs/cgroup/memory/mesos_test_a894bd47-5e1a-4442-bc6b-303d2aed6945/slave': Device or resource busy *** Aborted at 1449007033 (unix time) try "date -d @1449007033" if you are using GNU date *** PC: @ 0x14b079e testing::UnitTest::AddTestPartResult() *** SIGSEGV (@0x0) received by PID 1473 (TID 0x7f9c3db5d7c0) from PID 0; stack trace: *** @ 0x7f9c28c2166c os::Linux::chained_handler() @ 0x7f9c28c25a0a JVM_handle_linux_signal @ 0x7f9c374728d0 (unknown) @ 0x14b079e testing::UnitTest::AddTestPartResult() @ 0x14a51d7 testing::internal::AssertHelper::operator=() @ 0xf564c1 mesos::internal::tests::ContainerizerTest<>::TearDown() @ 0x14ce2c0 testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x14c9238 testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x14aa5c0 testing::Test::Run() @ 0x14aad05 testing::TestInfo::Run() @ 0x14ab340 testing::TestCase::Run() @ 0x14b1c8f testing::internal::UnitTestImpl::RunAllTests() @ 0x14cef4f testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x14c9d8e testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x14b09bf testing::UnitTest::Run() @ 0xd63df2 RUN_ALL_TESTS() @ 0xd639d0 main @ 0x7f9c370dbb45 (unknown) @ 0x9588e9 (unknown) {code} * The crash was inside *ContainerizerTest::TearDown*. * The assertion *AWAIT_READY(cgroups::destroy(hierarchy, cgroup));* failed. The cgroup in question was */sys/fs/cgroup/memory/mesos_test_a894bd47-5e1a-4442-bc6b-303d2aed6945/slave* as seen from the log above. > SlaveRecoveryTest/0.GCExecutor is flaky. > > > Key: MESOS-4025 > URL: https://issues.apache.org/jira/browse/MESOS-4025 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.26.0 >Reporter: Till Toenshoff > Labels: test > > Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based > on 0.26.0-rc1. > Testsuite was run as root. > {noformat} > sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1 > {noformat} > {noformat} > [ RUN ] SlaveRecoveryTest/0.GCExecutor > I1130 16:49:16.336833 1032 exec.cpp:136] Version: 0.26.0 > I1130 16:49:16.345212 1049 exec.cpp:210] Executor registered on slave > dde9fd4e-b016-4a99-9081-b047e9df9afa-S0 > Registered executor on ubuntu14 > Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114 > sh -c 'sleep 1000' > Forked command at 1057 > ../../src/tests/mesos.cpp:779: Failure > (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup > '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave': > Device or resource busy > *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are > using GNU date *** > PC: @ 0x1443e9a testing::UnitTest::AddTestPartResult() > *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; > stack trace: *** > @ 0x7f1be92b80b7 os::Linux::chained_handler() > @ 0x7f1be92bc219 JVM_handle_linux_signal > @ 0x7f1bf7bbc340 (unknown) > @ 0x1443e9a testing::UnitTest::AddTestPartResult() > @ 0x1438b99 testing::internal::A
[jira] [Commented] (MESOS-2918) CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen Flaky
[ https://issues.apache.org/jira/browse/MESOS-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034669#comment-15034669 ] Vinod Kone commented on MESOS-2918: --- Keeping the ticket open to address the TODO (dynamically disable the test if swap is enabled). commit c3dd3edb6f09de4333645cb87ba25c9d1c8969c3 Author: Chi Zhang Date: Tue Dec 1 13:56:49 2015 -0800 Checked if swap is enabled before running memory pressure related tests. Review: https://reviews.apache.org/r/38234 commit 5a21baa762d726ed22aaa4e14ba4f956d6132a5a Author: Chi Zhang Date: Tue Dec 1 13:56:16 2015 -0800 Added swap information to os::memory(). Review: https://reviews.apache.org/r/38233 > CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen Flaky > -- > > Key: MESOS-2918 > URL: https://issues.apache.org/jira/browse/MESOS-2918 > Project: Mesos > Issue Type: Bug > Components: isolation, test >Affects Versions: 0.23.0 >Reporter: Paul Brett >Assignee: Chi Zhang > Labels: test, twitter > > This test fails when swap is enabled on the platform because it creates a > memory hog with the expectation that the OOM killer will kill the hog but > with swap enabled, the hog is just swapped out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1763) Add support for multiple roles to be specified in FrameworkInfo
[ https://issues.apache.org/jira/browse/MESOS-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-1763: --- Labels: mesosphere roles (was: mesosphere) > Add support for multiple roles to be specified in FrameworkInfo > --- > > Key: MESOS-1763 > URL: https://issues.apache.org/jira/browse/MESOS-1763 > Project: Mesos > Issue Type: Task > Components: master >Reporter: Vinod Kone >Assignee: Timothy Chen > Labels: mesosphere, roles > > Currently frameworks have the ability to set only one (resource) role in > FrameworkInfo. It would be nice to let frameworks specify multiple roles so > that they can do more fine grained resource accounting per role. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1763) Add support for multiple roles to be specified in FrameworkInfo
[ https://issues.apache.org/jira/browse/MESOS-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-1763: --- Component/s: master > Add support for multiple roles to be specified in FrameworkInfo > --- > > Key: MESOS-1763 > URL: https://issues.apache.org/jira/browse/MESOS-1763 > Project: Mesos > Issue Type: Task > Components: master >Reporter: Vinod Kone >Assignee: Timothy Chen > Labels: mesosphere, roles > > Currently frameworks have the ability to set only one (resource) role in > FrameworkInfo. It would be nice to let frameworks specify multiple roles so > that they can do more fine grained resource accounting per role. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3586) Installing Mesos 0.24.0 on multiple systems. Failed test on MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
[ https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034339#comment-15034339 ] Jan Schlicht commented on MESOS-3586: - It seems like a timing problem in the test. It's making the assumption that {{os::sleep}} will sleep for the exact amount that it's provided with. > Installing Mesos 0.24.0 on multiple systems. Failed test on > MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > --- > > Key: MESOS-3586 > URL: https://issues.apache.org/jira/browse/MESOS-3586 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.24.0 > Environment: Ubuntu 14.04, 3.13.0-32 generic >Reporter: Miguel Bernadin > > I am install Mesos 0.24.0 on 4 servers which have very similar hardware and > software configurations. > After performing ../configure, make, and make check some servers have > completed successfully and other failed on test [ RUN ] > MemoryPressureMesosTest.CGROUPS_ROOT_Statistics. > Is there something I should check in this test? > PERFORMED MAKE CHECK NODE-001 > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0 > I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave > 20151005-143735-2393768202-35106-27900-S0 > Registered executor on svdidac038.techlabs.accenture.com > Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0 > Forked command at 38510 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > PERFORMED MAKE CHECK NODE-002 > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0 > I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave > 20151005-143857-2360213770-50427-26325-S0 > Registered executor on svdidac039.techlabs.accenture.com > Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > Forked command at 37028 > ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure > Expected: (usage.get().mem_medium_pressure_counter()) >= > (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6 > 2015-10-05 > 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3586) Installing Mesos 0.24.0 on multiple systems. Failed test on MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
[ https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034269#comment-15034269 ] Jan Schlicht commented on MESOS-3586: - I used the following vagrant generator to setup a CentOS virt env: {noformat} cat << EOF > Vagrantfile # -*- mode: ruby -*-" > # vi: set ft=ruby : Vagrant.configure(2) do |config| # Disable shared folder to prevent certain kernel module dependencies. config.vm.synced_folder ".", "/vagrant", disabled: true config.vm.hostname = "centos71" config.vm.box = "bento/centos-7.1" config.vm.provider "virtualbox" do |vb| vb.memory = 8192 vb.cpus = 8 end config.vm.provision "shell", inline: <<-SHELL yum -y update systemd yum install -y tar wget wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo yum groupinstall -y "Development Tools" yum install -y apache-maven python-devel java-1.7.0-openjdk-devel zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 apr-devel subversion-devel apr-util-devel yum install -y libevent-devel yum install -y perf nmap-ncat yum install -y git yum install -y docker systemctl start docker systemctl enable docker docker info #wget -qO- https://get.docker.com/ | sh SHELL end EOF vagrant up vagrant reload vagrant ssh -c " git clone https://github.com/apache/mesos.git mesos cd mesos git checkout -b 0.26.0-rc2 0.26.0-rc2 ./bootstrap mkdir build cd build ../configure GTEST_FILTER="" make check sudo ./bin/mesos-tests.sh " {noformat} > Installing Mesos 0.24.0 on multiple systems. Failed test on > MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > --- > > Key: MESOS-3586 > URL: https://issues.apache.org/jira/browse/MESOS-3586 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.24.0 > Environment: Ubuntu 14.04, 3.13.0-32 generic >Reporter: Miguel Bernadin > > I am install Mesos 0.24.0 on 4 servers which have very similar hardware and > software configurations. > After performing ../configure, make, and make check some servers have > completed successfully and other failed on test [ RUN ] > MemoryPressureMesosTest.CGROUPS_ROOT_Statistics. > Is there something I should check in this test? > PERFORMED MAKE CHECK NODE-001 > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0 > I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave > 20151005-143735-2393768202-35106-27900-S0 > Registered executor on svdidac038.techlabs.accenture.com > Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0 > Forked command at 38510 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > PERFORMED MAKE CHECK NODE-002 > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0 > I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave > 20151005-143857-2360213770-50427-26325-S0 > Registered executor on svdidac039.techlabs.accenture.com > Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > Forked command at 37028 > ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure > Expected: (usage.get().mem_medium_pressure_counter()) >= > (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6 > 2015-10-05 > 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3586) Installing Mesos 0.24.0 on multiple systems. Failed test on MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
[ https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034263#comment-15034263 ] Jan Schlicht commented on MESOS-3586: - I have to reopen this, as I've found the same behavior using the 0.26-rc2 on CentOS 7.1. Noticed some flakiness while running {{sudo ./bin/mesos-tests.sh}} and could reproduce it by running {{sudo ./bin/mesos-tests.sh - --gtest_filter="MemoryPressureMesosTest.CGROUPS_ROOT_Statistics" --gtest_repeat=-1 --gtest_break_on_failure}} until it breaks. Here's a verbose output of a failing test: {noformat} [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics I1201 18:07:51.136508 18883 cgroups.cpp:2429] Freezing cgroup /sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb/d540e60d-2d62-4a1e-b5ff-482f7b3cc1a5 I1201 18:07:51.144594 18886 cgroups.cpp:1411] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb/d540e60d-2d62-4a1e-b5ff-482f7b3cc1a5 after 7.076864ms I1201 18:07:51.151480 18882 cgroups.cpp:2447] Thawing cgroup /sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb/d540e60d-2d62-4a1e-b5ff-482f7b3cc1a5 I1201 18:07:51.162557 18886 cgroups.cpp:1440] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb/d540e60d-2d62-4a1e-b5ff-482f7b3cc1a5 after 11.026944ms I1201 18:07:51.172379 18887 cgroups.cpp:2429] Freezing cgroup /sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb I1201 18:07:51.183791 18881 cgroups.cpp:1411] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb after 7.8272ms I1201 18:07:51.192354 18887 cgroups.cpp:2447] Thawing cgroup /sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb I1201 18:07:51.199439 18885 cgroups.cpp:1440] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb after 7.028224ms I1201 18:07:51.332849 18866 leveldb.cpp:176] Opened db in 6.74674ms I1201 18:07:51.335450 18866 leveldb.cpp:183] Compacted db in 2.554513ms I1201 18:07:51.335539 18866 leveldb.cpp:198] Created db iterator in 53851ns I1201 18:07:51.335556 18866 leveldb.cpp:204] Seeked to beginning of db in 3455ns I1201 18:07:51.335561 18866 leveldb.cpp:273] Iterated through 0 keys in the db in 107ns I1201 18:07:51.335666 18866 replica.cpp:780] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I1201 18:07:51.337374 18881 recover.cpp:449] Starting replica recovery I1201 18:07:51.338235 18881 recover.cpp:475] Replica is in EMPTY status I1201 18:07:51.340142 18880 replica.cpp:676] Replica in EMPTY status received a broadcasted recover request from (14)@127.0.0.1:57652 I1201 18:07:51.340749 18882 recover.cpp:195] Received a recover response from a replica in EMPTY status I1201 18:07:51.340975 18885 master.cpp:367] Master 2f17d97c-de40-491e-9706-bf83a9ffd08c (centos71) started on 127.0.0.1:57652 I1201 18:07:51.341475 18884 recover.cpp:566] Updating replica status to STARTING I1201 18:07:51.341152 18885 master.cpp:369] Flags at startup: --acls="" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/ap4rPt/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" --quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="25secs" --registry_strict="true" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/ap4rPt/master" --zk_session_timeout="10secs" W1201 18:07:51.341752 18885 master.cpp:372] ** Master bound to loopback interface! Cannot communicate with remote schedulers or slaves. You might want to set '--ip' flag to a routable IP address. ** I1201 18:07:51.341794 18885 master.cpp:414] Master only allowing authenticated frameworks to register I1201 18:07:51.341804 18885 master.cpp:419] Master only allowing authenticated slaves to register I1201 18:07:51.341879 18885 credentials.hpp:37] Loading credentials for authentication from '/tmp/ap4rPt/credentials' I1201 18:07:51.345211 18885 master.cpp:458] Using default 'crammd5' authenticator I1201 18:07:51.345268 18882 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 3.5302ms I1201 18:07:51.345289 18882 replica.cpp:323] Persisted replica status to STARTING I1201 18:07:51.345350 18885 authenticator.cpp:520] Initializing server SAS
[jira] [Commented] (MESOS-4032) SlaveRecoveryTest/0.Reboot fails under CentOS 7.1 with libevent & SSL enabled.
[ https://issues.apache.org/jira/browse/MESOS-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034147#comment-15034147 ] Jan Schlicht commented on MESOS-4032: - Looks like it was caused by some artifacts. After restarting the virtual env, the test is OK. > SlaveRecoveryTest/0.Reboot fails under CentOS 7.1 with libevent & SSL enabled. > -- > > Key: MESOS-4032 > URL: https://issues.apache.org/jira/browse/MESOS-4032 > Project: Mesos > Issue Type: Bug > Environment: CentOS 7.1, {{--enable-libevent --enable-ssl}} >Reporter: Jan Schlicht > > Running {{sudo ./bin/mesos-tests.sh}} has SlaveRecoveryTest/0.Reboot failing. > A virtual env was used to run the tests. > Vagrantfile generator: > {noformat} > cat << EOF > Vagrantfile > # -*- mode: ruby -*-" > > # vi: set ft=ruby : > Vagrant.configure(2) do |config| > # Disable shared folder to prevent certain kernel module dependencies. > config.vm.synced_folder ".", "/vagrant", disabled: true > config.vm.hostname = "centos71" > config.vm.box = "bento/centos-7.1" > config.vm.provider "virtualbox" do |vb| > vb.memory = 8192 > vb.cpus = 8 > end > config.vm.provision "shell", inline: <<-SHELL > yum -y update systemd > yum install -y tar wget > wget > http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo > -O /etc/yum.repos.d/epel-apache-maven.repo > yum groupinstall -y "Development Tools" > yum install -y apache-maven python-devel java-1.7.0-openjdk-devel > zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 > apr-devel subversion-devel apr-util-devel > yum install -y libevent-devel > yum install -y perf nmap-ncat > yum install -y git > yum install -y docker > systemctl start docker > systemctl enable docker > SHELL > end > EOF > vagrant up > vagrant reload > vagrant ssh -c " > git clone https://github.com/apache/mesos.git mesos > cd mesos > git checkout -b 0.26.0-rc2 0.26.0-rc2 > ./bootstrap > mkdir build > cd build > ../configure --enable-libevent --enable-ssl > GTEST_FILTER="" make check > sudo ./bin/mesos-tests.sh > " > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3787) As a developer, I'd like to be able to expand environment variables through the Docker executor.
[ https://issues.apache.org/jira/browse/MESOS-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034131#comment-15034131 ] Adam B commented on MESOS-3787: --- Please allow me to express a potential security concern. I hope that our eventual solution addresses this. If the variable expansion happens as a part of the slave process, run as root, we must ensure that it isn't able to actually execute a command as root or view variable contents that only root should see, since the variable/config is set by the framework, not an admin. Rather, the expansion should happen as the TaskInfo.user/FrameworkInfo.user, so that {code}"containerPath": "/data/${USER}" "hostPath": "${HOME}"{code} should use the task user's name/home, not 'root'. > As a developer, I'd like to be able to expand environment variables through > the Docker executor. > > > Key: MESOS-3787 > URL: https://issues.apache.org/jira/browse/MESOS-3787 > Project: Mesos > Issue Type: Wish >Reporter: John Garcia > Labels: mesosphere > Attachments: mesos.patch, test-example.json > > > We'd like to have expanded variables usable in [the json files used to create > a Marathon app, hence] the Task's CommandInfo, so that the executor is able > to detect the correct values at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4033) Add a commit hook for non-ascii charachters
Alexander Rukletsov created MESOS-4033: -- Summary: Add a commit hook for non-ascii charachters Key: MESOS-4033 URL: https://issues.apache.org/jira/browse/MESOS-4033 Project: Mesos Issue Type: Task Reporter: Alexander Rukletsov Priority: Minor Non-ascii characters invisible in some editors may sneak into the codebase (see e.g. https://reviews.apache.org/r/40799/). To avoid this, a pre-commit hook can be added. Quick searching suggested a simple perl script: https://superuser.com/questions/417305/how-can-i-identify-non-ascii-characters-from-the-shell -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4032) SlaveRecoveryTest/0.Reboot fails under CentOS 7.1 with libevent & SSL enabled.
[ https://issues.apache.org/jira/browse/MESOS-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033936#comment-15033936 ] Jan Schlicht edited comment on MESOS-4032 at 12/1/15 4:14 PM: -- The tests work fine if Mesos is compiled with libev, without SSL. Running the test in isolation also fails. Verbose output: {noformat} [ RUN ] SlaveRecoveryTest/0.Reboot I1201 16:13:43.764530 30105 cgroups.cpp:2429] Freezing cgroup /sys/fs/cgroup/freezer/mesos_test_4ea77e5a-030e-468d-aa54-6cf580143b86 I1201 16:13:43.955772 30100 cgroups.cpp:1411] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos_test_4ea77e5a-030e-468d-aa54-6cf580143b86 after 190.95296ms I1201 16:13:44.151808 30106 cgroups.cpp:2447] Thawing cgroup /sys/fs/cgroup/freezer/mesos_test_4ea77e5a-030e-468d-aa54-6cf580143b86 I1201 16:13:44.338899 30103 cgroups.cpp:1440] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos_test_4ea77e5a-030e-468d-aa54-6cf580143b86 after 186.987008ms I1201 16:13:46.429718 30085 leveldb.cpp:176] Opened db in 6.794189ms I1201 16:13:46.431185 30085 leveldb.cpp:183] Compacted db in 1.403926ms I1201 16:13:46.431273 30085 leveldb.cpp:198] Created db iterator in 55789ns I1201 16:13:46.431289 30085 leveldb.cpp:204] Seeked to beginning of db in 3775ns I1201 16:13:46.431293 30085 leveldb.cpp:273] Iterated through 0 keys in the db in 120ns I1201 16:13:46.431409 30085 replica.cpp:780] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I1201 16:13:46.432781 30104 recover.cpp:449] Starting replica recovery I1201 16:13:46.433365 30104 recover.cpp:475] Replica is in EMPTY status I1201 16:13:46.438645 30104 replica.cpp:676] Replica in EMPTY status received a broadcasted recover request from (9)@127.0.0.1:52014 I1201 16:13:46.439353 30099 master.cpp:367] Master 0c54b5bb-d0f8-4c94-8f2a-c49672419e62 (centos71) started on 127.0.0.1:52014 I1201 16:13:46.439602 30100 recover.cpp:195] Received a recover response from a replica in EMPTY status I1201 16:13:46.439393 30099 master.cpp:369] Flags at startup: --acls="" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/qZBjUp/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" --quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="25secs" --registry_strict="true" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/qZBjUp/master" --zk_session_timeout="10secs" W1201 16:13:46.439997 30099 master.cpp:372] ** Master bound to loopback interface! Cannot communicate with remote schedulers or slaves. You might want to set '--ip' flag to a routable IP address. ** I1201 16:13:46.440037 30099 master.cpp:414] Master only allowing authenticated frameworks to register I1201 16:13:46.440042 30099 master.cpp:419] Master only allowing authenticated slaves to register I1201 16:13:46.440047 30099 credentials.hpp:37] Loading credentials for authentication from '/tmp/qZBjUp/credentials' I1201 16:13:46.440315 30106 recover.cpp:566] Updating replica status to STARTING I1201 16:13:46.440580 30099 master.cpp:458] Using default 'crammd5' authenticator I1201 16:13:46.440743 30099 authenticator.cpp:520] Initializing server SASL I1201 16:13:46.442067 30099 master.cpp:495] Authorization enabled I1201 16:13:46.447201 30099 master.cpp:1606] The newly elected leader is master@127.0.0.1:52014 with id 0c54b5bb-d0f8-4c94-8f2a-c49672419e62 I1201 16:13:46.447230 30099 master.cpp:1619] Elected as the leading master! I1201 16:13:46.447255 30099 master.cpp:1379] Recovering from registrar I1201 16:13:46.447590 30099 registrar.cpp:309] Recovering registrar I1201 16:13:46.451647 30100 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 10.746719ms I1201 16:13:46.451686 30100 replica.cpp:323] Persisted replica status to STARTING I1201 16:13:46.451942 30106 recover.cpp:475] Replica is in STARTING status I1201 16:13:46.452819 30100 replica.cpp:676] Replica in STARTING status received a broadcasted recover request from (10)@127.0.0.1:52014 I1201 16:13:46.453064 30105 recover.cpp:195] Received a recover response from a replica in STARTING status I1201 16:13:46.453727 30104 recover.cpp:566] Updating replica status to VOTING I1201 16:13:46.454529 30105 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 720044ns I1201 16:13:46.454548 30105 replica.cpp:323] Persi
[jira] [Commented] (MESOS-4032) SlaveRecoveryTest/0.Reboot fails under CentOS 7.1 with libevent & SSL enabled.
[ https://issues.apache.org/jira/browse/MESOS-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033936#comment-15033936 ] Jan Schlicht commented on MESOS-4032: - {noformat} [ RUN ] SlaveRecoveryTest/0.Reboot I1201 15:59:03.294540 21012 exec.cpp:136] Version: 0.26.0 I1201 15:59:03.302486 21039 exec.cpp:210] Executor registered on slave b17072f2-ce17-4f80-aa41-2197194f7cd0-S0 Registered executor on centos71 Starting task 6060349a-ab26-45d2-a2fa-96e561f794a8 sh -c 'sleep 1000' Forked command at 21048 I1201 15:59:03.420940 21044 exec.cpp:383] Executor asked to shutdown Shutting down Sending SIGTERM to process tree at pid 21048 Killing the following process trees: [ --- 21048 sleep 1000 ] Command terminated with signal Terminated (pid: 21048) ../../src/tests/mesos.cpp:781: Failure (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to kill tasks in nested cgroups: Collect failed: Invalid freezer cgroup: 'mesos_test_d456f5bc-7718-4850-990e-8961404efd15/8fa1aee7-b393-4a20-85e1-33cd2fca0b10' is not a valid cgroup [ FAILED ] SlaveRecoveryTest/0.Reboot, where TypeParam = mesos::internal::slave::MesosContainerizer (4835 ms) {noformat} > SlaveRecoveryTest/0.Reboot fails under CentOS 7.1 with libevent & SSL enabled. > -- > > Key: MESOS-4032 > URL: https://issues.apache.org/jira/browse/MESOS-4032 > Project: Mesos > Issue Type: Bug > Environment: CentOS 7.1, {{--enable-libevent --enable-ssl}} >Reporter: Jan Schlicht > > Running {{sudo ./bin/mesos-tests.sh}} has SlaveRecoveryTest/0.Reboot failing. > A virtual env was used to run the tests. > Vagrantfile generator: > {noformat} > cat << EOF > Vagrantfile > # -*- mode: ruby -*-" > > # vi: set ft=ruby : > Vagrant.configure(2) do |config| > # Disable shared folder to prevent certain kernel module dependencies. > config.vm.synced_folder ".", "/vagrant", disabled: true > config.vm.hostname = "centos71" > config.vm.box = "bento/centos-7.1" > config.vm.provider "virtualbox" do |vb| > vb.memory = 8192 > vb.cpus = 8 > end > config.vm.provision "shell", inline: <<-SHELL > yum -y update systemd > yum install -y tar wget > wget > http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo > -O /etc/yum.repos.d/epel-apache-maven.repo > yum groupinstall -y "Development Tools" > yum install -y apache-maven python-devel java-1.7.0-openjdk-devel > zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 > apr-devel subversion-devel apr-util-devel > yum install -y libevent-devel > yum install -y perf nmap-ncat > yum install -y git > yum install -y docker > systemctl start docker > systemctl enable docker > SHELL > end > EOF > vagrant up > vagrant reload > vagrant ssh -c " > git clone https://github.com/apache/mesos.git mesos > cd mesos > git checkout -b 0.26.0-rc2 0.26.0-rc2 > ./bootstrap > mkdir build > cd build > ../configure --enable-libevent --enable-ssl > GTEST_FILTER="" make check > sudo ./bin/mesos-tests.sh > " > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4032) SlaveRecoveryTest/0.Reboot fails under CentOS 7.1 with libevent & SSL enabled.
Jan Schlicht created MESOS-4032: --- Summary: SlaveRecoveryTest/0.Reboot fails under CentOS 7.1 with libevent & SSL enabled. Key: MESOS-4032 URL: https://issues.apache.org/jira/browse/MESOS-4032 Project: Mesos Issue Type: Bug Environment: CentOS 7.1, {{--enable-libevent --enable-ssl}} Reporter: Jan Schlicht Running {{sudo ./bin/mesos-tests.sh}} has SlaveRecoveryTest/0.Reboot failing. A virtual env was used to run the tests. Vagrantfile generator: {noformat} cat << EOF > Vagrantfile # -*- mode: ruby -*-" > # vi: set ft=ruby : Vagrant.configure(2) do |config| # Disable shared folder to prevent certain kernel module dependencies. config.vm.synced_folder ".", "/vagrant", disabled: true config.vm.hostname = "centos71" config.vm.box = "bento/centos-7.1" config.vm.provider "virtualbox" do |vb| vb.memory = 8192 vb.cpus = 8 end config.vm.provision "shell", inline: <<-SHELL yum -y update systemd yum install -y tar wget wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo yum groupinstall -y "Development Tools" yum install -y apache-maven python-devel java-1.7.0-openjdk-devel zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 apr-devel subversion-devel apr-util-devel yum install -y libevent-devel yum install -y perf nmap-ncat yum install -y git yum install -y docker systemctl start docker systemctl enable docker SHELL end EOF vagrant up vagrant reload vagrant ssh -c " git clone https://github.com/apache/mesos.git mesos cd mesos git checkout -b 0.26.0-rc2 0.26.0-rc2 ./bootstrap mkdir build cd build ../configure --enable-libevent --enable-ssl GTEST_FILTER="" make check sudo ./bin/mesos-tests.sh " {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3548) Investigate federations of Mesos masters
[ https://issues.apache.org/jira/browse/MESOS-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033932#comment-15033932 ] Neil Conway commented on MESOS-3548: Hi Elouan, That's awesome that you're interested in this area! We're working on setting up a special-interest group for federation, and we'll be sure to include you. > Investigate federations of Mesos masters > > > Key: MESOS-3548 > URL: https://issues.apache.org/jira/browse/MESOS-3548 > Project: Mesos > Issue Type: Improvement >Reporter: Neil Conway > Labels: federation, mesosphere, multi-dc > > In a large Mesos installation, the operator might want to ensure that even if > the Mesos masters are inaccessible or failed, new tasks can still be > scheduled (across multiple different frameworks). HA masters are only a > partial solution here: the masters might still be inaccessible due to a > correlated failure (e.g., Zookeeper misconfiguration/human error). > To support this, we could support the notion of "hierarchies" or > "federations" of Mesos masters. In a Mesos installation with 10k machines, > the operator might configure 10 Mesos masters (each of which might be HA) to > manage 1k machines each. Then an additional "meta-Master" would manage the > allocation of cluster resources to the 10 masters. Hence, the failure of any > individual master would impact 1k machines at most. The meta-master might not > have a lot of work to do: e.g., it might be limited to occasionally > reallocating cluster resources among the 10 masters, or ensuring that newly > added cluster resources are allocated among the masters as appropriate. > Hence, the failure of the meta-master would not prevent any of the individual > masters from scheduling new tasks. A single framework instance probably > wouldn't be able to use more resources than have been assigned to a single > Master, but that seems like a reasonable restriction. > This feature might also be a good fit for a multi-datacenter deployment of > Mesos: each Mesos master instance would manage a single DC. Naturally, > reducing the traffic between frameworks and the meta-master would be > important for performance reasons in a configuration like this. > Operationally, this might be simpler if Mesos processes were self-hosting > ([MESOS-3547]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3773) RegistryClientTest.SimpleGetBlob is flaky
[ https://issues.apache.org/jira/browse/MESOS-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-3773: -- Fix Version/s: (was: 0.26.0) 0.27.0 > RegistryClientTest.SimpleGetBlob is flaky > - > > Key: MESOS-3773 > URL: https://issues.apache.org/jira/browse/MESOS-3773 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Joseph Wu >Assignee: Jojy Varghese > Labels: mesosphere > Fix For: 0.27.0 > > > {{RegistryClientTest.SimpleGetBlob}} fails about 1/5 times. This was > encountered on OSX. > {code:title=Repro} > bin/mesos-tests.sh --gtest_filter="*RegistryClientTest.SimpleGetBlob*" > --gtest_repeat=10 --gtest_break_on_failure > {code} > {code:title=Example Failure} > [ RUN ] RegistryClientTest.SimpleGetBlob > ../../src/tests/containerizer/provisioner_docker_tests.cpp:946: Failure > Value of: blobResponse > Actual: "2015-10-20 20:58:59.579393024+00:00" > Expected: blob.get() > Which is: > "\x15\x3\x3\00(P~\xCA&\xC6<\x4\x16\xE\xB2\xFF\b1a\xB9Z{\xE0\x80\xDA`\xBCt\x5R\x81x6\xF8 > \x8B{\xA8\xA9\x4\xAB\xB6" "E\xE6\xDE\xCF\xD9*\xCC!\xC2\x15" "2015-10-20 > 20:58:59.579393024+00:00" > *** Aborted at 1445374739 (unix time) try "date -d @1445374739" if you are > using GNU date *** > PC: @0x103144ddc testing::UnitTest::AddTestPartResult() > *** SIGSEGV (@0x0) received by PID 49008 (TID 0x7fff73ca3300) stack trace: *** > @ 0x7fff8c58af1a _sigtramp > @ 0x7fff8386e187 malloc > @0x1031445b7 testing::internal::AssertHelper::operator=() > @0x1030d32e0 > mesos::internal::tests::RegistryClientTest_SimpleGetBlob_Test::TestBody() > @0x1030d3562 > mesos::internal::tests::RegistryClientTest_SimpleGetBlob_Test::TestBody() > @0x1031ac8f3 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @0x103192f87 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @0x1031533f5 testing::Test::Run() > @0x10315493b testing::TestInfo::Run() > @0x1031555f7 testing::TestCase::Run() > @0x103163df3 testing::internal::UnitTestImpl::RunAllTests() > @0x1031af8c3 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @0x103195397 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @0x1031639f2 testing::UnitTest::Run() > @0x1025abd41 RUN_ALL_TESTS() > @0x1025a8089 main > @ 0x7fff86b155c9 start > {code} > {code:title=Less common failure} > [ RUN ] RegistryClientTest.SimpleGetBlob > ../../src/tests/containerizer/provisioner_docker_tests.cpp:926: Failure > (socket).failure(): Failed accept: connection error: > error::lib(0):func(0):reason(0) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4029) ContentType/SchedulerTest is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033879#comment-15033879 ] Bernd Mathiske commented on MESOS-4029: --- Talking to Anand and Alexander I am getting the impression this is likely a test bug. > ContentType/SchedulerTest is flaky. > --- > > Key: MESOS-4029 > URL: https://issues.apache.org/jira/browse/MESOS-4029 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.26.0 >Reporter: Till Toenshoff >Assignee: Anand Mazumdar > Labels: flaky, flaky-test, mesosphere > > SSL build, [Ubuntu > 14.04|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh], > non-root test run. > {noformat} > [--] 22 tests from ContentType/SchedulerTest > [ RUN ] ContentType/SchedulerTest.Subscribe/0 > [ OK ] ContentType/SchedulerTest.Subscribe/0 (48 ms) > *** Aborted at 1448928007 (unix time) try "date -d @1448928007" if you are > using GNU date *** > [ RUN ] ContentType/SchedulerTest.Subscribe/1 > PC: @ 0x1451b8e > testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith() > *** SIGSEGV (@0x10030) received by PID 21320 (TID 0x2b549e5d4700) from > PID 48; stack trace: *** > @ 0x2b54c95940b7 os::Linux::chained_handler() > @ 0x2b54c9598219 JVM_handle_linux_signal > @ 0x2b5496300340 (unknown) > @ 0x1451b8e > testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith() > @ 0xe2ea6d > _ZN7testing8internal18FunctionMockerBaseIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS6_SaIS6_E10InvokeWithERKSt5tupleIJSC_EE > @ 0xe2b1bc testing::internal::FunctionMocker<>::Invoke() > @ 0x1118aed > mesos::internal::tests::SchedulerTest::Callbacks::received() > @ 0x111c453 > _ZNKSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS0_2v19scheduler5EventESt5dequeIS8_SaIS8_EclIJSE_EvEEvRS4_DpOT_ > @ 0x111c001 > _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi16__callIvJSF_EJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE > @ 0x111b90d > _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi1clIJSF_EvEET0_DpOT_ > @ 0x111ae09 std::_Function_handler<>::_M_invoke() > @ 0x2b5493c6da09 std::function<>::operator()() > @ 0x2b5493c688ee process::AsyncExecutorProcess::execute<>() > @ 0x2b5493c6db2a > _ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS8_SaIS8_ESC_PvSG_SC_SJ_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSO_FSL_T1_T2_T3_ET4_T5_T6_ENKUlPNS_11ProcessBaseEE_clES11_ > @ 0x2b5493c765a4 > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingNS0_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISC_SaISC_ESG_PvSK_SG_SN_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSS_FSP_T1_T2_T3_ET4_T5_T6_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ > @ 0x2b54946b1201 std::function<>::operator()() > @ 0x2b549469960f process::ProcessBase::visit() > @ 0x2b549469d480 process::DispatchEvent::visit() > @ 0x9dc0ba process::ProcessBase::serve() > @ 0x2b54946958cc process::ProcessManager::resume() > @ 0x2b5494692a9c > _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_ > @ 0x2b549469ccac > _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE > @ 0x2b549469cc5c > _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_ > @ 0x2b549469cbee > _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE > @ 0x2b549469cb45 > _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv > @ 0x2b549469cade > _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv > @ 0x2b5495b81a40 (unknown) > @ 0x2b54962f8182 start_thread > @ 0x2b549660847d (unknown) > make[3]: *** [check-local] Segmentation fault > make[3]: Leaving directory `/home/vagrant/mesos/build/src' > make[2]: *** [check-am] Er
[jira] [Updated] (MESOS-4029) ContentType/SchedulerTest is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4029: -- Affects Version/s: (was: 0.27.0) 0.26.0 > ContentType/SchedulerTest is flaky. > --- > > Key: MESOS-4029 > URL: https://issues.apache.org/jira/browse/MESOS-4029 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.26.0 >Reporter: Till Toenshoff >Assignee: Anand Mazumdar > Labels: flaky, flaky-test, mesosphere > > SSL build, [Ubuntu > 14.04|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh], > non-root test run. > {noformat} > [--] 22 tests from ContentType/SchedulerTest > [ RUN ] ContentType/SchedulerTest.Subscribe/0 > [ OK ] ContentType/SchedulerTest.Subscribe/0 (48 ms) > *** Aborted at 1448928007 (unix time) try "date -d @1448928007" if you are > using GNU date *** > [ RUN ] ContentType/SchedulerTest.Subscribe/1 > PC: @ 0x1451b8e > testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith() > *** SIGSEGV (@0x10030) received by PID 21320 (TID 0x2b549e5d4700) from > PID 48; stack trace: *** > @ 0x2b54c95940b7 os::Linux::chained_handler() > @ 0x2b54c9598219 JVM_handle_linux_signal > @ 0x2b5496300340 (unknown) > @ 0x1451b8e > testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith() > @ 0xe2ea6d > _ZN7testing8internal18FunctionMockerBaseIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS6_SaIS6_E10InvokeWithERKSt5tupleIJSC_EE > @ 0xe2b1bc testing::internal::FunctionMocker<>::Invoke() > @ 0x1118aed > mesos::internal::tests::SchedulerTest::Callbacks::received() > @ 0x111c453 > _ZNKSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS0_2v19scheduler5EventESt5dequeIS8_SaIS8_EclIJSE_EvEEvRS4_DpOT_ > @ 0x111c001 > _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi16__callIvJSF_EJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE > @ 0x111b90d > _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi1clIJSF_EvEET0_DpOT_ > @ 0x111ae09 std::_Function_handler<>::_M_invoke() > @ 0x2b5493c6da09 std::function<>::operator()() > @ 0x2b5493c688ee process::AsyncExecutorProcess::execute<>() > @ 0x2b5493c6db2a > _ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS8_SaIS8_ESC_PvSG_SC_SJ_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSO_FSL_T1_T2_T3_ET4_T5_T6_ENKUlPNS_11ProcessBaseEE_clES11_ > @ 0x2b5493c765a4 > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingNS0_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISC_SaISC_ESG_PvSK_SG_SN_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSS_FSP_T1_T2_T3_ET4_T5_T6_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ > @ 0x2b54946b1201 std::function<>::operator()() > @ 0x2b549469960f process::ProcessBase::visit() > @ 0x2b549469d480 process::DispatchEvent::visit() > @ 0x9dc0ba process::ProcessBase::serve() > @ 0x2b54946958cc process::ProcessManager::resume() > @ 0x2b5494692a9c > _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_ > @ 0x2b549469ccac > _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE > @ 0x2b549469cc5c > _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_ > @ 0x2b549469cbee > _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE > @ 0x2b549469cb45 > _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv > @ 0x2b549469cade > _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv > @ 0x2b5495b81a40 (unknown) > @ 0x2b54962f8182 start_thread > @ 0x2b549660847d (unknown) > make[3]: *** [check-local] Segmentation fault > make[3]: Leaving directory `/home/vagrant/mesos/build/src' > make[2]: *** [check-am] Error 2 > make[2]: Leaving directory `/home/vagrant/mesos/build/src' > make[1
[jira] [Updated] (MESOS-4029) ContentType/SchedulerTest is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4029: -- Target Version/s: 0.27.0 (was: 0.26.0) > ContentType/SchedulerTest is flaky. > --- > > Key: MESOS-4029 > URL: https://issues.apache.org/jira/browse/MESOS-4029 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.26.0 >Reporter: Till Toenshoff >Assignee: Anand Mazumdar > Labels: flaky, flaky-test, mesosphere > > SSL build, [Ubuntu > 14.04|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh], > non-root test run. > {noformat} > [--] 22 tests from ContentType/SchedulerTest > [ RUN ] ContentType/SchedulerTest.Subscribe/0 > [ OK ] ContentType/SchedulerTest.Subscribe/0 (48 ms) > *** Aborted at 1448928007 (unix time) try "date -d @1448928007" if you are > using GNU date *** > [ RUN ] ContentType/SchedulerTest.Subscribe/1 > PC: @ 0x1451b8e > testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith() > *** SIGSEGV (@0x10030) received by PID 21320 (TID 0x2b549e5d4700) from > PID 48; stack trace: *** > @ 0x2b54c95940b7 os::Linux::chained_handler() > @ 0x2b54c9598219 JVM_handle_linux_signal > @ 0x2b5496300340 (unknown) > @ 0x1451b8e > testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith() > @ 0xe2ea6d > _ZN7testing8internal18FunctionMockerBaseIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS6_SaIS6_E10InvokeWithERKSt5tupleIJSC_EE > @ 0xe2b1bc testing::internal::FunctionMocker<>::Invoke() > @ 0x1118aed > mesos::internal::tests::SchedulerTest::Callbacks::received() > @ 0x111c453 > _ZNKSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS0_2v19scheduler5EventESt5dequeIS8_SaIS8_EclIJSE_EvEEvRS4_DpOT_ > @ 0x111c001 > _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi16__callIvJSF_EJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE > @ 0x111b90d > _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi1clIJSF_EvEET0_DpOT_ > @ 0x111ae09 std::_Function_handler<>::_M_invoke() > @ 0x2b5493c6da09 std::function<>::operator()() > @ 0x2b5493c688ee process::AsyncExecutorProcess::execute<>() > @ 0x2b5493c6db2a > _ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS8_SaIS8_ESC_PvSG_SC_SJ_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSO_FSL_T1_T2_T3_ET4_T5_T6_ENKUlPNS_11ProcessBaseEE_clES11_ > @ 0x2b5493c765a4 > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingNS0_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISC_SaISC_ESG_PvSK_SG_SN_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSS_FSP_T1_T2_T3_ET4_T5_T6_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ > @ 0x2b54946b1201 std::function<>::operator()() > @ 0x2b549469960f process::ProcessBase::visit() > @ 0x2b549469d480 process::DispatchEvent::visit() > @ 0x9dc0ba process::ProcessBase::serve() > @ 0x2b54946958cc process::ProcessManager::resume() > @ 0x2b5494692a9c > _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_ > @ 0x2b549469ccac > _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE > @ 0x2b549469cc5c > _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_ > @ 0x2b549469cbee > _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE > @ 0x2b549469cb45 > _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv > @ 0x2b549469cade > _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv > @ 0x2b5495b81a40 (unknown) > @ 0x2b54962f8182 start_thread > @ 0x2b549660847d (unknown) > make[3]: *** [check-local] Segmentation fault > make[3]: Leaving directory `/home/vagrant/mesos/build/src' > make[2]: *** [check-am] Error 2 > make[2]: Leaving directory `/home/vagrant/mesos/build/src' > make[1]: *** [check] Error 2 > ma
[jira] [Updated] (MESOS-4029) ContentType/SchedulerTest is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4029: -- Affects Version/s: (was: 0.26.0) 0.27.0 > ContentType/SchedulerTest is flaky. > --- > > Key: MESOS-4029 > URL: https://issues.apache.org/jira/browse/MESOS-4029 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.26.0 >Reporter: Till Toenshoff >Assignee: Anand Mazumdar > Labels: flaky, flaky-test, mesosphere > > SSL build, [Ubuntu > 14.04|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh], > non-root test run. > {noformat} > [--] 22 tests from ContentType/SchedulerTest > [ RUN ] ContentType/SchedulerTest.Subscribe/0 > [ OK ] ContentType/SchedulerTest.Subscribe/0 (48 ms) > *** Aborted at 1448928007 (unix time) try "date -d @1448928007" if you are > using GNU date *** > [ RUN ] ContentType/SchedulerTest.Subscribe/1 > PC: @ 0x1451b8e > testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith() > *** SIGSEGV (@0x10030) received by PID 21320 (TID 0x2b549e5d4700) from > PID 48; stack trace: *** > @ 0x2b54c95940b7 os::Linux::chained_handler() > @ 0x2b54c9598219 JVM_handle_linux_signal > @ 0x2b5496300340 (unknown) > @ 0x1451b8e > testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith() > @ 0xe2ea6d > _ZN7testing8internal18FunctionMockerBaseIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS6_SaIS6_E10InvokeWithERKSt5tupleIJSC_EE > @ 0xe2b1bc testing::internal::FunctionMocker<>::Invoke() > @ 0x1118aed > mesos::internal::tests::SchedulerTest::Callbacks::received() > @ 0x111c453 > _ZNKSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS0_2v19scheduler5EventESt5dequeIS8_SaIS8_EclIJSE_EvEEvRS4_DpOT_ > @ 0x111c001 > _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi16__callIvJSF_EJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE > @ 0x111b90d > _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi1clIJSF_EvEET0_DpOT_ > @ 0x111ae09 std::_Function_handler<>::_M_invoke() > @ 0x2b5493c6da09 std::function<>::operator()() > @ 0x2b5493c688ee process::AsyncExecutorProcess::execute<>() > @ 0x2b5493c6db2a > _ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS8_SaIS8_ESC_PvSG_SC_SJ_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSO_FSL_T1_T2_T3_ET4_T5_T6_ENKUlPNS_11ProcessBaseEE_clES11_ > @ 0x2b5493c765a4 > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingNS0_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISC_SaISC_ESG_PvSK_SG_SN_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSS_FSP_T1_T2_T3_ET4_T5_T6_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ > @ 0x2b54946b1201 std::function<>::operator()() > @ 0x2b549469960f process::ProcessBase::visit() > @ 0x2b549469d480 process::DispatchEvent::visit() > @ 0x9dc0ba process::ProcessBase::serve() > @ 0x2b54946958cc process::ProcessManager::resume() > @ 0x2b5494692a9c > _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_ > @ 0x2b549469ccac > _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE > @ 0x2b549469cc5c > _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_ > @ 0x2b549469cbee > _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE > @ 0x2b549469cb45 > _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv > @ 0x2b549469cade > _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv > @ 0x2b5495b81a40 (unknown) > @ 0x2b54962f8182 start_thread > @ 0x2b549660847d (unknown) > make[3]: *** [check-local] Segmentation fault > make[3]: Leaving directory `/home/vagrant/mesos/build/src' > make[2]: *** [check-am] Error 2 > make[2]: Leaving directory `/home/vagrant/mesos/build/src' > make[1
[jira] [Created] (MESOS-4031) slave crashed in cgroupstatistics()
Steven created MESOS-4031: - Summary: slave crashed in cgroupstatistics() Key: MESOS-4031 URL: https://issues.apache.org/jira/browse/MESOS-4031 Project: Mesos Issue Type: Bug Components: containerization, libprocess Affects Versions: 0.24.0 Environment: Debian jessie Reporter: Steven Hi all, I have built a mesos cluster with three slaves. Any slave may sporadically crash when I get the summary through mesos master ui. Here is the stack trace. ``` slave.sh[13336]: I1201 11:54:12.827975 13338 slave.cpp:3926] Current disk usage 79.71%. Max allowed age: 17.279577136390834hrs slave.sh[13336]: I1201 11:55:12.829792 13342 slave.cpp:3926] Current disk usage 79.71%. Max allowed age: 17.279577136390834hrs slave.sh[13336]: I1201 11:55:38.389614 13342 http.cpp:189] HTTP GET for /slave(1)/state from 192.168.100.1:64870 with User-Agent='Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0' docker[8409]: time="2015-12-01T11:55:38.934148017+08:00" level=info msg="GET /v1.20/containers/mesos-b25be32d-41e1-4e14-9b84-d33d733cef51-S3.79c206a6-d6b5-487b-9390-e09292c5b53a/json" docker[8409]: time="2015-12-01T11:55:38.941489332+08:00" level=info msg="GET /v1.20/containers/mesos-b25be32d-41e1-4e14-9b84-d33d733cef51-S3.1e01a4b3-a76e-4bf6-8ce0-a4a937faf236/json" slave.sh[13336]: ABORT: (../../3rdparty/libprocess/3rdparty/stout/include/stout/result.hpp:110): Result::get() but state == NONE*** Aborted at 1448942139 (unix time) try "date -d @1448942139" if you are using GNU date *** slave.sh[13336]: PC: @ 0x7f295218a107 (unknown) slave.sh[13336]: *** SIGABRT (@0x3419) received by PID 13337 (TID 0x7f2948992700) from PID 13337; stack trace: *** slave.sh[13336]: @ 0x7f2952a2e8d0 (unknown) slave.sh[13336]: @ 0x7f295218a107 (unknown) slave.sh[13336]: @ 0x7f295218b4e8 (unknown) slave.sh[13336]: @ 0x43dc59 _Abort() slave.sh[13336]: @ 0x43dc87 _Abort() slave.sh[13336]: @ 0x7f2955e31c86 Result<>::get() slave.sh[13336]: @ 0x7f295637f017 mesos::internal::slave::DockerContainerizerProcess::cgroupsStatistics() slave.sh[13336]: @ 0x7f295637dfea _ZZN5mesos8internal5slave26DockerContainerizerProcess5usageERKNS_11ContainerIDEENKUliE_clEi slave.sh[13336]: @ 0x7f295637e549 _ZZN5mesos8internal5slave26DockerContainerizerProcess5usageERKNS_11ContainerIDEENKUlRKN6Docker9ContainerEE0_clES9_ slave.sh[13336]: @ 0x7f295638453b ZN5mesos8internal5slave26DockerContainerizerProcess5usageERKNS1_11ContainerIDEEUlRKN6Docker9ContainerEE0_EcvSt8functionIFT_T0_EEINS_6FutureINS1_18ResourceStatisticsEEESB_EEvENKUlSB_E_clESB_ENKUlvE_clEv slave.sh[13336]: @ 0x7f295638751d FN7process6FutureIN5mesos18ResourceStatisticsEEEvEZZNKS0_9_DeferredIZNS2_8internal5slave26DockerContainerizerProcess5usageERKNS2_11ContainerIDEEUlRKN6Docker9ContainerEE0_EcvSt8functionIFT_T0_EEIS4_SG_EEvENKUlSG_E_clESG_EUlvE_E9_M_invoke slave.sh[13336]: @ 0x7f29563b53e7 std::function<>::operator()() slave.sh[13336]: @ 0x7f29563aa5dc _ZZN7process8dispatchIN5mesos18ResourceStatisticsEEENS_6FutureIT_EERKNS_4UPIDERKSt8functionIFS5_vEEENKUlPNS_11ProcessBaseEE_clESF_ slave.sh[13336]: @ 0x7f29563bd667 _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos18ResourceStatisticsEEENS0_6FutureIT_EERKNS0_4UPIDERKSt8functionIFS9_vEEEUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ slave.sh[13336]: @ 0x7f2956b893c3 std::function<>::operator()() slave.sh[13336]: @ 0x7f2956b72ab0 process::ProcessBase::visit() slave.sh[13336]: @ 0x7f2956b7588e process::DispatchEvent::visit() slave.sh[13336]: @ 0x7f2955d7f972 process::ProcessBase::serve() slave.sh[13336]: @ 0x7f2956b6ef8e process::ProcessManager::resume() slave.sh[13336]: @ 0x7f2956b63555 process::internal::schedule() slave.sh[13336]: @ 0x7f2956bc0839 _ZNSt12_Bind_simpleIFPFvvEvEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE slave.sh[13336]: @ 0x7f2956bc0781 std::_Bind_simple<>::operator()() slave.sh[13336]: @ 0x7f2956bc06fe std::thread::_Impl<>::_M_run() slave.sh[13336]: @ 0x7f29527ca970 (unknown) slave.sh[13336]: @ 0x7f2952a270a4 start_thread slave.sh[13336]: @ 0x7f295223b04d (unknown) ``` -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-4025: -- Target Version/s: 0.27.0 (was: 0.26.0) > SlaveRecoveryTest/0.GCExecutor is flaky. > > > Key: MESOS-4025 > URL: https://issues.apache.org/jira/browse/MESOS-4025 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.26.0 >Reporter: Till Toenshoff > Labels: test > > Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based > on 0.26.0-rc1. > Testsuite was run as root. > {noformat} > sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1 > {noformat} > {noformat} > [ RUN ] SlaveRecoveryTest/0.GCExecutor > I1130 16:49:16.336833 1032 exec.cpp:136] Version: 0.26.0 > I1130 16:49:16.345212 1049 exec.cpp:210] Executor registered on slave > dde9fd4e-b016-4a99-9081-b047e9df9afa-S0 > Registered executor on ubuntu14 > Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114 > sh -c 'sleep 1000' > Forked command at 1057 > ../../src/tests/mesos.cpp:779: Failure > (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup > '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave': > Device or resource busy > *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are > using GNU date *** > PC: @ 0x1443e9a testing::UnitTest::AddTestPartResult() > *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; > stack trace: *** > @ 0x7f1be92b80b7 os::Linux::chained_handler() > @ 0x7f1be92bc219 JVM_handle_linux_signal > @ 0x7f1bf7bbc340 (unknown) > @ 0x1443e9a testing::UnitTest::AddTestPartResult() > @ 0x1438b99 testing::internal::AssertHelper::operator=() > @ 0xf0b3bb > mesos::internal::tests::ContainerizerTest<>::TearDown() > @ 0x1461882 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x145c6f8 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x143de4a testing::Test::Run() > @ 0x143e584 testing::TestInfo::Run() > @ 0x143ebca testing::TestCase::Run() > @ 0x1445312 testing::internal::UnitTestImpl::RunAllTests() > @ 0x14624a7 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x145d26e > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x14440ae testing::UnitTest::Run() > @ 0xd15cd4 RUN_ALL_TESTS() > @ 0xd158c1 main > @ 0x7f1bf7808ec5 (unknown) > @ 0x913009 (unknown) > {noformat} > My Vagrantfile generator; > {noformat} > #!/usr/bin/env bash > cat << EOF > Vagrantfile > # -*- mode: ruby -*-" > > # vi: set ft=ruby : > Vagrant.configure(2) do |config| > # Disable shared folder to prevent certain kernel module dependencies. > config.vm.synced_folder ".", "/vagrant", disabled: true > config.vm.box = "bento/ubuntu-14.04" > config.vm.hostname = "${PLATFORM_NAME}" > config.vm.provider "virtualbox" do |vb| > vb.memory = ${VAGRANT_MEM} > vb.cpus = ${VAGRANT_CPUS} > vb.customize ["modifyvm", :id, "--nictype1", "virtio"] > vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"] > vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"] > end > config.vm.provider "vmware_fusion" do |vb| > vb.memory = ${VAGRANT_MEM} > vb.cpus = ${VAGRANT_CPUS} > end > config.vm.provision "file", source: "../test.sh", destination: "~/test.sh" > config.vm.provision "shell", inline: <<-SHELL > sudo apt-get update > sudo apt-get -y install openjdk-7-jdk autoconf libtool > sudo apt-get -y install build-essential python-dev python-boto \ > libcurl4-nss-dev libsasl2-dev maven \ > libapr1-dev libsvn-dev libssl-dev libevent-dev > sudo apt-get -y install git > sudo wget -qO- https://get.docker.com/ | sh > SHELL > end > EOF > {noformat} > The problem is kicking in frequently in my tests - I'ld say > 10% but less > than 50%. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033828#comment-15033828 ] Till Toenshoff commented on MESOS-4025: --- Thanks for your analysis. We will declare it as a non-blocker as it is a test-only issue according to your research. Thanks again!! > SlaveRecoveryTest/0.GCExecutor is flaky. > > > Key: MESOS-4025 > URL: https://issues.apache.org/jira/browse/MESOS-4025 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.26.0 >Reporter: Till Toenshoff > Labels: test > > Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based > on 0.26.0-rc1. > Testsuite was run as root. > {noformat} > sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1 > {noformat} > {noformat} > [ RUN ] SlaveRecoveryTest/0.GCExecutor > I1130 16:49:16.336833 1032 exec.cpp:136] Version: 0.26.0 > I1130 16:49:16.345212 1049 exec.cpp:210] Executor registered on slave > dde9fd4e-b016-4a99-9081-b047e9df9afa-S0 > Registered executor on ubuntu14 > Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114 > sh -c 'sleep 1000' > Forked command at 1057 > ../../src/tests/mesos.cpp:779: Failure > (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup > '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave': > Device or resource busy > *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are > using GNU date *** > PC: @ 0x1443e9a testing::UnitTest::AddTestPartResult() > *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; > stack trace: *** > @ 0x7f1be92b80b7 os::Linux::chained_handler() > @ 0x7f1be92bc219 JVM_handle_linux_signal > @ 0x7f1bf7bbc340 (unknown) > @ 0x1443e9a testing::UnitTest::AddTestPartResult() > @ 0x1438b99 testing::internal::AssertHelper::operator=() > @ 0xf0b3bb > mesos::internal::tests::ContainerizerTest<>::TearDown() > @ 0x1461882 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x145c6f8 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x143de4a testing::Test::Run() > @ 0x143e584 testing::TestInfo::Run() > @ 0x143ebca testing::TestCase::Run() > @ 0x1445312 testing::internal::UnitTestImpl::RunAllTests() > @ 0x14624a7 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x145d26e > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x14440ae testing::UnitTest::Run() > @ 0xd15cd4 RUN_ALL_TESTS() > @ 0xd158c1 main > @ 0x7f1bf7808ec5 (unknown) > @ 0x913009 (unknown) > {noformat} > My Vagrantfile generator; > {noformat} > #!/usr/bin/env bash > cat << EOF > Vagrantfile > # -*- mode: ruby -*-" > > # vi: set ft=ruby : > Vagrant.configure(2) do |config| > # Disable shared folder to prevent certain kernel module dependencies. > config.vm.synced_folder ".", "/vagrant", disabled: true > config.vm.box = "bento/ubuntu-14.04" > config.vm.hostname = "${PLATFORM_NAME}" > config.vm.provider "virtualbox" do |vb| > vb.memory = ${VAGRANT_MEM} > vb.cpus = ${VAGRANT_CPUS} > vb.customize ["modifyvm", :id, "--nictype1", "virtio"] > vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"] > vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"] > end > config.vm.provider "vmware_fusion" do |vb| > vb.memory = ${VAGRANT_MEM} > vb.cpus = ${VAGRANT_CPUS} > end > config.vm.provision "file", source: "../test.sh", destination: "~/test.sh" > config.vm.provision "shell", inline: <<-SHELL > sudo apt-get update > sudo apt-get -y install openjdk-7-jdk autoconf libtool > sudo apt-get -y install build-essential python-dev python-boto \ > libcurl4-nss-dev libsasl2-dev maven \ > libapr1-dev libsvn-dev libssl-dev libevent-dev > sudo apt-get -y install git > sudo wget -qO- https://get.docker.com/ | sh > SHELL > end > EOF > {noformat} > The problem is kicking in frequently in my tests - I'ld say > 10% but less > than 50%. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4029) ContentType/SchedulerTest is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033781#comment-15033781 ] Alexander Rojas commented on MESOS-4029: After applying the patch I still got the following crashes: {noformat} [ OK ] ContentType/SchedulerTest.Subscribe/0 (66 ms) [ RUN ] ContentType/SchedulerTest.Subscribe/1 @ 0x7fb100193686 google::LogMessage::Fail() @ 0x7fb100198dac google::RawLog__() @ 0x7fb0ff3d9c14 __cxa_pure_virtual @ 0x14e3c38 testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith() @ 0xe20259 _ZN7testing8internal18FunctionMockerBaseIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS6_SaIS6_EEE EE10InvokeWithERKSt5tupleIJSC_EE @ 0xe1c9a8 testing::internal::FunctionMocker<>::Invoke() @ 0x118d6b9 mesos::internal::tests::SchedulerTest::Callbacks::received() @ 0x119101f _ZNKSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS0_2v19scheduler5EventES t5dequeIS8_SaIS8_EclIJSE_EvEEvRS4_DpOT_ @ 0x1190bcd _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19schedule r5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi16__callIvJSF_EJLm0ELm1T_OSt5tupleIJDpT0_EES t12_Index_tupleIJXspT1_EEE @ 0x11904d9 _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19schedule r5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi1clIJSF_EvEET0_DpOT_ @ 0x118f9d5 std::_Function_handler<>::_M_invoke() @ 0x7fb0ff69b103 std::function<>::operator()() @ 0x7fb0ff695fe8 process::AsyncExecutorProcess::execute<>() @ 0x7fb0ff69b224 _ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19schedule r5EventESt5dequeIS8_SaIS8_ESC_PvSG_SC_SJ_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSO_FSL_T1_T2_T3_ET4_T5_T6_ENKUlPNS_11ProcessBaseEE _clES11_ @ 0x7fb0ff6a3c9e _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingNS0_20AsyncExecutorProcessE RKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISC_SaISC_ESG_PvSK_SG_SN_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSS _FSP_T1_T2_T3_ET4_T5_T6_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ @ 0x7fb1001015e1 std::function<>::operator()() @ 0x7fb1000e9927 process::ProcessBase::visit() @ 0x7fb1000ed516 process::DispatchEvent::visit() @ 0x9e844a process::ProcessBase::serve() @ 0x7fb1000e5bf0 process::ProcessManager::resume() @ 0x7fb1000e2ca6 _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_ @ 0x7fb1000eccd8 _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_ EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE @ 0x7fb1000ecc88 _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_ EEEclIIEvEET0_DpOT_ @ 0x7fb1000ecc1a _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17ref erence_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE @ 0x7fb1000ecb71 _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17ref erence_wrapperIS4_EEEvEEclEv @ 0x7fb1000ecb0a _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atom ic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv @ 0x7fb0fb4a9a40 (unknown) @ 0x7fb0facc6182 start_thread @ 0x7fb0fa9f347d (unknown) Aborted (core dumped) {noformat} And {noformat} [ RUN ] ContentType/SchedulerTest.Subscribe/1 I1201 15:20:59.848814 32637 leveldb.cpp:174] Opened db in 5.713001ms I1201 15:20:59.850643 32637 leveldb.cpp:181] Compacted db in 1.722714ms I1201 15:20:59.851052 32637 leveldb.cpp:196] Created db iterator in 120371ns I1201 15:20:59.851768 32637 leveldb.cpp:202] Seeked to beginning of db in 3411ns I1201 15:20:59.851850 32637 leveldb.cpp:271] Iterated through 0 keys in the db in 15133ns I1201 15:20:59.852177 32637 replica.cpp:778] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I1201 15:20:59.853752 32657 recover.cpp:447] Starting replica recovery I1201 15:20:59.854022 32657 recover.cpp:473] Replica is in EMPTY status I1201 15:20:59.855265 32652 replica.cpp:674] Replica in EMPTY status received a broadcasted recover request from (6918)@127.0.1.1:46010 I1201 15:20:59.855675 32652 recover.cpp:193] Received a recover response from a replica in EMPTY status I1201 15:20:59.855649 32656 master.cpp:365] Master b893dcee-362e-4fcf-81ac-d190058b8682 (ubuntu-vm) started on 127.0.1.1:46010 I1201 15:20:59.856055
[jira] [Comment Edited] (MESOS-3718) Implement Quota support in allocator
[ https://issues.apache.org/jira/browse/MESOS-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971338#comment-14971338 ] Alexander Rukletsov edited comment on MESOS-3718 at 12/1/15 2:18 PM: - https://reviews.apache.org/r/39399/ https://reviews.apache.org/r/39400/ https://reviews.apache.org/r/40551/ https://reviews.apache.org/r/40795/ https://reviews.apache.org/r/40821/ was (Author: alexr): https://reviews.apache.org/r/39399/ https://reviews.apache.org/r/39400/ https://reviews.apache.org/r/40551/ https://reviews.apache.org/r/40795/ > Implement Quota support in allocator > > > Key: MESOS-3718 > URL: https://issues.apache.org/jira/browse/MESOS-3718 > Project: Mesos > Issue Type: Bug > Components: allocation >Reporter: Alexander Rukletsov >Assignee: Alexander Rukletsov > Labels: mesosphere > > The built-in Hierarchical DRF allocator should support Quota. This includes > (but not limited to): adding, updating, removing and satisfying quota; > avoiding both overcomitting resources and handing them to non-quota'ed roles > in presence of master failover. > A [design doc for Quota support in > Allocator|https://issues.apache.org/jira/browse/MESOS-2937] provides an > overview of a feature set required to be implemented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3273) EventCall Test Framework is flaky
[ https://issues.apache.org/jira/browse/MESOS-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Bannier updated MESOS-3273: Attachment: asan.log Clang address sanitizer reports use-after-free errors for this test which appear to come from the libevent bindings; it attached a log. It might be a good idea to address that issue first. > EventCall Test Framework is flaky > - > > Key: MESOS-3273 > URL: https://issues.apache.org/jira/browse/MESOS-3273 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Affects Versions: 0.24.0 > Environment: > https://builds.apache.org/job/Mesos/705/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/consoleFull >Reporter: Vinod Kone > Labels: flaky-test, tech-debt, twitter > Attachments: asan.log > > > Observed this on ASF CI. h/t [~haosd...@gmail.com] > Looks like the HTTP scheduler never sent a SUBSCRIBE request to the master. > {code} > [ RUN ] ExamplesTest.EventCallFramework > Using temporary directory '/tmp/ExamplesTest_EventCallFramework_k4vXkx' > I0813 19:55:15.643579 26085 exec.cpp:443] Ignoring exited event because the > driver is aborted! > Shutting down > Sending SIGTERM to process tree at pid 26061 > Killing the following process trees: > [ > ] > Shutting down > Sending SIGTERM to process tree at pid 26062 > Shutting down > Killing the following process trees: > [ > ] > Sending SIGTERM to process tree at pid 26063 > Killing the following process trees: > [ > ] > Shutting down > Sending SIGTERM to process tree at pid 26098 > Killing the following process trees: > [ > ] > Shutting down > Sending SIGTERM to process tree at pid 26099 > Killing the following process trees: > [ > ] > WARNING: Logging before InitGoogleLogging() is written to STDERR > I0813 19:55:17.161726 26100 process.cpp:1012] libprocess is initialized on > 172.17.2.10:60249 for 16 cpus > I0813 19:55:17.161888 26100 logging.cpp:177] Logging to STDERR > I0813 19:55:17.163625 26100 scheduler.cpp:157] Version: 0.24.0 > I0813 19:55:17.175302 26100 leveldb.cpp:176] Opened db in 3.167446ms > I0813 19:55:17.176393 26100 leveldb.cpp:183] Compacted db in 1.047996ms > I0813 19:55:17.176496 26100 leveldb.cpp:198] Created db iterator in 77155ns > I0813 19:55:17.176518 26100 leveldb.cpp:204] Seeked to beginning of db in > 8429ns > I0813 19:55:17.176527 26100 leveldb.cpp:273] Iterated through 0 keys in the > db in 4219ns > I0813 19:55:17.176708 26100 replica.cpp:744] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0813 19:55:17.178951 26136 recover.cpp:449] Starting replica recovery > I0813 19:55:17.179934 26136 recover.cpp:475] Replica is in EMPTY status > I0813 19:55:17.181970 26126 master.cpp:378] Master > 20150813-195517-167907756-60249-26100 (297daca2d01a) started on > 172.17.2.10:60249 > I0813 19:55:17.182317 26126 master.cpp:380] Flags at startup: > --acls="permissive: false > register_frameworks { > principals { > type: SOME > values: "test-principal" > } > roles { > type: SOME > values: "*" > } > } > run_tasks { > principals { > type: SOME > values: "test-principal" > } > users { > type: SOME > values: "mesos" > } > } > " --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="false" --authenticate_slaves="false" > --authenticators="crammd5" > --credentials="/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials" > --framework_sorter="drf" --help="false" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_slave_ping_timeouts="5" --quiet="false" > --recovery_slave_removal_limit="100%" --registry="replicated_log" > --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" > --registry_strict="false" --root_submissions="true" > --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" > --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-0.24.0/src/webui" --work_dir="/tmp/mesos-II8Gua" > --zk_session_timeout="10secs" > I0813 19:55:17.183475 26126 master.cpp:427] Master allowing unauthenticated > frameworks to register > I0813 19:55:17.183536 26126 master.cpp:432] Master allowing unauthenticated > slaves to register > I0813 19:55:17.183615 26126 credentials.hpp:37] Loading credentials for > authentication from '/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials' > W0813 19:55:17.183859 26126 credentials.hpp:52] Permissions on credentials > file '/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials' are too open. > It is recommended that your credentials file is NOT accessible by others. > I0813 19:55:17.183969 26123 replica.cpp:641] Replica in EMPTY status received > a broadcasted recover request > I0813 19:55:17.184306
[jira] [Updated] (MESOS-4030) DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping is flaky
[ https://issues.apache.org/jira/browse/MESOS-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-4030: -- Assignee: Timothy Chen (was: Benjamin Bannier) > DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping is flaky > --- > > Key: MESOS-4030 > URL: https://issues.apache.org/jira/browse/MESOS-4030 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 0.26.0 > Environment: [Ubuntu > 14|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh], > 0.26.0 RC (wip) enable-ssl & enable-libevent, root test-run >Reporter: Till Toenshoff >Assignee: Timothy Chen > Labels: flaky, flaky-test > > {noformat} > [ RUN ] DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping > I1201 02:18:00.325283 18931 leveldb.cpp:176] Opened db in 3.877576ms > I1201 02:18:00.326195 18931 leveldb.cpp:183] Compacted db in 831923ns > I1201 02:18:00.326288 18931 leveldb.cpp:198] Created db iterator in 21460ns > I1201 02:18:00.326305 18931 leveldb.cpp:204] Seeked to beginning of db in > 1431ns > I1201 02:18:00.326316 18931 leveldb.cpp:273] Iterated through 0 keys in the > db in 178ns > I1201 02:18:00.326354 18931 replica.cpp:780] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1201 02:18:00.327128 18952 recover.cpp:449] Starting replica recovery > I1201 02:18:00.327481 18948 recover.cpp:475] Replica is in EMPTY status > I1201 02:18:00.328354 18945 replica.cpp:676] Replica in EMPTY status received > a broadcasted recover request from (88123)@127.0.1.1:45788 > I1201 02:18:00.328660 18950 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I1201 02:18:00.329139 18951 recover.cpp:566] Updating replica status to > STARTING > I1201 02:18:00.330413 18949 master.cpp:367] Master > 9577131b-f0b1-47bd-8f88-f5edbf2f026d (ubuntu14) started on 127.0.1.1:45788 > I1201 02:18:00.330474 18949 master.cpp:369] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/dHFLJX/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/dHFLJX/master" > --zk_session_timeout="10secs" > I1201 02:18:00.330662 18949 master.cpp:414] Master only allowing > authenticated frameworks to register > I1201 02:18:00.330670 18949 master.cpp:419] Master only allowing > authenticated slaves to register > I1201 02:18:00.330682 18949 credentials.hpp:37] Loading credentials for > authentication from '/tmp/dHFLJX/credentials' > I1201 02:18:00.330950 18945 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 1.585892ms > I1201 02:18:00.331248 18945 replica.cpp:323] Persisted replica status to > STARTING > I1201 02:18:00.330968 18949 master.cpp:458] Using default 'crammd5' > authenticator > I1201 02:18:00.331681 18949 master.cpp:495] Authorization enabled > I1201 02:18:00.331717 18945 recover.cpp:475] Replica is in STARTING status > I1201 02:18:00.332875 18947 replica.cpp:676] Replica in STARTING status > received a broadcasted recover request from (88124)@127.0.1.1:45788 > I1201 02:18:00.44 18947 recover.cpp:195] Received a recover response from > a replica in STARTING status > I1201 02:18:00.333760 18950 recover.cpp:566] Updating replica status to VOTING > I1201 02:18:00.333875 18945 master.cpp:1606] The newly elected leader is > master@127.0.1.1:45788 with id 9577131b-f0b1-47bd-8f88-f5edbf2f026d > I1201 02:18:00.334624 18951 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 307292ns > I1201 02:18:00.334652 18951 replica.cpp:323] Persisted replica status to > VOTING > I1201 02:18:00.334656 18945 master.cpp:1619] Elected as the leading master! > I1201 02:18:00.334758 18951 recover.cpp:580] Successfully joined the Paxos > group > I1201 02:18:00.334933 18945 master.cpp:1379] Recovering from registrar > I1201 02:18:00.335108 18951 recover.cpp:464] Recover process terminated > I1201 02:18:00.335183 18951 registrar.cpp:309] Recovering registrar > I1201 02:18:00.335577 18950 log.cpp:661] Attempting to start the writer > I1201 02:18:00.336777 18952 replica.cpp:496] Replica received implicit > promise
[jira] [Updated] (MESOS-4030) DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping is flaky
[ https://issues.apache.org/jira/browse/MESOS-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-4030: -- Target Version/s: 0.27.0 (was: 0.26.0) > DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping is flaky > --- > > Key: MESOS-4030 > URL: https://issues.apache.org/jira/browse/MESOS-4030 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 0.26.0 > Environment: [Ubuntu > 14|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh], > 0.26.0 RC (wip) enable-ssl & enable-libevent, root test-run >Reporter: Till Toenshoff >Assignee: Timothy Chen > Labels: flaky, flaky-test > > {noformat} > [ RUN ] DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping > I1201 02:18:00.325283 18931 leveldb.cpp:176] Opened db in 3.877576ms > I1201 02:18:00.326195 18931 leveldb.cpp:183] Compacted db in 831923ns > I1201 02:18:00.326288 18931 leveldb.cpp:198] Created db iterator in 21460ns > I1201 02:18:00.326305 18931 leveldb.cpp:204] Seeked to beginning of db in > 1431ns > I1201 02:18:00.326316 18931 leveldb.cpp:273] Iterated through 0 keys in the > db in 178ns > I1201 02:18:00.326354 18931 replica.cpp:780] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1201 02:18:00.327128 18952 recover.cpp:449] Starting replica recovery > I1201 02:18:00.327481 18948 recover.cpp:475] Replica is in EMPTY status > I1201 02:18:00.328354 18945 replica.cpp:676] Replica in EMPTY status received > a broadcasted recover request from (88123)@127.0.1.1:45788 > I1201 02:18:00.328660 18950 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I1201 02:18:00.329139 18951 recover.cpp:566] Updating replica status to > STARTING > I1201 02:18:00.330413 18949 master.cpp:367] Master > 9577131b-f0b1-47bd-8f88-f5edbf2f026d (ubuntu14) started on 127.0.1.1:45788 > I1201 02:18:00.330474 18949 master.cpp:369] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/dHFLJX/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/dHFLJX/master" > --zk_session_timeout="10secs" > I1201 02:18:00.330662 18949 master.cpp:414] Master only allowing > authenticated frameworks to register > I1201 02:18:00.330670 18949 master.cpp:419] Master only allowing > authenticated slaves to register > I1201 02:18:00.330682 18949 credentials.hpp:37] Loading credentials for > authentication from '/tmp/dHFLJX/credentials' > I1201 02:18:00.330950 18945 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 1.585892ms > I1201 02:18:00.331248 18945 replica.cpp:323] Persisted replica status to > STARTING > I1201 02:18:00.330968 18949 master.cpp:458] Using default 'crammd5' > authenticator > I1201 02:18:00.331681 18949 master.cpp:495] Authorization enabled > I1201 02:18:00.331717 18945 recover.cpp:475] Replica is in STARTING status > I1201 02:18:00.332875 18947 replica.cpp:676] Replica in STARTING status > received a broadcasted recover request from (88124)@127.0.1.1:45788 > I1201 02:18:00.44 18947 recover.cpp:195] Received a recover response from > a replica in STARTING status > I1201 02:18:00.333760 18950 recover.cpp:566] Updating replica status to VOTING > I1201 02:18:00.333875 18945 master.cpp:1606] The newly elected leader is > master@127.0.1.1:45788 with id 9577131b-f0b1-47bd-8f88-f5edbf2f026d > I1201 02:18:00.334624 18951 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 307292ns > I1201 02:18:00.334652 18951 replica.cpp:323] Persisted replica status to > VOTING > I1201 02:18:00.334656 18945 master.cpp:1619] Elected as the leading master! > I1201 02:18:00.334758 18951 recover.cpp:580] Successfully joined the Paxos > group > I1201 02:18:00.334933 18945 master.cpp:1379] Recovering from registrar > I1201 02:18:00.335108 18951 recover.cpp:464] Recover process terminated > I1201 02:18:00.335183 18951 registrar.cpp:309] Recovering registrar > I1201 02:18:00.335577 18950 log.cpp:661] Attempting to start the writer > I1201 02:18:00.336777 18952 replica.cpp:496] Replica received implicit > promise request
[jira] [Commented] (MESOS-4030) DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping is flaky
[ https://issues.apache.org/jira/browse/MESOS-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033658#comment-15033658 ] Benjamin Bannier commented on MESOS-4030: - This appears to be a race in the test code: we cannot parse the containerizer's stdout or continue with the cleanup before the containerizer has finished running. We could e.g. insert a capture calls to {{Docker::_run}} to get notified once we are ready to proceed. > DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping is flaky > --- > > Key: MESOS-4030 > URL: https://issues.apache.org/jira/browse/MESOS-4030 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 0.26.0 > Environment: [Ubuntu > 14|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh], > 0.26.0 RC (wip) enable-ssl & enable-libevent, root test-run >Reporter: Till Toenshoff >Assignee: Benjamin Bannier > Labels: flaky, flaky-test > > {noformat} > [ RUN ] DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping > I1201 02:18:00.325283 18931 leveldb.cpp:176] Opened db in 3.877576ms > I1201 02:18:00.326195 18931 leveldb.cpp:183] Compacted db in 831923ns > I1201 02:18:00.326288 18931 leveldb.cpp:198] Created db iterator in 21460ns > I1201 02:18:00.326305 18931 leveldb.cpp:204] Seeked to beginning of db in > 1431ns > I1201 02:18:00.326316 18931 leveldb.cpp:273] Iterated through 0 keys in the > db in 178ns > I1201 02:18:00.326354 18931 replica.cpp:780] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1201 02:18:00.327128 18952 recover.cpp:449] Starting replica recovery > I1201 02:18:00.327481 18948 recover.cpp:475] Replica is in EMPTY status > I1201 02:18:00.328354 18945 replica.cpp:676] Replica in EMPTY status received > a broadcasted recover request from (88123)@127.0.1.1:45788 > I1201 02:18:00.328660 18950 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I1201 02:18:00.329139 18951 recover.cpp:566] Updating replica status to > STARTING > I1201 02:18:00.330413 18949 master.cpp:367] Master > 9577131b-f0b1-47bd-8f88-f5edbf2f026d (ubuntu14) started on 127.0.1.1:45788 > I1201 02:18:00.330474 18949 master.cpp:369] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/dHFLJX/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/dHFLJX/master" > --zk_session_timeout="10secs" > I1201 02:18:00.330662 18949 master.cpp:414] Master only allowing > authenticated frameworks to register > I1201 02:18:00.330670 18949 master.cpp:419] Master only allowing > authenticated slaves to register > I1201 02:18:00.330682 18949 credentials.hpp:37] Loading credentials for > authentication from '/tmp/dHFLJX/credentials' > I1201 02:18:00.330950 18945 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 1.585892ms > I1201 02:18:00.331248 18945 replica.cpp:323] Persisted replica status to > STARTING > I1201 02:18:00.330968 18949 master.cpp:458] Using default 'crammd5' > authenticator > I1201 02:18:00.331681 18949 master.cpp:495] Authorization enabled > I1201 02:18:00.331717 18945 recover.cpp:475] Replica is in STARTING status > I1201 02:18:00.332875 18947 replica.cpp:676] Replica in STARTING status > received a broadcasted recover request from (88124)@127.0.1.1:45788 > I1201 02:18:00.44 18947 recover.cpp:195] Received a recover response from > a replica in STARTING status > I1201 02:18:00.333760 18950 recover.cpp:566] Updating replica status to VOTING > I1201 02:18:00.333875 18945 master.cpp:1606] The newly elected leader is > master@127.0.1.1:45788 with id 9577131b-f0b1-47bd-8f88-f5edbf2f026d > I1201 02:18:00.334624 18951 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 307292ns > I1201 02:18:00.334652 18951 replica.cpp:323] Persisted replica status to > VOTING > I1201 02:18:00.334656 18945 master.cpp:1619] Elected as the leading master! > I1201 02:18:00.334758 18951 recover.cpp:580] Successfully joined the Paxos > group > I1201 02:18:00.334933 18945 master.cpp:1379] Recovering from registrar > I1201 02:18:00.33
[jira] [Updated] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Schlicht updated MESOS-4025: Assignee: (was: Jan Schlicht) > SlaveRecoveryTest/0.GCExecutor is flaky. > > > Key: MESOS-4025 > URL: https://issues.apache.org/jira/browse/MESOS-4025 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.26.0 >Reporter: Till Toenshoff > Labels: test > > Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based > on 0.26.0-rc1. > Testsuite was run as root. > {noformat} > sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1 > {noformat} > {noformat} > [ RUN ] SlaveRecoveryTest/0.GCExecutor > I1130 16:49:16.336833 1032 exec.cpp:136] Version: 0.26.0 > I1130 16:49:16.345212 1049 exec.cpp:210] Executor registered on slave > dde9fd4e-b016-4a99-9081-b047e9df9afa-S0 > Registered executor on ubuntu14 > Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114 > sh -c 'sleep 1000' > Forked command at 1057 > ../../src/tests/mesos.cpp:779: Failure > (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup > '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave': > Device or resource busy > *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are > using GNU date *** > PC: @ 0x1443e9a testing::UnitTest::AddTestPartResult() > *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; > stack trace: *** > @ 0x7f1be92b80b7 os::Linux::chained_handler() > @ 0x7f1be92bc219 JVM_handle_linux_signal > @ 0x7f1bf7bbc340 (unknown) > @ 0x1443e9a testing::UnitTest::AddTestPartResult() > @ 0x1438b99 testing::internal::AssertHelper::operator=() > @ 0xf0b3bb > mesos::internal::tests::ContainerizerTest<>::TearDown() > @ 0x1461882 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x145c6f8 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x143de4a testing::Test::Run() > @ 0x143e584 testing::TestInfo::Run() > @ 0x143ebca testing::TestCase::Run() > @ 0x1445312 testing::internal::UnitTestImpl::RunAllTests() > @ 0x14624a7 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x145d26e > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x14440ae testing::UnitTest::Run() > @ 0xd15cd4 RUN_ALL_TESTS() > @ 0xd158c1 main > @ 0x7f1bf7808ec5 (unknown) > @ 0x913009 (unknown) > {noformat} > My Vagrantfile generator; > {noformat} > #!/usr/bin/env bash > cat << EOF > Vagrantfile > # -*- mode: ruby -*-" > > # vi: set ft=ruby : > Vagrant.configure(2) do |config| > # Disable shared folder to prevent certain kernel module dependencies. > config.vm.synced_folder ".", "/vagrant", disabled: true > config.vm.box = "bento/ubuntu-14.04" > config.vm.hostname = "${PLATFORM_NAME}" > config.vm.provider "virtualbox" do |vb| > vb.memory = ${VAGRANT_MEM} > vb.cpus = ${VAGRANT_CPUS} > vb.customize ["modifyvm", :id, "--nictype1", "virtio"] > vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"] > vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"] > end > config.vm.provider "vmware_fusion" do |vb| > vb.memory = ${VAGRANT_MEM} > vb.cpus = ${VAGRANT_CPUS} > end > config.vm.provision "file", source: "../test.sh", destination: "~/test.sh" > config.vm.provision "shell", inline: <<-SHELL > sudo apt-get update > sudo apt-get -y install openjdk-7-jdk autoconf libtool > sudo apt-get -y install build-essential python-dev python-boto \ > libcurl4-nss-dev libsasl2-dev maven \ > libapr1-dev libsvn-dev libssl-dev libevent-dev > sudo apt-get -y install git > sudo wget -qO- https://get.docker.com/ | sh > SHELL > end > EOF > {noformat} > The problem is kicking in frequently in my tests - I'ld say > 10% but less > than 50%. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3581) License headers show up all over doxygen documentation.
[ https://issues.apache.org/jira/browse/MESOS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Bannier updated MESOS-3581: Shepherd: Michael Park (was: Bernd Mathiske) > License headers show up all over doxygen documentation. > --- > > Key: MESOS-3581 > URL: https://issues.apache.org/jira/browse/MESOS-3581 > Project: Mesos > Issue Type: Documentation > Components: documentation >Affects Versions: 0.24.1 >Reporter: Benjamin Bannier >Assignee: Benjamin Bannier >Priority: Minor > Labels: mesosphere > > Currently license headers are commented in something resembling Javadoc style, > {code} > /** > * Licensed ... > {code} > Since we use Javadoc-style comment blocks for doxygen documentation all > license headers appear in the generated documentation, potentially and likely > hiding the actual documentation. > Using {{/*}} to start the comment blocks would be enough to hide them from > doxygen, but would likely also result in a largish (though mostly > uninteresting) patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4020) Introduce filter for non-revocable resources in `Resources`
[ https://issues.apache.org/jira/browse/MESOS-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033562#comment-15033562 ] Michael Park commented on MESOS-4020: - {noformat} commit ffbed5ea3bb059ed6dd8830d8a6acc5195ed3683 Author: Alexander Rukletsov Date: Tue Dec 1 06:50:41 2015 -0500 Updated codebase to use `nonRevocable()` where appropriate. Review: https://reviews.apache.org/r/40756 {noformat} {noformat} commit dba67f5dd3d99f26b3d7331efd96706a0be905dd Author: Alexander Rukletsov Date: Tue Dec 1 06:39:46 2015 -0500 Introduced filter for non-revocable resources. Review: https://reviews.apache.org/r/40755 {noformat} > Introduce filter for non-revocable resources in `Resources` > --- > > Key: MESOS-4020 > URL: https://issues.apache.org/jira/browse/MESOS-4020 > Project: Mesos > Issue Type: Improvement >Reporter: Alexander Rukletsov >Assignee: Alexander Rukletsov >Priority: Minor > Labels: mesosphere > Fix For: 0.27.0 > > > {{Resources}} class defines some handy filters, like {{revocable()}}, > {{unreserved()}}, and so on. This ticket proposes to add one more: > {{nonRevocable()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Schlicht updated MESOS-4025: Sprint: Mesosphere Sprint 23 Story Points: 3 Labels: test (was: ) > SlaveRecoveryTest/0.GCExecutor is flaky. > > > Key: MESOS-4025 > URL: https://issues.apache.org/jira/browse/MESOS-4025 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.26.0 >Reporter: Till Toenshoff >Assignee: Jan Schlicht > Labels: test > > Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based > on 0.26.0-rc1. > Testsuite was run as root. > {noformat} > sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1 > {noformat} > {noformat} > [ RUN ] SlaveRecoveryTest/0.GCExecutor > I1130 16:49:16.336833 1032 exec.cpp:136] Version: 0.26.0 > I1130 16:49:16.345212 1049 exec.cpp:210] Executor registered on slave > dde9fd4e-b016-4a99-9081-b047e9df9afa-S0 > Registered executor on ubuntu14 > Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114 > sh -c 'sleep 1000' > Forked command at 1057 > ../../src/tests/mesos.cpp:779: Failure > (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup > '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave': > Device or resource busy > *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are > using GNU date *** > PC: @ 0x1443e9a testing::UnitTest::AddTestPartResult() > *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; > stack trace: *** > @ 0x7f1be92b80b7 os::Linux::chained_handler() > @ 0x7f1be92bc219 JVM_handle_linux_signal > @ 0x7f1bf7bbc340 (unknown) > @ 0x1443e9a testing::UnitTest::AddTestPartResult() > @ 0x1438b99 testing::internal::AssertHelper::operator=() > @ 0xf0b3bb > mesos::internal::tests::ContainerizerTest<>::TearDown() > @ 0x1461882 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x145c6f8 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x143de4a testing::Test::Run() > @ 0x143e584 testing::TestInfo::Run() > @ 0x143ebca testing::TestCase::Run() > @ 0x1445312 testing::internal::UnitTestImpl::RunAllTests() > @ 0x14624a7 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x145d26e > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x14440ae testing::UnitTest::Run() > @ 0xd15cd4 RUN_ALL_TESTS() > @ 0xd158c1 main > @ 0x7f1bf7808ec5 (unknown) > @ 0x913009 (unknown) > {noformat} > My Vagrantfile generator; > {noformat} > #!/usr/bin/env bash > cat << EOF > Vagrantfile > # -*- mode: ruby -*-" > > # vi: set ft=ruby : > Vagrant.configure(2) do |config| > # Disable shared folder to prevent certain kernel module dependencies. > config.vm.synced_folder ".", "/vagrant", disabled: true > config.vm.box = "bento/ubuntu-14.04" > config.vm.hostname = "${PLATFORM_NAME}" > config.vm.provider "virtualbox" do |vb| > vb.memory = ${VAGRANT_MEM} > vb.cpus = ${VAGRANT_CPUS} > vb.customize ["modifyvm", :id, "--nictype1", "virtio"] > vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"] > vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"] > end > config.vm.provider "vmware_fusion" do |vb| > vb.memory = ${VAGRANT_MEM} > vb.cpus = ${VAGRANT_CPUS} > end > config.vm.provision "file", source: "../test.sh", destination: "~/test.sh" > config.vm.provision "shell", inline: <<-SHELL > sudo apt-get update > sudo apt-get -y install openjdk-7-jdk autoconf libtool > sudo apt-get -y install build-essential python-dev python-boto \ > libcurl4-nss-dev libsasl2-dev maven \ > libapr1-dev libsvn-dev libssl-dev libevent-dev > sudo apt-get -y install git > sudo wget -qO- https://get.docker.com/ | sh > SHELL > end > EOF > {noformat} > The problem is kicking in frequently in my tests - I'ld say > 10% but less > than 50%. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Schlicht reassigned MESOS-4025: --- Assignee: Jan Schlicht > SlaveRecoveryTest/0.GCExecutor is flaky. > > > Key: MESOS-4025 > URL: https://issues.apache.org/jira/browse/MESOS-4025 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.26.0 >Reporter: Till Toenshoff >Assignee: Jan Schlicht > > Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based > on 0.26.0-rc1. > Testsuite was run as root. > {noformat} > sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1 > {noformat} > {noformat} > [ RUN ] SlaveRecoveryTest/0.GCExecutor > I1130 16:49:16.336833 1032 exec.cpp:136] Version: 0.26.0 > I1130 16:49:16.345212 1049 exec.cpp:210] Executor registered on slave > dde9fd4e-b016-4a99-9081-b047e9df9afa-S0 > Registered executor on ubuntu14 > Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114 > sh -c 'sleep 1000' > Forked command at 1057 > ../../src/tests/mesos.cpp:779: Failure > (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup > '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave': > Device or resource busy > *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are > using GNU date *** > PC: @ 0x1443e9a testing::UnitTest::AddTestPartResult() > *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; > stack trace: *** > @ 0x7f1be92b80b7 os::Linux::chained_handler() > @ 0x7f1be92bc219 JVM_handle_linux_signal > @ 0x7f1bf7bbc340 (unknown) > @ 0x1443e9a testing::UnitTest::AddTestPartResult() > @ 0x1438b99 testing::internal::AssertHelper::operator=() > @ 0xf0b3bb > mesos::internal::tests::ContainerizerTest<>::TearDown() > @ 0x1461882 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x145c6f8 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x143de4a testing::Test::Run() > @ 0x143e584 testing::TestInfo::Run() > @ 0x143ebca testing::TestCase::Run() > @ 0x1445312 testing::internal::UnitTestImpl::RunAllTests() > @ 0x14624a7 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x145d26e > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x14440ae testing::UnitTest::Run() > @ 0xd15cd4 RUN_ALL_TESTS() > @ 0xd158c1 main > @ 0x7f1bf7808ec5 (unknown) > @ 0x913009 (unknown) > {noformat} > My Vagrantfile generator; > {noformat} > #!/usr/bin/env bash > cat << EOF > Vagrantfile > # -*- mode: ruby -*-" > > # vi: set ft=ruby : > Vagrant.configure(2) do |config| > # Disable shared folder to prevent certain kernel module dependencies. > config.vm.synced_folder ".", "/vagrant", disabled: true > config.vm.box = "bento/ubuntu-14.04" > config.vm.hostname = "${PLATFORM_NAME}" > config.vm.provider "virtualbox" do |vb| > vb.memory = ${VAGRANT_MEM} > vb.cpus = ${VAGRANT_CPUS} > vb.customize ["modifyvm", :id, "--nictype1", "virtio"] > vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"] > vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"] > end > config.vm.provider "vmware_fusion" do |vb| > vb.memory = ${VAGRANT_MEM} > vb.cpus = ${VAGRANT_CPUS} > end > config.vm.provision "file", source: "../test.sh", destination: "~/test.sh" > config.vm.provision "shell", inline: <<-SHELL > sudo apt-get update > sudo apt-get -y install openjdk-7-jdk autoconf libtool > sudo apt-get -y install build-essential python-dev python-boto \ > libcurl4-nss-dev libsasl2-dev maven \ > libapr1-dev libsvn-dev libssl-dev libevent-dev > sudo apt-get -y install git > sudo wget -qO- https://get.docker.com/ | sh > SHELL > end > EOF > {noformat} > The problem is kicking in frequently in my tests - I'ld say > 10% but less > than 50%. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4030) DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping is flaky
[ https://issues.apache.org/jira/browse/MESOS-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Bannier reassigned MESOS-4030: --- Assignee: Benjamin Bannier > DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping is flaky > --- > > Key: MESOS-4030 > URL: https://issues.apache.org/jira/browse/MESOS-4030 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 0.26.0 > Environment: [Ubuntu > 14|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh], > 0.26.0 RC (wip) enable-ssl & enable-libevent, root test-run >Reporter: Till Toenshoff >Assignee: Benjamin Bannier > Labels: flaky, flaky-test > > {noformat} > [ RUN ] DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping > I1201 02:18:00.325283 18931 leveldb.cpp:176] Opened db in 3.877576ms > I1201 02:18:00.326195 18931 leveldb.cpp:183] Compacted db in 831923ns > I1201 02:18:00.326288 18931 leveldb.cpp:198] Created db iterator in 21460ns > I1201 02:18:00.326305 18931 leveldb.cpp:204] Seeked to beginning of db in > 1431ns > I1201 02:18:00.326316 18931 leveldb.cpp:273] Iterated through 0 keys in the > db in 178ns > I1201 02:18:00.326354 18931 replica.cpp:780] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1201 02:18:00.327128 18952 recover.cpp:449] Starting replica recovery > I1201 02:18:00.327481 18948 recover.cpp:475] Replica is in EMPTY status > I1201 02:18:00.328354 18945 replica.cpp:676] Replica in EMPTY status received > a broadcasted recover request from (88123)@127.0.1.1:45788 > I1201 02:18:00.328660 18950 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I1201 02:18:00.329139 18951 recover.cpp:566] Updating replica status to > STARTING > I1201 02:18:00.330413 18949 master.cpp:367] Master > 9577131b-f0b1-47bd-8f88-f5edbf2f026d (ubuntu14) started on 127.0.1.1:45788 > I1201 02:18:00.330474 18949 master.cpp:369] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/dHFLJX/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/dHFLJX/master" > --zk_session_timeout="10secs" > I1201 02:18:00.330662 18949 master.cpp:414] Master only allowing > authenticated frameworks to register > I1201 02:18:00.330670 18949 master.cpp:419] Master only allowing > authenticated slaves to register > I1201 02:18:00.330682 18949 credentials.hpp:37] Loading credentials for > authentication from '/tmp/dHFLJX/credentials' > I1201 02:18:00.330950 18945 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 1.585892ms > I1201 02:18:00.331248 18945 replica.cpp:323] Persisted replica status to > STARTING > I1201 02:18:00.330968 18949 master.cpp:458] Using default 'crammd5' > authenticator > I1201 02:18:00.331681 18949 master.cpp:495] Authorization enabled > I1201 02:18:00.331717 18945 recover.cpp:475] Replica is in STARTING status > I1201 02:18:00.332875 18947 replica.cpp:676] Replica in STARTING status > received a broadcasted recover request from (88124)@127.0.1.1:45788 > I1201 02:18:00.44 18947 recover.cpp:195] Received a recover response from > a replica in STARTING status > I1201 02:18:00.333760 18950 recover.cpp:566] Updating replica status to VOTING > I1201 02:18:00.333875 18945 master.cpp:1606] The newly elected leader is > master@127.0.1.1:45788 with id 9577131b-f0b1-47bd-8f88-f5edbf2f026d > I1201 02:18:00.334624 18951 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 307292ns > I1201 02:18:00.334652 18951 replica.cpp:323] Persisted replica status to > VOTING > I1201 02:18:00.334656 18945 master.cpp:1619] Elected as the leading master! > I1201 02:18:00.334758 18951 recover.cpp:580] Successfully joined the Paxos > group > I1201 02:18:00.334933 18945 master.cpp:1379] Recovering from registrar > I1201 02:18:00.335108 18951 recover.cpp:464] Recover process terminated > I1201 02:18:00.335183 18951 registrar.cpp:309] Recovering registrar > I1201 02:18:00.335577 18950 log.cpp:661] Attempting to start the writer > I1201 02:18:00.336777 18952 replica.cpp:496] Replica received implicit > promise reques
[jira] [Commented] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033479#comment-15033479 ] Jan Schlicht commented on MESOS-4025: - {{sudo ./bin/mesos-tests.sh --gtest_repeat=1 --gtest_break_on_failure --gtest_filter="*ROOT_DOCKER_DockerHealthStatusChange:SlaveRecoveryTest*GCExecutor"}} also triggers the failure. Seems that there's some problem during clean-up of the HealthCheckTest fixture. > SlaveRecoveryTest/0.GCExecutor is flaky. > > > Key: MESOS-4025 > URL: https://issues.apache.org/jira/browse/MESOS-4025 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.26.0 >Reporter: Till Toenshoff > > Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based > on 0.26.0-rc1. > Testsuite was run as root. > {noformat} > sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1 > {noformat} > {noformat} > [ RUN ] SlaveRecoveryTest/0.GCExecutor > I1130 16:49:16.336833 1032 exec.cpp:136] Version: 0.26.0 > I1130 16:49:16.345212 1049 exec.cpp:210] Executor registered on slave > dde9fd4e-b016-4a99-9081-b047e9df9afa-S0 > Registered executor on ubuntu14 > Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114 > sh -c 'sleep 1000' > Forked command at 1057 > ../../src/tests/mesos.cpp:779: Failure > (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup > '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave': > Device or resource busy > *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are > using GNU date *** > PC: @ 0x1443e9a testing::UnitTest::AddTestPartResult() > *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; > stack trace: *** > @ 0x7f1be92b80b7 os::Linux::chained_handler() > @ 0x7f1be92bc219 JVM_handle_linux_signal > @ 0x7f1bf7bbc340 (unknown) > @ 0x1443e9a testing::UnitTest::AddTestPartResult() > @ 0x1438b99 testing::internal::AssertHelper::operator=() > @ 0xf0b3bb > mesos::internal::tests::ContainerizerTest<>::TearDown() > @ 0x1461882 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x145c6f8 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x143de4a testing::Test::Run() > @ 0x143e584 testing::TestInfo::Run() > @ 0x143ebca testing::TestCase::Run() > @ 0x1445312 testing::internal::UnitTestImpl::RunAllTests() > @ 0x14624a7 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x145d26e > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x14440ae testing::UnitTest::Run() > @ 0xd15cd4 RUN_ALL_TESTS() > @ 0xd158c1 main > @ 0x7f1bf7808ec5 (unknown) > @ 0x913009 (unknown) > {noformat} > My Vagrantfile generator; > {noformat} > #!/usr/bin/env bash > cat << EOF > Vagrantfile > # -*- mode: ruby -*-" > > # vi: set ft=ruby : > Vagrant.configure(2) do |config| > # Disable shared folder to prevent certain kernel module dependencies. > config.vm.synced_folder ".", "/vagrant", disabled: true > config.vm.box = "bento/ubuntu-14.04" > config.vm.hostname = "${PLATFORM_NAME}" > config.vm.provider "virtualbox" do |vb| > vb.memory = ${VAGRANT_MEM} > vb.cpus = ${VAGRANT_CPUS} > vb.customize ["modifyvm", :id, "--nictype1", "virtio"] > vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"] > vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"] > end > config.vm.provider "vmware_fusion" do |vb| > vb.memory = ${VAGRANT_MEM} > vb.cpus = ${VAGRANT_CPUS} > end > config.vm.provision "file", source: "../test.sh", destination: "~/test.sh" > config.vm.provision "shell", inline: <<-SHELL > sudo apt-get update > sudo apt-get -y install openjdk-7-jdk autoconf libtool > sudo apt-get -y install build-essential python-dev python-boto \ > libcurl4-nss-dev libsasl2-dev maven \ > libapr1-dev libsvn-dev libssl-dev libevent-dev > sudo apt-get -y install git > sudo wget -qO- https://get.docker.com/ | sh > SHELL > end > EOF > {noformat} > The problem is kicking in frequently in my tests - I'ld say > 10% but less > than 50%. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2857) FetcherCacheTest.LocalCachedExtract is flaky.
[ https://issues.apache.org/jira/browse/MESOS-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033399#comment-15033399 ] Benjamin Bannier commented on MESOS-2857: - Comparing with the original log in this report, this appears to be a different issue. >From the log it appears as if everything happened as expected, only that the >test ran into our default timeout when waiting for a status update; without >verbose libprocess logs I am tempted to attribute this issue to very high >system load. > FetcherCacheTest.LocalCachedExtract is flaky. > - > > Key: MESOS-2857 > URL: https://issues.apache.org/jira/browse/MESOS-2857 > Project: Mesos > Issue Type: Bug > Components: fetcher, test >Reporter: Benjamin Mahler >Assignee: Benjamin Bannier > Labels: flaky-test, mesosphere > > From jenkins: > {noformat} > [ RUN ] FetcherCacheTest.LocalCachedExtract > Using temporary directory '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj' > I0610 20:04:48.591573 24561 leveldb.cpp:176] Opened db in 3.512525ms > I0610 20:04:48.592456 24561 leveldb.cpp:183] Compacted db in 828630ns > I0610 20:04:48.592512 24561 leveldb.cpp:198] Created db iterator in 32992ns > I0610 20:04:48.592531 24561 leveldb.cpp:204] Seeked to beginning of db in > 8967ns > I0610 20:04:48.592545 24561 leveldb.cpp:273] Iterated through 0 keys in the > db in 7762ns > I0610 20:04:48.592604 24561 replica.cpp:744] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0610 20:04:48.593438 24587 recover.cpp:449] Starting replica recovery > I0610 20:04:48.593698 24587 recover.cpp:475] Replica is in EMPTY status > I0610 20:04:48.595641 24580 replica.cpp:641] Replica in EMPTY status received > a broadcasted recover request > I0610 20:04:48.596086 24590 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I0610 20:04:48.596607 24590 recover.cpp:566] Updating replica status to > STARTING > I0610 20:04:48.597507 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 717888ns > I0610 20:04:48.597535 24590 replica.cpp:323] Persisted replica status to > STARTING > I0610 20:04:48.597697 24590 recover.cpp:475] Replica is in STARTING status > I0610 20:04:48.599165 24584 replica.cpp:641] Replica in STARTING status > received a broadcasted recover request > I0610 20:04:48.599434 24584 recover.cpp:195] Received a recover response from > a replica in STARTING status > I0610 20:04:48.599915 24590 recover.cpp:566] Updating replica status to VOTING > I0610 20:04:48.600545 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 432335ns > I0610 20:04:48.600574 24590 replica.cpp:323] Persisted replica status to > VOTING > I0610 20:04:48.600659 24590 recover.cpp:580] Successfully joined the Paxos > group > I0610 20:04:48.600797 24590 recover.cpp:464] Recover process terminated > I0610 20:04:48.602905 24594 master.cpp:363] Master > 20150610-200448-3875541420-32907-24561 (dbade881e927) started on > 172.17.0.231:32907 > I0610 20:04:48.602957 24594 master.cpp:365] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" > --credentials="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials" > --framework_sorter="drf" --help="false" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_reregister_timeout="10mins" > --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" > --work_dir="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/master" > --zk_session_timeout="10secs" > I0610 20:04:48.603374 24594 master.cpp:410] Master only allowing > authenticated frameworks to register > I0610 20:04:48.603392 24594 master.cpp:415] Master only allowing > authenticated slaves to register > I0610 20:04:48.603404 24594 credentials.hpp:37] Loading credentials for > authentication from > '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials' > I0610 20:04:48.603751 24594 master.cpp:454] Using default 'crammd5' > authenticator > I0610 20:04:48.604928 24594 master.cpp:491] Authorization enabled > I0610 20:04:48.606034 24593 hierarchical.hpp:309] Initialized hierarchical > allocator process > I0610 20:04:48.606106 24593 whitelist_watcher.cpp:79] No whitelist given > I0610 20:04:48.607430 24594 master.cpp:1476] The newly elected leader is > master@172.17.0.231:32907 with id 20150610-200448-3875541420-32907-24561 > I0610 20:04:48.60
[jira] [Commented] (MESOS-2857) FetcherCacheTest.LocalCachedExtract is flaky.
[ https://issues.apache.org/jira/browse/MESOS-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033398#comment-15033398 ] Benjamin Bannier commented on MESOS-2857: - Comparing with the original log in this report, this appears to be a different issue. >From the log it appears as if everything happened as expected, only that the >test ran into our default timeout when waiting for a status update; without >verbose libprocess logs I am tempted to attribute this issue to very high >system load. > FetcherCacheTest.LocalCachedExtract is flaky. > - > > Key: MESOS-2857 > URL: https://issues.apache.org/jira/browse/MESOS-2857 > Project: Mesos > Issue Type: Bug > Components: fetcher, test >Reporter: Benjamin Mahler >Assignee: Benjamin Bannier > Labels: flaky-test, mesosphere > > From jenkins: > {noformat} > [ RUN ] FetcherCacheTest.LocalCachedExtract > Using temporary directory '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj' > I0610 20:04:48.591573 24561 leveldb.cpp:176] Opened db in 3.512525ms > I0610 20:04:48.592456 24561 leveldb.cpp:183] Compacted db in 828630ns > I0610 20:04:48.592512 24561 leveldb.cpp:198] Created db iterator in 32992ns > I0610 20:04:48.592531 24561 leveldb.cpp:204] Seeked to beginning of db in > 8967ns > I0610 20:04:48.592545 24561 leveldb.cpp:273] Iterated through 0 keys in the > db in 7762ns > I0610 20:04:48.592604 24561 replica.cpp:744] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0610 20:04:48.593438 24587 recover.cpp:449] Starting replica recovery > I0610 20:04:48.593698 24587 recover.cpp:475] Replica is in EMPTY status > I0610 20:04:48.595641 24580 replica.cpp:641] Replica in EMPTY status received > a broadcasted recover request > I0610 20:04:48.596086 24590 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I0610 20:04:48.596607 24590 recover.cpp:566] Updating replica status to > STARTING > I0610 20:04:48.597507 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 717888ns > I0610 20:04:48.597535 24590 replica.cpp:323] Persisted replica status to > STARTING > I0610 20:04:48.597697 24590 recover.cpp:475] Replica is in STARTING status > I0610 20:04:48.599165 24584 replica.cpp:641] Replica in STARTING status > received a broadcasted recover request > I0610 20:04:48.599434 24584 recover.cpp:195] Received a recover response from > a replica in STARTING status > I0610 20:04:48.599915 24590 recover.cpp:566] Updating replica status to VOTING > I0610 20:04:48.600545 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 432335ns > I0610 20:04:48.600574 24590 replica.cpp:323] Persisted replica status to > VOTING > I0610 20:04:48.600659 24590 recover.cpp:580] Successfully joined the Paxos > group > I0610 20:04:48.600797 24590 recover.cpp:464] Recover process terminated > I0610 20:04:48.602905 24594 master.cpp:363] Master > 20150610-200448-3875541420-32907-24561 (dbade881e927) started on > 172.17.0.231:32907 > I0610 20:04:48.602957 24594 master.cpp:365] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" > --credentials="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials" > --framework_sorter="drf" --help="false" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_reregister_timeout="10mins" > --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" > --work_dir="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/master" > --zk_session_timeout="10secs" > I0610 20:04:48.603374 24594 master.cpp:410] Master only allowing > authenticated frameworks to register > I0610 20:04:48.603392 24594 master.cpp:415] Master only allowing > authenticated slaves to register > I0610 20:04:48.603404 24594 credentials.hpp:37] Loading credentials for > authentication from > '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials' > I0610 20:04:48.603751 24594 master.cpp:454] Using default 'crammd5' > authenticator > I0610 20:04:48.604928 24594 master.cpp:491] Authorization enabled > I0610 20:04:48.606034 24593 hierarchical.hpp:309] Initialized hierarchical > allocator process > I0610 20:04:48.606106 24593 whitelist_watcher.cpp:79] No whitelist given > I0610 20:04:48.607430 24594 master.cpp:1476] The newly elected leader is > master@172.17.0.231:32907 with id 20150610-200448-3875541420-32907-24561 > I0610 20:04:48.60
[jira] [Commented] (MESOS-2857) FetcherCacheTest.LocalCachedExtract is flaky.
[ https://issues.apache.org/jira/browse/MESOS-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033400#comment-15033400 ] Benjamin Bannier commented on MESOS-2857: - Comparing with the original log in this report, this appears to be a different issue. >From the log it appears as if everything happened as expected, only that the >test ran into our default timeout when waiting for a status update; without >verbose libprocess logs I am tempted to attribute this issue to very high >system load. > FetcherCacheTest.LocalCachedExtract is flaky. > - > > Key: MESOS-2857 > URL: https://issues.apache.org/jira/browse/MESOS-2857 > Project: Mesos > Issue Type: Bug > Components: fetcher, test >Reporter: Benjamin Mahler >Assignee: Benjamin Bannier > Labels: flaky-test, mesosphere > > From jenkins: > {noformat} > [ RUN ] FetcherCacheTest.LocalCachedExtract > Using temporary directory '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj' > I0610 20:04:48.591573 24561 leveldb.cpp:176] Opened db in 3.512525ms > I0610 20:04:48.592456 24561 leveldb.cpp:183] Compacted db in 828630ns > I0610 20:04:48.592512 24561 leveldb.cpp:198] Created db iterator in 32992ns > I0610 20:04:48.592531 24561 leveldb.cpp:204] Seeked to beginning of db in > 8967ns > I0610 20:04:48.592545 24561 leveldb.cpp:273] Iterated through 0 keys in the > db in 7762ns > I0610 20:04:48.592604 24561 replica.cpp:744] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0610 20:04:48.593438 24587 recover.cpp:449] Starting replica recovery > I0610 20:04:48.593698 24587 recover.cpp:475] Replica is in EMPTY status > I0610 20:04:48.595641 24580 replica.cpp:641] Replica in EMPTY status received > a broadcasted recover request > I0610 20:04:48.596086 24590 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I0610 20:04:48.596607 24590 recover.cpp:566] Updating replica status to > STARTING > I0610 20:04:48.597507 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 717888ns > I0610 20:04:48.597535 24590 replica.cpp:323] Persisted replica status to > STARTING > I0610 20:04:48.597697 24590 recover.cpp:475] Replica is in STARTING status > I0610 20:04:48.599165 24584 replica.cpp:641] Replica in STARTING status > received a broadcasted recover request > I0610 20:04:48.599434 24584 recover.cpp:195] Received a recover response from > a replica in STARTING status > I0610 20:04:48.599915 24590 recover.cpp:566] Updating replica status to VOTING > I0610 20:04:48.600545 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 432335ns > I0610 20:04:48.600574 24590 replica.cpp:323] Persisted replica status to > VOTING > I0610 20:04:48.600659 24590 recover.cpp:580] Successfully joined the Paxos > group > I0610 20:04:48.600797 24590 recover.cpp:464] Recover process terminated > I0610 20:04:48.602905 24594 master.cpp:363] Master > 20150610-200448-3875541420-32907-24561 (dbade881e927) started on > 172.17.0.231:32907 > I0610 20:04:48.602957 24594 master.cpp:365] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" > --credentials="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials" > --framework_sorter="drf" --help="false" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_reregister_timeout="10mins" > --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" > --work_dir="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/master" > --zk_session_timeout="10secs" > I0610 20:04:48.603374 24594 master.cpp:410] Master only allowing > authenticated frameworks to register > I0610 20:04:48.603392 24594 master.cpp:415] Master only allowing > authenticated slaves to register > I0610 20:04:48.603404 24594 credentials.hpp:37] Loading credentials for > authentication from > '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials' > I0610 20:04:48.603751 24594 master.cpp:454] Using default 'crammd5' > authenticator > I0610 20:04:48.604928 24594 master.cpp:491] Authorization enabled > I0610 20:04:48.606034 24593 hierarchical.hpp:309] Initialized hierarchical > allocator process > I0610 20:04:48.606106 24593 whitelist_watcher.cpp:79] No whitelist given > I0610 20:04:48.607430 24594 master.cpp:1476] The newly elected leader is > master@172.17.0.231:32907 with id 20150610-200448-3875541420-32907-24561 > I0610 20:04:48.60
[jira] [Comment Edited] (MESOS-3548) Investigate federations of Mesos masters
[ https://issues.apache.org/jira/browse/MESOS-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15031948#comment-15031948 ] Elouan Keryell-Even edited comment on MESOS-3548 at 12/1/15 8:14 AM: - My team is also interested in multi-cluster management with Mesos. We have set up a test architecture consisting of 2 separated clusters, with one mesos master managing both of them. The use case we are interested in is to have both clusters collaborating, each one being able to borrow a few slaves from the other, when facing a load peak (this is indeed "bursting"). I think this would imply that each cluster is managed by its own Mesos master. One of the solution we thought about for the resource borrowing was to have the two masters communicating together to temporarily lend available resources. Elouan KERYELL-EVEN Software engineer @ Atos Integration Toulouse, France was (Author: winstonsurechill): My team is also interested in multi-cluster management with Mesos. For now we have set up a test architecture consisting of 2 separated clusters, with one mesos master managing both of them. The use case we are interested in is to have multiple clusters collaborating, each one being able to borrow a few slaves from another, when facing an load peak (this is indeed "bursting"). I think that would imply that each cluster is managed by one Mesos master, and that the various masters could communicate in some way or another for the resource lending/borrowing. Elouan KERYELL-EVEN Software engineer @ Atos Integration Toulouse, France > Investigate federations of Mesos masters > > > Key: MESOS-3548 > URL: https://issues.apache.org/jira/browse/MESOS-3548 > Project: Mesos > Issue Type: Improvement >Reporter: Neil Conway > Labels: federation, mesosphere, multi-dc > > In a large Mesos installation, the operator might want to ensure that even if > the Mesos masters are inaccessible or failed, new tasks can still be > scheduled (across multiple different frameworks). HA masters are only a > partial solution here: the masters might still be inaccessible due to a > correlated failure (e.g., Zookeeper misconfiguration/human error). > To support this, we could support the notion of "hierarchies" or > "federations" of Mesos masters. In a Mesos installation with 10k machines, > the operator might configure 10 Mesos masters (each of which might be HA) to > manage 1k machines each. Then an additional "meta-Master" would manage the > allocation of cluster resources to the 10 masters. Hence, the failure of any > individual master would impact 1k machines at most. The meta-master might not > have a lot of work to do: e.g., it might be limited to occasionally > reallocating cluster resources among the 10 masters, or ensuring that newly > added cluster resources are allocated among the masters as appropriate. > Hence, the failure of the meta-master would not prevent any of the individual > masters from scheduling new tasks. A single framework instance probably > wouldn't be able to use more resources than have been assigned to a single > Master, but that seems like a reasonable restriction. > This feature might also be a good fit for a multi-datacenter deployment of > Mesos: each Mesos master instance would manage a single DC. Naturally, > reducing the traffic between frameworks and the meta-master would be > important for performance reasons in a configuration like this. > Operationally, this might be simpler if Mesos processes were self-hosting > ([MESOS-3547]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)