date:20151201

[jira] [Updated] (MESOS-4038) SlaveRecoveryTests, UserCgroupIsolatorTests fail on CentOS 6.6

2015-12-01 Thread Greg Mann (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-4038:
-
Summary: SlaveRecoveryTests, UserCgroupIsolatorTests fail on CentOS 6.6  
(was: SlaveRecoveryTests fail on CentOS 6.6)

> SlaveRecoveryTests, UserCgroupIsolatorTests fail on CentOS 6.6
> --
>
> Key: MESOS-4038
> URL: https://issues.apache.org/jira/browse/MESOS-4038
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 6.6
>Reporter: Greg Mann
>  Labels: mesosphere, test-failure
>
> All {{SlaveRecoveryTest.\*}} tests, 
> {{MesosContainerizerSlaveRecoveryTest.\*}} tests, and 
> {{UserCgroupIsolatorTest*}} tests fail on CentOS 6.6 with {{TypeParam = 
> mesos::internal::slave::MesosContainerizer}}. They all fail with the same 
> error:
> {code}
> [--] 1 test from SlaveRecoveryTest/0, where TypeParam = 
> mesos::internal::slave::MesosContainerizer
> [ RUN  ] SlaveRecoveryTest/0.ReconnectExecutor
> ../../src/tests/mesos.cpp:722: Failure
> cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in 
> the file system
> -
> We cannot run any cgroups tests that require
> a hierarchy with subsystem 'perf_event'
> because we failed to find an existing hierarchy
> or create a new one (tried '/cgroup/perf_event').
> You can either remove all existing
> hierarchies, or disable this test case
> (i.e., --gtest_filter=-SlaveRecoveryTest/0.*).
> -
> ../../src/tests/mesos.cpp:776: Failure
> cgroups: '/cgroup/perf_event' is not a valid hierarchy
> [  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
> mesos::internal::slave::MesosContainerizer (8 ms)
> [--] 1 test from SlaveRecoveryTest/0 (9 ms total)
> [--] Global test environment tear-down
> [==] 1 test from 1 test case ran. (15 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
> mesos::internal::slave::MesosContainerizer
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4038) SlaveRecoveryTests fail on CentOS 6.6

2015-12-01 Thread Greg Mann (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-4038:
-
Description: 
All {{SlaveRecoveryTest.\*}} tests, {{MesosContainerizerSlaveRecoveryTest.\*}} 
tests, and {{UserCgroupIsolatorTest*}} tests fail on CentOS 6.6 with 
{{TypeParam = mesos::internal::slave::MesosContainerizer}}. They all fail with 
the same error:

{code}
[--] 1 test from SlaveRecoveryTest/0, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[ RUN  ] SlaveRecoveryTest/0.ReconnectExecutor
../../src/tests/mesos.cpp:722: Failure
cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in 
the file system
-
We cannot run any cgroups tests that require
a hierarchy with subsystem 'perf_event'
because we failed to find an existing hierarchy
or create a new one (tried '/cgroup/perf_event').
You can either remove all existing
hierarchies, or disable this test case
(i.e., --gtest_filter=-SlaveRecoveryTest/0.*).
-
../../src/tests/mesos.cpp:776: Failure
cgroups: '/cgroup/perf_event' is not a valid hierarchy
[  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer (8 ms)
[--] 1 test from SlaveRecoveryTest/0 (9 ms total)

[--] Global test environment tear-down
[==] 1 test from 1 test case ran. (15 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer
{code}

  was:
All {{SlaveRecoveryTest.\*}} tests and 
{{MesosContainerizerSlaveRecoveryTest.\*}} tests fail on CentOS 6.6 with 
{{TypeParam = mesos::internal::slave::MesosContainerizer}}. They all fail with 
the same error:

{code}
[--] 1 test from SlaveRecoveryTest/0, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[ RUN  ] SlaveRecoveryTest/0.ReconnectExecutor
../../src/tests/mesos.cpp:722: Failure
cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in 
the file system
-
We cannot run any cgroups tests that require
a hierarchy with subsystem 'perf_event'
because we failed to find an existing hierarchy
or create a new one (tried '/cgroup/perf_event').
You can either remove all existing
hierarchies, or disable this test case
(i.e., --gtest_filter=-SlaveRecoveryTest/0.*).
-
../../src/tests/mesos.cpp:776: Failure
cgroups: '/cgroup/perf_event' is not a valid hierarchy
[  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer (8 ms)
[--] 1 test from SlaveRecoveryTest/0 (9 ms total)

[--] Global test environment tear-down
[==] 1 test from 1 test case ran. (15 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer
{code}


> SlaveRecoveryTests fail on CentOS 6.6
> -
>
> Key: MESOS-4038
> URL: https://issues.apache.org/jira/browse/MESOS-4038
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 6.6
>Reporter: Greg Mann
>  Labels: mesosphere, test-failure
>
> All {{SlaveRecoveryTest.\*}} tests, 
> {{MesosContainerizerSlaveRecoveryTest.\*}} tests, and 
> {{UserCgroupIsolatorTest*}} tests fail on CentOS 6.6 with {{TypeParam = 
> mesos::internal::slave::MesosContainerizer}}. They all fail with the same 
> error:
> {code}
> [--] 1 test from SlaveRecoveryTest/0, where TypeParam = 
> mesos::internal::slave::MesosContainerizer
> [ RUN  ] SlaveRecoveryTest/0.ReconnectExecutor
> ../../src/tests/mesos.cpp:722: Failure
> cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in 
> the file system
> -
> We cannot run any cgroups tests that require
> a hierarchy with subsystem 'perf_event'
> because we failed to find an existing hierarchy
> or create a new one (tried '/cgroup/perf_event').
> You can either remove all existing
> hierarchies, or disable this test case
> (i.e., --gtest_filter=-SlaveRecoveryTest/0.*).
> -
> ../../src/tests/mesos.cpp:776: Failure
> cgroups: '/cgroup/perf_event' is not a valid hierarchy
> [  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
> mesos::internal::slave::MesosContainerizer (8 ms)
> [--] 1 test from SlaveRecoveryTest/0 (9 ms total)
> [--] Global test environment tear-down
> [==] 1 test from 1 test case ran. (15 ms t

[jira] [Created] (MESOS-4039) PerfEventIsolatorTest.ROOT_CGROUPS_Sample fails

2015-12-01 Thread Greg Mann (JIRA)

Greg Mann created MESOS-4039:


 Summary: PerfEventIsolatorTest.ROOT_CGROUPS_Sample fails
 Key: MESOS-4039
 URL: https://issues.apache.org/jira/browse/MESOS-4039
 Project: Mesos
  Issue Type: Bug
Reporter: Greg Mann


PerfEventIsolatorTest.ROOT_CGROUPS_Sample fails on CentOS 6.6:

{code}
[--] 1 test from PerfEventIsolatorTest
[ RUN  ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample
../../src/tests/containerizer/isolator_tests.cpp:848: Failure
isolator: Perf is not supported
[  FAILED  ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample (79 ms)
[--] 1 test from PerfEventIsolatorTest (79 ms total)

[--] Global test environment tear-down
[==] 1 test from 1 test case ran. (86 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4038) SlaveRecoveryTests fail on CentOS 6.6

2015-12-01 Thread Greg Mann (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-4038:
-
Description: 
All {{SlaveRecoveryTest.*}} tests and {{MesosContainerizerSlaveRecoveryTest.*}} 
tests fail on CentOS 6.6 with {{TypeParam = 
mesos::internal::slave::MesosContainerizer}}. They all fail with the same error:

{code}
[--] 1 test from SlaveRecoveryTest/0, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[ RUN  ] SlaveRecoveryTest/0.ReconnectExecutor
../../src/tests/mesos.cpp:722: Failure
cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in 
the file system
-
We cannot run any cgroups tests that require
a hierarchy with subsystem 'perf_event'
because we failed to find an existing hierarchy
or create a new one (tried '/cgroup/perf_event').
You can either remove all existing
hierarchies, or disable this test case
(i.e., --gtest_filter=-SlaveRecoveryTest/0.*).
-
../../src/tests/mesos.cpp:776: Failure
cgroups: '/cgroup/perf_event' is not a valid hierarchy
[  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer (8 ms)
[--] 1 test from SlaveRecoveryTest/0 (9 ms total)

[--] Global test environment tear-down
[==] 1 test from 1 test case ran. (15 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer
{code}

  was:
All {{SlaveRecoveryTest.*}} tests fail on CentOS 6.6 with {{TypeParam = 
mesos::internal::slave::MesosContainerizer}}. They all fail with the same error:

{code}
[--] 1 test from SlaveRecoveryTest/0, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[ RUN  ] SlaveRecoveryTest/0.ReconnectExecutor
../../src/tests/mesos.cpp:722: Failure
cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in 
the file system
-
We cannot run any cgroups tests that require
a hierarchy with subsystem 'perf_event'
because we failed to find an existing hierarchy
or create a new one (tried '/cgroup/perf_event').
You can either remove all existing
hierarchies, or disable this test case
(i.e., --gtest_filter=-SlaveRecoveryTest/0.*).
-
../../src/tests/mesos.cpp:776: Failure
cgroups: '/cgroup/perf_event' is not a valid hierarchy
[  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer (8 ms)
[--] 1 test from SlaveRecoveryTest/0 (9 ms total)

[--] Global test environment tear-down
[==] 1 test from 1 test case ran. (15 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer
{code}


> SlaveRecoveryTests fail on CentOS 6.6
> -
>
> Key: MESOS-4038
> URL: https://issues.apache.org/jira/browse/MESOS-4038
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 6.6
>Reporter: Greg Mann
>  Labels: mesosphere, test-failure
>
> All {{SlaveRecoveryTest.*}} tests and 
> {{MesosContainerizerSlaveRecoveryTest.*}} tests fail on CentOS 6.6 with 
> {{TypeParam = mesos::internal::slave::MesosContainerizer}}. They all fail 
> with the same error:
> {code}
> [--] 1 test from SlaveRecoveryTest/0, where TypeParam = 
> mesos::internal::slave::MesosContainerizer
> [ RUN  ] SlaveRecoveryTest/0.ReconnectExecutor
> ../../src/tests/mesos.cpp:722: Failure
> cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in 
> the file system
> -
> We cannot run any cgroups tests that require
> a hierarchy with subsystem 'perf_event'
> because we failed to find an existing hierarchy
> or create a new one (tried '/cgroup/perf_event').
> You can either remove all existing
> hierarchies, or disable this test case
> (i.e., --gtest_filter=-SlaveRecoveryTest/0.*).
> -
> ../../src/tests/mesos.cpp:776: Failure
> cgroups: '/cgroup/perf_event' is not a valid hierarchy
> [  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
> mesos::internal::slave::MesosContainerizer (8 ms)
> [--] 1 test from SlaveRecoveryTest/0 (9 ms total)
> [--] Global test environment tear-down
> [==] 1 test from 1 test case ran. (15 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam

[jira] [Updated] (MESOS-4038) SlaveRecoveryTests fail on CentOS 6.6

2015-12-01 Thread Greg Mann (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-4038:
-
Description: 
All {{SlaveRecoveryTest.\*}} tests and 
{{MesosContainerizerSlaveRecoveryTest.\*}} tests fail on CentOS 6.6 with 
{{TypeParam = mesos::internal::slave::MesosContainerizer}}. They all fail with 
the same error:

{code}
[--] 1 test from SlaveRecoveryTest/0, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[ RUN  ] SlaveRecoveryTest/0.ReconnectExecutor
../../src/tests/mesos.cpp:722: Failure
cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in 
the file system
-
We cannot run any cgroups tests that require
a hierarchy with subsystem 'perf_event'
because we failed to find an existing hierarchy
or create a new one (tried '/cgroup/perf_event').
You can either remove all existing
hierarchies, or disable this test case
(i.e., --gtest_filter=-SlaveRecoveryTest/0.*).
-
../../src/tests/mesos.cpp:776: Failure
cgroups: '/cgroup/perf_event' is not a valid hierarchy
[  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer (8 ms)
[--] 1 test from SlaveRecoveryTest/0 (9 ms total)

[--] Global test environment tear-down
[==] 1 test from 1 test case ran. (15 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer
{code}

  was:
All {{SlaveRecoveryTest.*}} tests and {{MesosContainerizerSlaveRecoveryTest.*}} 
tests fail on CentOS 6.6 with {{TypeParam = 
mesos::internal::slave::MesosContainerizer}}. They all fail with the same error:

{code}
[--] 1 test from SlaveRecoveryTest/0, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[ RUN  ] SlaveRecoveryTest/0.ReconnectExecutor
../../src/tests/mesos.cpp:722: Failure
cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in 
the file system
-
We cannot run any cgroups tests that require
a hierarchy with subsystem 'perf_event'
because we failed to find an existing hierarchy
or create a new one (tried '/cgroup/perf_event').
You can either remove all existing
hierarchies, or disable this test case
(i.e., --gtest_filter=-SlaveRecoveryTest/0.*).
-
../../src/tests/mesos.cpp:776: Failure
cgroups: '/cgroup/perf_event' is not a valid hierarchy
[  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer (8 ms)
[--] 1 test from SlaveRecoveryTest/0 (9 ms total)

[--] Global test environment tear-down
[==] 1 test from 1 test case ran. (15 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer
{code}


> SlaveRecoveryTests fail on CentOS 6.6
> -
>
> Key: MESOS-4038
> URL: https://issues.apache.org/jira/browse/MESOS-4038
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 6.6
>Reporter: Greg Mann
>  Labels: mesosphere, test-failure
>
> All {{SlaveRecoveryTest.\*}} tests and 
> {{MesosContainerizerSlaveRecoveryTest.\*}} tests fail on CentOS 6.6 with 
> {{TypeParam = mesos::internal::slave::MesosContainerizer}}. They all fail 
> with the same error:
> {code}
> [--] 1 test from SlaveRecoveryTest/0, where TypeParam = 
> mesos::internal::slave::MesosContainerizer
> [ RUN  ] SlaveRecoveryTest/0.ReconnectExecutor
> ../../src/tests/mesos.cpp:722: Failure
> cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in 
> the file system
> -
> We cannot run any cgroups tests that require
> a hierarchy with subsystem 'perf_event'
> because we failed to find an existing hierarchy
> or create a new one (tried '/cgroup/perf_event').
> You can either remove all existing
> hierarchies, or disable this test case
> (i.e., --gtest_filter=-SlaveRecoveryTest/0.*).
> -
> ../../src/tests/mesos.cpp:776: Failure
> cgroups: '/cgroup/perf_event' is not a valid hierarchy
> [  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
> mesos::internal::slave::MesosContainerizer (8 ms)
> [--] 1 test from SlaveRecoveryTest/0 (9 ms total)
> [--] Global test environment tear-down
> [==] 1 test from 1 test case ran. (15 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED

[jira] [Commented] (MESOS-3831) Document operator HTTP endpoints

2015-12-01 Thread Klaus Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035386#comment-15035386
 ] 

Klaus Ma commented on MESOS-3831:
-

+1 to have a single page to list all HTTP endpoints.

> Document operator HTTP endpoints
> 
>
> Key: MESOS-3831
> URL: https://issues.apache.org/jira/browse/MESOS-3831
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Neil Conway
>Priority: Minor
>  Labels: documentation, mesosphere, newbie
>
> These are not exhaustively documented; they probably should be.
> Some endpoints have docs: e.g., {{/reserve}} and {{/unreserve}} are described 
> in the reservation doc page. But it would be good to have a single page that 
> lists all the endpoints and their semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4038) SlaveRecoveryTests fail on CentOS 6.6

2015-12-01 Thread Greg Mann (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-4038:
-
Description: 
All {{SlaveRecoveryTest.*}} tests fail on CentOS 6.6 with {{TypeParam = 
mesos::internal::slave::MesosContainerizer}}. They all fail with the same error:

{code}
[--] 1 test from SlaveRecoveryTest/0, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[ RUN  ] SlaveRecoveryTest/0.ReconnectExecutor
../../src/tests/mesos.cpp:722: Failure
cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in 
the file system
-
We cannot run any cgroups tests that require
a hierarchy with subsystem 'perf_event'
because we failed to find an existing hierarchy
or create a new one (tried '/cgroup/perf_event').
You can either remove all existing
hierarchies, or disable this test case
(i.e., --gtest_filter=-SlaveRecoveryTest/0.*).
-
../../src/tests/mesos.cpp:776: Failure
cgroups: '/cgroup/perf_event' is not a valid hierarchy
[  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer (8 ms)
[--] 1 test from SlaveRecoveryTest/0 (9 ms total)

[--] Global test environment tear-down
[==] 1 test from 1 test case ran. (15 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer
{code}

  was:
All {{SlaveRecoveryTest.*}} tests fail on CentOS 6.6 with {{TypeParam = 
mesos::internal::slave::MesosContainerizer}}:

{code}
[--] 1 test from SlaveRecoveryTest/0, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[ RUN  ] SlaveRecoveryTest/0.ReconnectExecutor
../../src/tests/mesos.cpp:722: Failure
cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in 
the file system
-
We cannot run any cgroups tests that require
a hierarchy with subsystem 'perf_event'
because we failed to find an existing hierarchy
or create a new one (tried '/cgroup/perf_event').
You can either remove all existing
hierarchies, or disable this test case
(i.e., --gtest_filter=-SlaveRecoveryTest/0.*).
-
../../src/tests/mesos.cpp:776: Failure
cgroups: '/cgroup/perf_event' is not a valid hierarchy
[  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer (8 ms)
[--] 1 test from SlaveRecoveryTest/0 (9 ms total)

[--] Global test environment tear-down
[==] 1 test from 1 test case ran. (15 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer
{code}


> SlaveRecoveryTests fail on CentOS 6.6
> -
>
> Key: MESOS-4038
> URL: https://issues.apache.org/jira/browse/MESOS-4038
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 6.6
>Reporter: Greg Mann
>  Labels: mesosphere, test-failure
>
> All {{SlaveRecoveryTest.*}} tests fail on CentOS 6.6 with {{TypeParam = 
> mesos::internal::slave::MesosContainerizer}}. They all fail with the same 
> error:
> {code}
> [--] 1 test from SlaveRecoveryTest/0, where TypeParam = 
> mesos::internal::slave::MesosContainerizer
> [ RUN  ] SlaveRecoveryTest/0.ReconnectExecutor
> ../../src/tests/mesos.cpp:722: Failure
> cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in 
> the file system
> -
> We cannot run any cgroups tests that require
> a hierarchy with subsystem 'perf_event'
> because we failed to find an existing hierarchy
> or create a new one (tried '/cgroup/perf_event').
> You can either remove all existing
> hierarchies, or disable this test case
> (i.e., --gtest_filter=-SlaveRecoveryTest/0.*).
> -
> ../../src/tests/mesos.cpp:776: Failure
> cgroups: '/cgroup/perf_event' is not a valid hierarchy
> [  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
> mesos::internal::slave::MesosContainerizer (8 ms)
> [--] 1 test from SlaveRecoveryTest/0 (9 ms total)
> [--] Global test environment tear-down
> [==] 1 test from 1 test case ran. (15 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
> mesos::internal::slave::MesosContainerizer
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4038) SlaveRecoveryTests fail on CentOS 6.6

2015-12-01 Thread Greg Mann (JIRA)

Greg Mann created MESOS-4038:


 Summary: SlaveRecoveryTests fail on CentOS 6.6
 Key: MESOS-4038
 URL: https://issues.apache.org/jira/browse/MESOS-4038
 Project: Mesos
  Issue Type: Bug
 Environment: CentOS 6.6
Reporter: Greg Mann


All {{SlaveRecoveryTest.*}} tests fail on CentOS 6.6 with {{TypeParam = 
mesos::internal::slave::MesosContainerizer}}:

{code}
[--] 1 test from SlaveRecoveryTest/0, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[ RUN  ] SlaveRecoveryTest/0.ReconnectExecutor
../../src/tests/mesos.cpp:722: Failure
cgroups::mount(hierarchy, subsystem): '/cgroup/perf_event' already exists in 
the file system
-
We cannot run any cgroups tests that require
a hierarchy with subsystem 'perf_event'
because we failed to find an existing hierarchy
or create a new one (tried '/cgroup/perf_event').
You can either remove all existing
hierarchies, or disable this test case
(i.e., --gtest_filter=-SlaveRecoveryTest/0.*).
-
../../src/tests/mesos.cpp:776: Failure
cgroups: '/cgroup/perf_event' is not a valid hierarchy
[  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer (8 ms)
[--] 1 test from SlaveRecoveryTest/0 (9 ms total)

[--] Global test environment tear-down
[==] 1 test from 1 test case ran. (15 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4037) Images are broken at least on http://mesos.apache.org/documentation/latest/architecture/

2015-12-01 Thread Kirill Zaborsky (JIRA)

Kirill Zaborsky created MESOS-4037:
--

 Summary: Images are broken at least on 
http://mesos.apache.org/documentation/latest/architecture/
 Key: MESOS-4037
 URL: https://issues.apache.org/jira/browse/MESOS-4037
 Project: Mesos
  Issue Type: Documentation
  Components: documentation
Reporter: Kirill Zaborsky
Priority: Minor


http://mesos.apache.org/documentation/latest/architecture/ does not show 
pictures correctly, e.g. 
http://mesos.apache.org/documentation/latest/architecture/images/architecture3.jpg
 returns error 404.
While e.g. https://github.com/apache/mesos/blob/master/docs/architecture.md 
works OK



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4036) perf will not run on CentOS 6.6

2015-12-01 Thread Greg Mann (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-4036:
-
Environment: CentOS 6.6
Description: After using the current installation instructions in the 
getting started documentation, {{perf}} will not run on CentOS 6.6 because the 
version of elfutils included in devtoolset-2 is not compatible with the version 
of {{perf}} installed by {{yum}}. Installing and using devtoolset-3, however 
(http://linux.web.cern.ch/linux/scientific6/docs/softwarecollections.shtml) 
fixes this issue. This could be resolved by updating the getting started 
documentation to recommend installing devtoolset-3.  (was: After using the 
current installation instructions in the getting started documentation, 
{{perf}} will not run because the version of elfutils included in devtoolset-2 
is not compatible with the version of {{perf}} installed by {{yum}}. Installing 
and using devtoolset-3, however 
(http://linux.web.cern.ch/linux/scientific6/docs/softwarecollections.shtml) 
fixes this issue. This could be resolved by updating the getting started 
documentation to recommend installing devtoolset-3.)

> perf will not run on CentOS 6.6
> ---
>
> Key: MESOS-4036
> URL: https://issues.apache.org/jira/browse/MESOS-4036
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 6.6
>Reporter: Greg Mann
>  Labels: mesosphere
>
> After using the current installation instructions in the getting started 
> documentation, {{perf}} will not run on CentOS 6.6 because the version of 
> elfutils included in devtoolset-2 is not compatible with the version of 
> {{perf}} installed by {{yum}}. Installing and using devtoolset-3, however 
> (http://linux.web.cern.ch/linux/scientific6/docs/softwarecollections.shtml) 
> fixes this issue. This could be resolved by updating the getting started 
> documentation to recommend installing devtoolset-3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4036) perf will not run on CentOS 6.6

2015-12-01 Thread Greg Mann (JIRA)

Greg Mann created MESOS-4036:


 Summary: perf will not run on CentOS 6.6
 Key: MESOS-4036
 URL: https://issues.apache.org/jira/browse/MESOS-4036
 Project: Mesos
  Issue Type: Bug
Reporter: Greg Mann


After using the current installation instructions in the getting started 
documentation, {{perf}} will not run because the version of elfutils included 
in devtoolset-2 is not compatible with the version of {{perf}} installed by 
{{yum}}. Installing and using devtoolset-3, however 
(http://linux.web.cern.ch/linux/scientific6/docs/softwarecollections.shtml) 
fixes this issue. This could be resolved by updating the getting started 
documentation to recommend installing devtoolset-3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4034) URLs with doubled slashes return 404

2015-12-01 Thread Klaus Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035106#comment-15035106
 ] 

Klaus Ma commented on MESOS-4034:
-

Yes, you're right, please ignore my append :).

> URLs with doubled slashes return 404
> 
>
> Key: MESOS-4034
> URL: https://issues.apache.org/jira/browse/MESOS-4034
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: James Peach
>Priority: Minor
>
> The Mesos 0.25 HTTP request router no longer coalesces doubled slashes in the 
> URL path. Previous versions did so this; we noticed when we upgraded a 
> cluster and our metrics poller started getting 404s.
> {code}
> $ curl -v http://localhost:5050//metrics/snapshot
> * About to connect() to localhost port 5050 (#0)
> *   Trying 17.138.64.22... connected
> * Connected to localhost (127.0.0.1) port 5050 (#0)
> > GET //metrics/snapshot HTTP/1.1
> > User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 
> > NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2
> > Host: localhost:5050
> > Accept: */*
> >
> < HTTP/1.1 404 Not Found
> < Date: Wed, 02 Dec 2015 00:50:57 GMT
> < Content-Length: 0
> <
> * Connection #0 to host localhost left intact
> * Closing connection #0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4035) UserCgroupIsolatorTest.ROOT_CGROUPS_UserCgroup fails on CentOS 6.6

2015-12-01 Thread Gilbert Song (JIRA)

Gilbert Song created MESOS-4035:
---

 Summary: UserCgroupIsolatorTest.ROOT_CGROUPS_UserCgroup fails on 
CentOS 6.6
 Key: MESOS-4035
 URL: https://issues.apache.org/jira/browse/MESOS-4035
 Project: Mesos
  Issue Type: Bug
 Environment: CentOS6.6
Reporter: Gilbert Song


`ROOT_CGROUPS_UserCgroup` on CentOS6.6 with 0.26rc3. The environment setup on 
CentOS6.6 is based on latest update of /docs/getting-started.md. Either using 
devtoolset-2 or devtoolset-3 returns the same failure. 

If running `sudo ./bin/mesos-tests.sh 
--gtest_filter="*ROOT_CGROUPS_UserCgroup*"`, it would return failures as 
following log:
{noformat}
[==] Running 3 tests from 3 test cases.
[--] Global test environment set-up.
[--] 1 test from UserCgroupIsolatorTest/0, where TypeParam = 
mesos::internal::slave::CgroupsMemIsolatorProcess
userdel: user 'mesos.test.unprivileged.user' does not exist
[ RUN  ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup
../../src/tests/mesos.cpp:722: Failure
cgroups::mount(hierarchy, subsystem): '/tmp/mesos_test_cgroup/perf_event' 
already exists in the file system
-
We cannot run any cgroups tests that require
a hierarchy with subsystem 'perf_event'
because we failed to find an existing hierarchy
or create a new one (tried '/tmp/mesos_test_cgroup/perf_event').
You can either remove all existing
hierarchies, or disable this test case
(i.e., --gtest_filter=-UserCgroupIsolatorTest/0.*).
-
../../src/tests/mesos.cpp:776: Failure
cgroups: '/tmp/mesos_test_cgroup/perf_event' is not a valid hierarchy
[  FAILED  ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where TypeParam 
= mesos::internal::slave::CgroupsMemIsolatorProcess (1 ms)
[--] 1 test from UserCgroupIsolatorTest/0 (1 ms total)

[--] 1 test from UserCgroupIsolatorTest/1, where TypeParam = 
mesos::internal::slave::CgroupsCpushareIsolatorProcess
userdel: user 'mesos.test.unprivileged.user' does not exist
[ RUN  ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup
../../src/tests/mesos.cpp:722: Failure
cgroups::mount(hierarchy, subsystem): '/tmp/mesos_test_cgroup/perf_event' 
already exists in the file system
-
We cannot run any cgroups tests that require
a hierarchy with subsystem 'perf_event'
because we failed to find an existing hierarchy
or create a new one (tried '/tmp/mesos_test_cgroup/perf_event').
You can either remove all existing
hierarchies, or disable this test case
(i.e., --gtest_filter=-UserCgroupIsolatorTest/1.*).
-
../../src/tests/mesos.cpp:776: Failure
cgroups: '/tmp/mesos_test_cgroup/perf_event' is not a valid hierarchy
[  FAILED  ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup, where TypeParam 
= mesos::internal::slave::CgroupsCpushareIsolatorProcess (4 ms)
[--] 1 test from UserCgroupIsolatorTest/1 (5 ms total)

[--] 1 test from UserCgroupIsolatorTest/2, where TypeParam = 
mesos::internal::slave::CgroupsPerfEventIsolatorProcess
userdel: user 'mesos.test.unprivileged.user' does not exist
[ RUN  ] UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup
../../src/tests/mesos.cpp:722: Failure
cgroups::mount(hierarchy, subsystem): '/tmp/mesos_test_cgroup/perf_event' 
already exists in the file system
-
We cannot run any cgroups tests that require
a hierarchy with subsystem 'perf_event'
because we failed to find an existing hierarchy
or create a new one (tried '/tmp/mesos_test_cgroup/perf_event').
You can either remove all existing
hierarchies, or disable this test case
(i.e., --gtest_filter=-UserCgroupIsolatorTest/2.*).
-
../../src/tests/mesos.cpp:776: Failure
cgroups: '/tmp/mesos_test_cgroup/perf_event' is not a valid hierarchy
[  FAILED  ] UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup, where TypeParam 
= mesos::internal::slave::CgroupsPerfEventIsolatorProcess (2 ms)
[--] 1 test from UserCgroupIsolatorTest/2 (2 ms total)

[--] Global test environment tear-down
[==] 3 tests from 3 test cases ran. (349 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 3 tests, listed below:
[  FAILED  ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where TypeParam 
= mesos::internal::slave::CgroupsMemIsolatorProcess
[  FAILED  ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup, where TypeParam 
= mesos::internal::slave::CgroupsCpushareIsolatorProcess
[  FAILED  ] UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup, where TypeParam 
= mesos::internal::slave::CgroupsPerfEventIsolatorProcess

 3 FAILED TESTS
{noformat}

If running it with `sudo ./bin/mesos-tests.sh 
--gtest_filter="*ROOT_CGROUPS_UserCgroup*" --g

[jira] [Commented] (MESOS-4034) URLs with doubled slashes return 404

2015-12-01 Thread James Peach (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035059#comment-15035059
 ] 

James Peach commented on MESOS-4034:


Do you mean {{strings::tokenize}}? That's supposed to ignore empty tokens, so 
it should consider {{//foo/bar}} to be equivalent to {{/foo/bar}}.

> URLs with doubled slashes return 404
> 
>
> Key: MESOS-4034
> URL: https://issues.apache.org/jira/browse/MESOS-4034
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: James Peach
>Priority: Minor
>
> The Mesos 0.25 HTTP request router no longer coalesces doubled slashes in the 
> URL path. Previous versions did so this; we noticed when we upgraded a 
> cluster and our metrics poller started getting 404s.
> {code}
> $ curl -v http://localhost:5050//metrics/snapshot
> * About to connect() to localhost port 5050 (#0)
> *   Trying 17.138.64.22... connected
> * Connected to localhost (127.0.0.1) port 5050 (#0)
> > GET //metrics/snapshot HTTP/1.1
> > User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 
> > NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2
> > Host: localhost:5050
> > Accept: */*
> >
> < HTTP/1.1 404 Not Found
> < Date: Wed, 02 Dec 2015 00:50:57 GMT
> < Content-Length: 0
> <
> * Connection #0 to host localhost left intact
> * Closing connection #0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4034) URLs with doubled slashes return 404

2015-12-01 Thread Klaus Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035026#comment-15035026
 ] 

Klaus Ma commented on MESOS-4034:
-

That's because we use {{tokenzie("/")}} to get the path info; so if there're 
two {{/}}, the index is wrong :). Should we build related function for URL? 
e.g. URL sub-path parsing.

> URLs with doubled slashes return 404
> 
>
> Key: MESOS-4034
> URL: https://issues.apache.org/jira/browse/MESOS-4034
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: James Peach
>Priority: Minor
>
> The Mesos 0.25 HTTP request router no longer coalesces doubled slashes in the 
> URL path. Previous versions did so this; we noticed when we upgraded a 
> cluster and our metrics poller started getting 404s.
> {code}
> $ curl -v http://localhost:5050//metrics/snapshot
> * About to connect() to localhost port 5050 (#0)
> *   Trying 17.138.64.22... connected
> * Connected to localhost (127.0.0.1) port 5050 (#0)
> > GET //metrics/snapshot HTTP/1.1
> > User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 
> > NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2
> > Host: localhost:5050
> > Accept: */*
> >
> < HTTP/1.1 404 Not Found
> < Date: Wed, 02 Dec 2015 00:50:57 GMT
> < Content-Length: 0
> <
> * Connection #0 to host localhost left intact
> * Closing connection #0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3586) MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and CGROUPS_ROOT_SlaveRecovery are flaky

2015-12-01 Thread Joseph Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035017#comment-15035017
 ] 

Joseph Wu commented on MESOS-3586:
--

Review: https://reviews.apache.org/r/40849/

> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and 
> CGROUPS_ROOT_SlaveRecovery are flaky
> 
>
> Key: MESOS-3586
> URL: https://issues.apache.org/jira/browse/MESOS-3586
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0, 0.26.0
> Environment: Ubuntu 14.04, 3.13.0-32 generic
> Debian 8, gcc 4.9.2
>Reporter: Miguel Bernadin
>Assignee: Joseph Wu
>  Labels: flaky, flaky-test
>
> I am install Mesos 0.24.0 on 4 servers which have very similar hardware and 
> software configurations. 
> After performing ../configure, make, and make check some servers have 
> completed successfully and other failed on test [ RUN  ] 
> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics.
> Is there something I should check in this test? 
> PERFORMED MAKE CHECK NODE-001
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0
> I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave 
> 20151005-143735-2393768202-35106-27900-S0
> Registered executor on svdidac038.techlabs.accenture.com
> Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0
> Forked command at 38510
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> PERFORMED MAKE CHECK NODE-002
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0
> I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave 
> 20151005-143857-2360213770-50427-26325-S0
> Registered executor on svdidac039.techlabs.accenture.com
> Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> Forked command at 37028
> ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure
> Expected: (usage.get().mem_medium_pressure_counter()) >= 
> (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6
> 2015-10-05 
> 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4034) URLs with doubled slashes return 404

2015-12-01 Thread James Peach (JIRA)

James Peach created MESOS-4034:
--

 Summary: URLs with doubled slashes return 404
 Key: MESOS-4034
 URL: https://issues.apache.org/jira/browse/MESOS-4034
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.25.0
Reporter: James Peach
Priority: Minor


The Mesos 0.25 HTTP request router no longer coalesces doubled slashes in the 
URL path. Previous versions did so this; we noticed when we upgraded a cluster 
and our metrics poller started getting 404s.

{code}
$ curl -v http://localhost:5050//metrics/snapshot
* About to connect() to localhost port 5050 (#0)
*   Trying 17.138.64.22... connected
* Connected to localhost (127.0.0.1) port 5050 (#0)
> GET //metrics/snapshot HTTP/1.1
> User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 
> zlib/1.2.3 libidn/1.18 libssh2/1.4.2
> Host: localhost:5050
> Accept: */*
>
< HTTP/1.1 404 Not Found
< Date: Wed, 02 Dec 2015 00:50:57 GMT
< Content-Length: 0
<
* Connection #0 to host localhost left intact
* Closing connection #0
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-3586) MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and CGROUPS_ROOT_SlaveRecovery are flaky

2015-12-01 Thread Joseph Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu reassigned MESOS-3586:


Assignee: Joseph Wu

> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and 
> CGROUPS_ROOT_SlaveRecovery are flaky
> 
>
> Key: MESOS-3586
> URL: https://issues.apache.org/jira/browse/MESOS-3586
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0, 0.26.0
> Environment: Ubuntu 14.04, 3.13.0-32 generic
> Debian 8, gcc 4.9.2
>Reporter: Miguel Bernadin
>Assignee: Joseph Wu
>  Labels: flaky, flaky-test
>
> I am install Mesos 0.24.0 on 4 servers which have very similar hardware and 
> software configurations. 
> After performing ../configure, make, and make check some servers have 
> completed successfully and other failed on test [ RUN  ] 
> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics.
> Is there something I should check in this test? 
> PERFORMED MAKE CHECK NODE-001
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0
> I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave 
> 20151005-143735-2393768202-35106-27900-S0
> Registered executor on svdidac038.techlabs.accenture.com
> Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0
> Forked command at 38510
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> PERFORMED MAKE CHECK NODE-002
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0
> I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave 
> 20151005-143857-2360213770-50427-26325-S0
> Registered executor on svdidac039.techlabs.accenture.com
> Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> Forked command at 37028
> ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure
> Expected: (usage.get().mem_medium_pressure_counter()) >= 
> (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6
> 2015-10-05 
> 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4029) ContentType/SchedulerTest is flaky.

2015-12-01 Thread Anand Mazumdar (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034974#comment-15034974
 ] 

Anand Mazumdar commented on MESOS-4029:
---

The culprit is this: 
https://github.com/apache/mesos/blob/master/src/scheduler/scheduler.cpp#L260 
We pass the {{Callbacks}} mock object by reference and not by value. Since we 
do an {{async}} , the call is queued on another thread but it does not ensure 
that it is invoked before the object is destroyed. Hence, we might invoke the 
{{received}} callback even after the original {{Callbacks}} object is destroyed.

> ContentType/SchedulerTest is flaky.
> ---
>
> Key: MESOS-4029
> URL: https://issues.apache.org/jira/browse/MESOS-4029
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>Assignee: Anand Mazumdar
>  Labels: flaky, flaky-test, mesosphere
>
> SSL build, [Ubuntu 
> 14.04|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh],
>  non-root test run.
> {noformat}
> [--] 22 tests from ContentType/SchedulerTest
> [ RUN  ] ContentType/SchedulerTest.Subscribe/0
> [   OK ] ContentType/SchedulerTest.Subscribe/0 (48 ms)
> *** Aborted at 1448928007 (unix time) try "date -d @1448928007" if you are 
> using GNU date ***
> [ RUN  ] ContentType/SchedulerTest.Subscribe/1
> PC: @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> *** SIGSEGV (@0x10030) received by PID 21320 (TID 0x2b549e5d4700) from 
> PID 48; stack trace: ***
> @ 0x2b54c95940b7 os::Linux::chained_handler()
> @ 0x2b54c9598219 JVM_handle_linux_signal
> @ 0x2b5496300340 (unknown)
> @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> @   0xe2ea6d 
> _ZN7testing8internal18FunctionMockerBaseIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS6_SaIS6_E10InvokeWithERKSt5tupleIJSC_EE
> @   0xe2b1bc testing::internal::FunctionMocker<>::Invoke()
> @  0x1118aed 
> mesos::internal::tests::SchedulerTest::Callbacks::received()
> @  0x111c453 
> _ZNKSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS0_2v19scheduler5EventESt5dequeIS8_SaIS8_EclIJSE_EvEEvRS4_DpOT_
> @  0x111c001 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi16__callIvJSF_EJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
> @  0x111b90d 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi1clIJSF_EvEET0_DpOT_
> @  0x111ae09 std::_Function_handler<>::_M_invoke()
> @ 0x2b5493c6da09 std::function<>::operator()()
> @ 0x2b5493c688ee process::AsyncExecutorProcess::execute<>()
> @ 0x2b5493c6db2a 
> _ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS8_SaIS8_ESC_PvSG_SC_SJ_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSO_FSL_T1_T2_T3_ET4_T5_T6_ENKUlPNS_11ProcessBaseEE_clES11_
> @ 0x2b5493c765a4 
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingNS0_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISC_SaISC_ESG_PvSK_SG_SN_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSS_FSP_T1_T2_T3_ET4_T5_T6_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x2b54946b1201 std::function<>::operator()()
> @ 0x2b549469960f process::ProcessBase::visit()
> @ 0x2b549469d480 process::DispatchEvent::visit()
> @   0x9dc0ba process::ProcessBase::serve()
> @ 0x2b54946958cc process::ProcessManager::resume()
> @ 0x2b5494692a9c 
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
> @ 0x2b549469ccac 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
> @ 0x2b549469cc5c 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
> @ 0x2b549469cbee 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @ 0x2b549469cb45 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
> @ 0x2b549469cade 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager1

[jira] [Commented] (MESOS-3586) MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and CGROUPS_ROOT_SlaveRecovery are flaky

2015-12-01 Thread Joseph Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034948#comment-15034948
 ] 

Joseph Wu commented on MESOS-3586:
--

This race _almost_ seems unavoidable (at least, given the test currently), and 
I don't think the sleep duration is really a problem.

*Background*
Both tests are essentially hammering away at memory, resulting in "memory 
pressure".  Depending on the load (low, medium, critical), this triggers some 
cgroup status events.  By definition, the "low" pressure event is always 
triggered whenever there is any pressure at all:
{quote}
Application will be notified through eventfd when memory pressure is at
the specific level (or higher).
{quote}
[Reference section "11. Memory 
Pressure"|https://www.kernel.org/doc/Documentation/cgroups/memory.txt]

In the tests, we check this by expecting "number of low pressure events" >= 
"number of medium pressure events" >= "number of critical pressure events".

*Problem*
There's no guarantee of the order of notification.  When we read from our 
memory pressure counters, there might be some events in-flight that haven't 
been processed yet.  Therefore, we occasionally see our expectations betrayed.

*???*
The memory pressure event counts should be eventually consistent with our 
expectations.  So the test should probably:
* Stop the memory-hammering task at some point.
* Wait for all pressure events to be processed.
* Then check the counters.

> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and 
> CGROUPS_ROOT_SlaveRecovery are flaky
> 
>
> Key: MESOS-3586
> URL: https://issues.apache.org/jira/browse/MESOS-3586
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0, 0.26.0
> Environment: Ubuntu 14.04, 3.13.0-32 generic
> Debian 8, gcc 4.9.2
>Reporter: Miguel Bernadin
>  Labels: flaky, flaky-test
>
> I am install Mesos 0.24.0 on 4 servers which have very similar hardware and 
> software configurations. 
> After performing ../configure, make, and make check some servers have 
> completed successfully and other failed on test [ RUN  ] 
> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics.
> Is there something I should check in this test? 
> PERFORMED MAKE CHECK NODE-001
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0
> I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave 
> 20151005-143735-2393768202-35106-27900-S0
> Registered executor on svdidac038.techlabs.accenture.com
> Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0
> Forked command at 38510
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> PERFORMED MAKE CHECK NODE-002
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0
> I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave 
> 20151005-143857-2360213770-50427-26325-S0
> Registered executor on svdidac039.techlabs.accenture.com
> Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> Forked command at 37028
> ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure
> Expected: (usage.get().mem_medium_pressure_counter()) >= 
> (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6
> 2015-10-05 
> 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3586) MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and CGROUPS_ROOT_SlaveRecovery are flaky

2015-12-01 Thread Till Toenshoff (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3586:
--
Labels: flaky flaky-test  (was: )

> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and 
> CGROUPS_ROOT_SlaveRecovery are flaky
> 
>
> Key: MESOS-3586
> URL: https://issues.apache.org/jira/browse/MESOS-3586
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0, 0.26.0
> Environment: Ubuntu 14.04, 3.13.0-32 generic
> Debian 8, gcc 4.9.2
>Reporter: Miguel Bernadin
>  Labels: flaky, flaky-test
>
> I am install Mesos 0.24.0 on 4 servers which have very similar hardware and 
> software configurations. 
> After performing ../configure, make, and make check some servers have 
> completed successfully and other failed on test [ RUN  ] 
> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics.
> Is there something I should check in this test? 
> PERFORMED MAKE CHECK NODE-001
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0
> I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave 
> 20151005-143735-2393768202-35106-27900-S0
> Registered executor on svdidac038.techlabs.accenture.com
> Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0
> Forked command at 38510
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> PERFORMED MAKE CHECK NODE-002
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0
> I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave 
> 20151005-143857-2360213770-50427-26325-S0
> Registered executor on svdidac039.techlabs.accenture.com
> Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> Forked command at 37028
> ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure
> Expected: (usage.get().mem_medium_pressure_counter()) >= 
> (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6
> 2015-10-05 
> 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3586) MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and CGROUPS_ROOT_SlaveRecovery are flaky

2015-12-01 Thread Till Toenshoff (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3586:
--
Component/s: test

> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and 
> CGROUPS_ROOT_SlaveRecovery are flaky
> 
>
> Key: MESOS-3586
> URL: https://issues.apache.org/jira/browse/MESOS-3586
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0, 0.26.0
> Environment: Ubuntu 14.04, 3.13.0-32 generic
> Debian 8, gcc 4.9.2
>Reporter: Miguel Bernadin
>  Labels: flaky, flaky-test
>
> I am install Mesos 0.24.0 on 4 servers which have very similar hardware and 
> software configurations. 
> After performing ../configure, make, and make check some servers have 
> completed successfully and other failed on test [ RUN  ] 
> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics.
> Is there something I should check in this test? 
> PERFORMED MAKE CHECK NODE-001
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0
> I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave 
> 20151005-143735-2393768202-35106-27900-S0
> Registered executor on svdidac038.techlabs.accenture.com
> Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0
> Forked command at 38510
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> PERFORMED MAKE CHECK NODE-002
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0
> I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave 
> 20151005-143857-2360213770-50427-26325-S0
> Registered executor on svdidac039.techlabs.accenture.com
> Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> Forked command at 37028
> ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure
> Expected: (usage.get().mem_medium_pressure_counter()) >= 
> (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6
> 2015-10-05 
> 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3586) MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and CGROUPS_ROOT_SlaveRecovery are flaky

2015-12-01 Thread Joseph Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3586:
-
Affects Version/s: 0.26.0
  Environment: 
Ubuntu 14.04, 3.13.0-32 generic
Debian 8, gcc 4.9.2

  was:Ubuntu 14.04, 3.13.0-32 generic

  Summary: MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and 
CGROUPS_ROOT_SlaveRecovery are flaky  (was: Installing Mesos 0.24.0 on multiple 
systems. Failed test on MemoryPressureMesosTest.CGROUPS_ROOT_Statistics)

The {{CGROUPS_ROOT_Statistics}} and {{CGROUPS_ROOT_SlaveRecovery}} are both 
similarly flaky.

The tests also fail on Debian 8 with the same error.

> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and 
> CGROUPS_ROOT_SlaveRecovery are flaky
> 
>
> Key: MESOS-3586
> URL: https://issues.apache.org/jira/browse/MESOS-3586
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.24.0, 0.26.0
> Environment: Ubuntu 14.04, 3.13.0-32 generic
> Debian 8, gcc 4.9.2
>Reporter: Miguel Bernadin
>
> I am install Mesos 0.24.0 on 4 servers which have very similar hardware and 
> software configurations. 
> After performing ../configure, make, and make check some servers have 
> completed successfully and other failed on test [ RUN  ] 
> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics.
> Is there something I should check in this test? 
> PERFORMED MAKE CHECK NODE-001
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0
> I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave 
> 20151005-143735-2393768202-35106-27900-S0
> Registered executor on svdidac038.techlabs.accenture.com
> Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0
> Forked command at 38510
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> PERFORMED MAKE CHECK NODE-002
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0
> I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave 
> 20151005-143857-2360213770-50427-26325-S0
> Registered executor on svdidac039.techlabs.accenture.com
> Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> Forked command at 37028
> ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure
> Expected: (usage.get().mem_medium_pressure_counter()) >= 
> (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6
> 2015-10-05 
> 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.

2015-12-01 Thread Till Toenshoff (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-4025:
--
Labels: flaky flaky-test test  (was: test)

> SlaveRecoveryTest/0.GCExecutor is flaky.
> 
>
> Key: MESOS-4025
> URL: https://issues.apache.org/jira/browse/MESOS-4025
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>  Labels: flaky, flaky-test, test
>
> Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based 
> on 0.26.0-rc1.
> Testsuite was run as root.
> {noformat}
> sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1
> {noformat}
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.GCExecutor
> I1130 16:49:16.336833  1032 exec.cpp:136] Version: 0.26.0
> I1130 16:49:16.345212  1049 exec.cpp:210] Executor registered on slave 
> dde9fd4e-b016-4a99-9081-b047e9df9afa-S0
> Registered executor on ubuntu14
> Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114
> sh -c 'sleep 1000'
> Forked command at 1057
> ../../src/tests/mesos.cpp:779: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave':
>  Device or resource busy
> *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are 
> using GNU date ***
> PC: @  0x1443e9a testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; 
> stack trace: ***
> @ 0x7f1be92b80b7 os::Linux::chained_handler()
> @ 0x7f1be92bc219 JVM_handle_linux_signal
> @ 0x7f1bf7bbc340 (unknown)
> @  0x1443e9a testing::UnitTest::AddTestPartResult()
> @  0x1438b99 testing::internal::AssertHelper::operator=()
> @   0xf0b3bb 
> mesos::internal::tests::ContainerizerTest<>::TearDown()
> @  0x1461882 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145c6f8 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x143de4a testing::Test::Run()
> @  0x143e584 testing::TestInfo::Run()
> @  0x143ebca testing::TestCase::Run()
> @  0x1445312 testing::internal::UnitTestImpl::RunAllTests()
> @  0x14624a7 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145d26e 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x14440ae testing::UnitTest::Run()
> @   0xd15cd4 RUN_ALL_TESTS()
> @   0xd158c1 main
> @ 0x7f1bf7808ec5 (unknown)
> @   0x913009 (unknown)
> {noformat}
> My Vagrantfile generator;
> {noformat}
> #!/usr/bin/env bash
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.box = "bento/ubuntu-14.04"
>   config.vm.hostname = "${PLATFORM_NAME}"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
> vb.customize ["modifyvm", :id, "--nictype1", "virtio"]
> vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
> vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"]
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
>   end
>   config.vm.provision "file", source: "../test.sh", destination: "~/test.sh"
>   config.vm.provision "shell", inline: <<-SHELL
> sudo apt-get update
> sudo apt-get -y install openjdk-7-jdk autoconf libtool
> sudo apt-get -y install build-essential python-dev python-boto  \
> libcurl4-nss-dev libsasl2-dev maven \
> libapr1-dev libsvn-dev libssl-dev libevent-dev
> sudo apt-get -y install git
> sudo wget -qO- https://get.docker.com/ | sh
>   SHELL
> end
> EOF
> {noformat}
> The problem is kicking in frequently in my tests - I'ld say > 10% but less 
> than 50%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.

2015-12-01 Thread Till Toenshoff (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-4025:
--
Component/s: test

> SlaveRecoveryTest/0.GCExecutor is flaky.
> 
>
> Key: MESOS-4025
> URL: https://issues.apache.org/jira/browse/MESOS-4025
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>  Labels: flaky, flaky-test, test
>
> Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based 
> on 0.26.0-rc1.
> Testsuite was run as root.
> {noformat}
> sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1
> {noformat}
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.GCExecutor
> I1130 16:49:16.336833  1032 exec.cpp:136] Version: 0.26.0
> I1130 16:49:16.345212  1049 exec.cpp:210] Executor registered on slave 
> dde9fd4e-b016-4a99-9081-b047e9df9afa-S0
> Registered executor on ubuntu14
> Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114
> sh -c 'sleep 1000'
> Forked command at 1057
> ../../src/tests/mesos.cpp:779: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave':
>  Device or resource busy
> *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are 
> using GNU date ***
> PC: @  0x1443e9a testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; 
> stack trace: ***
> @ 0x7f1be92b80b7 os::Linux::chained_handler()
> @ 0x7f1be92bc219 JVM_handle_linux_signal
> @ 0x7f1bf7bbc340 (unknown)
> @  0x1443e9a testing::UnitTest::AddTestPartResult()
> @  0x1438b99 testing::internal::AssertHelper::operator=()
> @   0xf0b3bb 
> mesos::internal::tests::ContainerizerTest<>::TearDown()
> @  0x1461882 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145c6f8 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x143de4a testing::Test::Run()
> @  0x143e584 testing::TestInfo::Run()
> @  0x143ebca testing::TestCase::Run()
> @  0x1445312 testing::internal::UnitTestImpl::RunAllTests()
> @  0x14624a7 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145d26e 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x14440ae testing::UnitTest::Run()
> @   0xd15cd4 RUN_ALL_TESTS()
> @   0xd158c1 main
> @ 0x7f1bf7808ec5 (unknown)
> @   0x913009 (unknown)
> {noformat}
> My Vagrantfile generator;
> {noformat}
> #!/usr/bin/env bash
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.box = "bento/ubuntu-14.04"
>   config.vm.hostname = "${PLATFORM_NAME}"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
> vb.customize ["modifyvm", :id, "--nictype1", "virtio"]
> vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
> vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"]
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
>   end
>   config.vm.provision "file", source: "../test.sh", destination: "~/test.sh"
>   config.vm.provision "shell", inline: <<-SHELL
> sudo apt-get update
> sudo apt-get -y install openjdk-7-jdk autoconf libtool
> sudo apt-get -y install build-essential python-dev python-boto  \
> libcurl4-nss-dev libsasl2-dev maven \
> libapr1-dev libsvn-dev libssl-dev libevent-dev
> sudo apt-get -y install git
> sudo wget -qO- https://get.docker.com/ | sh
>   SHELL
> end
> EOF
> {noformat}
> The problem is kicking in frequently in my tests - I'ld say > 10% but less 
> than 50%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.

2015-12-01 Thread Jojy Varghese (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034701#comment-15034701
 ] 

Jojy Varghese commented on MESOS-4025:
--

On debian8:

{code}
[ RUN  ] SlaveRecoveryTest/0.Reboot
I1201 21:57:11.562711  7964 exec.cpp:136] Version: 0.26.0
I1201 21:57:11.571506  7978 exec.cpp:210] Executor registered on slave 
00a179f0-f087-4054-a0c7-c15281d5e7ff-S0
Registered executor on debian8
Starting task 791255fc-88dd-452e-ba12-6b2dfced99a0
Forked command at 7987
sh -c 'sleep 1000'
I1201 21:57:11.640627  7982 exec.cpp:383] Executor asked to shutdown
Shutting down
Sending SIGTERM to process tree at pid 7987
Killing the following process trees:
[ 
-+- 7987 sh -c sleep 1000 
 \--- 7988 sleep 1000 
]
Command terminated with signal Terminated (pid: 7987)
[   OK ] SlaveRecoveryTest/0.Reboot (1730 ms)
[ RUN  ] SlaveRecoveryTest/0.GCExecutor
2015-12-01 
21:57:13,187:1473(0x7f9bf4e36700):ZOO_ERROR@handle_socket_error_msg@1697: 
Socket [127.0.0.1:44262] zk retcode=-4, errno=111(Connection refused): server 
refused to accept the client
I1201 21:57:13.296581  8012 exec.cpp:136] Version: 0.26.0
I1201 21:57:13.305498  8028 exec.cpp:210] Executor registered on slave 
44a46bd2-d24a-48d6-bd62-492c15845841-S0
Registered executor on debian8
Starting task 8affc624-c95d-43f5-a2b9-967663c3151b
sh -c 'sleep 1000'
Forked command at 8035
../../src/tests/mesos.cpp:781: Failure
(cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
'/sys/fs/cgroup/memory/mesos_test_a894bd47-5e1a-4442-bc6b-303d2aed6945/slave': 
Device or resource busy
*** Aborted at 1449007033 (unix time) try "date -d @1449007033" if you are 
using GNU date ***
PC: @  0x14b079e testing::UnitTest::AddTestPartResult()
*** SIGSEGV (@0x0) received by PID 1473 (TID 0x7f9c3db5d7c0) from PID 0; stack 
trace: ***
@ 0x7f9c28c2166c os::Linux::chained_handler()
@ 0x7f9c28c25a0a JVM_handle_linux_signal
@ 0x7f9c374728d0 (unknown)
@  0x14b079e testing::UnitTest::AddTestPartResult()
@  0x14a51d7 testing::internal::AssertHelper::operator=()
@   0xf564c1 mesos::internal::tests::ContainerizerTest<>::TearDown()
@  0x14ce2c0 
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@  0x14c9238 
testing::internal::HandleExceptionsInMethodIfSupported<>()
@  0x14aa5c0 testing::Test::Run()
@  0x14aad05 testing::TestInfo::Run()
@  0x14ab340 testing::TestCase::Run()
@  0x14b1c8f testing::internal::UnitTestImpl::RunAllTests()
@  0x14cef4f 
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@  0x14c9d8e 
testing::internal::HandleExceptionsInMethodIfSupported<>()
@  0x14b09bf testing::UnitTest::Run()
@   0xd63df2 RUN_ALL_TESTS()
@   0xd639d0 main
@ 0x7f9c370dbb45 (unknown)
@   0x9588e9 (unknown)

{code}

* The crash was inside 
*ContainerizerTest::TearDown*. 
* The assertion *AWAIT_READY(cgroups::destroy(hierarchy, cgroup));* failed. The 
cgroup in question was 
*/sys/fs/cgroup/memory/mesos_test_a894bd47-5e1a-4442-bc6b-303d2aed6945/slave* 
as seen from the log above.

> SlaveRecoveryTest/0.GCExecutor is flaky.
> 
>
> Key: MESOS-4025
> URL: https://issues.apache.org/jira/browse/MESOS-4025
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>  Labels: test
>
> Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based 
> on 0.26.0-rc1.
> Testsuite was run as root.
> {noformat}
> sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1
> {noformat}
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.GCExecutor
> I1130 16:49:16.336833  1032 exec.cpp:136] Version: 0.26.0
> I1130 16:49:16.345212  1049 exec.cpp:210] Executor registered on slave 
> dde9fd4e-b016-4a99-9081-b047e9df9afa-S0
> Registered executor on ubuntu14
> Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114
> sh -c 'sleep 1000'
> Forked command at 1057
> ../../src/tests/mesos.cpp:779: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave':
>  Device or resource busy
> *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are 
> using GNU date ***
> PC: @  0x1443e9a testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; 
> stack trace: ***
> @ 0x7f1be92b80b7 os::Linux::chained_handler()
> @ 0x7f1be92bc219 JVM_handle_linux_signal
> @ 0x7f1bf7bbc340 (unknown)
> @  0x1443e9a testing::UnitTest::AddTestPartResult()
> @  0x1438b99 testing::internal::A

[jira] [Commented] (MESOS-2918) CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen Flaky

2015-12-01 Thread Vinod Kone (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034669#comment-15034669
 ] 

Vinod Kone commented on MESOS-2918:
---

Keeping the ticket open to address the TODO (dynamically disable the test if 
swap is enabled).

commit c3dd3edb6f09de4333645cb87ba25c9d1c8969c3
Author: Chi Zhang 
Date:   Tue Dec 1 13:56:49 2015 -0800

Checked if swap is enabled before running memory pressure related
tests.

Review: https://reviews.apache.org/r/38234

commit 5a21baa762d726ed22aaa4e14ba4f956d6132a5a
Author: Chi Zhang 
Date:   Tue Dec 1 13:56:16 2015 -0800

Added swap information to os::memory().

Review: https://reviews.apache.org/r/38233


> CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen Flaky
> --
>
> Key: MESOS-2918
> URL: https://issues.apache.org/jira/browse/MESOS-2918
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, test
>Affects Versions: 0.23.0
>Reporter: Paul Brett
>Assignee: Chi Zhang
>  Labels: test, twitter
>
> This test fails when swap is enabled on the platform because it creates a 
> memory hog with the expectation that the OOM killer will kill the hog but 
> with swap enabled, the hog is just swapped out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-1763) Add support for multiple roles to be specified in FrameworkInfo

2015-12-01 Thread Neil Conway (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-1763:
---
Labels: mesosphere roles  (was: mesosphere)

> Add support for multiple roles to be specified in FrameworkInfo
> ---
>
> Key: MESOS-1763
> URL: https://issues.apache.org/jira/browse/MESOS-1763
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Vinod Kone
>Assignee: Timothy Chen
>  Labels: mesosphere, roles
>
> Currently frameworks have the ability to set only one (resource) role in 
> FrameworkInfo. It would be nice to let frameworks specify multiple roles so 
> that they can do more fine grained resource accounting per role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-1763) Add support for multiple roles to be specified in FrameworkInfo

2015-12-01 Thread Neil Conway (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-1763:
---
Component/s: master

> Add support for multiple roles to be specified in FrameworkInfo
> ---
>
> Key: MESOS-1763
> URL: https://issues.apache.org/jira/browse/MESOS-1763
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Vinod Kone
>Assignee: Timothy Chen
>  Labels: mesosphere, roles
>
> Currently frameworks have the ability to set only one (resource) role in 
> FrameworkInfo. It would be nice to let frameworks specify multiple roles so 
> that they can do more fine grained resource accounting per role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3586) Installing Mesos 0.24.0 on multiple systems. Failed test on MemoryPressureMesosTest.CGROUPS_ROOT_Statistics

2015-12-01 Thread Jan Schlicht (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034339#comment-15034339
 ] 

Jan Schlicht commented on MESOS-3586:
-

It seems like a timing problem in the test. It's making the assumption that 
{{os::sleep}} will sleep for the exact amount that it's provided with.

> Installing Mesos 0.24.0 on multiple systems. Failed test on 
> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> ---
>
> Key: MESOS-3586
> URL: https://issues.apache.org/jira/browse/MESOS-3586
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.24.0
> Environment: Ubuntu 14.04, 3.13.0-32 generic
>Reporter: Miguel Bernadin
>
> I am install Mesos 0.24.0 on 4 servers which have very similar hardware and 
> software configurations. 
> After performing ../configure, make, and make check some servers have 
> completed successfully and other failed on test [ RUN  ] 
> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics.
> Is there something I should check in this test? 
> PERFORMED MAKE CHECK NODE-001
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0
> I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave 
> 20151005-143735-2393768202-35106-27900-S0
> Registered executor on svdidac038.techlabs.accenture.com
> Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0
> Forked command at 38510
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> PERFORMED MAKE CHECK NODE-002
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0
> I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave 
> 20151005-143857-2360213770-50427-26325-S0
> Registered executor on svdidac039.techlabs.accenture.com
> Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> Forked command at 37028
> ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure
> Expected: (usage.get().mem_medium_pressure_counter()) >= 
> (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6
> 2015-10-05 
> 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3586) Installing Mesos 0.24.0 on multiple systems. Failed test on MemoryPressureMesosTest.CGROUPS_ROOT_Statistics

2015-12-01 Thread Jan Schlicht (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034269#comment-15034269
 ] 

Jan Schlicht commented on MESOS-3586:
-

I used the following vagrant generator to setup a CentOS virt env:
{noformat}
cat << EOF > Vagrantfile
# -*- mode: ruby -*-" >
# vi: set ft=ruby :
Vagrant.configure(2) do |config|
  # Disable shared folder to prevent certain kernel module dependencies.
  config.vm.synced_folder ".", "/vagrant", disabled: true

  config.vm.hostname = "centos71"

  config.vm.box = "bento/centos-7.1"

  config.vm.provider "virtualbox" do |vb|
vb.memory = 8192
vb.cpus = 8
  end

  config.vm.provision "shell", inline: <<-SHELL
 yum -y update systemd

 yum install -y tar wget
 wget 
http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo 
-O /etc/yum.repos.d/epel-apache-maven.repo

 yum groupinstall -y "Development Tools"
 yum install -y apache-maven python-devel java-1.7.0-openjdk-devel 
zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 
apr-devel subversion-devel apr-util-devel

 yum install -y libevent-devel

 yum install -y perf nmap-ncat

 yum install -y git

 yum install -y docker
 systemctl start docker
 systemctl enable docker
 docker info

 #wget -qO- https://get.docker.com/ | sh

  SHELL
end
EOF

vagrant up
vagrant reload

vagrant ssh -c "
git clone  https://github.com/apache/mesos.git mesos
cd mesos
git checkout -b 0.26.0-rc2 0.26.0-rc2

./bootstrap
mkdir build
cd build

../configure
GTEST_FILTER="" make check
sudo ./bin/mesos-tests.sh
"
{noformat}

> Installing Mesos 0.24.0 on multiple systems. Failed test on 
> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> ---
>
> Key: MESOS-3586
> URL: https://issues.apache.org/jira/browse/MESOS-3586
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.24.0
> Environment: Ubuntu 14.04, 3.13.0-32 generic
>Reporter: Miguel Bernadin
>
> I am install Mesos 0.24.0 on 4 servers which have very similar hardware and 
> software configurations. 
> After performing ../configure, make, and make check some servers have 
> completed successfully and other failed on test [ RUN  ] 
> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics.
> Is there something I should check in this test? 
> PERFORMED MAKE CHECK NODE-001
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0
> I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave 
> 20151005-143735-2393768202-35106-27900-S0
> Registered executor on svdidac038.techlabs.accenture.com
> Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0
> Forked command at 38510
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> PERFORMED MAKE CHECK NODE-002
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0
> I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave 
> 20151005-143857-2360213770-50427-26325-S0
> Registered executor on svdidac039.techlabs.accenture.com
> Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> Forked command at 37028
> ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure
> Expected: (usage.get().mem_medium_pressure_counter()) >= 
> (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6
> 2015-10-05 
> 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3586) Installing Mesos 0.24.0 on multiple systems. Failed test on MemoryPressureMesosTest.CGROUPS_ROOT_Statistics

2015-12-01 Thread Jan Schlicht (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034263#comment-15034263
 ] 

Jan Schlicht commented on MESOS-3586:
-

I have to reopen this, as I've found the same behavior using the 0.26-rc2 on 
CentOS 7.1. Noticed some flakiness while running {{sudo ./bin/mesos-tests.sh}} 
and could reproduce it by running {{sudo ./bin/mesos-tests.sh - 
--gtest_filter="MemoryPressureMesosTest.CGROUPS_ROOT_Statistics" 
--gtest_repeat=-1 --gtest_break_on_failure}} until it breaks.

Here's a verbose output of a failing test:
{noformat}
[ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
I1201 18:07:51.136508 18883 cgroups.cpp:2429] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb/d540e60d-2d62-4a1e-b5ff-482f7b3cc1a5
I1201 18:07:51.144594 18886 cgroups.cpp:1411] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb/d540e60d-2d62-4a1e-b5ff-482f7b3cc1a5
 after 7.076864ms
I1201 18:07:51.151480 18882 cgroups.cpp:2447] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb/d540e60d-2d62-4a1e-b5ff-482f7b3cc1a5
I1201 18:07:51.162557 18886 cgroups.cpp:1440] Successfullly thawed cgroup 
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb/d540e60d-2d62-4a1e-b5ff-482f7b3cc1a5
 after 11.026944ms
I1201 18:07:51.172379 18887 cgroups.cpp:2429] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb
I1201 18:07:51.183791 18881 cgroups.cpp:1411] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb after 
7.8272ms
I1201 18:07:51.192354 18887 cgroups.cpp:2447] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb
I1201 18:07:51.199439 18885 cgroups.cpp:1440] Successfullly thawed cgroup 
/sys/fs/cgroup/freezer/mesos_test_7bcd6aa5-6f35-44ea-90a5-e7f047edbffb after 
7.028224ms
I1201 18:07:51.332849 18866 leveldb.cpp:176] Opened db in 6.74674ms
I1201 18:07:51.335450 18866 leveldb.cpp:183] Compacted db in 2.554513ms
I1201 18:07:51.335539 18866 leveldb.cpp:198] Created db iterator in 53851ns
I1201 18:07:51.335556 18866 leveldb.cpp:204] Seeked to beginning of db in 3455ns
I1201 18:07:51.335561 18866 leveldb.cpp:273] Iterated through 0 keys in the db 
in 107ns
I1201 18:07:51.335666 18866 replica.cpp:780] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1201 18:07:51.337374 18881 recover.cpp:449] Starting replica recovery
I1201 18:07:51.338235 18881 recover.cpp:475] Replica is in EMPTY status
I1201 18:07:51.340142 18880 replica.cpp:676] Replica in EMPTY status received a 
broadcasted recover request from (14)@127.0.0.1:57652
I1201 18:07:51.340749 18882 recover.cpp:195] Received a recover response from a 
replica in EMPTY status
I1201 18:07:51.340975 18885 master.cpp:367] Master 
2f17d97c-de40-491e-9706-bf83a9ffd08c (centos71) started on 127.0.0.1:57652
I1201 18:07:51.341475 18884 recover.cpp:566] Updating replica status to STARTING
I1201 18:07:51.341152 18885 master.cpp:369] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/ap4rPt/credentials" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="25secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/ap4rPt/master" 
--zk_session_timeout="10secs"
W1201 18:07:51.341752 18885 master.cpp:372]
**
Master bound to loopback interface! Cannot communicate with remote schedulers 
or slaves. You might want to set '--ip' flag to a routable IP address.
**
I1201 18:07:51.341794 18885 master.cpp:414] Master only allowing authenticated 
frameworks to register
I1201 18:07:51.341804 18885 master.cpp:419] Master only allowing authenticated 
slaves to register
I1201 18:07:51.341879 18885 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/ap4rPt/credentials'
I1201 18:07:51.345211 18885 master.cpp:458] Using default 'crammd5' 
authenticator
I1201 18:07:51.345268 18882 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 3.5302ms
I1201 18:07:51.345289 18882 replica.cpp:323] Persisted replica status to 
STARTING
I1201 18:07:51.345350 18885 authenticator.cpp:520] Initializing server SAS

[jira] [Commented] (MESOS-4032) SlaveRecoveryTest/0.Reboot fails under CentOS 7.1 with libevent & SSL enabled.

2015-12-01 Thread Jan Schlicht (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034147#comment-15034147
 ] 

Jan Schlicht commented on MESOS-4032:
-

Looks like it was caused by some artifacts. After restarting the virtual env, 
the test is OK.

> SlaveRecoveryTest/0.Reboot fails under CentOS 7.1 with libevent & SSL enabled.
> --
>
> Key: MESOS-4032
> URL: https://issues.apache.org/jira/browse/MESOS-4032
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 7.1, {{--enable-libevent --enable-ssl}}
>Reporter: Jan Schlicht
>
> Running {{sudo ./bin/mesos-tests.sh}} has SlaveRecoveryTest/0.Reboot failing. 
> A virtual env was used to run the tests.
> Vagrantfile generator:
> {noformat}
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.hostname = "centos71"
>   config.vm.box = "bento/centos-7.1"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = 8192
> vb.cpus = 8
>   end
>   config.vm.provision "shell", inline: <<-SHELL
>  yum -y update systemd
>  yum install -y tar wget
>  wget 
> http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo 
> -O /etc/yum.repos.d/epel-apache-maven.repo
>  yum groupinstall -y "Development Tools"
>  yum install -y apache-maven python-devel java-1.7.0-openjdk-devel 
> zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 
> apr-devel subversion-devel apr-util-devel
>  yum install -y libevent-devel
>  yum install -y perf nmap-ncat
>  yum install -y git
>  yum install -y docker
>  systemctl start docker
>  systemctl enable docker
>   SHELL
> end
> EOF
> vagrant up
> vagrant reload
> vagrant ssh -c "
> git clone  https://github.com/apache/mesos.git mesos
> cd mesos
> git checkout -b 0.26.0-rc2 0.26.0-rc2
> ./bootstrap
> mkdir build
> cd build
> ../configure --enable-libevent --enable-ssl
> GTEST_FILTER="" make check
> sudo ./bin/mesos-tests.sh
> "
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3787) As a developer, I'd like to be able to expand environment variables through the Docker executor.

2015-12-01 Thread Adam B (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034131#comment-15034131
 ] 

Adam B commented on MESOS-3787:
---

Please allow me to express a potential security concern. I hope that our 
eventual solution addresses this.
If the variable expansion happens as a part of the slave process, run as root, 
we must ensure that it isn't able to actually execute a command as root or view 
variable contents that only root should see, since the variable/config is set 
by the framework, not an admin. Rather, the expansion should happen as the 
TaskInfo.user/FrameworkInfo.user, so that {code}"containerPath": "/data/${USER}"
"hostPath": "${HOME}"{code} should use the task user's name/home, not 'root'.

> As a developer, I'd like to be able to expand environment variables through 
> the Docker executor.
> 
>
> Key: MESOS-3787
> URL: https://issues.apache.org/jira/browse/MESOS-3787
> Project: Mesos
>  Issue Type: Wish
>Reporter: John Garcia
>  Labels: mesosphere
> Attachments: mesos.patch, test-example.json
>
>
> We'd like to have expanded variables usable in [the json files used to create 
> a Marathon app, hence] the Task's CommandInfo, so that the executor is able 
> to detect the correct values at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4033) Add a commit hook for non-ascii charachters

2015-12-01 Thread Alexander Rukletsov (JIRA)

Alexander Rukletsov created MESOS-4033:
--

 Summary: Add a commit hook for non-ascii charachters
 Key: MESOS-4033
 URL: https://issues.apache.org/jira/browse/MESOS-4033
 Project: Mesos
  Issue Type: Task
Reporter: Alexander Rukletsov
Priority: Minor


Non-ascii characters invisible in some editors may sneak into the codebase (see 
e.g. https://reviews.apache.org/r/40799/). To avoid this, a pre-commit hook can 
be added.

Quick searching suggested a simple perl script: 
https://superuser.com/questions/417305/how-can-i-identify-non-ascii-characters-from-the-shell



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-4032) SlaveRecoveryTest/0.Reboot fails under CentOS 7.1 with libevent & SSL enabled.

2015-12-01 Thread Jan Schlicht (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033936#comment-15033936
 ] 

Jan Schlicht edited comment on MESOS-4032 at 12/1/15 4:14 PM:
--

The tests work fine if Mesos is compiled with libev, without SSL.
Running the test in isolation also fails.

Verbose output:
{noformat}
[ RUN  ] SlaveRecoveryTest/0.Reboot
I1201 16:13:43.764530 30105 cgroups.cpp:2429] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos_test_4ea77e5a-030e-468d-aa54-6cf580143b86
I1201 16:13:43.955772 30100 cgroups.cpp:1411] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos_test_4ea77e5a-030e-468d-aa54-6cf580143b86 after 
190.95296ms
I1201 16:13:44.151808 30106 cgroups.cpp:2447] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos_test_4ea77e5a-030e-468d-aa54-6cf580143b86
I1201 16:13:44.338899 30103 cgroups.cpp:1440] Successfullly thawed cgroup 
/sys/fs/cgroup/freezer/mesos_test_4ea77e5a-030e-468d-aa54-6cf580143b86 after 
186.987008ms
I1201 16:13:46.429718 30085 leveldb.cpp:176] Opened db in 6.794189ms
I1201 16:13:46.431185 30085 leveldb.cpp:183] Compacted db in 1.403926ms
I1201 16:13:46.431273 30085 leveldb.cpp:198] Created db iterator in 55789ns
I1201 16:13:46.431289 30085 leveldb.cpp:204] Seeked to beginning of db in 3775ns
I1201 16:13:46.431293 30085 leveldb.cpp:273] Iterated through 0 keys in the db 
in 120ns
I1201 16:13:46.431409 30085 replica.cpp:780] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1201 16:13:46.432781 30104 recover.cpp:449] Starting replica recovery
I1201 16:13:46.433365 30104 recover.cpp:475] Replica is in EMPTY status
I1201 16:13:46.438645 30104 replica.cpp:676] Replica in EMPTY status received a 
broadcasted recover request from (9)@127.0.0.1:52014
I1201 16:13:46.439353 30099 master.cpp:367] Master 
0c54b5bb-d0f8-4c94-8f2a-c49672419e62 (centos71) started on 127.0.0.1:52014
I1201 16:13:46.439602 30100 recover.cpp:195] Received a recover response from a 
replica in EMPTY status
I1201 16:13:46.439393 30099 master.cpp:369] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/qZBjUp/credentials" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="25secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/qZBjUp/master" 
--zk_session_timeout="10secs"
W1201 16:13:46.439997 30099 master.cpp:372]
**
Master bound to loopback interface! Cannot communicate with remote schedulers 
or slaves. You might want to set '--ip' flag to a routable IP address.
**
I1201 16:13:46.440037 30099 master.cpp:414] Master only allowing authenticated 
frameworks to register
I1201 16:13:46.440042 30099 master.cpp:419] Master only allowing authenticated 
slaves to register
I1201 16:13:46.440047 30099 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/qZBjUp/credentials'
I1201 16:13:46.440315 30106 recover.cpp:566] Updating replica status to STARTING
I1201 16:13:46.440580 30099 master.cpp:458] Using default 'crammd5' 
authenticator
I1201 16:13:46.440743 30099 authenticator.cpp:520] Initializing server SASL
I1201 16:13:46.442067 30099 master.cpp:495] Authorization enabled
I1201 16:13:46.447201 30099 master.cpp:1606] The newly elected leader is 
master@127.0.0.1:52014 with id 0c54b5bb-d0f8-4c94-8f2a-c49672419e62
I1201 16:13:46.447230 30099 master.cpp:1619] Elected as the leading master!
I1201 16:13:46.447255 30099 master.cpp:1379] Recovering from registrar
I1201 16:13:46.447590 30099 registrar.cpp:309] Recovering registrar
I1201 16:13:46.451647 30100 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 10.746719ms
I1201 16:13:46.451686 30100 replica.cpp:323] Persisted replica status to 
STARTING
I1201 16:13:46.451942 30106 recover.cpp:475] Replica is in STARTING status
I1201 16:13:46.452819 30100 replica.cpp:676] Replica in STARTING status 
received a broadcasted recover request from (10)@127.0.0.1:52014
I1201 16:13:46.453064 30105 recover.cpp:195] Received a recover response from a 
replica in STARTING status
I1201 16:13:46.453727 30104 recover.cpp:566] Updating replica status to VOTING
I1201 16:13:46.454529 30105 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 720044ns
I1201 16:13:46.454548 30105 replica.cpp:323] Persi

[jira] [Commented] (MESOS-4032) SlaveRecoveryTest/0.Reboot fails under CentOS 7.1 with libevent & SSL enabled.

2015-12-01 Thread Jan Schlicht (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033936#comment-15033936
 ] 

Jan Schlicht commented on MESOS-4032:
-

{noformat}
[ RUN  ] SlaveRecoveryTest/0.Reboot
I1201 15:59:03.294540 21012 exec.cpp:136] Version: 0.26.0
I1201 15:59:03.302486 21039 exec.cpp:210] Executor registered on slave 
b17072f2-ce17-4f80-aa41-2197194f7cd0-S0
Registered executor on centos71
Starting task 6060349a-ab26-45d2-a2fa-96e561f794a8
sh -c 'sleep 1000'
Forked command at 21048
I1201 15:59:03.420940 21044 exec.cpp:383] Executor asked to shutdown
Shutting down
Sending SIGTERM to process tree at pid 21048
Killing the following process trees:
[
--- 21048 sleep 1000
]
Command terminated with signal Terminated (pid: 21048)
../../src/tests/mesos.cpp:781: Failure
(cgroups::destroy(hierarchy, cgroup)).failure(): Failed to kill tasks in nested 
cgroups: Collect failed: Invalid freezer cgroup: 
'mesos_test_d456f5bc-7718-4850-990e-8961404efd15/8fa1aee7-b393-4a20-85e1-33cd2fca0b10'
 is not a valid cgroup
[  FAILED  ] SlaveRecoveryTest/0.Reboot, where TypeParam = 
mesos::internal::slave::MesosContainerizer (4835 ms)
{noformat}

> SlaveRecoveryTest/0.Reboot fails under CentOS 7.1 with libevent & SSL enabled.
> --
>
> Key: MESOS-4032
> URL: https://issues.apache.org/jira/browse/MESOS-4032
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 7.1, {{--enable-libevent --enable-ssl}}
>Reporter: Jan Schlicht
>
> Running {{sudo ./bin/mesos-tests.sh}} has SlaveRecoveryTest/0.Reboot failing. 
> A virtual env was used to run the tests.
> Vagrantfile generator:
> {noformat}
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.hostname = "centos71"
>   config.vm.box = "bento/centos-7.1"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = 8192
> vb.cpus = 8
>   end
>   config.vm.provision "shell", inline: <<-SHELL
>  yum -y update systemd
>  yum install -y tar wget
>  wget 
> http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo 
> -O /etc/yum.repos.d/epel-apache-maven.repo
>  yum groupinstall -y "Development Tools"
>  yum install -y apache-maven python-devel java-1.7.0-openjdk-devel 
> zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 
> apr-devel subversion-devel apr-util-devel
>  yum install -y libevent-devel
>  yum install -y perf nmap-ncat
>  yum install -y git
>  yum install -y docker
>  systemctl start docker
>  systemctl enable docker
>   SHELL
> end
> EOF
> vagrant up
> vagrant reload
> vagrant ssh -c "
> git clone  https://github.com/apache/mesos.git mesos
> cd mesos
> git checkout -b 0.26.0-rc2 0.26.0-rc2
> ./bootstrap
> mkdir build
> cd build
> ../configure --enable-libevent --enable-ssl
> GTEST_FILTER="" make check
> sudo ./bin/mesos-tests.sh
> "
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4032) SlaveRecoveryTest/0.Reboot fails under CentOS 7.1 with libevent & SSL enabled.

2015-12-01 Thread Jan Schlicht (JIRA)

Jan Schlicht created MESOS-4032:
---

 Summary: SlaveRecoveryTest/0.Reboot fails under CentOS 7.1 with 
libevent & SSL enabled.
 Key: MESOS-4032
 URL: https://issues.apache.org/jira/browse/MESOS-4032
 Project: Mesos
  Issue Type: Bug
 Environment: CentOS 7.1, {{--enable-libevent --enable-ssl}}
Reporter: Jan Schlicht


Running {{sudo ./bin/mesos-tests.sh}} has SlaveRecoveryTest/0.Reboot failing. A 
virtual env was used to run the tests.

Vagrantfile generator:
{noformat}
cat << EOF > Vagrantfile
# -*- mode: ruby -*-" >
# vi: set ft=ruby :
Vagrant.configure(2) do |config|
  # Disable shared folder to prevent certain kernel module dependencies.
  config.vm.synced_folder ".", "/vagrant", disabled: true

  config.vm.hostname = "centos71"

  config.vm.box = "bento/centos-7.1"

  config.vm.provider "virtualbox" do |vb|
vb.memory = 8192
vb.cpus = 8
  end

  config.vm.provision "shell", inline: <<-SHELL
 yum -y update systemd

 yum install -y tar wget
 wget 
http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo 
-O /etc/yum.repos.d/epel-apache-maven.repo

 yum groupinstall -y "Development Tools"
 yum install -y apache-maven python-devel java-1.7.0-openjdk-devel 
zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 
apr-devel subversion-devel apr-util-devel

 yum install -y libevent-devel

 yum install -y perf nmap-ncat

 yum install -y git

 yum install -y docker
 systemctl start docker
 systemctl enable docker

  SHELL
end
EOF

vagrant up
vagrant reload

vagrant ssh -c "
git clone  https://github.com/apache/mesos.git mesos
cd mesos
git checkout -b 0.26.0-rc2 0.26.0-rc2

./bootstrap
mkdir build
cd build

../configure --enable-libevent --enable-ssl
GTEST_FILTER="" make check
sudo ./bin/mesos-tests.sh
"
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3548) Investigate federations of Mesos masters

2015-12-01 Thread Neil Conway (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033932#comment-15033932
 ] 

Neil Conway commented on MESOS-3548:


Hi Elouan,

That's awesome that you're interested in this area! We're working on setting up 
a special-interest group for federation, and we'll be sure to include you.

> Investigate federations of Mesos masters
> 
>
> Key: MESOS-3548
> URL: https://issues.apache.org/jira/browse/MESOS-3548
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Neil Conway
>  Labels: federation, mesosphere, multi-dc
>
> In a large Mesos installation, the operator might want to ensure that even if 
> the Mesos masters are inaccessible or failed, new tasks can still be 
> scheduled (across multiple different frameworks). HA masters are only a 
> partial solution here: the masters might still be inaccessible due to a 
> correlated failure (e.g., Zookeeper misconfiguration/human error).
> To support this, we could support the notion of "hierarchies" or 
> "federations" of Mesos masters. In a Mesos installation with 10k machines, 
> the operator might configure 10 Mesos masters (each of which might be HA) to 
> manage 1k machines each. Then an additional "meta-Master" would manage the 
> allocation of cluster resources to the 10 masters. Hence, the failure of any 
> individual master would impact 1k machines at most. The meta-master might not 
> have a lot of work to do: e.g., it might be limited to occasionally 
> reallocating cluster resources among the 10 masters, or ensuring that newly 
> added cluster resources are allocated among the masters as appropriate. 
> Hence, the failure of the meta-master would not prevent any of the individual 
> masters from scheduling new tasks. A single framework instance probably 
> wouldn't be able to use more resources than have been assigned to a single 
> Master, but that seems like a reasonable restriction.
> This feature might also be a good fit for a multi-datacenter deployment of 
> Mesos: each Mesos master instance would manage a single DC. Naturally, 
> reducing the traffic between frameworks and the meta-master would be 
> important for performance reasons in a configuration like this.
> Operationally, this might be simpler if Mesos processes were self-hosting 
> ([MESOS-3547]).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3773) RegistryClientTest.SimpleGetBlob is flaky

2015-12-01 Thread Bernd Mathiske (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-3773:
--
Fix Version/s: (was: 0.26.0)
   0.27.0

> RegistryClientTest.SimpleGetBlob is flaky
> -
>
> Key: MESOS-3773
> URL: https://issues.apache.org/jira/browse/MESOS-3773
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Joseph Wu
>Assignee: Jojy Varghese
>  Labels: mesosphere
> Fix For: 0.27.0
>
>
> {{RegistryClientTest.SimpleGetBlob}} fails about 1/5 times.  This was 
> encountered on OSX.
> {code:title=Repro}
> bin/mesos-tests.sh --gtest_filter="*RegistryClientTest.SimpleGetBlob*" 
> --gtest_repeat=10 --gtest_break_on_failure
> {code}
> {code:title=Example Failure}
> [ RUN  ] RegistryClientTest.SimpleGetBlob
> ../../src/tests/containerizer/provisioner_docker_tests.cpp:946: Failure
> Value of: blobResponse
>   Actual: "2015-10-20 20:58:59.579393024+00:00"
> Expected: blob.get()
> Which is: 
> "\x15\x3\x3\00(P~\xCA&\xC6<\x4\x16\xE\xB2\xFF\b1a\xB9Z{\xE0\x80\xDA`\xBCt\x5R\x81x6\xF8
>  \x8B{\xA8\xA9\x4\xAB\xB6" "E\xE6\xDE\xCF\xD9*\xCC!\xC2\x15" "2015-10-20 
> 20:58:59.579393024+00:00"
> *** Aborted at 1445374739 (unix time) try "date -d @1445374739" if you are 
> using GNU date ***
> PC: @0x103144ddc testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 49008 (TID 0x7fff73ca3300) stack trace: ***
> @ 0x7fff8c58af1a _sigtramp
> @ 0x7fff8386e187 malloc
> @0x1031445b7 testing::internal::AssertHelper::operator=()
> @0x1030d32e0 
> mesos::internal::tests::RegistryClientTest_SimpleGetBlob_Test::TestBody()
> @0x1030d3562 
> mesos::internal::tests::RegistryClientTest_SimpleGetBlob_Test::TestBody()
> @0x1031ac8f3 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @0x103192f87 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x1031533f5 testing::Test::Run()
> @0x10315493b testing::TestInfo::Run()
> @0x1031555f7 testing::TestCase::Run()
> @0x103163df3 testing::internal::UnitTestImpl::RunAllTests()
> @0x1031af8c3 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @0x103195397 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x1031639f2 testing::UnitTest::Run()
> @0x1025abd41 RUN_ALL_TESTS()
> @0x1025a8089 main
> @ 0x7fff86b155c9 start
> {code}
> {code:title=Less common failure}
> [ RUN  ] RegistryClientTest.SimpleGetBlob
> ../../src/tests/containerizer/provisioner_docker_tests.cpp:926: Failure
> (socket).failure(): Failed accept: connection error: 
> error::lib(0):func(0):reason(0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4029) ContentType/SchedulerTest is flaky.

2015-12-01 Thread Bernd Mathiske (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033879#comment-15033879
 ] 

Bernd Mathiske commented on MESOS-4029:
---

Talking to Anand and Alexander I am getting the impression this is likely a 
test bug. 

> ContentType/SchedulerTest is flaky.
> ---
>
> Key: MESOS-4029
> URL: https://issues.apache.org/jira/browse/MESOS-4029
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>Assignee: Anand Mazumdar
>  Labels: flaky, flaky-test, mesosphere
>
> SSL build, [Ubuntu 
> 14.04|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh],
>  non-root test run.
> {noformat}
> [--] 22 tests from ContentType/SchedulerTest
> [ RUN  ] ContentType/SchedulerTest.Subscribe/0
> [   OK ] ContentType/SchedulerTest.Subscribe/0 (48 ms)
> *** Aborted at 1448928007 (unix time) try "date -d @1448928007" if you are 
> using GNU date ***
> [ RUN  ] ContentType/SchedulerTest.Subscribe/1
> PC: @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> *** SIGSEGV (@0x10030) received by PID 21320 (TID 0x2b549e5d4700) from 
> PID 48; stack trace: ***
> @ 0x2b54c95940b7 os::Linux::chained_handler()
> @ 0x2b54c9598219 JVM_handle_linux_signal
> @ 0x2b5496300340 (unknown)
> @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> @   0xe2ea6d 
> _ZN7testing8internal18FunctionMockerBaseIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS6_SaIS6_E10InvokeWithERKSt5tupleIJSC_EE
> @   0xe2b1bc testing::internal::FunctionMocker<>::Invoke()
> @  0x1118aed 
> mesos::internal::tests::SchedulerTest::Callbacks::received()
> @  0x111c453 
> _ZNKSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS0_2v19scheduler5EventESt5dequeIS8_SaIS8_EclIJSE_EvEEvRS4_DpOT_
> @  0x111c001 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi16__callIvJSF_EJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
> @  0x111b90d 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi1clIJSF_EvEET0_DpOT_
> @  0x111ae09 std::_Function_handler<>::_M_invoke()
> @ 0x2b5493c6da09 std::function<>::operator()()
> @ 0x2b5493c688ee process::AsyncExecutorProcess::execute<>()
> @ 0x2b5493c6db2a 
> _ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS8_SaIS8_ESC_PvSG_SC_SJ_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSO_FSL_T1_T2_T3_ET4_T5_T6_ENKUlPNS_11ProcessBaseEE_clES11_
> @ 0x2b5493c765a4 
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingNS0_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISC_SaISC_ESG_PvSK_SG_SN_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSS_FSP_T1_T2_T3_ET4_T5_T6_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x2b54946b1201 std::function<>::operator()()
> @ 0x2b549469960f process::ProcessBase::visit()
> @ 0x2b549469d480 process::DispatchEvent::visit()
> @   0x9dc0ba process::ProcessBase::serve()
> @ 0x2b54946958cc process::ProcessManager::resume()
> @ 0x2b5494692a9c 
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
> @ 0x2b549469ccac 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
> @ 0x2b549469cc5c 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
> @ 0x2b549469cbee 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @ 0x2b549469cb45 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
> @ 0x2b549469cade 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
> @ 0x2b5495b81a40 (unknown)
> @ 0x2b54962f8182 start_thread
> @ 0x2b549660847d (unknown)
> make[3]: *** [check-local] Segmentation fault
> make[3]: Leaving directory `/home/vagrant/mesos/build/src'
> make[2]: *** [check-am] Er

[jira] [Updated] (MESOS-4029) ContentType/SchedulerTest is flaky.

2015-12-01 Thread Bernd Mathiske (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4029:
--
Affects Version/s: (was: 0.27.0)
   0.26.0

> ContentType/SchedulerTest is flaky.
> ---
>
> Key: MESOS-4029
> URL: https://issues.apache.org/jira/browse/MESOS-4029
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>Assignee: Anand Mazumdar
>  Labels: flaky, flaky-test, mesosphere
>
> SSL build, [Ubuntu 
> 14.04|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh],
>  non-root test run.
> {noformat}
> [--] 22 tests from ContentType/SchedulerTest
> [ RUN  ] ContentType/SchedulerTest.Subscribe/0
> [   OK ] ContentType/SchedulerTest.Subscribe/0 (48 ms)
> *** Aborted at 1448928007 (unix time) try "date -d @1448928007" if you are 
> using GNU date ***
> [ RUN  ] ContentType/SchedulerTest.Subscribe/1
> PC: @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> *** SIGSEGV (@0x10030) received by PID 21320 (TID 0x2b549e5d4700) from 
> PID 48; stack trace: ***
> @ 0x2b54c95940b7 os::Linux::chained_handler()
> @ 0x2b54c9598219 JVM_handle_linux_signal
> @ 0x2b5496300340 (unknown)
> @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> @   0xe2ea6d 
> _ZN7testing8internal18FunctionMockerBaseIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS6_SaIS6_E10InvokeWithERKSt5tupleIJSC_EE
> @   0xe2b1bc testing::internal::FunctionMocker<>::Invoke()
> @  0x1118aed 
> mesos::internal::tests::SchedulerTest::Callbacks::received()
> @  0x111c453 
> _ZNKSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS0_2v19scheduler5EventESt5dequeIS8_SaIS8_EclIJSE_EvEEvRS4_DpOT_
> @  0x111c001 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi16__callIvJSF_EJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
> @  0x111b90d 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi1clIJSF_EvEET0_DpOT_
> @  0x111ae09 std::_Function_handler<>::_M_invoke()
> @ 0x2b5493c6da09 std::function<>::operator()()
> @ 0x2b5493c688ee process::AsyncExecutorProcess::execute<>()
> @ 0x2b5493c6db2a 
> _ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS8_SaIS8_ESC_PvSG_SC_SJ_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSO_FSL_T1_T2_T3_ET4_T5_T6_ENKUlPNS_11ProcessBaseEE_clES11_
> @ 0x2b5493c765a4 
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingNS0_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISC_SaISC_ESG_PvSK_SG_SN_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSS_FSP_T1_T2_T3_ET4_T5_T6_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x2b54946b1201 std::function<>::operator()()
> @ 0x2b549469960f process::ProcessBase::visit()
> @ 0x2b549469d480 process::DispatchEvent::visit()
> @   0x9dc0ba process::ProcessBase::serve()
> @ 0x2b54946958cc process::ProcessManager::resume()
> @ 0x2b5494692a9c 
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
> @ 0x2b549469ccac 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
> @ 0x2b549469cc5c 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
> @ 0x2b549469cbee 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @ 0x2b549469cb45 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
> @ 0x2b549469cade 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
> @ 0x2b5495b81a40 (unknown)
> @ 0x2b54962f8182 start_thread
> @ 0x2b549660847d (unknown)
> make[3]: *** [check-local] Segmentation fault
> make[3]: Leaving directory `/home/vagrant/mesos/build/src'
> make[2]: *** [check-am] Error 2
> make[2]: Leaving directory `/home/vagrant/mesos/build/src'
> make[1

[jira] [Updated] (MESOS-4029) ContentType/SchedulerTest is flaky.

2015-12-01 Thread Bernd Mathiske (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4029:
--
Target Version/s: 0.27.0  (was: 0.26.0)

> ContentType/SchedulerTest is flaky.
> ---
>
> Key: MESOS-4029
> URL: https://issues.apache.org/jira/browse/MESOS-4029
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>Assignee: Anand Mazumdar
>  Labels: flaky, flaky-test, mesosphere
>
> SSL build, [Ubuntu 
> 14.04|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh],
>  non-root test run.
> {noformat}
> [--] 22 tests from ContentType/SchedulerTest
> [ RUN  ] ContentType/SchedulerTest.Subscribe/0
> [   OK ] ContentType/SchedulerTest.Subscribe/0 (48 ms)
> *** Aborted at 1448928007 (unix time) try "date -d @1448928007" if you are 
> using GNU date ***
> [ RUN  ] ContentType/SchedulerTest.Subscribe/1
> PC: @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> *** SIGSEGV (@0x10030) received by PID 21320 (TID 0x2b549e5d4700) from 
> PID 48; stack trace: ***
> @ 0x2b54c95940b7 os::Linux::chained_handler()
> @ 0x2b54c9598219 JVM_handle_linux_signal
> @ 0x2b5496300340 (unknown)
> @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> @   0xe2ea6d 
> _ZN7testing8internal18FunctionMockerBaseIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS6_SaIS6_E10InvokeWithERKSt5tupleIJSC_EE
> @   0xe2b1bc testing::internal::FunctionMocker<>::Invoke()
> @  0x1118aed 
> mesos::internal::tests::SchedulerTest::Callbacks::received()
> @  0x111c453 
> _ZNKSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS0_2v19scheduler5EventESt5dequeIS8_SaIS8_EclIJSE_EvEEvRS4_DpOT_
> @  0x111c001 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi16__callIvJSF_EJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
> @  0x111b90d 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi1clIJSF_EvEET0_DpOT_
> @  0x111ae09 std::_Function_handler<>::_M_invoke()
> @ 0x2b5493c6da09 std::function<>::operator()()
> @ 0x2b5493c688ee process::AsyncExecutorProcess::execute<>()
> @ 0x2b5493c6db2a 
> _ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS8_SaIS8_ESC_PvSG_SC_SJ_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSO_FSL_T1_T2_T3_ET4_T5_T6_ENKUlPNS_11ProcessBaseEE_clES11_
> @ 0x2b5493c765a4 
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingNS0_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISC_SaISC_ESG_PvSK_SG_SN_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSS_FSP_T1_T2_T3_ET4_T5_T6_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x2b54946b1201 std::function<>::operator()()
> @ 0x2b549469960f process::ProcessBase::visit()
> @ 0x2b549469d480 process::DispatchEvent::visit()
> @   0x9dc0ba process::ProcessBase::serve()
> @ 0x2b54946958cc process::ProcessManager::resume()
> @ 0x2b5494692a9c 
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
> @ 0x2b549469ccac 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
> @ 0x2b549469cc5c 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
> @ 0x2b549469cbee 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @ 0x2b549469cb45 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
> @ 0x2b549469cade 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
> @ 0x2b5495b81a40 (unknown)
> @ 0x2b54962f8182 start_thread
> @ 0x2b549660847d (unknown)
> make[3]: *** [check-local] Segmentation fault
> make[3]: Leaving directory `/home/vagrant/mesos/build/src'
> make[2]: *** [check-am] Error 2
> make[2]: Leaving directory `/home/vagrant/mesos/build/src'
> make[1]: *** [check] Error 2
> ma

[jira] [Updated] (MESOS-4029) ContentType/SchedulerTest is flaky.

2015-12-01 Thread Bernd Mathiske (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4029:
--
Affects Version/s: (was: 0.26.0)
   0.27.0

> ContentType/SchedulerTest is flaky.
> ---
>
> Key: MESOS-4029
> URL: https://issues.apache.org/jira/browse/MESOS-4029
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>Assignee: Anand Mazumdar
>  Labels: flaky, flaky-test, mesosphere
>
> SSL build, [Ubuntu 
> 14.04|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh],
>  non-root test run.
> {noformat}
> [--] 22 tests from ContentType/SchedulerTest
> [ RUN  ] ContentType/SchedulerTest.Subscribe/0
> [   OK ] ContentType/SchedulerTest.Subscribe/0 (48 ms)
> *** Aborted at 1448928007 (unix time) try "date -d @1448928007" if you are 
> using GNU date ***
> [ RUN  ] ContentType/SchedulerTest.Subscribe/1
> PC: @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> *** SIGSEGV (@0x10030) received by PID 21320 (TID 0x2b549e5d4700) from 
> PID 48; stack trace: ***
> @ 0x2b54c95940b7 os::Linux::chained_handler()
> @ 0x2b54c9598219 JVM_handle_linux_signal
> @ 0x2b5496300340 (unknown)
> @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> @   0xe2ea6d 
> _ZN7testing8internal18FunctionMockerBaseIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS6_SaIS6_E10InvokeWithERKSt5tupleIJSC_EE
> @   0xe2b1bc testing::internal::FunctionMocker<>::Invoke()
> @  0x1118aed 
> mesos::internal::tests::SchedulerTest::Callbacks::received()
> @  0x111c453 
> _ZNKSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS0_2v19scheduler5EventESt5dequeIS8_SaIS8_EclIJSE_EvEEvRS4_DpOT_
> @  0x111c001 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi16__callIvJSF_EJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
> @  0x111b90d 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi1clIJSF_EvEET0_DpOT_
> @  0x111ae09 std::_Function_handler<>::_M_invoke()
> @ 0x2b5493c6da09 std::function<>::operator()()
> @ 0x2b5493c688ee process::AsyncExecutorProcess::execute<>()
> @ 0x2b5493c6db2a 
> _ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS8_SaIS8_ESC_PvSG_SC_SJ_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSO_FSL_T1_T2_T3_ET4_T5_T6_ENKUlPNS_11ProcessBaseEE_clES11_
> @ 0x2b5493c765a4 
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingNS0_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISC_SaISC_ESG_PvSK_SG_SN_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSS_FSP_T1_T2_T3_ET4_T5_T6_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x2b54946b1201 std::function<>::operator()()
> @ 0x2b549469960f process::ProcessBase::visit()
> @ 0x2b549469d480 process::DispatchEvent::visit()
> @   0x9dc0ba process::ProcessBase::serve()
> @ 0x2b54946958cc process::ProcessManager::resume()
> @ 0x2b5494692a9c 
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
> @ 0x2b549469ccac 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
> @ 0x2b549469cc5c 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
> @ 0x2b549469cbee 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @ 0x2b549469cb45 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
> @ 0x2b549469cade 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
> @ 0x2b5495b81a40 (unknown)
> @ 0x2b54962f8182 start_thread
> @ 0x2b549660847d (unknown)
> make[3]: *** [check-local] Segmentation fault
> make[3]: Leaving directory `/home/vagrant/mesos/build/src'
> make[2]: *** [check-am] Error 2
> make[2]: Leaving directory `/home/vagrant/mesos/build/src'
> make[1

[jira] [Created] (MESOS-4031) slave crashed in cgroupstatistics()

2015-12-01 Thread Steven (JIRA)

Steven created MESOS-4031:
-

 Summary: slave crashed in cgroupstatistics()
 Key: MESOS-4031
 URL: https://issues.apache.org/jira/browse/MESOS-4031
 Project: Mesos
  Issue Type: Bug
  Components: containerization, libprocess
Affects Versions: 0.24.0
 Environment: Debian jessie
Reporter: Steven


Hi all, 
I have built a mesos cluster with three slaves. Any slave may sporadically 
crash when I get the summary through mesos master ui. Here is the stack trace. 

```
 slave.sh[13336]: I1201 11:54:12.827975 13338 slave.cpp:3926] Current disk 
usage 79.71%. Max allowed age: 17.279577136390834hrs
 slave.sh[13336]: I1201 11:55:12.829792 13342 slave.cpp:3926] Current disk 
usage 79.71%. Max allowed age: 17.279577136390834hrs
 slave.sh[13336]: I1201 11:55:38.389614 13342 http.cpp:189] HTTP GET for 
/slave(1)/state from 192.168.100.1:64870 with User-Agent='Mozilla/5.0 (X11; 
Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0'
 docker[8409]: time="2015-12-01T11:55:38.934148017+08:00" level=info msg="GET 
/v1.20/containers/mesos-b25be32d-41e1-4e14-9b84-d33d733cef51-S3.79c206a6-d6b5-487b-9390-e09292c5b53a/json"
 docker[8409]: time="2015-12-01T11:55:38.941489332+08:00" level=info msg="GET 
/v1.20/containers/mesos-b25be32d-41e1-4e14-9b84-d33d733cef51-S3.1e01a4b3-a76e-4bf6-8ce0-a4a937faf236/json"
 slave.sh[13336]: ABORT: 
(../../3rdparty/libprocess/3rdparty/stout/include/stout/result.hpp:110): 
Result::get() but state == NONE*** Aborted at 1448942139 (unix time) try "date 
-d @1448942139" if you are using GNU date ***
 slave.sh[13336]: PC: @ 0x7f295218a107 (unknown)
 slave.sh[13336]: *** SIGABRT (@0x3419) received by PID 13337 (TID 
0x7f2948992700) from PID 13337; stack trace: ***
 slave.sh[13336]: @ 0x7f2952a2e8d0 (unknown)
 slave.sh[13336]: @ 0x7f295218a107 (unknown)
 slave.sh[13336]: @ 0x7f295218b4e8 (unknown)
 slave.sh[13336]: @   0x43dc59 _Abort()
 slave.sh[13336]: @   0x43dc87 _Abort()
 slave.sh[13336]: @ 0x7f2955e31c86 Result<>::get()
 slave.sh[13336]: @ 0x7f295637f017 
mesos::internal::slave::DockerContainerizerProcess::cgroupsStatistics()
 slave.sh[13336]: @ 0x7f295637dfea 
_ZZN5mesos8internal5slave26DockerContainerizerProcess5usageERKNS_11ContainerIDEENKUliE_clEi
 slave.sh[13336]: @ 0x7f295637e549 
_ZZN5mesos8internal5slave26DockerContainerizerProcess5usageERKNS_11ContainerIDEENKUlRKN6Docker9ContainerEE0_clES9_
 slave.sh[13336]: @ 0x7f295638453b
ZN5mesos8internal5slave26DockerContainerizerProcess5usageERKNS1_11ContainerIDEEUlRKN6Docker9ContainerEE0_EcvSt8functionIFT_T0_EEINS_6FutureINS1_18ResourceStatisticsEEESB_EEvENKUlSB_E_clESB_ENKUlvE_clEv
 slave.sh[13336]: @ 0x7f295638751d
FN7process6FutureIN5mesos18ResourceStatisticsEEEvEZZNKS0_9_DeferredIZNS2_8internal5slave26DockerContainerizerProcess5usageERKNS2_11ContainerIDEEUlRKN6Docker9ContainerEE0_EcvSt8functionIFT_T0_EEIS4_SG_EEvENKUlSG_E_clESG_EUlvE_E9_M_invoke

 slave.sh[13336]: @ 0x7f29563b53e7 std::function<>::operator()()
 slave.sh[13336]: @ 0x7f29563aa5dc 
_ZZN7process8dispatchIN5mesos18ResourceStatisticsEEENS_6FutureIT_EERKNS_4UPIDERKSt8functionIFS5_vEEENKUlPNS_11ProcessBaseEE_clESF_
 slave.sh[13336]: @ 0x7f29563bd667 
_ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos18ResourceStatisticsEEENS0_6FutureIT_EERKNS0_4UPIDERKSt8functionIFS9_vEEEUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
 slave.sh[13336]: @ 0x7f2956b893c3 std::function<>::operator()()
 slave.sh[13336]: @ 0x7f2956b72ab0 process::ProcessBase::visit()
 slave.sh[13336]: @ 0x7f2956b7588e process::DispatchEvent::visit()
 slave.sh[13336]: @ 0x7f2955d7f972 process::ProcessBase::serve()
 slave.sh[13336]: @ 0x7f2956b6ef8e process::ProcessManager::resume()
 slave.sh[13336]: @ 0x7f2956b63555 process::internal::schedule()
 slave.sh[13336]: @ 0x7f2956bc0839 
_ZNSt12_Bind_simpleIFPFvvEvEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
 slave.sh[13336]: @ 0x7f2956bc0781 std::_Bind_simple<>::operator()()
 slave.sh[13336]: @ 0x7f2956bc06fe std::thread::_Impl<>::_M_run()
 slave.sh[13336]: @ 0x7f29527ca970 (unknown)
 slave.sh[13336]: @ 0x7f2952a270a4 start_thread
 slave.sh[13336]: @ 0x7f295223b04d (unknown)
```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.

2015-12-01 Thread Till Toenshoff (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-4025:
--
Target Version/s: 0.27.0  (was: 0.26.0)

> SlaveRecoveryTest/0.GCExecutor is flaky.
> 
>
> Key: MESOS-4025
> URL: https://issues.apache.org/jira/browse/MESOS-4025
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>  Labels: test
>
> Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based 
> on 0.26.0-rc1.
> Testsuite was run as root.
> {noformat}
> sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1
> {noformat}
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.GCExecutor
> I1130 16:49:16.336833  1032 exec.cpp:136] Version: 0.26.0
> I1130 16:49:16.345212  1049 exec.cpp:210] Executor registered on slave 
> dde9fd4e-b016-4a99-9081-b047e9df9afa-S0
> Registered executor on ubuntu14
> Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114
> sh -c 'sleep 1000'
> Forked command at 1057
> ../../src/tests/mesos.cpp:779: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave':
>  Device or resource busy
> *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are 
> using GNU date ***
> PC: @  0x1443e9a testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; 
> stack trace: ***
> @ 0x7f1be92b80b7 os::Linux::chained_handler()
> @ 0x7f1be92bc219 JVM_handle_linux_signal
> @ 0x7f1bf7bbc340 (unknown)
> @  0x1443e9a testing::UnitTest::AddTestPartResult()
> @  0x1438b99 testing::internal::AssertHelper::operator=()
> @   0xf0b3bb 
> mesos::internal::tests::ContainerizerTest<>::TearDown()
> @  0x1461882 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145c6f8 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x143de4a testing::Test::Run()
> @  0x143e584 testing::TestInfo::Run()
> @  0x143ebca testing::TestCase::Run()
> @  0x1445312 testing::internal::UnitTestImpl::RunAllTests()
> @  0x14624a7 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145d26e 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x14440ae testing::UnitTest::Run()
> @   0xd15cd4 RUN_ALL_TESTS()
> @   0xd158c1 main
> @ 0x7f1bf7808ec5 (unknown)
> @   0x913009 (unknown)
> {noformat}
> My Vagrantfile generator;
> {noformat}
> #!/usr/bin/env bash
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.box = "bento/ubuntu-14.04"
>   config.vm.hostname = "${PLATFORM_NAME}"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
> vb.customize ["modifyvm", :id, "--nictype1", "virtio"]
> vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
> vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"]
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
>   end
>   config.vm.provision "file", source: "../test.sh", destination: "~/test.sh"
>   config.vm.provision "shell", inline: <<-SHELL
> sudo apt-get update
> sudo apt-get -y install openjdk-7-jdk autoconf libtool
> sudo apt-get -y install build-essential python-dev python-boto  \
> libcurl4-nss-dev libsasl2-dev maven \
> libapr1-dev libsvn-dev libssl-dev libevent-dev
> sudo apt-get -y install git
> sudo wget -qO- https://get.docker.com/ | sh
>   SHELL
> end
> EOF
> {noformat}
> The problem is kicking in frequently in my tests - I'ld say > 10% but less 
> than 50%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.

2015-12-01 Thread Till Toenshoff (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033828#comment-15033828
 ] 

Till Toenshoff commented on MESOS-4025:
---

Thanks for your analysis. We will declare it as a non-blocker as it is a 
test-only issue according to your research. Thanks again!!

> SlaveRecoveryTest/0.GCExecutor is flaky.
> 
>
> Key: MESOS-4025
> URL: https://issues.apache.org/jira/browse/MESOS-4025
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>  Labels: test
>
> Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based 
> on 0.26.0-rc1.
> Testsuite was run as root.
> {noformat}
> sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1
> {noformat}
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.GCExecutor
> I1130 16:49:16.336833  1032 exec.cpp:136] Version: 0.26.0
> I1130 16:49:16.345212  1049 exec.cpp:210] Executor registered on slave 
> dde9fd4e-b016-4a99-9081-b047e9df9afa-S0
> Registered executor on ubuntu14
> Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114
> sh -c 'sleep 1000'
> Forked command at 1057
> ../../src/tests/mesos.cpp:779: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave':
>  Device or resource busy
> *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are 
> using GNU date ***
> PC: @  0x1443e9a testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; 
> stack trace: ***
> @ 0x7f1be92b80b7 os::Linux::chained_handler()
> @ 0x7f1be92bc219 JVM_handle_linux_signal
> @ 0x7f1bf7bbc340 (unknown)
> @  0x1443e9a testing::UnitTest::AddTestPartResult()
> @  0x1438b99 testing::internal::AssertHelper::operator=()
> @   0xf0b3bb 
> mesos::internal::tests::ContainerizerTest<>::TearDown()
> @  0x1461882 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145c6f8 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x143de4a testing::Test::Run()
> @  0x143e584 testing::TestInfo::Run()
> @  0x143ebca testing::TestCase::Run()
> @  0x1445312 testing::internal::UnitTestImpl::RunAllTests()
> @  0x14624a7 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145d26e 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x14440ae testing::UnitTest::Run()
> @   0xd15cd4 RUN_ALL_TESTS()
> @   0xd158c1 main
> @ 0x7f1bf7808ec5 (unknown)
> @   0x913009 (unknown)
> {noformat}
> My Vagrantfile generator;
> {noformat}
> #!/usr/bin/env bash
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.box = "bento/ubuntu-14.04"
>   config.vm.hostname = "${PLATFORM_NAME}"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
> vb.customize ["modifyvm", :id, "--nictype1", "virtio"]
> vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
> vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"]
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
>   end
>   config.vm.provision "file", source: "../test.sh", destination: "~/test.sh"
>   config.vm.provision "shell", inline: <<-SHELL
> sudo apt-get update
> sudo apt-get -y install openjdk-7-jdk autoconf libtool
> sudo apt-get -y install build-essential python-dev python-boto  \
> libcurl4-nss-dev libsasl2-dev maven \
> libapr1-dev libsvn-dev libssl-dev libevent-dev
> sudo apt-get -y install git
> sudo wget -qO- https://get.docker.com/ | sh
>   SHELL
> end
> EOF
> {noformat}
> The problem is kicking in frequently in my tests - I'ld say > 10% but less 
> than 50%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4029) ContentType/SchedulerTest is flaky.

2015-12-01 Thread Alexander Rojas (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033781#comment-15033781
 ] 

Alexander Rojas commented on MESOS-4029:


After applying the patch I still got the following crashes:

{noformat}
[   OK ] ContentType/SchedulerTest.Subscribe/0 (66 ms)
[ RUN  ] ContentType/SchedulerTest.Subscribe/1
@ 0x7fb100193686  google::LogMessage::Fail()
@ 0x7fb100198dac  google::RawLog__()
@ 0x7fb0ff3d9c14  __cxa_pure_virtual
@  0x14e3c38  
testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
@   0xe20259  
_ZN7testing8internal18FunctionMockerBaseIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS6_SaIS6_EEE
EE10InvokeWithERKSt5tupleIJSC_EE
@   0xe1c9a8  testing::internal::FunctionMocker<>::Invoke()
@  0x118d6b9  
mesos::internal::tests::SchedulerTest::Callbacks::received()
@  0x119101f  
_ZNKSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS0_2v19scheduler5EventES
t5dequeIS8_SaIS8_EclIJSE_EvEEvRS4_DpOT_
@  0x1190bcd  
_ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19schedule
r5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi16__callIvJSF_EJLm0ELm1T_OSt5tupleIJDpT0_EES
t12_Index_tupleIJXspT1_EEE
@  0x11904d9  
_ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19schedule
r5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi1clIJSF_EvEET0_DpOT_
@  0x118f9d5  std::_Function_handler<>::_M_invoke()
@ 0x7fb0ff69b103  std::function<>::operator()()
@ 0x7fb0ff695fe8  process::AsyncExecutorProcess::execute<>()
@ 0x7fb0ff69b224  
_ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19schedule
r5EventESt5dequeIS8_SaIS8_ESC_PvSG_SC_SJ_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSO_FSL_T1_T2_T3_ET4_T5_T6_ENKUlPNS_11ProcessBaseEE
_clES11_
@ 0x7fb0ff6a3c9e  
_ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingNS0_20AsyncExecutorProcessE
RKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISC_SaISC_ESG_PvSK_SG_SN_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSS
_FSP_T1_T2_T3_ET4_T5_T6_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
@ 0x7fb1001015e1  std::function<>::operator()()
@ 0x7fb1000e9927  process::ProcessBase::visit()
@ 0x7fb1000ed516  process::DispatchEvent::visit()
@   0x9e844a  process::ProcessBase::serve()
@ 0x7fb1000e5bf0  process::ProcessManager::resume()
@ 0x7fb1000e2ca6  
_ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
@ 0x7fb1000eccd8  
_ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_
EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
@ 0x7fb1000ecc88  
_ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_
EEEclIIEvEET0_DpOT_
@ 0x7fb1000ecc1a  
_ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17ref
erence_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
@ 0x7fb1000ecb71  
_ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17ref
erence_wrapperIS4_EEEvEEclEv
@ 0x7fb1000ecb0a  
_ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atom
ic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
@ 0x7fb0fb4a9a40  (unknown)
@ 0x7fb0facc6182  start_thread
@ 0x7fb0fa9f347d  (unknown)
Aborted (core dumped)
{noformat}

And

{noformat}
[ RUN  ] ContentType/SchedulerTest.Subscribe/1
I1201 15:20:59.848814 32637 leveldb.cpp:174] Opened db in 5.713001ms
I1201 15:20:59.850643 32637 leveldb.cpp:181] Compacted db in 1.722714ms
I1201 15:20:59.851052 32637 leveldb.cpp:196] Created db iterator in 120371ns
I1201 15:20:59.851768 32637 leveldb.cpp:202] Seeked to beginning of db in 3411ns
I1201 15:20:59.851850 32637 leveldb.cpp:271] Iterated through 0 keys in the db 
in 15133ns
I1201 15:20:59.852177 32637 replica.cpp:778] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1201 15:20:59.853752 32657 recover.cpp:447] Starting replica recovery
I1201 15:20:59.854022 32657 recover.cpp:473] Replica is in EMPTY status
I1201 15:20:59.855265 32652 replica.cpp:674] Replica in EMPTY status received a 
broadcasted recover request from (6918)@127.0.1.1:46010
I1201 15:20:59.855675 32652 recover.cpp:193] Received a recover response from a 
replica in EMPTY status
I1201 15:20:59.855649 32656 master.cpp:365] Master 
b893dcee-362e-4fcf-81ac-d190058b8682 (ubuntu-vm) started on 127.0.1.1:46010
I1201 15:20:59.856055

[jira] [Comment Edited] (MESOS-3718) Implement Quota support in allocator

2015-12-01 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971338#comment-14971338
 ] 

Alexander Rukletsov edited comment on MESOS-3718 at 12/1/15 2:18 PM:
-

https://reviews.apache.org/r/39399/
https://reviews.apache.org/r/39400/
https://reviews.apache.org/r/40551/
https://reviews.apache.org/r/40795/
https://reviews.apache.org/r/40821/


was (Author: alexr):
https://reviews.apache.org/r/39399/
https://reviews.apache.org/r/39400/
https://reviews.apache.org/r/40551/
https://reviews.apache.org/r/40795/

> Implement Quota support in allocator
> 
>
> Key: MESOS-3718
> URL: https://issues.apache.org/jira/browse/MESOS-3718
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> The built-in Hierarchical DRF allocator should support Quota. This includes 
> (but not limited to): adding, updating, removing and satisfying quota; 
> avoiding both overcomitting resources and handing them to non-quota'ed roles 
> in presence of master failover.
> A [design doc for Quota support in 
> Allocator|https://issues.apache.org/jira/browse/MESOS-2937] provides an 
> overview of a feature set required to be implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3273) EventCall Test Framework is flaky

2015-12-01 Thread Benjamin Bannier (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3273:

Attachment: asan.log

Clang address sanitizer reports use-after-free errors for this test which 
appear to come from the libevent bindings; it attached a log. It might be a 
good idea to address that issue first.

> EventCall Test Framework is flaky
> -
>
> Key: MESOS-3273
> URL: https://issues.apache.org/jira/browse/MESOS-3273
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0
> Environment: 
> https://builds.apache.org/job/Mesos/705/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/consoleFull
>Reporter: Vinod Kone
>  Labels: flaky-test, tech-debt, twitter
> Attachments: asan.log
>
>
> Observed this on ASF CI. h/t [~haosd...@gmail.com]
> Looks like the HTTP scheduler never sent a SUBSCRIBE request to the master.
> {code}
> [ RUN  ] ExamplesTest.EventCallFramework
> Using temporary directory '/tmp/ExamplesTest_EventCallFramework_k4vXkx'
> I0813 19:55:15.643579 26085 exec.cpp:443] Ignoring exited event because the 
> driver is aborted!
> Shutting down
> Sending SIGTERM to process tree at pid 26061
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26062
> Shutting down
> Killing the following process trees:
> [ 
> ]
> Sending SIGTERM to process tree at pid 26063
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26098
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26099
> Killing the following process trees:
> [ 
> ]
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0813 19:55:17.161726 26100 process.cpp:1012] libprocess is initialized on 
> 172.17.2.10:60249 for 16 cpus
> I0813 19:55:17.161888 26100 logging.cpp:177] Logging to STDERR
> I0813 19:55:17.163625 26100 scheduler.cpp:157] Version: 0.24.0
> I0813 19:55:17.175302 26100 leveldb.cpp:176] Opened db in 3.167446ms
> I0813 19:55:17.176393 26100 leveldb.cpp:183] Compacted db in 1.047996ms
> I0813 19:55:17.176496 26100 leveldb.cpp:198] Created db iterator in 77155ns
> I0813 19:55:17.176518 26100 leveldb.cpp:204] Seeked to beginning of db in 
> 8429ns
> I0813 19:55:17.176527 26100 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 4219ns
> I0813 19:55:17.176708 26100 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0813 19:55:17.178951 26136 recover.cpp:449] Starting replica recovery
> I0813 19:55:17.179934 26136 recover.cpp:475] Replica is in EMPTY status
> I0813 19:55:17.181970 26126 master.cpp:378] Master 
> 20150813-195517-167907756-60249-26100 (297daca2d01a) started on 
> 172.17.2.10:60249
> I0813 19:55:17.182317 26126 master.cpp:380] Flags at startup: 
> --acls="permissive: false
> register_frameworks {
>   principals {
> type: SOME
> values: "test-principal"
>   }
>   roles {
> type: SOME
> values: "*"
>   }
> }
> run_tasks {
>   principals {
> type: SOME
> values: "test-principal"
>   }
>   users {
> type: SOME
> values: "mesos"
>   }
> }
> " --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="false" 
> --authenticators="crammd5" 
> --credentials="/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" 
> --registry_strict="false" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.24.0/src/webui" --work_dir="/tmp/mesos-II8Gua" 
> --zk_session_timeout="10secs"
> I0813 19:55:17.183475 26126 master.cpp:427] Master allowing unauthenticated 
> frameworks to register
> I0813 19:55:17.183536 26126 master.cpp:432] Master allowing unauthenticated 
> slaves to register
> I0813 19:55:17.183615 26126 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials'
> W0813 19:55:17.183859 26126 credentials.hpp:52] Permissions on credentials 
> file '/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials' are too open. 
> It is recommended that your credentials file is NOT accessible by others.
> I0813 19:55:17.183969 26123 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0813 19:55:17.184306

[jira] [Updated] (MESOS-4030) DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping is flaky

2015-12-01 Thread Till Toenshoff (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-4030:
--
Assignee: Timothy Chen  (was: Benjamin Bannier)

> DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping is flaky
> ---
>
> Key: MESOS-4030
> URL: https://issues.apache.org/jira/browse/MESOS-4030
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.26.0
> Environment: [Ubuntu 
> 14|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh], 
> 0.26.0 RC (wip) enable-ssl & enable-libevent, root test-run 
>Reporter: Till Toenshoff
>Assignee: Timothy Chen
>  Labels: flaky, flaky-test
>
> {noformat}
> [ RUN  ] DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping
> I1201 02:18:00.325283 18931 leveldb.cpp:176] Opened db in 3.877576ms
> I1201 02:18:00.326195 18931 leveldb.cpp:183] Compacted db in 831923ns
> I1201 02:18:00.326288 18931 leveldb.cpp:198] Created db iterator in 21460ns
> I1201 02:18:00.326305 18931 leveldb.cpp:204] Seeked to beginning of db in 
> 1431ns
> I1201 02:18:00.326316 18931 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 178ns
> I1201 02:18:00.326354 18931 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1201 02:18:00.327128 18952 recover.cpp:449] Starting replica recovery
> I1201 02:18:00.327481 18948 recover.cpp:475] Replica is in EMPTY status
> I1201 02:18:00.328354 18945 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (88123)@127.0.1.1:45788
> I1201 02:18:00.328660 18950 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1201 02:18:00.329139 18951 recover.cpp:566] Updating replica status to 
> STARTING
> I1201 02:18:00.330413 18949 master.cpp:367] Master 
> 9577131b-f0b1-47bd-8f88-f5edbf2f026d (ubuntu14) started on 127.0.1.1:45788
> I1201 02:18:00.330474 18949 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/dHFLJX/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/dHFLJX/master" 
> --zk_session_timeout="10secs"
> I1201 02:18:00.330662 18949 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1201 02:18:00.330670 18949 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1201 02:18:00.330682 18949 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/dHFLJX/credentials'
> I1201 02:18:00.330950 18945 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.585892ms
> I1201 02:18:00.331248 18945 replica.cpp:323] Persisted replica status to 
> STARTING
> I1201 02:18:00.330968 18949 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1201 02:18:00.331681 18949 master.cpp:495] Authorization enabled
> I1201 02:18:00.331717 18945 recover.cpp:475] Replica is in STARTING status
> I1201 02:18:00.332875 18947 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (88124)@127.0.1.1:45788
> I1201 02:18:00.44 18947 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1201 02:18:00.333760 18950 recover.cpp:566] Updating replica status to VOTING
> I1201 02:18:00.333875 18945 master.cpp:1606] The newly elected leader is 
> master@127.0.1.1:45788 with id 9577131b-f0b1-47bd-8f88-f5edbf2f026d
> I1201 02:18:00.334624 18951 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 307292ns
> I1201 02:18:00.334652 18951 replica.cpp:323] Persisted replica status to 
> VOTING
> I1201 02:18:00.334656 18945 master.cpp:1619] Elected as the leading master!
> I1201 02:18:00.334758 18951 recover.cpp:580] Successfully joined the Paxos 
> group
> I1201 02:18:00.334933 18945 master.cpp:1379] Recovering from registrar
> I1201 02:18:00.335108 18951 recover.cpp:464] Recover process terminated
> I1201 02:18:00.335183 18951 registrar.cpp:309] Recovering registrar
> I1201 02:18:00.335577 18950 log.cpp:661] Attempting to start the writer
> I1201 02:18:00.336777 18952 replica.cpp:496] Replica received implicit 
> promise

[jira] [Updated] (MESOS-4030) DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping is flaky

2015-12-01 Thread Till Toenshoff (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-4030:
--
Target Version/s: 0.27.0  (was: 0.26.0)

> DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping is flaky
> ---
>
> Key: MESOS-4030
> URL: https://issues.apache.org/jira/browse/MESOS-4030
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.26.0
> Environment: [Ubuntu 
> 14|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh], 
> 0.26.0 RC (wip) enable-ssl & enable-libevent, root test-run 
>Reporter: Till Toenshoff
>Assignee: Timothy Chen
>  Labels: flaky, flaky-test
>
> {noformat}
> [ RUN  ] DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping
> I1201 02:18:00.325283 18931 leveldb.cpp:176] Opened db in 3.877576ms
> I1201 02:18:00.326195 18931 leveldb.cpp:183] Compacted db in 831923ns
> I1201 02:18:00.326288 18931 leveldb.cpp:198] Created db iterator in 21460ns
> I1201 02:18:00.326305 18931 leveldb.cpp:204] Seeked to beginning of db in 
> 1431ns
> I1201 02:18:00.326316 18931 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 178ns
> I1201 02:18:00.326354 18931 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1201 02:18:00.327128 18952 recover.cpp:449] Starting replica recovery
> I1201 02:18:00.327481 18948 recover.cpp:475] Replica is in EMPTY status
> I1201 02:18:00.328354 18945 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (88123)@127.0.1.1:45788
> I1201 02:18:00.328660 18950 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1201 02:18:00.329139 18951 recover.cpp:566] Updating replica status to 
> STARTING
> I1201 02:18:00.330413 18949 master.cpp:367] Master 
> 9577131b-f0b1-47bd-8f88-f5edbf2f026d (ubuntu14) started on 127.0.1.1:45788
> I1201 02:18:00.330474 18949 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/dHFLJX/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/dHFLJX/master" 
> --zk_session_timeout="10secs"
> I1201 02:18:00.330662 18949 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1201 02:18:00.330670 18949 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1201 02:18:00.330682 18949 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/dHFLJX/credentials'
> I1201 02:18:00.330950 18945 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.585892ms
> I1201 02:18:00.331248 18945 replica.cpp:323] Persisted replica status to 
> STARTING
> I1201 02:18:00.330968 18949 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1201 02:18:00.331681 18949 master.cpp:495] Authorization enabled
> I1201 02:18:00.331717 18945 recover.cpp:475] Replica is in STARTING status
> I1201 02:18:00.332875 18947 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (88124)@127.0.1.1:45788
> I1201 02:18:00.44 18947 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1201 02:18:00.333760 18950 recover.cpp:566] Updating replica status to VOTING
> I1201 02:18:00.333875 18945 master.cpp:1606] The newly elected leader is 
> master@127.0.1.1:45788 with id 9577131b-f0b1-47bd-8f88-f5edbf2f026d
> I1201 02:18:00.334624 18951 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 307292ns
> I1201 02:18:00.334652 18951 replica.cpp:323] Persisted replica status to 
> VOTING
> I1201 02:18:00.334656 18945 master.cpp:1619] Elected as the leading master!
> I1201 02:18:00.334758 18951 recover.cpp:580] Successfully joined the Paxos 
> group
> I1201 02:18:00.334933 18945 master.cpp:1379] Recovering from registrar
> I1201 02:18:00.335108 18951 recover.cpp:464] Recover process terminated
> I1201 02:18:00.335183 18951 registrar.cpp:309] Recovering registrar
> I1201 02:18:00.335577 18950 log.cpp:661] Attempting to start the writer
> I1201 02:18:00.336777 18952 replica.cpp:496] Replica received implicit 
> promise request

[jira] [Commented] (MESOS-4030) DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping is flaky

2015-12-01 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033658#comment-15033658
 ] 

Benjamin Bannier commented on MESOS-4030:
-

This appears to be a race in the test code: we cannot parse the containerizer's 
stdout or continue with the cleanup before the containerizer has finished 
running. We could e.g. insert a capture calls to {{Docker::_run}} to get 
notified once we are ready to proceed.

> DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping is flaky
> ---
>
> Key: MESOS-4030
> URL: https://issues.apache.org/jira/browse/MESOS-4030
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.26.0
> Environment: [Ubuntu 
> 14|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh], 
> 0.26.0 RC (wip) enable-ssl & enable-libevent, root test-run 
>Reporter: Till Toenshoff
>Assignee: Benjamin Bannier
>  Labels: flaky, flaky-test
>
> {noformat}
> [ RUN  ] DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping
> I1201 02:18:00.325283 18931 leveldb.cpp:176] Opened db in 3.877576ms
> I1201 02:18:00.326195 18931 leveldb.cpp:183] Compacted db in 831923ns
> I1201 02:18:00.326288 18931 leveldb.cpp:198] Created db iterator in 21460ns
> I1201 02:18:00.326305 18931 leveldb.cpp:204] Seeked to beginning of db in 
> 1431ns
> I1201 02:18:00.326316 18931 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 178ns
> I1201 02:18:00.326354 18931 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1201 02:18:00.327128 18952 recover.cpp:449] Starting replica recovery
> I1201 02:18:00.327481 18948 recover.cpp:475] Replica is in EMPTY status
> I1201 02:18:00.328354 18945 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (88123)@127.0.1.1:45788
> I1201 02:18:00.328660 18950 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1201 02:18:00.329139 18951 recover.cpp:566] Updating replica status to 
> STARTING
> I1201 02:18:00.330413 18949 master.cpp:367] Master 
> 9577131b-f0b1-47bd-8f88-f5edbf2f026d (ubuntu14) started on 127.0.1.1:45788
> I1201 02:18:00.330474 18949 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/dHFLJX/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/dHFLJX/master" 
> --zk_session_timeout="10secs"
> I1201 02:18:00.330662 18949 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1201 02:18:00.330670 18949 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1201 02:18:00.330682 18949 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/dHFLJX/credentials'
> I1201 02:18:00.330950 18945 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.585892ms
> I1201 02:18:00.331248 18945 replica.cpp:323] Persisted replica status to 
> STARTING
> I1201 02:18:00.330968 18949 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1201 02:18:00.331681 18949 master.cpp:495] Authorization enabled
> I1201 02:18:00.331717 18945 recover.cpp:475] Replica is in STARTING status
> I1201 02:18:00.332875 18947 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (88124)@127.0.1.1:45788
> I1201 02:18:00.44 18947 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1201 02:18:00.333760 18950 recover.cpp:566] Updating replica status to VOTING
> I1201 02:18:00.333875 18945 master.cpp:1606] The newly elected leader is 
> master@127.0.1.1:45788 with id 9577131b-f0b1-47bd-8f88-f5edbf2f026d
> I1201 02:18:00.334624 18951 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 307292ns
> I1201 02:18:00.334652 18951 replica.cpp:323] Persisted replica status to 
> VOTING
> I1201 02:18:00.334656 18945 master.cpp:1619] Elected as the leading master!
> I1201 02:18:00.334758 18951 recover.cpp:580] Successfully joined the Paxos 
> group
> I1201 02:18:00.334933 18945 master.cpp:1379] Recovering from registrar
> I1201 02:18:00.33

[jira] [Updated] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.

2015-12-01 Thread Jan Schlicht (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-4025:

Assignee: (was: Jan Schlicht)

> SlaveRecoveryTest/0.GCExecutor is flaky.
> 
>
> Key: MESOS-4025
> URL: https://issues.apache.org/jira/browse/MESOS-4025
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>  Labels: test
>
> Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based 
> on 0.26.0-rc1.
> Testsuite was run as root.
> {noformat}
> sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1
> {noformat}
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.GCExecutor
> I1130 16:49:16.336833  1032 exec.cpp:136] Version: 0.26.0
> I1130 16:49:16.345212  1049 exec.cpp:210] Executor registered on slave 
> dde9fd4e-b016-4a99-9081-b047e9df9afa-S0
> Registered executor on ubuntu14
> Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114
> sh -c 'sleep 1000'
> Forked command at 1057
> ../../src/tests/mesos.cpp:779: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave':
>  Device or resource busy
> *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are 
> using GNU date ***
> PC: @  0x1443e9a testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; 
> stack trace: ***
> @ 0x7f1be92b80b7 os::Linux::chained_handler()
> @ 0x7f1be92bc219 JVM_handle_linux_signal
> @ 0x7f1bf7bbc340 (unknown)
> @  0x1443e9a testing::UnitTest::AddTestPartResult()
> @  0x1438b99 testing::internal::AssertHelper::operator=()
> @   0xf0b3bb 
> mesos::internal::tests::ContainerizerTest<>::TearDown()
> @  0x1461882 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145c6f8 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x143de4a testing::Test::Run()
> @  0x143e584 testing::TestInfo::Run()
> @  0x143ebca testing::TestCase::Run()
> @  0x1445312 testing::internal::UnitTestImpl::RunAllTests()
> @  0x14624a7 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145d26e 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x14440ae testing::UnitTest::Run()
> @   0xd15cd4 RUN_ALL_TESTS()
> @   0xd158c1 main
> @ 0x7f1bf7808ec5 (unknown)
> @   0x913009 (unknown)
> {noformat}
> My Vagrantfile generator;
> {noformat}
> #!/usr/bin/env bash
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.box = "bento/ubuntu-14.04"
>   config.vm.hostname = "${PLATFORM_NAME}"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
> vb.customize ["modifyvm", :id, "--nictype1", "virtio"]
> vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
> vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"]
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
>   end
>   config.vm.provision "file", source: "../test.sh", destination: "~/test.sh"
>   config.vm.provision "shell", inline: <<-SHELL
> sudo apt-get update
> sudo apt-get -y install openjdk-7-jdk autoconf libtool
> sudo apt-get -y install build-essential python-dev python-boto  \
> libcurl4-nss-dev libsasl2-dev maven \
> libapr1-dev libsvn-dev libssl-dev libevent-dev
> sudo apt-get -y install git
> sudo wget -qO- https://get.docker.com/ | sh
>   SHELL
> end
> EOF
> {noformat}
> The problem is kicking in frequently in my tests - I'ld say > 10% but less 
> than 50%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3581) License headers show up all over doxygen documentation.

2015-12-01 Thread Benjamin Bannier (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3581:

Shepherd: Michael Park  (was: Bernd Mathiske)

> License headers show up all over doxygen documentation.
> ---
>
> Key: MESOS-3581
> URL: https://issues.apache.org/jira/browse/MESOS-3581
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.24.1
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
>  Labels: mesosphere
>
> Currently license headers are commented in something resembling Javadoc style,
> {code}
> /**
> * Licensed ...
> {code}
> Since we use Javadoc-style comment blocks for doxygen documentation all 
> license headers appear in the generated documentation, potentially and likely 
> hiding the actual documentation.
> Using {{/*}} to start the comment blocks would be enough to hide them from 
> doxygen, but would likely also result in a largish (though mostly 
> uninteresting) patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4020) Introduce filter for non-revocable resources in `Resources`

2015-12-01 Thread Michael Park (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033562#comment-15033562
 ] 

Michael Park commented on MESOS-4020:
-

{noformat}
commit ffbed5ea3bb059ed6dd8830d8a6acc5195ed3683
Author: Alexander Rukletsov 
Date:   Tue Dec 1 06:50:41 2015 -0500

Updated codebase to use `nonRevocable()` where appropriate.

Review: https://reviews.apache.org/r/40756
{noformat}
{noformat}
commit dba67f5dd3d99f26b3d7331efd96706a0be905dd
Author: Alexander Rukletsov 
Date:   Tue Dec 1 06:39:46 2015 -0500

Introduced filter for non-revocable resources.

Review: https://reviews.apache.org/r/40755
{noformat}

> Introduce filter for non-revocable resources in `Resources`
> ---
>
> Key: MESOS-4020
> URL: https://issues.apache.org/jira/browse/MESOS-4020
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>Priority: Minor
>  Labels: mesosphere
> Fix For: 0.27.0
>
>
> {{Resources}} class defines some handy filters, like {{revocable()}}, 
> {{unreserved()}}, and so on. This ticket proposes to add one more: 
> {{nonRevocable()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.

2015-12-01 Thread Jan Schlicht (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-4025:

  Sprint: Mesosphere Sprint 23
Story Points: 3
  Labels: test  (was: )

> SlaveRecoveryTest/0.GCExecutor is flaky.
> 
>
> Key: MESOS-4025
> URL: https://issues.apache.org/jira/browse/MESOS-4025
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>Assignee: Jan Schlicht
>  Labels: test
>
> Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based 
> on 0.26.0-rc1.
> Testsuite was run as root.
> {noformat}
> sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1
> {noformat}
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.GCExecutor
> I1130 16:49:16.336833  1032 exec.cpp:136] Version: 0.26.0
> I1130 16:49:16.345212  1049 exec.cpp:210] Executor registered on slave 
> dde9fd4e-b016-4a99-9081-b047e9df9afa-S0
> Registered executor on ubuntu14
> Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114
> sh -c 'sleep 1000'
> Forked command at 1057
> ../../src/tests/mesos.cpp:779: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave':
>  Device or resource busy
> *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are 
> using GNU date ***
> PC: @  0x1443e9a testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; 
> stack trace: ***
> @ 0x7f1be92b80b7 os::Linux::chained_handler()
> @ 0x7f1be92bc219 JVM_handle_linux_signal
> @ 0x7f1bf7bbc340 (unknown)
> @  0x1443e9a testing::UnitTest::AddTestPartResult()
> @  0x1438b99 testing::internal::AssertHelper::operator=()
> @   0xf0b3bb 
> mesos::internal::tests::ContainerizerTest<>::TearDown()
> @  0x1461882 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145c6f8 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x143de4a testing::Test::Run()
> @  0x143e584 testing::TestInfo::Run()
> @  0x143ebca testing::TestCase::Run()
> @  0x1445312 testing::internal::UnitTestImpl::RunAllTests()
> @  0x14624a7 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145d26e 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x14440ae testing::UnitTest::Run()
> @   0xd15cd4 RUN_ALL_TESTS()
> @   0xd158c1 main
> @ 0x7f1bf7808ec5 (unknown)
> @   0x913009 (unknown)
> {noformat}
> My Vagrantfile generator;
> {noformat}
> #!/usr/bin/env bash
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.box = "bento/ubuntu-14.04"
>   config.vm.hostname = "${PLATFORM_NAME}"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
> vb.customize ["modifyvm", :id, "--nictype1", "virtio"]
> vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
> vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"]
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
>   end
>   config.vm.provision "file", source: "../test.sh", destination: "~/test.sh"
>   config.vm.provision "shell", inline: <<-SHELL
> sudo apt-get update
> sudo apt-get -y install openjdk-7-jdk autoconf libtool
> sudo apt-get -y install build-essential python-dev python-boto  \
> libcurl4-nss-dev libsasl2-dev maven \
> libapr1-dev libsvn-dev libssl-dev libevent-dev
> sudo apt-get -y install git
> sudo wget -qO- https://get.docker.com/ | sh
>   SHELL
> end
> EOF
> {noformat}
> The problem is kicking in frequently in my tests - I'ld say > 10% but less 
> than 50%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.

2015-12-01 Thread Jan Schlicht (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht reassigned MESOS-4025:
---

Assignee: Jan Schlicht

> SlaveRecoveryTest/0.GCExecutor is flaky.
> 
>
> Key: MESOS-4025
> URL: https://issues.apache.org/jira/browse/MESOS-4025
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>Assignee: Jan Schlicht
>
> Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based 
> on 0.26.0-rc1.
> Testsuite was run as root.
> {noformat}
> sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1
> {noformat}
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.GCExecutor
> I1130 16:49:16.336833  1032 exec.cpp:136] Version: 0.26.0
> I1130 16:49:16.345212  1049 exec.cpp:210] Executor registered on slave 
> dde9fd4e-b016-4a99-9081-b047e9df9afa-S0
> Registered executor on ubuntu14
> Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114
> sh -c 'sleep 1000'
> Forked command at 1057
> ../../src/tests/mesos.cpp:779: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave':
>  Device or resource busy
> *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are 
> using GNU date ***
> PC: @  0x1443e9a testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; 
> stack trace: ***
> @ 0x7f1be92b80b7 os::Linux::chained_handler()
> @ 0x7f1be92bc219 JVM_handle_linux_signal
> @ 0x7f1bf7bbc340 (unknown)
> @  0x1443e9a testing::UnitTest::AddTestPartResult()
> @  0x1438b99 testing::internal::AssertHelper::operator=()
> @   0xf0b3bb 
> mesos::internal::tests::ContainerizerTest<>::TearDown()
> @  0x1461882 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145c6f8 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x143de4a testing::Test::Run()
> @  0x143e584 testing::TestInfo::Run()
> @  0x143ebca testing::TestCase::Run()
> @  0x1445312 testing::internal::UnitTestImpl::RunAllTests()
> @  0x14624a7 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145d26e 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x14440ae testing::UnitTest::Run()
> @   0xd15cd4 RUN_ALL_TESTS()
> @   0xd158c1 main
> @ 0x7f1bf7808ec5 (unknown)
> @   0x913009 (unknown)
> {noformat}
> My Vagrantfile generator;
> {noformat}
> #!/usr/bin/env bash
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.box = "bento/ubuntu-14.04"
>   config.vm.hostname = "${PLATFORM_NAME}"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
> vb.customize ["modifyvm", :id, "--nictype1", "virtio"]
> vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
> vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"]
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
>   end
>   config.vm.provision "file", source: "../test.sh", destination: "~/test.sh"
>   config.vm.provision "shell", inline: <<-SHELL
> sudo apt-get update
> sudo apt-get -y install openjdk-7-jdk autoconf libtool
> sudo apt-get -y install build-essential python-dev python-boto  \
> libcurl4-nss-dev libsasl2-dev maven \
> libapr1-dev libsvn-dev libssl-dev libevent-dev
> sudo apt-get -y install git
> sudo wget -qO- https://get.docker.com/ | sh
>   SHELL
> end
> EOF
> {noformat}
> The problem is kicking in frequently in my tests - I'ld say > 10% but less 
> than 50%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-4030) DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping is flaky

2015-12-01 Thread Benjamin Bannier (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-4030:
---

Assignee: Benjamin Bannier

> DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping is flaky
> ---
>
> Key: MESOS-4030
> URL: https://issues.apache.org/jira/browse/MESOS-4030
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.26.0
> Environment: [Ubuntu 
> 14|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh], 
> 0.26.0 RC (wip) enable-ssl & enable-libevent, root test-run 
>Reporter: Till Toenshoff
>Assignee: Benjamin Bannier
>  Labels: flaky, flaky-test
>
> {noformat}
> [ RUN  ] DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping
> I1201 02:18:00.325283 18931 leveldb.cpp:176] Opened db in 3.877576ms
> I1201 02:18:00.326195 18931 leveldb.cpp:183] Compacted db in 831923ns
> I1201 02:18:00.326288 18931 leveldb.cpp:198] Created db iterator in 21460ns
> I1201 02:18:00.326305 18931 leveldb.cpp:204] Seeked to beginning of db in 
> 1431ns
> I1201 02:18:00.326316 18931 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 178ns
> I1201 02:18:00.326354 18931 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1201 02:18:00.327128 18952 recover.cpp:449] Starting replica recovery
> I1201 02:18:00.327481 18948 recover.cpp:475] Replica is in EMPTY status
> I1201 02:18:00.328354 18945 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (88123)@127.0.1.1:45788
> I1201 02:18:00.328660 18950 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1201 02:18:00.329139 18951 recover.cpp:566] Updating replica status to 
> STARTING
> I1201 02:18:00.330413 18949 master.cpp:367] Master 
> 9577131b-f0b1-47bd-8f88-f5edbf2f026d (ubuntu14) started on 127.0.1.1:45788
> I1201 02:18:00.330474 18949 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/dHFLJX/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/dHFLJX/master" 
> --zk_session_timeout="10secs"
> I1201 02:18:00.330662 18949 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1201 02:18:00.330670 18949 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1201 02:18:00.330682 18949 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/dHFLJX/credentials'
> I1201 02:18:00.330950 18945 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.585892ms
> I1201 02:18:00.331248 18945 replica.cpp:323] Persisted replica status to 
> STARTING
> I1201 02:18:00.330968 18949 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1201 02:18:00.331681 18949 master.cpp:495] Authorization enabled
> I1201 02:18:00.331717 18945 recover.cpp:475] Replica is in STARTING status
> I1201 02:18:00.332875 18947 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (88124)@127.0.1.1:45788
> I1201 02:18:00.44 18947 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1201 02:18:00.333760 18950 recover.cpp:566] Updating replica status to VOTING
> I1201 02:18:00.333875 18945 master.cpp:1606] The newly elected leader is 
> master@127.0.1.1:45788 with id 9577131b-f0b1-47bd-8f88-f5edbf2f026d
> I1201 02:18:00.334624 18951 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 307292ns
> I1201 02:18:00.334652 18951 replica.cpp:323] Persisted replica status to 
> VOTING
> I1201 02:18:00.334656 18945 master.cpp:1619] Elected as the leading master!
> I1201 02:18:00.334758 18951 recover.cpp:580] Successfully joined the Paxos 
> group
> I1201 02:18:00.334933 18945 master.cpp:1379] Recovering from registrar
> I1201 02:18:00.335108 18951 recover.cpp:464] Recover process terminated
> I1201 02:18:00.335183 18951 registrar.cpp:309] Recovering registrar
> I1201 02:18:00.335577 18950 log.cpp:661] Attempting to start the writer
> I1201 02:18:00.336777 18952 replica.cpp:496] Replica received implicit 
> promise reques

[jira] [Commented] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.

2015-12-01 Thread Jan Schlicht (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033479#comment-15033479
 ] 

Jan Schlicht commented on MESOS-4025:
-

{{sudo ./bin/mesos-tests.sh --gtest_repeat=1 --gtest_break_on_failure 
--gtest_filter="*ROOT_DOCKER_DockerHealthStatusChange:SlaveRecoveryTest*GCExecutor"}}
 also triggers the failure. Seems that there's some problem during clean-up of 
the HealthCheckTest fixture.

> SlaveRecoveryTest/0.GCExecutor is flaky.
> 
>
> Key: MESOS-4025
> URL: https://issues.apache.org/jira/browse/MESOS-4025
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>
> Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based 
> on 0.26.0-rc1.
> Testsuite was run as root.
> {noformat}
> sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1
> {noformat}
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.GCExecutor
> I1130 16:49:16.336833  1032 exec.cpp:136] Version: 0.26.0
> I1130 16:49:16.345212  1049 exec.cpp:210] Executor registered on slave 
> dde9fd4e-b016-4a99-9081-b047e9df9afa-S0
> Registered executor on ubuntu14
> Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114
> sh -c 'sleep 1000'
> Forked command at 1057
> ../../src/tests/mesos.cpp:779: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave':
>  Device or resource busy
> *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are 
> using GNU date ***
> PC: @  0x1443e9a testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; 
> stack trace: ***
> @ 0x7f1be92b80b7 os::Linux::chained_handler()
> @ 0x7f1be92bc219 JVM_handle_linux_signal
> @ 0x7f1bf7bbc340 (unknown)
> @  0x1443e9a testing::UnitTest::AddTestPartResult()
> @  0x1438b99 testing::internal::AssertHelper::operator=()
> @   0xf0b3bb 
> mesos::internal::tests::ContainerizerTest<>::TearDown()
> @  0x1461882 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145c6f8 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x143de4a testing::Test::Run()
> @  0x143e584 testing::TestInfo::Run()
> @  0x143ebca testing::TestCase::Run()
> @  0x1445312 testing::internal::UnitTestImpl::RunAllTests()
> @  0x14624a7 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145d26e 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x14440ae testing::UnitTest::Run()
> @   0xd15cd4 RUN_ALL_TESTS()
> @   0xd158c1 main
> @ 0x7f1bf7808ec5 (unknown)
> @   0x913009 (unknown)
> {noformat}
> My Vagrantfile generator;
> {noformat}
> #!/usr/bin/env bash
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.box = "bento/ubuntu-14.04"
>   config.vm.hostname = "${PLATFORM_NAME}"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
> vb.customize ["modifyvm", :id, "--nictype1", "virtio"]
> vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
> vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"]
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
>   end
>   config.vm.provision "file", source: "../test.sh", destination: "~/test.sh"
>   config.vm.provision "shell", inline: <<-SHELL
> sudo apt-get update
> sudo apt-get -y install openjdk-7-jdk autoconf libtool
> sudo apt-get -y install build-essential python-dev python-boto  \
> libcurl4-nss-dev libsasl2-dev maven \
> libapr1-dev libsvn-dev libssl-dev libevent-dev
> sudo apt-get -y install git
> sudo wget -qO- https://get.docker.com/ | sh
>   SHELL
> end
> EOF
> {noformat}
> The problem is kicking in frequently in my tests - I'ld say > 10% but less 
> than 50%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2857) FetcherCacheTest.LocalCachedExtract is flaky.

2015-12-01 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033399#comment-15033399
 ] 

Benjamin Bannier commented on MESOS-2857:
-

Comparing with the original log in this report, this appears to be a different 
issue.

>From the log it appears as if everything happened as expected, only that the 
>test ran into our default timeout when waiting for a status update; without 
>verbose libprocess logs I am tempted to attribute this issue to very high 
>system load.

> FetcherCacheTest.LocalCachedExtract is flaky.
> -
>
> Key: MESOS-2857
> URL: https://issues.apache.org/jira/browse/MESOS-2857
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>  Labels: flaky-test, mesosphere
>
> From jenkins:
> {noformat}
> [ RUN  ] FetcherCacheTest.LocalCachedExtract
> Using temporary directory '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj'
> I0610 20:04:48.591573 24561 leveldb.cpp:176] Opened db in 3.512525ms
> I0610 20:04:48.592456 24561 leveldb.cpp:183] Compacted db in 828630ns
> I0610 20:04:48.592512 24561 leveldb.cpp:198] Created db iterator in 32992ns
> I0610 20:04:48.592531 24561 leveldb.cpp:204] Seeked to beginning of db in 
> 8967ns
> I0610 20:04:48.592545 24561 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 7762ns
> I0610 20:04:48.592604 24561 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0610 20:04:48.593438 24587 recover.cpp:449] Starting replica recovery
> I0610 20:04:48.593698 24587 recover.cpp:475] Replica is in EMPTY status
> I0610 20:04:48.595641 24580 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0610 20:04:48.596086 24590 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0610 20:04:48.596607 24590 recover.cpp:566] Updating replica status to 
> STARTING
> I0610 20:04:48.597507 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 717888ns
> I0610 20:04:48.597535 24590 replica.cpp:323] Persisted replica status to 
> STARTING
> I0610 20:04:48.597697 24590 recover.cpp:475] Replica is in STARTING status
> I0610 20:04:48.599165 24584 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0610 20:04:48.599434 24584 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0610 20:04:48.599915 24590 recover.cpp:566] Updating replica status to VOTING
> I0610 20:04:48.600545 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 432335ns
> I0610 20:04:48.600574 24590 replica.cpp:323] Persisted replica status to 
> VOTING
> I0610 20:04:48.600659 24590 recover.cpp:580] Successfully joined the Paxos 
> group
> I0610 20:04:48.600797 24590 recover.cpp:464] Recover process terminated
> I0610 20:04:48.602905 24594 master.cpp:363] Master 
> 20150610-200448-3875541420-32907-24561 (dbade881e927) started on 
> 172.17.0.231:32907
> I0610 20:04:48.602957 24594 master.cpp:365] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --credentials="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/master" 
> --zk_session_timeout="10secs"
> I0610 20:04:48.603374 24594 master.cpp:410] Master only allowing 
> authenticated frameworks to register
> I0610 20:04:48.603392 24594 master.cpp:415] Master only allowing 
> authenticated slaves to register
> I0610 20:04:48.603404 24594 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials'
> I0610 20:04:48.603751 24594 master.cpp:454] Using default 'crammd5' 
> authenticator
> I0610 20:04:48.604928 24594 master.cpp:491] Authorization enabled
> I0610 20:04:48.606034 24593 hierarchical.hpp:309] Initialized hierarchical 
> allocator process
> I0610 20:04:48.606106 24593 whitelist_watcher.cpp:79] No whitelist given
> I0610 20:04:48.607430 24594 master.cpp:1476] The newly elected leader is 
> master@172.17.0.231:32907 with id 20150610-200448-3875541420-32907-24561
> I0610 20:04:48.60

[jira] [Commented] (MESOS-2857) FetcherCacheTest.LocalCachedExtract is flaky.

2015-12-01 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033398#comment-15033398
 ] 

Benjamin Bannier commented on MESOS-2857:
-

Comparing with the original log in this report, this appears to be a different 
issue.

>From the log it appears as if everything happened as expected, only that the 
>test ran into our default timeout when waiting for a status update; without 
>verbose libprocess logs I am tempted to attribute this issue to very high 
>system load.

> FetcherCacheTest.LocalCachedExtract is flaky.
> -
>
> Key: MESOS-2857
> URL: https://issues.apache.org/jira/browse/MESOS-2857
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>  Labels: flaky-test, mesosphere
>
> From jenkins:
> {noformat}
> [ RUN  ] FetcherCacheTest.LocalCachedExtract
> Using temporary directory '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj'
> I0610 20:04:48.591573 24561 leveldb.cpp:176] Opened db in 3.512525ms
> I0610 20:04:48.592456 24561 leveldb.cpp:183] Compacted db in 828630ns
> I0610 20:04:48.592512 24561 leveldb.cpp:198] Created db iterator in 32992ns
> I0610 20:04:48.592531 24561 leveldb.cpp:204] Seeked to beginning of db in 
> 8967ns
> I0610 20:04:48.592545 24561 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 7762ns
> I0610 20:04:48.592604 24561 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0610 20:04:48.593438 24587 recover.cpp:449] Starting replica recovery
> I0610 20:04:48.593698 24587 recover.cpp:475] Replica is in EMPTY status
> I0610 20:04:48.595641 24580 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0610 20:04:48.596086 24590 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0610 20:04:48.596607 24590 recover.cpp:566] Updating replica status to 
> STARTING
> I0610 20:04:48.597507 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 717888ns
> I0610 20:04:48.597535 24590 replica.cpp:323] Persisted replica status to 
> STARTING
> I0610 20:04:48.597697 24590 recover.cpp:475] Replica is in STARTING status
> I0610 20:04:48.599165 24584 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0610 20:04:48.599434 24584 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0610 20:04:48.599915 24590 recover.cpp:566] Updating replica status to VOTING
> I0610 20:04:48.600545 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 432335ns
> I0610 20:04:48.600574 24590 replica.cpp:323] Persisted replica status to 
> VOTING
> I0610 20:04:48.600659 24590 recover.cpp:580] Successfully joined the Paxos 
> group
> I0610 20:04:48.600797 24590 recover.cpp:464] Recover process terminated
> I0610 20:04:48.602905 24594 master.cpp:363] Master 
> 20150610-200448-3875541420-32907-24561 (dbade881e927) started on 
> 172.17.0.231:32907
> I0610 20:04:48.602957 24594 master.cpp:365] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --credentials="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/master" 
> --zk_session_timeout="10secs"
> I0610 20:04:48.603374 24594 master.cpp:410] Master only allowing 
> authenticated frameworks to register
> I0610 20:04:48.603392 24594 master.cpp:415] Master only allowing 
> authenticated slaves to register
> I0610 20:04:48.603404 24594 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials'
> I0610 20:04:48.603751 24594 master.cpp:454] Using default 'crammd5' 
> authenticator
> I0610 20:04:48.604928 24594 master.cpp:491] Authorization enabled
> I0610 20:04:48.606034 24593 hierarchical.hpp:309] Initialized hierarchical 
> allocator process
> I0610 20:04:48.606106 24593 whitelist_watcher.cpp:79] No whitelist given
> I0610 20:04:48.607430 24594 master.cpp:1476] The newly elected leader is 
> master@172.17.0.231:32907 with id 20150610-200448-3875541420-32907-24561
> I0610 20:04:48.60

[jira] [Commented] (MESOS-2857) FetcherCacheTest.LocalCachedExtract is flaky.

2015-12-01 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033400#comment-15033400
 ] 

Benjamin Bannier commented on MESOS-2857:
-

Comparing with the original log in this report, this appears to be a different 
issue.

>From the log it appears as if everything happened as expected, only that the 
>test ran into our default timeout when waiting for a status update; without 
>verbose libprocess logs I am tempted to attribute this issue to very high 
>system load.

> FetcherCacheTest.LocalCachedExtract is flaky.
> -
>
> Key: MESOS-2857
> URL: https://issues.apache.org/jira/browse/MESOS-2857
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>  Labels: flaky-test, mesosphere
>
> From jenkins:
> {noformat}
> [ RUN  ] FetcherCacheTest.LocalCachedExtract
> Using temporary directory '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj'
> I0610 20:04:48.591573 24561 leveldb.cpp:176] Opened db in 3.512525ms
> I0610 20:04:48.592456 24561 leveldb.cpp:183] Compacted db in 828630ns
> I0610 20:04:48.592512 24561 leveldb.cpp:198] Created db iterator in 32992ns
> I0610 20:04:48.592531 24561 leveldb.cpp:204] Seeked to beginning of db in 
> 8967ns
> I0610 20:04:48.592545 24561 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 7762ns
> I0610 20:04:48.592604 24561 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0610 20:04:48.593438 24587 recover.cpp:449] Starting replica recovery
> I0610 20:04:48.593698 24587 recover.cpp:475] Replica is in EMPTY status
> I0610 20:04:48.595641 24580 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0610 20:04:48.596086 24590 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0610 20:04:48.596607 24590 recover.cpp:566] Updating replica status to 
> STARTING
> I0610 20:04:48.597507 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 717888ns
> I0610 20:04:48.597535 24590 replica.cpp:323] Persisted replica status to 
> STARTING
> I0610 20:04:48.597697 24590 recover.cpp:475] Replica is in STARTING status
> I0610 20:04:48.599165 24584 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0610 20:04:48.599434 24584 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0610 20:04:48.599915 24590 recover.cpp:566] Updating replica status to VOTING
> I0610 20:04:48.600545 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 432335ns
> I0610 20:04:48.600574 24590 replica.cpp:323] Persisted replica status to 
> VOTING
> I0610 20:04:48.600659 24590 recover.cpp:580] Successfully joined the Paxos 
> group
> I0610 20:04:48.600797 24590 recover.cpp:464] Recover process terminated
> I0610 20:04:48.602905 24594 master.cpp:363] Master 
> 20150610-200448-3875541420-32907-24561 (dbade881e927) started on 
> 172.17.0.231:32907
> I0610 20:04:48.602957 24594 master.cpp:365] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --credentials="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/master" 
> --zk_session_timeout="10secs"
> I0610 20:04:48.603374 24594 master.cpp:410] Master only allowing 
> authenticated frameworks to register
> I0610 20:04:48.603392 24594 master.cpp:415] Master only allowing 
> authenticated slaves to register
> I0610 20:04:48.603404 24594 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials'
> I0610 20:04:48.603751 24594 master.cpp:454] Using default 'crammd5' 
> authenticator
> I0610 20:04:48.604928 24594 master.cpp:491] Authorization enabled
> I0610 20:04:48.606034 24593 hierarchical.hpp:309] Initialized hierarchical 
> allocator process
> I0610 20:04:48.606106 24593 whitelist_watcher.cpp:79] No whitelist given
> I0610 20:04:48.607430 24594 master.cpp:1476] The newly elected leader is 
> master@172.17.0.231:32907 with id 20150610-200448-3875541420-32907-24561
> I0610 20:04:48.60

[jira] [Comment Edited] (MESOS-3548) Investigate federations of Mesos masters

2015-12-01 Thread Elouan Keryell-Even (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15031948#comment-15031948
 ] 

Elouan Keryell-Even edited comment on MESOS-3548 at 12/1/15 8:14 AM:
-

My team is also interested in multi-cluster management with Mesos.

We have set up a test architecture consisting of 2 separated clusters, with one 
mesos master managing both of them.

The use case we are interested in is to have both clusters collaborating, each 
one being able to borrow a few slaves from the other, when facing a load peak 
(this is indeed "bursting").

I think this would imply that each cluster is managed by its own Mesos master. 
One of the solution we thought about for the resource borrowing was to have the 
two masters communicating together to temporarily lend available resources.

Elouan KERYELL-EVEN
Software engineer @ Atos Integration
Toulouse, France


was (Author: winstonsurechill):
My team is also interested in multi-cluster management with Mesos.

For now we have set up a test architecture consisting of 2 separated clusters, 
with one mesos master managing both of them.

The use case we are interested in is to have multiple clusters collaborating, 
each one being able to borrow a few slaves from another, when facing an load 
peak (this is indeed "bursting"). I think that would imply that each cluster is 
managed by one Mesos master, and that the various masters could communicate in 
some way or another for the resource lending/borrowing.

Elouan KERYELL-EVEN
Software engineer @ Atos Integration
Toulouse, France

> Investigate federations of Mesos masters
> 
>
> Key: MESOS-3548
> URL: https://issues.apache.org/jira/browse/MESOS-3548
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Neil Conway
>  Labels: federation, mesosphere, multi-dc
>
> In a large Mesos installation, the operator might want to ensure that even if 
> the Mesos masters are inaccessible or failed, new tasks can still be 
> scheduled (across multiple different frameworks). HA masters are only a 
> partial solution here: the masters might still be inaccessible due to a 
> correlated failure (e.g., Zookeeper misconfiguration/human error).
> To support this, we could support the notion of "hierarchies" or 
> "federations" of Mesos masters. In a Mesos installation with 10k machines, 
> the operator might configure 10 Mesos masters (each of which might be HA) to 
> manage 1k machines each. Then an additional "meta-Master" would manage the 
> allocation of cluster resources to the 10 masters. Hence, the failure of any 
> individual master would impact 1k machines at most. The meta-master might not 
> have a lot of work to do: e.g., it might be limited to occasionally 
> reallocating cluster resources among the 10 masters, or ensuring that newly 
> added cluster resources are allocated among the masters as appropriate. 
> Hence, the failure of the meta-master would not prevent any of the individual 
> masters from scheduling new tasks. A single framework instance probably 
> wouldn't be able to use more resources than have been assigned to a single 
> Master, but that seems like a reasonable restriction.
> This feature might also be a good fit for a multi-datacenter deployment of 
> Mesos: each Mesos master instance would manage a single DC. Naturally, 
> reducing the traffic between frameworks and the meta-master would be 
> important for performance reasons in a configuration like this.
> Operationally, this might be simpler if Mesos processes were self-hosting 
> ([MESOS-3547]).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

64 matches

Mail list logo