date:20151123

[jira] [Comment Edited] (MESOS-3059) Allow http endpoint to dynamically change the slave attributes

2015-11-23 Thread Chris (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022795#comment-15022795
 ] 

Chris edited comment on MESOS-3059 at 11/23/15 7:19 PM:


Has there been any recent work on this capability?


was (Author: ct.clmsn):
Has there been any work on this capability?

> Allow http endpoint to dynamically change the slave attributes
> --
>
> Key: MESOS-3059
> URL: https://issues.apache.org/jira/browse/MESOS-3059
> Project: Mesos
>  Issue Type: Wish
>Reporter: Nitin
>
> This is well understood that - changing the attributes dynamically is not 
> safe without a restart because slave itself may not know which old framework 
> tasks are running on it that were dependent on previous attributes. 
> However, total restart makes lot of other history to delete. We need to 
> ensure a dynamic attribute changes with a soft restart. 
> It will be good to expose a rest endpoint either at slave or mesos-master 
> which directly changes the state in zookeeper.
> USE-CASE
> We use slave attributes/roles to direct the framework scheduling to use 
> specific slave as per it's requirements. Mesos scheduler only creates the 
> offer on the basis of some resources.
> In our use case, we have some categorization of our spark frameworks or jobs 
> with framework(like marathon) based on multiple factors. We want job or 
> frameworks belonging to one category be running into their specific cluster 
> of resources. We want to dynamically manage the slaves into these logical 
> sub-clusters.
> Since number of jobs that will be submitted or when it will be submitted is 
> very dynamic, it make sense to be able to dynamically assign roles or 
> attributes to slaves. It is not possible to gauge the requirements at time of 
> cluster provisioning. Static role or attribute assignment leads to 
> sub-optimal use of the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3995) Document Mesos release, upgrade, deprecation policy

2015-11-23 Thread Neil Conway (JIRA)

Neil Conway created MESOS-3995:
--

 Summary: Document Mesos release, upgrade, deprecation policy
 Key: MESOS-3995
 URL: https://issues.apache.org/jira/browse/MESOS-3995
 Project: Mesos
  Issue Type: Documentation
Reporter: Neil Conway


We should ensure the Mesos website and documentation has clear answers to the 
following questions:

[ For end-users]
* Which is the "best" version of Mesos to download?
* How often are new releases made? Do those releases contain new features, bug 
fixes, breaking changes, or all of the above?
* Which versions of Mesos can be run simultaneously in a mixed cluster?

[ For Mesos core developers ]
* When (and how) should features be deprecated?
* Which versions of Mesos do I need to test with when adding a new feature?

[ For framework developers ]
* How quickly are Mesos features deprecated?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3113) Add resource usage section to containerizer documentation

2015-11-23 Thread Gilbert Song (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-3113:

Assignee: (was: Gilbert Song)

> Add resource usage section to containerizer documentation
> -
>
> Key: MESOS-3113
> URL: https://issues.apache.org/jira/browse/MESOS-3113
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Niklas Quarfot Nielsen
>  Labels: docathon, documentation, mesosphere
>
> Currently, the containerizer documentation doesn't touch upon the usage() API 
> and how to interpret the collected statistics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3953) DockerTest.ROOT_DOCKER_CheckPortResource fails.

2015-11-23 Thread Timothy Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022835#comment-15022835
 ] 

Timothy Chen commented on MESOS-3953:
-

I can't reproduce this on a CentOS box as well, Till where are you running this 
test? And is there anything else running on the VM?

> DockerTest.ROOT_DOCKER_CheckPortResource fails.
> ---
>
> Key: MESOS-3953
> URL: https://issues.apache.org/jira/browse/MESOS-3953
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS Linux release 7.1.1503 (Core),
> gcc (GCC) 4.8.3,
> Docker version 1.9.0, build 76d6bc9
>Reporter: Till Toenshoff
>Assignee: Timothy Chen
>
> The following is happening on my CentOS 7 installation (100% reproducible).
> {noformat}
> [ RUN  ] DockerTest.ROOT_DOCKER_CheckPortResource
> I1118 08:18:50.336110 20979 docker.cpp:684] Running docker -H 
> unix:///var/run/docker.sock rm -f -v mesos-docker-port-resource-test
> I1118 08:18:50.413763 20979 resources.cpp:474] Parsing resources as JSON 
> failed: ports:[9998-];ports:[10001-11000]
> Trying semicolon-delimited string format instead
> I1118 08:18:50.414670 20979 resources.cpp:474] Parsing resources as JSON 
> failed: ports:[9998-];ports:[1-11000]
> Trying semicolon-delimited string format instead
> I1118 08:18:50.415073 20979 docker.cpp:564] Running docker -H 
> unix:///var/run/docker.sock run -e MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-docker-port-resource-test -v 
> /tmp/DockerTest_ROOT_DOCKER_CheckPortResource_4e34OB:/mnt/mesos/sandbox --net 
> bridge -p 1:80 --name mesos-docker-port-resource-test busybox true
> ../../src/tests/containerizer/docker_tests.cpp:338: Failure
> (run).failure(): Container exited on error: exited with status 1
> I1118 08:18:50.717136 20979 docker.cpp:842] Running docker -H 
> unix:///var/run/docker.sock ps -a
> I1118 08:18:50.819042 20999 docker.cpp:723] Running docker -H 
> unix:///var/run/docker.sock inspect mesos-docker-port-resource-test
> I1118 08:18:50.924579 20979 docker.cpp:684] Running docker -H 
> unix:///var/run/docker.sock rm -f -v 
> 67781b79c7641a6450c3ddb4ba13112b6f5a50060eac3f65cac3ad57a2a527ea
> [  FAILED  ] DockerTest.ROOT_DOCKER_CheckPortResource
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3996) libprocess: document when, why defer() is necessary

2015-11-23 Thread Neil Conway (JIRA)

Neil Conway created MESOS-3996:
--

 Summary: libprocess: document when, why defer() is necessary
 Key: MESOS-3996
 URL: https://issues.apache.org/jira/browse/MESOS-3996
 Project: Mesos
  Issue Type: Documentation
Reporter: Neil Conway
Priority: Minor


Current rules around this are pretty confusing and undocumented, as evidenced 
by some recent bugs in this area.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3059) Allow http endpoint to dynamically change the slave attributes

2015-11-23 Thread Chris (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022795#comment-15022795
 ] 

Chris commented on MESOS-3059:
--

Has there been any work on this capability?

> Allow http endpoint to dynamically change the slave attributes
> --
>
> Key: MESOS-3059
> URL: https://issues.apache.org/jira/browse/MESOS-3059
> Project: Mesos
>  Issue Type: Wish
>Reporter: Nitin
>
> This is well understood that - changing the attributes dynamically is not 
> safe without a restart because slave itself may not know which old framework 
> tasks are running on it that were dependent on previous attributes. 
> However, total restart makes lot of other history to delete. We need to 
> ensure a dynamic attribute changes with a soft restart. 
> It will be good to expose a rest endpoint either at slave or mesos-master 
> which directly changes the state in zookeeper.
> USE-CASE
> We use slave attributes/roles to direct the framework scheduling to use 
> specific slave as per it's requirements. Mesos scheduler only creates the 
> offer on the basis of some resources.
> In our use case, we have some categorization of our spark frameworks or jobs 
> with framework(like marathon) based on multiple factors. We want job or 
> frameworks belonging to one category be running into their specific cluster 
> of resources. We want to dynamically manage the slaves into these logical 
> sub-clusters.
> Since number of jobs that will be submitted or when it will be submitted is 
> very dynamic, it make sense to be able to dynamically assign roles or 
> attributes to slaves. It is not possible to gauge the requirements at time of 
> cluster provisioning. Static role or attribute assignment leads to 
> sub-optimal use of the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3514) Serialize Docker registry responses as Protobuf

2015-11-23 Thread Gilbert Song (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022824#comment-15022824
 ] 

Gilbert Song commented on MESOS-3514:
-

Labeling this issue as resolved by invalid, because this issue is partially 
duplicated by MESOS-2972. We change our purpose to refactor registry client and 
registry puller. Please see MESOS-3994.

> Serialize Docker registry responses as Protobuf
> ---
>
> Key: MESOS-3514
> URL: https://issues.apache.org/jira/browse/MESOS-3514
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Timothy Chen
>Assignee: Gilbert Song
>
> We should read all responses into protobuf to avoid a lot of the JSON 
> bolierplate code



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3113) Add resource usage section to containerizer documentation

2015-11-23 Thread Gilbert Song (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-3113:

Sprint: Mesosphere Sprint 21  (was: Mesosphere Sprint 21, Mesosphere Sprint 
22)

> Add resource usage section to containerizer documentation
> -
>
> Key: MESOS-3113
> URL: https://issues.apache.org/jira/browse/MESOS-3113
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Niklas Quarfot Nielsen
>  Labels: docathon, documentation, mesosphere
>
> Currently, the containerizer documentation doesn't touch upon the usage() API 
> and how to interpret the collected statistics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3113) Add resource usage section to containerizer documentation

2015-11-23 Thread Gilbert Song (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-3113:

Labels: documentation  (was: docathon documentation mesosphere)

> Add resource usage section to containerizer documentation
> -
>
> Key: MESOS-3113
> URL: https://issues.apache.org/jira/browse/MESOS-3113
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Niklas Quarfot Nielsen
>  Labels: documentation
>
> Currently, the containerizer documentation doesn't touch upon the usage() API 
> and how to interpret the collected statistics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-3794) Master should not store arbitrarily sized data in ExecutorInfo

2015-11-23 Thread James Peach (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-3794:
--

Assignee: James Peach

> Master should not store arbitrarily sized data in ExecutorInfo
> --
>
> Key: MESOS-3794
> URL: https://issues.apache.org/jira/browse/MESOS-3794
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Joseph Wu
>Assignee: James Peach
>Priority: Critical
>  Labels: mesosphere
>
> From a comment in [MESOS-3771]:
> Master should not be storing the {{data}} fields from {{ExecutorInfo}}.  We 
> currently [store the entire 
> object|https://github.com/apache/mesos/blob/master/src/master/master.hpp#L262-L271],
>  which means master would be at high risk of OOM-ing if a bunch of executors 
> were started with big {{data}} blobs.
> * Master should scrub out unneeded bloat from {{ExecutorInfo}} before storing 
> it.
> * We can use an alternate internal object, like we do for {{TaskInfo}} vs 
> {{Task}}; see 
> [this|https://github.com/apache/mesos/blob/master/src/messages/messages.proto#L39-L41].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3996) libprocess: document when, why defer() is necessary

2015-11-23 Thread Anand Mazumdar (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3996:
--
Description: 
Current rules around this are pretty confusing and undocumented, as evidenced 
by some recent bugs in this area.

Some example snippets in the mesos source code that were a result of this 
confusion and are indeed bugs:

1. 
https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/provisioner/docker/registry_client.cpp#L754
{code}
return doHttpGet(blobURL, None(), true, true, None())
.then([this, blobURLPath, digest, filePath](
const http::Response& response) -> Future {
  Try fd = os::open(
  filePath.value,
  O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC,
  S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
{code}


  was:Current rules around this are pretty confusing and undocumented, as 
evidenced by some recent bugs in this area.


> libprocess: document when, why defer() is necessary
> ---
>
> Key: MESOS-3996
> URL: https://issues.apache.org/jira/browse/MESOS-3996
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Neil Conway
>Priority: Minor
>  Labels: documentation, libprocess, mesosphere
>
> Current rules around this are pretty confusing and undocumented, as evidenced 
> by some recent bugs in this area.
> Some example snippets in the mesos source code that were a result of this 
> confusion and are indeed bugs:
> 1. 
> https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/provisioner/docker/registry_client.cpp#L754
> {code}
> return doHttpGet(blobURL, None(), true, true, None())
> .then([this, blobURLPath, digest, filePath](
> const http::Response& response) -> Future {
>   Try fd = os::open(
>   filePath.value,
>   O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC,
>   S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3993) Fix up future.hpp after MSVC update 2

2015-11-23 Thread Alex Clemmer (JIRA)

Alex Clemmer created MESOS-3993:
---

 Summary: Fix up future.hpp after MSVC update 2
 Key: MESOS-3993
 URL: https://issues.apache.org/jira/browse/MESOS-3993
 Project: Mesos
  Issue Type: Bug
Reporter: Alex Clemmer
Assignee: Alex Clemmer


See the TODOs in future.hpp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3993) Fix up future.hpp after MSVC update 2

2015-11-23 Thread Alex Clemmer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Clemmer updated MESOS-3993:

Labels: libprocess  (was: )

> Fix up future.hpp after MSVC update 2
> -
>
> Key: MESOS-3993
> URL: https://issues.apache.org/jira/browse/MESOS-3993
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: libprocess
>
> See the TODOs in future.hpp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3993) Fix up future.hpp after MSVC update 2

2015-11-23 Thread Alex Clemmer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Clemmer updated MESOS-3993:

Component/s: libprocess

> Fix up future.hpp after MSVC update 2
> -
>
> Key: MESOS-3993
> URL: https://issues.apache.org/jira/browse/MESOS-3993
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: libprocess
>
> See the TODOs in future.hpp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3997) Switch to fixed-point for resources

2015-11-23 Thread Neil Conway (JIRA)

Neil Conway created MESOS-3997:
--

 Summary: Switch to fixed-point for resources
 Key: MESOS-3997
 URL: https://issues.apache.org/jira/browse/MESOS-3997
 Project: Mesos
  Issue Type: Improvement
  Components: allocation, master
Reporter: Neil Conway


Using floating point for resources is problematic, because roundoff and 
precision errors when doing resource math can produce unexpected results.

Instead, we should probably adopt a fixed-point representation: e.g., CPU 
resources will be measured as an integer number of fractional CPUs (e.g., 25 
deci-CPUs == 2.5 CPUs).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3998) resource test failure

2015-11-23 Thread David Robinson (JIRA)

David Robinson created MESOS-3998:
-

 Summary: resource test failure
 Key: MESOS-3998
 URL: https://issues.apache.org/jira/browse/MESOS-3998
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: David Robinson


Encountered this test failure when building mesos on CentOS 7 via 
[devtoolset-3|https://www.softwarecollections.org/en/scls/rhscl/devtoolset-3/].

{code}
DEBUG: In file included from tests/resources_tests.cpp:23:0:
DEBUG: ../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include/gtest/gtest.h: 
In instantiation of 'testing::AssertionResult 
testing::internal::CmpHelperEQ(const char*, const char*, const T1&, const T2&) 
[with T1 = int; T2 = long unsigned int]':
DEBUG: 
../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include/gtest/gtest.h:1485:30:
   required from 'static testing::AssertionResult 
testing::internal::EqHelper::Compare(const char*, const 
char*, const T1&, const T2&) [with T1 = int; T2 = long unsigned int; bool 
lhs_is_null_literal = false]'
DEBUG: tests/resources_tests.cpp:219:5:   required from here
DEBUG: 
../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include/gtest/gtest.h:1448:16:
 error: comparison between signed and unsigned integer expressions 
[-Werror=sign-compare]
DEBUG:if (expected == actual) {
DEBUG: ^
DEBUG: g++ -DPACKAGE_NAME=\"mesos\" -DPACKAGE_TARNAME=\"mesos\" 
-DPACKAGE_VERSION=\"0.26.0-rc1\" -DPACKAGE_STRING=\"mesos\ 0.26.0-rc1\" 
-DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" 
-DVERSION=\"0.26.0-rc1\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 
-DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 
-DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 
-DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD_PRIO_INHERIT=1 
-DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 
-DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 
-DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -DHAVE_LIBNL_3=1 
-DHAVE_NETLINK_NETLINK_H=1 -DHAVE_LIBNL_ROUTE_3=1 
-DHAVE_NETLINK_ROUTE_LINK_VETH_H=1 -DHAVE_LIBNL_IDIAG_3=1 
-DWITH_NETWORK_ISOLATOR=1 -DMESOS_HAS_JAVA=1 -DHAVE_PYTHON=\"2.7\" 
-DMESOS_HAS_PYTHON=1 -I.   -Wall -Werror -DLIBDIR=\"/usr/local/lib64\" 
-DPKGLIBEXECDIR=\"/usr/local/libexec/mesos\" 
-DPKGDATADIR=\"/usr/local/share/mesos\" -I../include 
-I../3rdparty/libprocess/include 
-I../3rdparty/libprocess/3rdparty/stout/include -I../include -I../include/mesos 
-I../3rdparty/libprocess/3rdparty/boost-1.53.0 
-I../3rdparty/libprocess/3rdparty/picojson-1.3.0 -DPICOJSON_USE_INT64 
-D__STDC_FORMAT_MACROS -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
-I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
-I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb/include 
-I../3rdparty/zookeeper-3.4.5/src/c/include 
-I../3rdparty/zookeeper-3.4.5/src/c/generated 
-I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
-DSOURCE_DIR=\"/builddir/build/BUILD/mesos-0.26.0\" 
-DBUILD_DIR=\"/builddir/build/BUILD/mesos-0.26.0\" 
-I../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include 
-I../3rdparty/libprocess/3rdparty/gmock-1.7.0/include 
-I/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.91-2.6.2.1.el7_1.x86_64/include 
-I/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.91-2.6.2.1.el7_1.x86_64/include/linux 
-DZOOKEEPER_VERSION=\"3.4.5\" -I/usr/include/libnl3 -I/usr/include/subversion-1 
-I/usr/include/apr-1 -I/usr/include/apr-1.0  -pthread -O2 -g -pipe -Wall 
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong 
--param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic 
-Wno-unused-local-typedefs -Wno-maybe-uninitialized -std=c++11 -c -o 
tests/mesos_tests-scheduler_driver_tests.o `test -f 
'tests/scheduler_driver_tests.cpp' || echo './'`tests/scheduler_driver_tests.cpp
DEBUG: g++ -DPACKAGE_NAME=\"mesos\" -DPACKAGE_TARNAME=\"mesos\" 
-DPACKAGE_VERSION=\"0.26.0-rc1\" -DPACKAGE_STRING=\"mesos\ 0.26.0-rc1\" 
-DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" 
-DVERSION=\"0.26.0-rc1\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 
-DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 
-DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 
-DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD_PRIO_INHERIT=1 
-DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 
-DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 
-DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -DHAVE_LIBNL_3=1 
-DHAVE_NETLINK_NETLINK_H=1 -DHAVE_LIBNL_ROUTE_3=1 
-DHAVE_NETLINK_ROUTE_LINK_VETH_H=1 -DHAVE_LIBNL_IDIAG_3=1 
-DWITH_NETWORK_ISOLATOR=1 -DMESOS_HAS_JAVA=1 -DHAVE_PYTHON=\"2.7\" 
-DMESOS_HAS_PYTHON=1 -I.   -Wall -Werror -DLIBDIR=\"/usr/local/lib64\" 
-DPKGLIBEXECDIR=\"/usr/local/libexec/mesos\" 
-DPKGDATADIR=\"/usr/local/share/mesos\" -I../include

[jira] [Commented] (MESOS-3552) Check failed: result.cpus() == cpus() && result.mem() == mem() && result.disk() == disk() && result.ports() == ports()

2015-11-23 Thread Neil Conway (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022891#comment-15022891
 ] 

Neil Conway commented on MESOS-3552:


I opened MESOS-3997 for the longer-term issue that we should switch to fixed 
point for resources.

> Check failed: result.cpus() == cpus() && result.mem() == mem() && 
> result.disk() == disk() && result.ports() == ports() 
> ---
>
> Key: MESOS-3552
> URL: https://issues.apache.org/jira/browse/MESOS-3552
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.24.1
>Reporter: Mandeep Chadha
>Assignee: Mandeep Chadha
>
> result.cpus() == cpus() check is failing due to ( double == double ) 
> comparison problem. 
> Root Cause : 
> Framework requested 0.1 cpu reservation for the first task. So far so good. 
> Next Reserve operation — lead to double operations resulting in following 
> double values :
>  results.cpus() : 23.9964472863211995 cpus() : 24
> And the check ( result.cpus() == cpus() ) failed. 
>  The double arithmetic operations caused results.cpus() value to be :  
> 23.9964472863211995 and hence ( 23.9964472863211995 
> == 24 ) failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3552) CHECK failure due to floating point precision on reservation request

2015-11-23 Thread Neil Conway (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-3552:
---
Description: 
result.cpus() == cpus() check is failing due to ( double == double ) comparison 
problem. 


Root Cause : 

Framework requested 0.1 cpu reservation for the first task. So far so good. 
Next Reserve operation — lead to double operations resulting in following 
double values :

 results.cpus() : 23.9964472863211995 cpus() : 24

And the check ( result.cpus() == cpus() ) failed. 

 The double arithmetic operations caused results.cpus() value to be :  
23.9964472863211995 and hence ( 23.9964472863211995 == 
24 ) failed.




  was:

result.cpus() == cpus() check is failing due to ( double == double ) comparison 
problem. 


Root Cause : 

Framework requested 0.1 cpu reservation for the first task. So far so good. 
Next Reserve operation — lead to double operations resulting in following 
double values :

 results.cpus() : 23.9964472863211995 cpus() : 24

And the check ( result.cpus() == cpus() ) failed. 

 The double arithmetic operations caused results.cpus() value to be :  
23.9964472863211995 and hence ( 23.9964472863211995 == 
24 ) failed.




Summary: CHECK failure due to floating point precision on reservation 
request  (was: Check failed: result.cpus() == cpus() && result.mem() == mem() 
&& result.disk() == disk() && result.ports() == ports() )

> CHECK failure due to floating point precision on reservation request
> 
>
> Key: MESOS-3552
> URL: https://issues.apache.org/jira/browse/MESOS-3552
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.24.1
>Reporter: Mandeep Chadha
>Assignee: Mandeep Chadha
>
> result.cpus() == cpus() check is failing due to ( double == double ) 
> comparison problem. 
> Root Cause : 
> Framework requested 0.1 cpu reservation for the first task. So far so good. 
> Next Reserve operation — lead to double operations resulting in following 
> double values :
>  results.cpus() : 23.9964472863211995 cpus() : 24
> And the check ( result.cpus() == cpus() ) failed. 
>  The double arithmetic operations caused results.cpus() value to be :  
> 23.9964472863211995 and hence ( 23.9964472863211995 
> == 24 ) failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3949) User CGroup Isolation tests fail on Centos 6.

2015-11-23 Thread Alexander Rojas (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022949#comment-15022949
 ] 

Alexander Rojas commented on MESOS-3949:


[~marco-mesos]: even if they didn't ever pass, it is a good idea to check why 
they didn't.

I tried a first theory which stated that the kernel version of CentOS 6 (kernel 
2.6) was the reason for the failure, so I set up two machines with CentOS 6.7 
and upgraded the kernel in one of them to 3.10 (minimal version required for 
latest docker). The result was a failure in it still. I will do some heavy 
debugging tomorrow morning.

> User CGroup Isolation tests fail on Centos 6.
> -
>
> Key: MESOS-3949
> URL: https://issues.apache.org/jira/browse/MESOS-3949
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Affects Versions: 0.26.0
> Environment: CentOS 6.6, gcc 4.8.1, on vagrant libvirt, 16GB, 8 CPUs,
> ../configure --enable-libevent --enable-ssl
>Reporter: Bernd Mathiske
>Assignee: Alexander Rojas
>  Labels: mesosphere
>
> UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup and 
> UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup fail on CentOS 6.6 with 
> similar output when libevent and SSL are enabled.
> {noformat}
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup" --verbose
> {noformat}
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from UserCgroupIsolatorTest/0, where TypeParam = 
> mesos::internal::slave::CgroupsMemIsolatorProcess
> userdel: user 'mesos.test.unprivileged.user' does not exist
> [ RUN  ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup
> I1118 16:53:35.273717 30249 mem.cpp:605] Started listening for OOM events for 
> container 867a829e-4a26-43f5-86e0-938bf1f47688
> I1118 16:53:35.274538 30249 mem.cpp:725] Started listening on low memory 
> pressure events for container 867a829e-4a26-43f5-86e0-938bf1f47688
> I1118 16:53:35.275164 30249 mem.cpp:725] Started listening on medium memory 
> pressure events for container 867a829e-4a26-43f5-86e0-938bf1f47688
> I1118 16:53:35.275784 30249 mem.cpp:725] Started listening on critical memory 
> pressure events for container 867a829e-4a26-43f5-86e0-938bf1f47688
> I1118 16:53:35.276448 30249 mem.cpp:356] Updated 'memory.soft_limit_in_bytes' 
> to 1GB for container 867a829e-4a26-43f5-86e0-938bf1f47688
> I1118 16:53:35.277331 30249 mem.cpp:391] Updated 'memory.limit_in_bytes' to 
> 1GB for container 867a829e-4a26-43f5-86e0-938bf1f47688
> -bash: 
> /sys/fs/cgroup/memory/mesos/867a829e-4a26-43f5-86e0-938bf1f47688/cgroup.procs:
>  No such file or directory
> mkdir: cannot create directory 
> `/sys/fs/cgroup/memory/mesos/867a829e-4a26-43f5-86e0-938bf1f47688/user': No 
> such file or directory
> ../../src/tests/containerizer/isolator_tests.cpp:1307: Failure
> Value of: os::system( "su - " + UNPRIVILEGED_USERNAME + " -c 'mkdir " + 
> path::join(flags.cgroups_hierarchy, userCgroup) + "'")
>   Actual: 256
> Expected: 0
> -bash: 
> /sys/fs/cgroup/memory/mesos/867a829e-4a26-43f5-86e0-938bf1f47688/user/cgroup.procs:
>  No such file or directory
> ../../src/tests/containerizer/isolator_tests.cpp:1316: Failure
> Value of: os::system( "su - " + UNPRIVILEGED_USERNAME + " -c 'echo $$ >" + 
> path::join(flags.cgroups_hierarchy, userCgroup, "cgroup.procs") + "'")
>   Actual: 256
> Expected: 0
> [  FAILED  ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where 
> TypeParam = mesos::internal::slave::CgroupsMemIsolatorProcess (149 ms)
> {noformat}
> {noformat}
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup" --verbose
> {noformat}
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from UserCgroupIsolatorTest/1, where TypeParam = 
> mesos::internal::slave::CgroupsCpushareIsolatorProcess
> userdel: user 'mesos.test.unprivileged.user' does not exist
> [ RUN  ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup
> I1118 17:01:00.550706 30357 cpushare.cpp:392] Updated 'cpu.shares' to 1024 
> (cpus 1) for container e57f4343-1a97-4b44-b347-803be47ace80
> -bash: 
> /sys/fs/cgroup/cpuacct/mesos/e57f4343-1a97-4b44-b347-803be47ace80/cgroup.procs:
>  No such file or directory
> mkdir: cannot create directory 
> `/sys/fs/cgroup/cpuacct/mesos/e57f4343-1a97-4b44-b347-803be47ace80/user': No 
> such file or directory
> ../../src/tests/containerizer/isolator_tests.cpp:1307: Failure
> Value of: os::system( "su - " + UNPRIVILEGED_USERNAME + " -c 'mkdir " + 
> path::join(flags.cgroups_hierarchy, userCgroup) + "'")
>   Actual: 256
> Expected: 0
> -bash: 
> /sys/fs/cgroup/cpuacct/mesos/e57f4343-1a97-4b44-b347-803be47ace80/user/cgroup.procs:

[jira] [Commented] (MESOS-1563) Failed to configure on FreeBSD

2015-11-23 Thread David Forsythe (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021817#comment-15021817
 ] 

David Forsythe commented on MESOS-1563:
---

[~idownes] Any chance of getting your eyes back on this some time soon?

> Failed to configure on FreeBSD
> --
>
> Key: MESOS-1563
> URL: https://issues.apache.org/jira/browse/MESOS-1563
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.19.0
> Environment: FreeBSD-10/stable
>Reporter: Dmitry Sivachenko
>
> When trying to configure mesos on FreeBSD, I get the following error:
> configure: Setting up build environment for x86_64 freebsd10.0
> configure: error: "Mesos is currently unsupported on your platform."
> Why? Is there anything really Linux-specific inside? It's written in Java 
> after all.
> And MacOS is supported, but it is rather close to FreeBSD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3874) Investigate recovery for the Hierarchical allocator

2015-11-23 Thread Alexander Rukletsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-3874:
---
Summary: Investigate recovery for the Hierarchical allocator  (was: 
Implement recovery in the Hierarchical allocator)

> Investigate recovery for the Hierarchical allocator
> ---
>
> Key: MESOS-3874
> URL: https://issues.apache.org/jira/browse/MESOS-3874
> Project: Mesos
>  Issue Type: Task
>  Components: allocation
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> The built-in Hierarchical allocator should implement the recovery (in the 
> presence of quota).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-2157) Add /master/slaves and /master/frameworks/{framework}/tasks/{task} endpoints

2015-11-23 Thread Bernd Mathiske (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-2157:
--
Assignee: (was: Alexander Rojas)

> Add /master/slaves and /master/frameworks/{framework}/tasks/{task} endpoints
> 
>
> Key: MESOS-2157
> URL: https://issues.apache.org/jira/browse/MESOS-2157
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Niklas Quarfot Nielsen
>Priority: Trivial
>  Labels: mesosphere, newbie
>
> master/state.json exports the entire state of the cluster and can, for large 
> clusters, become massive (tens of megabytes of JSON).
> Often, a client only need information about subsets of the entire state, for 
> example all connected slaves, or information (registration info, tasks, etc) 
> belonging to a particular framework.
> We can partition state.json into many smaller endpoints, but for starters, 
> being able to get slave information and tasks information per framework would 
> be useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3985) Tests for rescinding offers for quota

2015-11-23 Thread Alexander Rukletsov (JIRA)

Alexander Rukletsov created MESOS-3985:
--

 Summary: Tests for rescinding offers for quota
 Key: MESOS-3985
 URL: https://issues.apache.org/jira/browse/MESOS-3985
 Project: Mesos
  Issue Type: Task
  Components: master
Reporter: Alexander Rukletsov
Assignee: Alexander Rukletsov






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-3912) Rescind offers in order to satisfy quota

2015-11-23 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008407#comment-15008407
 ] 

Alexander Rukletsov edited comment on MESOS-3912 at 11/23/15 10:17 AM:
---

https://reviews.apache.org/r/40351


was (Author: alexr):
https://reviews.apache.org/r/40351
https://reviews.apache.org/r/40396/

> Rescind offers in order to satisfy quota
> 
>
> Key: MESOS-3912
> URL: https://issues.apache.org/jira/browse/MESOS-3912
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> When a quota request comes in, we may need to rescind a certain amount of 
> outstanding offers in order to satisfy it. Because resources are allocated in 
> the allocator, there can be a race between rescinding and allocating. This 
> race makes it hard to determine the exact amount of offers that should be 
> rescinded in the master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-3579) FetcherCacheTest.LocalUncachedExtract is flaky

2015-11-23 Thread Benjamin Bannier (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-3579:
---

Assignee: Benjamin Bannier  (was: Bernd Mathiske)

> FetcherCacheTest.LocalUncachedExtract is flaky
> --
>
> Key: MESOS-3579
> URL: https://issues.apache.org/jira/browse/MESOS-3579
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Anand Mazumdar
>Assignee: Benjamin Bannier
>  Labels: flaky-test, mesosphere
>
> From ASF CI:
> https://builds.apache.org/job/Mesos/866/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/console
> {code}
> [ RUN  ] FetcherCacheTest.LocalUncachedExtract
> Using temporary directory '/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA'
> I0925 19:15:39.541198 27410 leveldb.cpp:176] Opened db in 3.43934ms
> I0925 19:15:39.542362 27410 leveldb.cpp:183] Compacted db in 1.136184ms
> I0925 19:15:39.542428 27410 leveldb.cpp:198] Created db iterator in 35866ns
> I0925 19:15:39.542448 27410 leveldb.cpp:204] Seeked to beginning of db in 
> 8807ns
> I0925 19:15:39.542459 27410 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 6325ns
> I0925 19:15:39.542505 27410 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0925 19:15:39.543143 27438 recover.cpp:449] Starting replica recovery
> I0925 19:15:39.543393 27438 recover.cpp:475] Replica is in EMPTY status
> I0925 19:15:39.544373 27436 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0925 19:15:39.544791 27433 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0925 19:15:39.545284 27433 recover.cpp:566] Updating replica status to 
> STARTING
> I0925 19:15:39.546155 27436 master.cpp:376] Master 
> c8bf1c95-50f4-4832-a570-c560f0b466ae (f57fd4291168) started on 
> 172.17.1.195:41781
> I0925 19:15:39.546257 27433 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 747249ns
> I0925 19:15:39.546288 27433 replica.cpp:323] Persisted replica status to 
> STARTING
> I0925 19:15:39.546483 27434 recover.cpp:475] Replica is in STARTING status
> I0925 19:15:39.546187 27436 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/master" 
> --zk_session_timeout="10secs"
> I0925 19:15:39.546567 27436 master.cpp:423] Master only allowing 
> authenticated frameworks to register
> I0925 19:15:39.546617 27436 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I0925 19:15:39.546632 27436 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/credentials'
> I0925 19:15:39.546931 27436 master.cpp:467] Using default 'crammd5' 
> authenticator
> I0925 19:15:39.547044 27436 master.cpp:504] Authorization enabled
> I0925 19:15:39.547276 27441 whitelist_watcher.cpp:79] No whitelist given
> I0925 19:15:39.547320 27434 hierarchical.hpp:468] Initialized hierarchical 
> allocator process
> I0925 19:15:39.547471 27438 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0925 19:15:39.548318 27443 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0925 19:15:39.549067 27435 recover.cpp:566] Updating replica status to VOTING
> I0925 19:15:39.549115 27440 master.cpp:1603] The newly elected leader is 
> master@172.17.1.195:41781 with id c8bf1c95-50f4-4832-a570-c560f0b466ae
> I0925 19:15:39.549162 27440 master.cpp:1616] Elected as the leading master!
> I0925 19:15:39.549190 27440 master.cpp:1376] Recovering from registrar
> I0925 19:15:39.549342 27434 registrar.cpp:309] Recovering registrar
> I0925 19:15:39.549666 27430 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 418187ns
> I0925 19:15:39.549753 27430 replica.cpp:323] Persisted replica status to 
> VOTING
> I0925 19:15:39.550089 27442

[jira] [Assigned] (MESOS-2857) FetcherCacheTest.LocalCachedExtract is flaky.

2015-11-23 Thread Benjamin Bannier (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-2857:
---

Assignee: Benjamin Bannier  (was: Bernd Mathiske)

> FetcherCacheTest.LocalCachedExtract is flaky.
> -
>
> Key: MESOS-2857
> URL: https://issues.apache.org/jira/browse/MESOS-2857
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>  Labels: flaky-test, mesosphere
>
> From jenkins:
> {noformat}
> [ RUN  ] FetcherCacheTest.LocalCachedExtract
> Using temporary directory '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj'
> I0610 20:04:48.591573 24561 leveldb.cpp:176] Opened db in 3.512525ms
> I0610 20:04:48.592456 24561 leveldb.cpp:183] Compacted db in 828630ns
> I0610 20:04:48.592512 24561 leveldb.cpp:198] Created db iterator in 32992ns
> I0610 20:04:48.592531 24561 leveldb.cpp:204] Seeked to beginning of db in 
> 8967ns
> I0610 20:04:48.592545 24561 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 7762ns
> I0610 20:04:48.592604 24561 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0610 20:04:48.593438 24587 recover.cpp:449] Starting replica recovery
> I0610 20:04:48.593698 24587 recover.cpp:475] Replica is in EMPTY status
> I0610 20:04:48.595641 24580 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0610 20:04:48.596086 24590 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0610 20:04:48.596607 24590 recover.cpp:566] Updating replica status to 
> STARTING
> I0610 20:04:48.597507 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 717888ns
> I0610 20:04:48.597535 24590 replica.cpp:323] Persisted replica status to 
> STARTING
> I0610 20:04:48.597697 24590 recover.cpp:475] Replica is in STARTING status
> I0610 20:04:48.599165 24584 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0610 20:04:48.599434 24584 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0610 20:04:48.599915 24590 recover.cpp:566] Updating replica status to VOTING
> I0610 20:04:48.600545 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 432335ns
> I0610 20:04:48.600574 24590 replica.cpp:323] Persisted replica status to 
> VOTING
> I0610 20:04:48.600659 24590 recover.cpp:580] Successfully joined the Paxos 
> group
> I0610 20:04:48.600797 24590 recover.cpp:464] Recover process terminated
> I0610 20:04:48.602905 24594 master.cpp:363] Master 
> 20150610-200448-3875541420-32907-24561 (dbade881e927) started on 
> 172.17.0.231:32907
> I0610 20:04:48.602957 24594 master.cpp:365] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --credentials="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/master" 
> --zk_session_timeout="10secs"
> I0610 20:04:48.603374 24594 master.cpp:410] Master only allowing 
> authenticated frameworks to register
> I0610 20:04:48.603392 24594 master.cpp:415] Master only allowing 
> authenticated slaves to register
> I0610 20:04:48.603404 24594 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials'
> I0610 20:04:48.603751 24594 master.cpp:454] Using default 'crammd5' 
> authenticator
> I0610 20:04:48.604928 24594 master.cpp:491] Authorization enabled
> I0610 20:04:48.606034 24593 hierarchical.hpp:309] Initialized hierarchical 
> allocator process
> I0610 20:04:48.606106 24593 whitelist_watcher.cpp:79] No whitelist given
> I0610 20:04:48.607430 24594 master.cpp:1476] The newly elected leader is 
> master@172.17.0.231:32907 with id 20150610-200448-3875541420-32907-24561
> I0610 20:04:48.607466 24594 master.cpp:1489] Elected as the leading master!
> I0610 20:04:48.607481 24594 master.cpp:1259] Recovering from registrar
> I0610 20:04:48.607712 24594 registrar.cpp:313] Recovering registrar
> I0610 20:04:48.608543 24588 log.cpp:661] Attempting to start the writer
> I0610 20:04:48.610231 24588 replica.cpp:477] Replica

[jira] [Updated] (MESOS-3775) MasterAllocatorTest.SlaveLost is slow

2015-11-23 Thread Benjamin Bannier (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3775:

Assignee: Jan Schlicht

> MasterAllocatorTest.SlaveLost is slow
> -
>
> Key: MESOS-3775
> URL: https://issues.apache.org/jira/browse/MESOS-3775
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt, test
>Reporter: Alexander Rukletsov
>Assignee: Jan Schlicht
>Priority: Minor
>  Labels: mesosphere, tech-debt
>
> The {{MasterAllocatorTest.SlaveLost}} takes more that {{5s}} to complete. A 
> brief look into the code hints that the stopped agent does not quit 
> immediately (and hence its resources are not released by the allocator) 
> because [it waits for the executor to 
> terminate|https://github.com/apache/mesos/blob/master/src/tests/master_allocator_tests.cpp#L717].
>  {{5s}} timeout comes from {{EXECUTOR_SHUTDOWN_GRACE_PERIOD}} agent constant.
> Possible solutions:
> * Do not wait until the stopped agent quits (can be flaky, needs deeper 
> analysis).
> * Decrease the agent's {{executor_shutdown_grace_period}} flag.
> * Terminate the executor faster (this may require some refactoring since the 
> executor driver is created in the {{TestContainerizer}} and we do not have 
> direct access to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-3818) Line wrapping for "--help" output

2015-11-23 Thread Neil Conway (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021936#comment-15021936
 ] 

Neil Conway edited comment on MESOS-3818 at 11/23/15 10:47 AM:
---

My vote is to keep things simple, and just adjust the newlines in the existing 
help strings. I'm concerned that trying to do line-wrapping adds more 
complexity than is justified (especially if you want to handle special-cases 
like JSON example text, etc.). If we fix the newlines manually once, it 
shouldn't be that hard to keep the output looking reasonable in the future.


was (Author: neilc):
My vote is to keep things simple, and just adjust the newlines in the existing 
help strings. I'm concerned that trying to do line-wrapping adds more 
complexity than is justified. If we fix the newlines manually once, it 
shouldn't be that hard to keep the output looking reasonable in the future.

> Line wrapping for "--help" output
> -
>
> Key: MESOS-3818
> URL: https://issues.apache.org/jira/browse/MESOS-3818
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Neil Conway
>Assignee: Shuai Lin
>Priority: Trivial
>  Labels: mesosphere, newbie
>
> The output of `mesos-slave --help`, `mesos-master --help`, and perhaps other 
> programs has very inconsistent line wrapping: different help text fragments 
> are wrapped at very different column numbers, which harms readability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3818) Line wrapping for "--help" output

2015-11-23 Thread Neil Conway (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021936#comment-15021936
 ] 

Neil Conway commented on MESOS-3818:


My vote is to keep things simple, and just adjust the newlines in the existing 
help strings. I'm concerned that trying to do line-wrapping adds more 
complexity than is justified. If we fix the newlines manually once, it 
shouldn't be that hard to keep the output looking reasonable in the future.

> Line wrapping for "--help" output
> -
>
> Key: MESOS-3818
> URL: https://issues.apache.org/jira/browse/MESOS-3818
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Neil Conway
>Assignee: Shuai Lin
>Priority: Trivial
>  Labels: mesosphere, newbie
>
> The output of `mesos-slave --help`, `mesos-master --help`, and perhaps other 
> programs has very inconsistent line wrapping: different help text fragments 
> are wrapped at very different column numbers, which harms readability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-3231) Implement http::AuthenticatorManager and http::Authenticator

2015-11-23 Thread Alexander Rojas (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964906#comment-14964906
 ] 

Alexander Rojas edited comment on MESOS-3231 at 11/23/15 10:57 AM:
---

h3. Reviews

# -[r/37998/|https://reviews.apache.org/r/37998/]- (commited): Made 
ProcessManager::handle() a void returning method.
# -[r/39472/|https://reviews.apache.org/r/39472/]- (discarded): Added the 
helper container InheritanceTree where nodes inherit values from their 
ancestors.
# [r/37999/|https://reviews.apache.org/r/37999/] (commited): Summary:.


was (Author: arojas):
h3. Reviews

# [r/37998/|https://reviews.apache.org/r/37998/]: Made ProcessManager::handle() 
a void returning method.
# [r/39472/|https://reviews.apache.org/r/39472/]: Added the helper container 
InheritanceTree where nodes inherit values from their ancestors.
# [r/37999/|https://reviews.apache.org/r/37999/]: Implemented 
http::AuthenticatorManager.

> Implement http::AuthenticatorManager and http::Authenticator
> 
>
> Key: MESOS-3231
> URL: https://issues.apache.org/jira/browse/MESOS-3231
> Project: Mesos
>  Issue Type: Improvement
>  Components: security
>Reporter: Alexander Rojas
>Assignee: Alexander Rojas
>  Labels: mesosphere, security
>
> As proposed in the document [Mesos HTTP Authentication 
> Design|https://docs.google.com/document/d/1kM3_f7DSqXcE2MuERrLTGp_XMC6ss2wmpkNYDCY5rOM],
>  a {{process::http::AuthenticatorManager}} and 
> {{process::http::Authenticator}} are needed.
> The {{process::http::AuthenticatorManager}} takes care of the logic which is 
> common for all authenticators, while the {{process::http::Authenticator}} 
> implements specific authentication schemes (for more details, please head to 
> the design doc).
> Tests will be needed too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3579) FetcherCacheTest.LocalUncachedExtract is flaky

2015-11-23 Thread Benjamin Bannier (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3579:

Sprint: Mesosphere Sprint 23

> FetcherCacheTest.LocalUncachedExtract is flaky
> --
>
> Key: MESOS-3579
> URL: https://issues.apache.org/jira/browse/MESOS-3579
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Anand Mazumdar
>Assignee: Benjamin Bannier
>  Labels: flaky-test, mesosphere
>
> From ASF CI:
> https://builds.apache.org/job/Mesos/866/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/console
> {code}
> [ RUN  ] FetcherCacheTest.LocalUncachedExtract
> Using temporary directory '/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA'
> I0925 19:15:39.541198 27410 leveldb.cpp:176] Opened db in 3.43934ms
> I0925 19:15:39.542362 27410 leveldb.cpp:183] Compacted db in 1.136184ms
> I0925 19:15:39.542428 27410 leveldb.cpp:198] Created db iterator in 35866ns
> I0925 19:15:39.542448 27410 leveldb.cpp:204] Seeked to beginning of db in 
> 8807ns
> I0925 19:15:39.542459 27410 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 6325ns
> I0925 19:15:39.542505 27410 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0925 19:15:39.543143 27438 recover.cpp:449] Starting replica recovery
> I0925 19:15:39.543393 27438 recover.cpp:475] Replica is in EMPTY status
> I0925 19:15:39.544373 27436 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0925 19:15:39.544791 27433 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0925 19:15:39.545284 27433 recover.cpp:566] Updating replica status to 
> STARTING
> I0925 19:15:39.546155 27436 master.cpp:376] Master 
> c8bf1c95-50f4-4832-a570-c560f0b466ae (f57fd4291168) started on 
> 172.17.1.195:41781
> I0925 19:15:39.546257 27433 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 747249ns
> I0925 19:15:39.546288 27433 replica.cpp:323] Persisted replica status to 
> STARTING
> I0925 19:15:39.546483 27434 recover.cpp:475] Replica is in STARTING status
> I0925 19:15:39.546187 27436 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/master" 
> --zk_session_timeout="10secs"
> I0925 19:15:39.546567 27436 master.cpp:423] Master only allowing 
> authenticated frameworks to register
> I0925 19:15:39.546617 27436 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I0925 19:15:39.546632 27436 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/credentials'
> I0925 19:15:39.546931 27436 master.cpp:467] Using default 'crammd5' 
> authenticator
> I0925 19:15:39.547044 27436 master.cpp:504] Authorization enabled
> I0925 19:15:39.547276 27441 whitelist_watcher.cpp:79] No whitelist given
> I0925 19:15:39.547320 27434 hierarchical.hpp:468] Initialized hierarchical 
> allocator process
> I0925 19:15:39.547471 27438 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0925 19:15:39.548318 27443 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0925 19:15:39.549067 27435 recover.cpp:566] Updating replica status to VOTING
> I0925 19:15:39.549115 27440 master.cpp:1603] The newly elected leader is 
> master@172.17.1.195:41781 with id c8bf1c95-50f4-4832-a570-c560f0b466ae
> I0925 19:15:39.549162 27440 master.cpp:1616] Elected as the leading master!
> I0925 19:15:39.549190 27440 master.cpp:1376] Recovering from registrar
> I0925 19:15:39.549342 27434 registrar.cpp:309] Recovering registrar
> I0925 19:15:39.549666 27430 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 418187ns
> I0925 19:15:39.549753 27430 replica.cpp:323] Persisted replica status to 
> VOTING
> I0925 19:15:39.550089 27442 recover.cpp:580] Successfully joined

[jira] [Updated] (MESOS-3775) MasterAllocatorTest.SlaveLost is slow

2015-11-23 Thread Jan Schlicht (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-3775:

Sprint: Mesosphere Sprint 23

> MasterAllocatorTest.SlaveLost is slow
> -
>
> Key: MESOS-3775
> URL: https://issues.apache.org/jira/browse/MESOS-3775
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt, test
>Reporter: Alexander Rukletsov
>Assignee: Jan Schlicht
>Priority: Minor
>  Labels: mesosphere, tech-debt
>
> The {{MasterAllocatorTest.SlaveLost}} takes more that {{5s}} to complete. A 
> brief look into the code hints that the stopped agent does not quit 
> immediately (and hence its resources are not released by the allocator) 
> because [it waits for the executor to 
> terminate|https://github.com/apache/mesos/blob/master/src/tests/master_allocator_tests.cpp#L717].
>  {{5s}} timeout comes from {{EXECUTOR_SHUTDOWN_GRACE_PERIOD}} agent constant.
> Possible solutions:
> * Do not wait until the stopped agent quits (can be flaky, needs deeper 
> analysis).
> * Decrease the agent's {{executor_shutdown_grace_period}} flag.
> * Terminate the executor faster (this may require some refactoring since the 
> executor driver is created in the {{TestContainerizer}} and we do not have 
> direct access to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-2307) Add a Future state for gone processes

2015-11-23 Thread Alexander Rojas (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rojas updated MESOS-2307:
---
Sprint: Mesosphere Sprint 23

> Add a Future state for gone processes
> -
>
> Key: MESOS-2307
> URL: https://issues.apache.org/jira/browse/MESOS-2307
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Alexander Rukletsov
>
> If the libprocess process is terminated, we can still dispatch calls to it as 
> long as we have a {{UPID}}. In this case the future will be pending forever. 
> Instead, it would be better to introduce a separate state for such case, e.g. 
> {{Disconnected}}, {{Abandoned}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-2307) Add a Future state for gone processes

2015-11-23 Thread Alexander Rojas (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rojas reassigned MESOS-2307:
--

Assignee: Alexander Rojas

> Add a Future state for gone processes
> -
>
> Key: MESOS-2307
> URL: https://issues.apache.org/jira/browse/MESOS-2307
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rojas
>
> If the libprocess process is terminated, we can still dispatch calls to it as 
> long as we have a {{UPID}}. In this case the future will be pending forever. 
> Instead, it would be better to introduce a separate state for such case, e.g. 
> {{Disconnected}}, {{Abandoned}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-3231) Implement http::AuthenticatorManager and http::Authenticator

2015-11-23 Thread Alexander Rojas (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964906#comment-14964906
 ] 

Alexander Rojas edited comment on MESOS-3231 at 11/23/15 10:58 AM:
---

h3. Reviews

# -[r/37998/|https://reviews.apache.org/r/37998/]- (commited): Made 
ProcessManager::handle() a void returning method.
# -[r/39472/|https://reviews.apache.org/r/39472/]- (discarded): Added the 
helper container InheritanceTree where nodes inherit values from their 
ancestors.
# -[r/37999/|https://reviews.apache.org/r/37999/]- (commited): Summary:.


was (Author: arojas):
h3. Reviews

# -[r/37998/|https://reviews.apache.org/r/37998/]- (commited): Made 
ProcessManager::handle() a void returning method.
# -[r/39472/|https://reviews.apache.org/r/39472/]- (discarded): Added the 
helper container InheritanceTree where nodes inherit values from their 
ancestors.
# [r/37999/|https://reviews.apache.org/r/37999/] (commited): Summary:.

> Implement http::AuthenticatorManager and http::Authenticator
> 
>
> Key: MESOS-3231
> URL: https://issues.apache.org/jira/browse/MESOS-3231
> Project: Mesos
>  Issue Type: Improvement
>  Components: security
>Reporter: Alexander Rojas
>Assignee: Alexander Rojas
>  Labels: mesosphere, security
>
> As proposed in the document [Mesos HTTP Authentication 
> Design|https://docs.google.com/document/d/1kM3_f7DSqXcE2MuERrLTGp_XMC6ss2wmpkNYDCY5rOM],
>  a {{process::http::AuthenticatorManager}} and 
> {{process::http::Authenticator}} are needed.
> The {{process::http::AuthenticatorManager}} takes care of the logic which is 
> common for all authenticators, while the {{process::http::Authenticator}} 
> implements specific authentication schemes (for more details, please head to 
> the design doc).
> Tests will be needed too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3233) Allow developers to decide whether a HTTP endpoint should use authentication

2015-11-23 Thread Alexander Rojas (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rojas updated MESOS-3233:
---
Shepherd: Benjamin Mahler  (was: Bernd Mathiske)
  Sprint: Mesosphere Sprint 23

> Allow developers to decide whether a HTTP endpoint should use authentication
> 
>
> Key: MESOS-3233
> URL: https://issues.apache.org/jira/browse/MESOS-3233
> Project: Mesos
>  Issue Type: Improvement
>  Components: security
>Reporter: Alexander Rojas
>Assignee: Alexander Rojas
>  Labels: mesosphere, security
>
> Once HTTP Authentication is enabled, developers should be allowed to decide 
> which endpoints should require authentication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3977) http::_operation() creates unnecessary filter, rescinds unnecessarily

2015-11-23 Thread Neil Conway (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021958#comment-15021958
 ] 

Neil Conway commented on MESOS-3977:


I believe that logic is ensuring that, if an outstanding offer shares no 
resources in common with the required resources, we don't try to rescind the 
offer. i.e., doesn't seem related as far as I can tell.

> http::_operation() creates unnecessary filter, rescinds unnecessarily
> -
>
> Key: MESOS-3977
> URL: https://issues.apache.org/jira/browse/MESOS-3977
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere, reservations
>
> This function is used by the /reserve, /unreserve, /create-volume, and 
> /destroy-volume endpoints. It has a few worts:
> 1. It installs a 5-second filter when rescinding an offer. However, the 
> cluster state might change so that the filter is actually undesirable. For 
> example, this scenario:
> * Create DR, make offer
> * Create PV => rescinds previous offer, sets filter, makes offer
> * Destroy PV => rescinds previous offer
> After the last step, we'll wait 5 seconds for the filter to expire before 
> re-offering the DR.
> 2. If there are sufficient available resources at the target slave, we don't 
> actually need to rescind any offers in the first place. However, _operation() 
> rescinds offers unconditionally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-3718) Implement Quota support in allocator

2015-11-23 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971338#comment-14971338
 ] 

Alexander Rukletsov edited comment on MESOS-3718 at 11/23/15 10:02 AM:
---

https://reviews.apache.org/r/39399/
https://reviews.apache.org/r/39400/
https://reviews.apache.org/r/40551/
https://reviews.apache.org/r/39450/


was (Author: alexr):
https://reviews.apache.org/r/39399/
https://reviews.apache.org/r/39400/
https://reviews.apache.org/r/39401/
https://reviews.apache.org/r/39450/

> Implement Quota support in allocator
> 
>
> Key: MESOS-3718
> URL: https://issues.apache.org/jira/browse/MESOS-3718
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> The built-in Hierarchical DRF allocator should support Quota. This includes 
> (but not limited to): adding, updating, removing and satisfying quota; 
> avoiding both overcomitting resources and handing them to non-quota'ed roles 
> in presence of master failover.
> A [design doc for Quota support in 
> Allocator|https://issues.apache.org/jira/browse/MESOS-2937] provides an 
> overview of a feature set required to be implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3984) Tests for quota support in the hierarchical allocator

2015-11-23 Thread Alexander Rukletsov (JIRA)

Alexander Rukletsov created MESOS-3984:
--

 Summary: Tests for quota support in the hierarchical allocator
 Key: MESOS-3984
 URL: https://issues.apache.org/jira/browse/MESOS-3984
 Project: Mesos
  Issue Type: Task
  Components: allocation, test
Reporter: Alexander Rukletsov
Assignee: Alexander Rukletsov






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3965) Ensure resources in `QuotaInfo` protobuf do not contain `role`

2015-11-23 Thread Alexander Rukletsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-3965:
---
Shepherd: Joris Van Remoortere
  Sprint: Mesosphere Sprint 23

> Ensure resources in `QuotaInfo` protobuf do not contain `role`
> --
>
> Key: MESOS-3965
> URL: https://issues.apache.org/jira/browse/MESOS-3965
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> {{QuotaInfo}} protobuf currently stores per-role quotas, including 
> {{Resource}} objects. These resources are neither statically nor dynamically 
> reserved, hence they may not contain {{role}} field. We should ensure this 
> field is unset, as well as update validation routine for {{QuotaInfo}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3986) Tests for allocator recovery.

2015-11-23 Thread Alexander Rukletsov (JIRA)

Alexander Rukletsov created MESOS-3986:
--

 Summary: Tests for allocator recovery.
 Key: MESOS-3986
 URL: https://issues.apache.org/jira/browse/MESOS-3986
 Project: Mesos
  Issue Type: Task
  Components: allocation
Reporter: Alexander Rukletsov
Assignee: Alexander Rukletsov






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3986) Tests for allocator recovery

2015-11-23 Thread Alexander Rukletsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-3986:
---
Summary: Tests for allocator recovery  (was: Tests for allocator recovery.)

> Tests for allocator recovery
> 
>
> Key: MESOS-3986
> URL: https://issues.apache.org/jira/browse/MESOS-3986
> Project: Mesos
>  Issue Type: Task
>  Components: allocation
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-3940) /reserve and /unreserve should be permissive under a master without authentication.

2015-11-23 Thread Neil Conway (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway reassigned MESOS-3940:
--

Assignee: Neil Conway

> /reserve and /unreserve should be permissive under a master without 
> authentication.
> ---
>
> Key: MESOS-3940
> URL: https://issues.apache.org/jira/browse/MESOS-3940
> Project: Mesos
>  Issue Type: Bug
>Reporter: Michael Park
>Assignee: Neil Conway
>  Labels: mesosphere, reservations
>
> Currently, the {{/reserve}} and {{/unreserve}} endpoints do not work without 
> authentication enabled on the master. When authentication is disabled on the 
> master, these endpoints should just be permissive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3826) Add an optional unique identifier for resource reservations

2015-11-23 Thread Neil Conway (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021979#comment-15021979
 ] 

Neil Conway commented on MESOS-3826:


You don't _need_ an ID: the status quo is that your framework needs to compare 
the resources it is offered with the resources it has tried to reserve, and 
then it should either make additional reservations or unreserve duplicate 
reservations as necessary.

Adding reservation request IDs (likely along with some notion of 
"reconciliation" for those IDs) might make this simpler, though.

> Add an optional unique identifier for resource reservations
> ---
>
> Key: MESOS-3826
> URL: https://issues.apache.org/jira/browse/MESOS-3826
> Project: Mesos
>  Issue Type: Improvement
>  Components: general
>Reporter: Sargun Dhillon
>Assignee: Guangya Liu
>Priority: Minor
>  Labels: mesosphere, reservations
>
> Thanks to the resource reservation primitives, frameworks can reserve 
> resources. These reservations are per role, which means multiple frameworks 
> can share reservations. This can get very hairy, as multiple reservations can 
> occur on each agent. 
> It would be nice to be able to optionally, uniquely identify reservations by 
> ID, much like persistent volumes are today. This could be done by adding a 
> new protobuf field, such as Resource.ReservationInfo.id, that if set upon 
> reservation time, would come back when the reservation is advertised.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3940) /reserve and /unreserve should be permissive under a master without authentication.

2015-11-23 Thread Neil Conway (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-3940:
---
Labels: authentication mesosphere reservations  (was: mesosphere 
reservations)

> /reserve and /unreserve should be permissive under a master without 
> authentication.
> ---
>
> Key: MESOS-3940
> URL: https://issues.apache.org/jira/browse/MESOS-3940
> Project: Mesos
>  Issue Type: Bug
>Reporter: Michael Park
>Assignee: Neil Conway
>  Labels: authentication, mesosphere, reservations
>
> Currently, the {{/reserve}} and {{/unreserve}} endpoints do not work without 
> authentication enabled on the master. When authentication is disabled on the 
> master, these endpoints should just be permissive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3775) MasterAllocatorTest.SlaveLost is slow

2015-11-23 Thread Jan Schlicht (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-3775:

Sprint:   (was: Mesosphere Sprint 23)

> MasterAllocatorTest.SlaveLost is slow
> -
>
> Key: MESOS-3775
> URL: https://issues.apache.org/jira/browse/MESOS-3775
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt, test
>Reporter: Alexander Rukletsov
>Priority: Minor
>  Labels: mesosphere, tech-debt
>
> The {{MasterAllocatorTest.SlaveLost}} takes more that {{5s}} to complete. A 
> brief look into the code hints that the stopped agent does not quit 
> immediately (and hence its resources are not released by the allocator) 
> because [it waits for the executor to 
> terminate|https://github.com/apache/mesos/blob/master/src/tests/master_allocator_tests.cpp#L717].
>  {{5s}} timeout comes from {{EXECUTOR_SHUTDOWN_GRACE_PERIOD}} agent constant.
> Possible solutions:
> * Do not wait until the stopped agent quits (can be flaky, needs deeper 
> analysis).
> * Decrease the agent's {{executor_shutdown_grace_period}} flag.
> * Terminate the executor faster (this may require some refactoring since the 
> executor driver is created in the {{TestContainerizer}} and we do not have 
> direct access to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3988) Implicit roles

2015-11-23 Thread Neil Conway (JIRA)

Neil Conway created MESOS-3988:
--

 Summary: Implicit roles
 Key: MESOS-3988
 URL: https://issues.apache.org/jira/browse/MESOS-3988
 Project: Mesos
  Issue Type: Improvement
Reporter: Neil Conway


At present, Mesos uses a static list of roles that are configured when the 
master starts up. This places some severe limitations on how roles can be used 
(e.g., changing the set of roles requires restarting all the masters).

As an alternative (or a precursor) to implementing full-blown dynamic roles, we 
could instead relax the concept of roles, so that:
* frameworks can register with any role (subject to ACLs/authz)
* reservations can be made for any role

Open questions, at least to me:
* This would mean weights cannot be configured dynamically. Is that okay?
* Is this feature useful enough without dynamic ACL changes?
* If we implement this (+ dynamic ACLs), do we also need dynamic roles?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3988) Implicit roles

2015-11-23 Thread Guangya Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022024#comment-15022024
 ] 

Guangya Liu commented on MESOS-3988:


One concern for this is that the role will be out of control in mesos cluster 
because the end user may input some wrong words for roles. 

If we can go with this direction, for the weight issues, perhaps we can enable 
framework set roles plus weight, i,e, "role:weight", and we can define some 
policy to decide how to get the weight if same roles configured different 
weights, such as {{sum(weight)/n}} or using the biggest or smallest weight etc.

> Implicit roles
> --
>
> Key: MESOS-3988
> URL: https://issues.apache.org/jira/browse/MESOS-3988
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Neil Conway
>  Labels: mesosphere, roles
>
> At present, Mesos uses a static list of roles that are configured when the 
> master starts up. This places some severe limitations on how roles can be 
> used (e.g., changing the set of roles requires restarting all the masters).
> As an alternative (or a precursor) to implementing full-blown dynamic roles, 
> we could instead relax the concept of roles, so that:
> * frameworks can register with any role (subject to ACLs/authz)
> * reservations can be made for any role
> Open questions, at least to me:
> * This would mean weights cannot be configured dynamically. Is that okay?
> * Is this feature useful enough without dynamic ACL changes?
> * If we implement this (+ dynamic ACLs), do we also need dynamic roles?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3940) /reserve and /unreserve should be permissive under a master without authentication.

2015-11-23 Thread Anand Mazumdar (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022040#comment-15022040
 ] 

Anand Mazumdar commented on MESOS-3940:
---

[~neilc] It won't be a bad idea to wait for MESOS-3233 ? Once that is 
implemented, all you need to do is remove the boiler plate code inside the 
handler function that tries to extract the {{Authorization}} header itself.

> /reserve and /unreserve should be permissive under a master without 
> authentication.
> ---
>
> Key: MESOS-3940
> URL: https://issues.apache.org/jira/browse/MESOS-3940
> Project: Mesos
>  Issue Type: Bug
>Reporter: Michael Park
>Assignee: Neil Conway
>  Labels: authentication, mesosphere, reservations
>
> Currently, the {{/reserve}} and {{/unreserve}} endpoints do not work without 
> authentication enabled on the master. When authentication is disabled on the 
> master, these endpoints should just be permissive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3940) /reserve and /unreserve should be permissive under a master without authentication.

2015-11-23 Thread Anand Mazumdar (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3940:
--
Shepherd: Michael Park

> /reserve and /unreserve should be permissive under a master without 
> authentication.
> ---
>
> Key: MESOS-3940
> URL: https://issues.apache.org/jira/browse/MESOS-3940
> Project: Mesos
>  Issue Type: Bug
>Reporter: Michael Park
>Assignee: Neil Conway
>  Labels: authentication, mesosphere, reservations
>
> Currently, the {{/reserve}} and {{/unreserve}} endpoints do not work without 
> authentication enabled on the master. When authentication is disabled on the 
> master, these endpoints should just be permissive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3987) /create-volume, /destroy-volume should be permissive under a master without authentication.

2015-11-23 Thread Neil Conway (JIRA)

Neil Conway created MESOS-3987:
--

 Summary: /create-volume, /destroy-volume should be permissive 
under a master without authentication.
 Key: MESOS-3987
 URL: https://issues.apache.org/jira/browse/MESOS-3987
 Project: Mesos
  Issue Type: Bug
Reporter: Neil Conway


See MESOS-3940 for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3775) MasterAllocatorTest.SlaveLost is slow

2015-11-23 Thread Jan Schlicht (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-3775:

Assignee: (was: Jan Schlicht)

> MasterAllocatorTest.SlaveLost is slow
> -
>
> Key: MESOS-3775
> URL: https://issues.apache.org/jira/browse/MESOS-3775
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt, test
>Reporter: Alexander Rukletsov
>Priority: Minor
>  Labels: mesosphere, tech-debt
>
> The {{MasterAllocatorTest.SlaveLost}} takes more that {{5s}} to complete. A 
> brief look into the code hints that the stopped agent does not quit 
> immediately (and hence its resources are not released by the allocator) 
> because [it waits for the executor to 
> terminate|https://github.com/apache/mesos/blob/master/src/tests/master_allocator_tests.cpp#L717].
>  {{5s}} timeout comes from {{EXECUTOR_SHUTDOWN_GRACE_PERIOD}} agent constant.
> Possible solutions:
> * Do not wait until the stopped agent quits (can be flaky, needs deeper 
> analysis).
> * Decrease the agent's {{executor_shutdown_grace_period}} flag.
> * Terminate the executor faster (this may require some refactoring since the 
> executor driver is created in the {{TestContainerizer}} and we do not have 
> direct access to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3977) http::_operation() creates unnecessary filter, rescinds unnecessarily

2015-11-23 Thread Guangya Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022055#comment-15022055
 ] 

Guangya Liu commented on MESOS-3977:


I see, yes, we should do some checking to see if the slave has enough resources 
before rescind offers instead of pessimistically rescind offers.

> http::_operation() creates unnecessary filter, rescinds unnecessarily
> -
>
> Key: MESOS-3977
> URL: https://issues.apache.org/jira/browse/MESOS-3977
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere, reservations
>
> This function is used by the /reserve, /unreserve, /create-volume, and 
> /destroy-volume endpoints. It has a few worts:
> 1. It installs a 5-second filter when rescinding an offer. However, the 
> cluster state might change so that the filter is actually undesirable. For 
> example, this scenario:
> * Create DR, make offer
> * Create PV => rescinds previous offer, sets filter, makes offer
> * Destroy PV => rescinds previous offer
> After the last step, we'll wait 5 seconds for the filter to expire before 
> re-offering the DR.
> 2. If there are sufficient available resources at the target slave, we don't 
> actually need to rescind any offers in the first place. However, _operation() 
> rescinds offers unconditionally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3026) ProcessTest.Cache fails and hangs

2015-11-23 Thread Bernd Mathiske (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-3026:
--
Priority: Minor  (was: Blocker)

> ProcessTest.Cache fails and hangs
> -
>
> Key: MESOS-3026
> URL: https://issues.apache.org/jira/browse/MESOS-3026
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
> Environment: ubuntu 15.04/ ubuntu 14.04.2
> clang-3.6 / gcc 4.8.2
>Reporter: Joris Van Remoortere
>Priority: Minor
>  Labels: libprocess, mesosphere, tests
>
> {code}
> [ RUN  ] ProcessTest.Cache
> ../../../3rdparty/libprocess/src/tests/process_tests.cpp:1726: Failure
> Value of: response.get().status
>   Actual: "200 OK"
> Expected: "304 Not Modified"
> [  FAILED  ] ProcessTest.Cache (1 ms)
> {code}
> The tests then finish running, but the gtest framework fails to terminate and 
> uses 100% CPU.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3983) Tests for quota request validation

2015-11-23 Thread Alexander Rukletsov (JIRA)

Alexander Rukletsov created MESOS-3983:
--

 Summary: Tests for quota request validation
 Key: MESOS-3983
 URL: https://issues.apache.org/jira/browse/MESOS-3983
 Project: Mesos
  Issue Type: Task
  Components: master, test
Reporter: Alexander Rukletsov
Assignee: Joerg Schad


Tests should include:
* JSON validation;
* Absence of irrelevant fields;
* Semantic validation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3026) ProcessTest.Cache fails and hangs

2015-11-23 Thread Bernd Mathiske (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-3026:
--
Assignee: (was: Alexander Rojas)

> ProcessTest.Cache fails and hangs
> -
>
> Key: MESOS-3026
> URL: https://issues.apache.org/jira/browse/MESOS-3026
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
> Environment: ubuntu 15.04/ ubuntu 14.04.2
> clang-3.6 / gcc 4.8.2
>Reporter: Joris Van Remoortere
>Priority: Blocker
>  Labels: libprocess, mesosphere, tests
>
> {code}
> [ RUN  ] ProcessTest.Cache
> ../../../3rdparty/libprocess/src/tests/process_tests.cpp:1726: Failure
> Value of: response.get().status
>   Actual: "200 OK"
> Expected: "304 Not Modified"
> [  FAILED  ] ProcessTest.Cache (1 ms)
> {code}
> The tests then finish running, but the gtest framework fails to terminate and 
> uses 100% CPU.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3982) Tests for Quota

2015-11-23 Thread Alexander Rukletsov (JIRA)

Alexander Rukletsov created MESOS-3982:
--

 Summary: Tests for Quota
 Key: MESOS-3982
 URL: https://issues.apache.org/jira/browse/MESOS-3982
 Project: Mesos
  Issue Type: Epic
  Components: test
Reporter: Alexander Rukletsov
Assignee: Alexander Rukletsov


We need quite some tests for the quota feature. They span over multiple 
subsystems:
* Request validation;
* Capacity heuristic, rescinding resources;
* Master failover and recovery;
* Registry;
* Allocator;
* Functionality and quota guarantees;
* Integration tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-708) Static files missing "Last-Modified" HTTP headers

2015-11-23 Thread Bernd Mathiske (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-708:
-
Assignee: (was: Alexander Rojas)

> Static files missing "Last-Modified" HTTP headers
> -
>
> Key: MESOS-708
> URL: https://issues.apache.org/jira/browse/MESOS-708
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess, webui
>Affects Versions: 0.13.0
>Reporter: Ross Allen
>  Labels: mesosphere
>
> Static assets served by the Mesos master don't return "Last-Modified" HTTP 
> headers. That means clients receive a 200 status code and re-download assets 
> on every page request even if the assets haven't changed. Because Angular JS 
> does most of the work, the downloading happens only when you navigate to 
> Mesos master in your browser or use the browser's refresh.
> Example header for "mesos.css":
> HTTP/1.1 200 OK
> Date: Thu, 26 Sep 2013 17:18:52 GMT
> Content-Length: 1670
> Content-Type: text/css
> Clients sometimes use the "Date" header for the same effect as 
> "Last-Modified", but the date is always the time of the response from the 
> server, i.e. it changes on every request and makes the assets look new every 
> time.
> The "Last-Modified" header should be added and should be the last modified 
> time of the file. On subsequent requests for the same files, the master 
> should return 304 responses with no content rather than 200 with the full 
> files. It could save clients a lot of download time since Mesos assets are 
> rather heavyweight.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-3718) Implement Quota support in allocator

2015-11-23 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971338#comment-14971338
 ] 

Alexander Rukletsov edited comment on MESOS-3718 at 11/23/15 10:08 AM:
---

https://reviews.apache.org/r/39399/
https://reviews.apache.org/r/39400/
https://reviews.apache.org/r/40551/


was (Author: alexr):
https://reviews.apache.org/r/39399/
https://reviews.apache.org/r/39400/
https://reviews.apache.org/r/40551/
https://reviews.apache.org/r/39450/

> Implement Quota support in allocator
> 
>
> Key: MESOS-3718
> URL: https://issues.apache.org/jira/browse/MESOS-3718
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> The built-in Hierarchical DRF allocator should support Quota. This includes 
> (but not limited to): adding, updating, removing and satisfying quota; 
> avoiding both overcomitting resources and handing them to non-quota'ed roles 
> in presence of master failover.
> A [design doc for Quota support in 
> Allocator|https://issues.apache.org/jira/browse/MESOS-2937] provides an 
> overview of a feature set required to be implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3981) Implement recovery in the Hierarchical allocator

2015-11-23 Thread Alexander Rukletsov (JIRA)

Alexander Rukletsov created MESOS-3981:
--

 Summary: Implement recovery in the Hierarchical allocator
 Key: MESOS-3981
 URL: https://issues.apache.org/jira/browse/MESOS-3981
 Project: Mesos
  Issue Type: Task
  Components: allocation
Reporter: Alexander Rukletsov
Assignee: Alexander Rukletsov


The built-in Hierarchical allocator should implement the recovery (in the 
presence of quota).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3720) Tests for Quota support in master

2015-11-23 Thread Alexander Rukletsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-3720:
---
Description: 
Allocator-agnostic tests for quota support in the master. They can be divided 
into several groups:
* Heuristic check;
* Master failover;
* Functionality and quota guarantees.

  was:
Allocator-agnostic tests for quota support in the master. They can be divided 
into several groups:
* Request validation;
* Satisfiability validation;
* Master failover;
* Persisting in the registry;
* Functionality and quota guarantees.


> Tests for Quota support in master
> -
>
> Key: MESOS-3720
> URL: https://issues.apache.org/jira/browse/MESOS-3720
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> Allocator-agnostic tests for quota support in the master. They can be divided 
> into several groups:
> * Heuristic check;
> * Master failover;
> * Functionality and quota guarantees.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3984) Tests for quota support in `allocate()` function.

2015-11-23 Thread Alexander Rukletsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-3984:
---
Summary: Tests for quota support in `allocate()` function.  (was: Tests for 
quota support in the hierarchical allocator)

> Tests for quota support in `allocate()` function.
> -
>
> Key: MESOS-3984
> URL: https://issues.apache.org/jira/browse/MESOS-3984
> Project: Mesos
>  Issue Type: Task
>  Components: allocation, test
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3940) /reserve and /unreserve should be permissive under a master without authentication.

2015-11-23 Thread Guangya Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022106#comment-15022106
 ] 

Guangya Liu commented on MESOS-3940:


Its a good idea, [~anandmazumdar] [~arojas] can you please show more detail for 
how MESOS-3233 works, maybe an example? Both this and MESOS-3987 can benefit 
from this.

> /reserve and /unreserve should be permissive under a master without 
> authentication.
> ---
>
> Key: MESOS-3940
> URL: https://issues.apache.org/jira/browse/MESOS-3940
> Project: Mesos
>  Issue Type: Bug
>Reporter: Michael Park
>Assignee: Neil Conway
>  Labels: authentication, mesosphere, reservations
>
> Currently, the {{/reserve}} and {{/unreserve}} endpoints do not work without 
> authentication enabled on the master. When authentication is disabled on the 
> master, these endpoints should just be permissive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3940) /reserve and /unreserve should be permissive under a master without authentication.

2015-11-23 Thread Alexander Rojas (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022113#comment-15022113
 ] 

Alexander Rojas commented on MESOS-3940:


Under MESOS-3233, endpoints which require authentication will give a 
[realm|http://tools.ietf.org/html/rfc1945#section-11] when they call {{route}}. 
The handler function also differs in that it now requires an additional 
parameter of {{const Option& principal}}. If authentication is 
turned off or no authenticators were set, the principal is {{None}}, otherwise 
it is, well, the principal.

The logic of what to do when no principal is provided is left to the handler 
itself. So I guess the ACLs should enable themselves the permissive behavior.

> /reserve and /unreserve should be permissive under a master without 
> authentication.
> ---
>
> Key: MESOS-3940
> URL: https://issues.apache.org/jira/browse/MESOS-3940
> Project: Mesos
>  Issue Type: Bug
>Reporter: Michael Park
>Assignee: Neil Conway
>  Labels: authentication, mesosphere, reservations
>
> Currently, the {{/reserve}} and {{/unreserve}} endpoints do not work without 
> authentication enabled on the master. When authentication is disabled on the 
> master, these endpoints should just be permissive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3969) Failing 'make distcheck' on Debian 8, somehow SSL-related.

2015-11-23 Thread Bernd Mathiske (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-3969:
--
Assignee: Joseph Wu  (was: Joris Van Remoortere)

> Failing 'make distcheck' on Debian 8, somehow SSL-related.
> --
>
> Key: MESOS-3969
> URL: https://issues.apache.org/jira/browse/MESOS-3969
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
> Environment: Debian 8, gcc 4.9.2, Docker 1.9.0, vagrant, libvirt
> Vagrantfile see MESOS-3957
>Reporter: Bernd Mathiske
>Assignee: Joseph Wu
>  Labels: build, build-failure, mesosphere
>
> As non-root: make distcheck.
> {noformat}
> /bin/mkdir -p '/home/vagrant/mesos/build/mesos-0.26.0/_inst/bin'
> /bin/bash ../libtool --mode=install /usr/bin/install -c mesos-local mesos-log 
> mesos mesos-execute mesos-resolve 
> '/home/vagrant/mesos/build/mesos-0.26.0/_inst/bin'
> libtool: install: /usr/bin/install -c .libs/mesos-local 
> /home/vagrant/mesos/build/mesos-0.26.0/_inst/bin/mesos-local
> libtool: install: /usr/bin/install -c .libs/mesos-log 
> /home/vagrant/mesos/build/mesos-0.26.0/_inst/bin/mesos-log
> libtool: install: /usr/bin/install -c .libs/mesos 
> /home/vagrant/mesos/build/mesos-0.26.0/_inst/bin/mesos
> libtool: install: /usr/bin/install -c .libs/mesos-execute 
> /home/vagrant/mesos/build/mesos-0.26.0/_inst/bin/mesos-execute
> libtool: install: /usr/bin/install -c .libs/mesos-resolve 
> /home/vagrant/mesos/build/mesos-0.26.0/_inst/bin/mesos-resolve
> Traceback (most recent call last):
> File "", line 1, in 
> File 
> "/home/vagrant/mesos/build/mesos-0.26.0/build/3rdparty/pip-1.5.6/pip/__init_.py",
>  line 11, in 
> from pip.vcs import git, mercurial, subversion, bazaar # noqa
> File 
> "/home/vagrant/mesos/build/mesos-0.26.0/_build/3rdparty/pip-1.5.6/pip/vcs/mercurial.py",
>  line 9, in 
> from pip.download import path_to_url
> File 
> "/home/vagrant/mesos/build/mesos-0.26.0/_build/3rdparty/pip-1.5.6/pip/download.py",
>  line 22, in 
> from pip._vendor import requests, six
> File 
> "/home/vagrant/mesos/build/mesos-0.26.0/build/3rdparty/pip-1.5.6/pip/_vendor/requests/__init_.py",
>  line 53, in 
> from .packages.urllib3.contrib import pyopenssl
> File 
> "/home/vagrant/mesos/build/mesos-0.26.0/_build/3rdparty/pip-1.5.6/pip/_vendor/requests/packages/urllib3/contrib/pyopenssl.py",
>  line 70, in 
> ssl.PROTOCOL_SSLv3: OpenSSL.SSL.SSLv3_METHOD,
> AttributeError: 'module' object has no attribute 'PROTOCOL_SSLv3'
> Traceback (most recent call last):
> File "", line 1, in 
> File "/home/vagrant/mesos/build/mesos-0.26.0/_build/3rd
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3579) FetcherCacheTest.LocalUncachedExtract is flaky

2015-11-23 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022170#comment-15022170
 ] 

Benjamin Bannier commented on MESOS-3579:
-

Some extra code was added middle of October to log additional information on 
the fetcher (who looks like the culprit here) in case of failure. We should add 
more information  once it fails again.

> FetcherCacheTest.LocalUncachedExtract is flaky
> --
>
> Key: MESOS-3579
> URL: https://issues.apache.org/jira/browse/MESOS-3579
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Anand Mazumdar
>Assignee: Benjamin Bannier
>  Labels: flaky-test, mesosphere
>
> From ASF CI:
> https://builds.apache.org/job/Mesos/866/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/console
> {code}
> [ RUN  ] FetcherCacheTest.LocalUncachedExtract
> Using temporary directory '/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA'
> I0925 19:15:39.541198 27410 leveldb.cpp:176] Opened db in 3.43934ms
> I0925 19:15:39.542362 27410 leveldb.cpp:183] Compacted db in 1.136184ms
> I0925 19:15:39.542428 27410 leveldb.cpp:198] Created db iterator in 35866ns
> I0925 19:15:39.542448 27410 leveldb.cpp:204] Seeked to beginning of db in 
> 8807ns
> I0925 19:15:39.542459 27410 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 6325ns
> I0925 19:15:39.542505 27410 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0925 19:15:39.543143 27438 recover.cpp:449] Starting replica recovery
> I0925 19:15:39.543393 27438 recover.cpp:475] Replica is in EMPTY status
> I0925 19:15:39.544373 27436 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0925 19:15:39.544791 27433 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0925 19:15:39.545284 27433 recover.cpp:566] Updating replica status to 
> STARTING
> I0925 19:15:39.546155 27436 master.cpp:376] Master 
> c8bf1c95-50f4-4832-a570-c560f0b466ae (f57fd4291168) started on 
> 172.17.1.195:41781
> I0925 19:15:39.546257 27433 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 747249ns
> I0925 19:15:39.546288 27433 replica.cpp:323] Persisted replica status to 
> STARTING
> I0925 19:15:39.546483 27434 recover.cpp:475] Replica is in STARTING status
> I0925 19:15:39.546187 27436 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/master" 
> --zk_session_timeout="10secs"
> I0925 19:15:39.546567 27436 master.cpp:423] Master only allowing 
> authenticated frameworks to register
> I0925 19:15:39.546617 27436 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I0925 19:15:39.546632 27436 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/credentials'
> I0925 19:15:39.546931 27436 master.cpp:467] Using default 'crammd5' 
> authenticator
> I0925 19:15:39.547044 27436 master.cpp:504] Authorization enabled
> I0925 19:15:39.547276 27441 whitelist_watcher.cpp:79] No whitelist given
> I0925 19:15:39.547320 27434 hierarchical.hpp:468] Initialized hierarchical 
> allocator process
> I0925 19:15:39.547471 27438 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0925 19:15:39.548318 27443 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0925 19:15:39.549067 27435 recover.cpp:566] Updating replica status to VOTING
> I0925 19:15:39.549115 27440 master.cpp:1603] The newly elected leader is 
> master@172.17.1.195:41781 with id c8bf1c95-50f4-4832-a570-c560f0b466ae
> I0925 19:15:39.549162 27440 master.cpp:1616] Elected as the leading master!
> I0925 19:15:39.549190 27440 master.cpp:1376] Recovering from registrar
> I0925 19:15:39.549342 27434 registrar.cpp:309] Recovering registrar
> I0925 19:15:39.549666 27430 leveldb.cpp:306]

[jira] [Updated] (MESOS-3988) Implicit roles

2015-11-23 Thread Neil Conway (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-3988:
---
  Sprint:   (was: Mesosphere Sprint 23)
Story Points:   (was: 8)
   Epic Name: Implicit Roles

> Implicit roles
> --
>
> Key: MESOS-3988
> URL: https://issues.apache.org/jira/browse/MESOS-3988
> Project: Mesos
>  Issue Type: Epic
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere, roles
>
> At present, Mesos uses a static list of roles that are configured when the 
> master starts up. This places some severe limitations on how roles can be 
> used (e.g., changing the set of roles requires restarting all the masters).
> As an alternative (or a precursor) to implementing full-blown dynamic roles, 
> we could instead relax the concept of roles, so that:
> * frameworks can register with any role (subject to ACLs/authz)
> * reservations can be made for any role
> Open questions, at least to me:
> * This would mean weights cannot be configured dynamically. Is that okay?
> * Is this feature useful enough without dynamic ACL changes?
> * If we implement this (+ dynamic ACLs), do we also need dynamic roles?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3939) ubsan error in net::IP::create(sockaddr const&): misaligned address

2015-11-23 Thread Neil Conway (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-3939:
---
Shepherd: Joris Van Remoortere

> ubsan error in net::IP::create(sockaddr const&): misaligned address
> ---
>
> Key: MESOS-3939
> URL: https://issues.apache.org/jira/browse/MESOS-3939
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Minor
>  Labels: mesosphere, ubsan
>
> Running ubsan from GCC 5.2 on the current Mesos unit tests yields this, among 
> other problems:
> {noformat}
> /mesos/3rdparty/libprocess/3rdparty/stout/include/stout/ip.hpp:230:56: 
> runtime error: reference binding to misaligned address 0x0199629c for 
> type 'const struct sockaddr_storage', which requires 8 byte alignment
> 0x0199629c: note: pointer points here
>   00 00 00 00 02 00 00 00  ff ff ff 00 00 00 00 00  00 00 00 00 00 00 00 00  
> 00 00 00 00 00 00 00 00
>   ^
> #0 0x5950cb in net::IP::create(sockaddr const&) 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x5950cb)
> #1 0x5970cd in 
> net::IPNetwork::fromLinkDevice(std::__cxx11::basic_string std::char_traits, std::allocator > const&, int) 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x5970cd)
> #2 0x58e006 in NetTest_LinkDevice_Test::TestBody() 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x58e006)
> #3 0x85abd5 in void 
> testing::internal::HandleSehExceptionsInMethodIfSupported void>(testing::Test*, void (testing::Test::*)(), char const*) 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x85abd5)
> #4 0x848abc in void 
> testing::internal::HandleExceptionsInMethodIfSupported void>(testing::Test*, void (testing::Test::*)(), char const*) 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x848abc)
> #5 0x7e2755 in testing::Test::Run() 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x7e2755)
> #6 0x7e44a0 in testing::TestInfo::Run() 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x7e44a0)
> #7 0x7e5ffa in testing::TestCase::Run() 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x7e5ffa)
> #8 0x7ffe21 in testing::internal::UnitTestImpl::RunAllTests() 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x7ffe21)
> #9 0x85d7a5 in bool 
> testing::internal::HandleSehExceptionsInMethodIfSupported  bool>(testing::internal::UnitTestImpl*, bool 
> (testing::internal::UnitTestImpl::*)(), char const*) 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x85d7a5)
> #10 0x84b37a in bool 
> testing::internal::HandleExceptionsInMethodIfSupported  bool>(testing::internal::UnitTestImpl*, bool 
> (testing::internal::UnitTestImpl::*)(), char const*) 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x84b37a)
> #11 0x7f8a4a in testing::UnitTest::Run() 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x7f8a4a)
> #12 0x608a96 in RUN_ALL_TESTS() 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x608a96)
> #13 0x60896b in main 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x60896b)
> #14 0x7fd0f0c7fa3f in __libc_start_main 
> (/lib/x86_64-linux-gnu/libc.so.6+0x20a3f)
> #15 0x4145c8 in _start 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x4145c8)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-3851) Investigate recent crashes in Command Executor

2015-11-23 Thread Marco Massenzio (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio reassigned MESOS-3851:
--

Assignee: Anand Mazumdar  (was: Benjamin Mahler)

> Investigate recent crashes in Command Executor
> --
>
> Key: MESOS-3851
> URL: https://issues.apache.org/jira/browse/MESOS-3851
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere
>
> Post https://reviews.apache.org/r/38900 i.e. updating CommandExecutor to 
> support rootfs. There seem to be some tests showing frequent crashes due to 
> assert violations.
> {{FetcherCacheTest.SimpleEviction}} failed due to the following log:
> {code}
> I1107 19:36:46.360908 30657 slave.cpp:1793] Sending queued task '3' to 
> executor ''3' of framework 7d94c7fb-8950-4bcf-80c1-46112292dcd6- at 
> executor(1)@172.17.5.200:33871'
> I1107 19:36:46.363682  1236 exec.cpp:297] 
> I1107 19:36:46.373569  1245 exec.cpp:210] Executor registered on slave 
> 7d94c7fb-8950-4bcf-80c1-46112292dcd6-S0
> @ 0x7f9f5a7db3fa  google::LogMessage::Fail()
> I1107 19:36:46.394081  1245 exec.cpp:222] Executor::registered took 395411ns
> @ 0x7f9f5a7db359  google::LogMessage::SendToLog()
> @ 0x7f9f5a7dad6a  google::LogMessage::Flush()
> @ 0x7f9f5a7dda9e  google::LogMessageFatal::~LogMessageFatal()
> @   0x48d00a  _CheckFatal::~_CheckFatal()
> @   0x49c99d  
> mesos::internal::CommandExecutorProcess::launchTask()
> @   0x4b3dd7  
> _ZZN7process8dispatchIN5mesos8internal22CommandExecutorProcessEPNS1_14ExecutorDriverERKNS1_8TaskInfoES5_S6_EEvRKNS_3PIDIT_EEMSA_FvT0_T1_ET2_T3_ENKUlPNS_11ProcessBaseEE_clESL_
> @   0x4c470c  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal22CommandExecutorProcessEPNS5_14ExecutorDriverERKNS5_8TaskInfoES9_SA_EEvRKNS0_3PIDIT_EEMSE_FvT0_T1_ET2_T3_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x7f9f5a761b1b  std::function<>::operator()()
> @ 0x7f9f5a749935  process::ProcessBase::visit()
> @ 0x7f9f5a74d700  process::DispatchEvent::visit()
> @   0x48e004  process::ProcessBase::serve()
> @ 0x7f9f5a745d21  process::ProcessManager::resume()
> @ 0x7f9f5a742f52  
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
> @ 0x7f9f5a74cf2c  
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
> @ 0x7f9f5a74cedc  
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
> @ 0x7f9f5a74ce6e  
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @ 0x7f9f5a74cdc5  
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
> @ 0x7f9f5a74cd5e  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
> @ 0x7f9f5624f1e0  (unknown)
> @ 0x7f9f564a8df5  start_thread
> @ 0x7f9f559b71ad  __clone
> I1107 19:36:46.551370 30656 containerizer.cpp:1257] Executor for container 
> '6553a617-6b4a-418d-9759-5681f45ff854' has exited
> I1107 19:36:46.551429 30656 containerizer.cpp:1074] Destroying container 
> '6553a617-6b4a-418d-9759-5681f45ff854'
> I1107 19:36:46.553869 30656 containerizer.cpp:1257] Executor for container 
> 'd2c1f924-c92a-453e-82b1-c294d09c4873' has exited
> {code}
> The reason seems to be a race between the executor receiving a 
> {{RunTaskMessage}} before {{ExecutorRegisteredMessage}} leading to the 
> {{CHECK_SOME(executorInfo)}} failure.
> Link to complete log: 
> https://issues.apache.org/jira/browse/MESOS-2831?focusedCommentId=14995535=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14995535
> Another related failure from {{ExamplesTest.PersistentVolumeFramework}}
> {code}
> @ 0x7f4f71529cbd  google::LogMessage::SendToLog()
> I1107 13:15:09.949987 31573 slave.cpp:2337] Status update manager 
> successfully handled status update acknowledgement (UUID: 
> 721c7316-5580-4636-a83a-098e3bd4ed1f) for task 
> ad90531f-d3d8-43f6-96f2-c81c4548a12d of framework 
> ac4ea54a-7d19-4e41-9ee3-1a761f8e5b0f-
> @ 0x7f4f715296ce  google::LogMessage::Flush()
> @ 0x7f4f7152c402  google::LogMessageFatal::~LogMessageFatal()
> @   0x48d00a

[jira] [Created] (MESOS-4000) Implicit roles: Design Doc

2015-11-23 Thread Neil Conway (JIRA)

Neil Conway created MESOS-4000:
--

 Summary: Implicit roles: Design Doc
 Key: MESOS-4000
 URL: https://issues.apache.org/jira/browse/MESOS-4000
 Project: Mesos
  Issue Type: Task
  Components: master
Reporter: Neil Conway






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3975) SSL build of mesos causes flaky testsuite.

2015-11-23 Thread Joris Van Remoortere (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023229#comment-15023229
 ] 

Joris Van Remoortere commented on MESOS-3975:
-

[~tillt] Are they reproducible if you run the whole test suite? or not even 
then?
It would be great to at least eliminate that these are SSL specific, rather 
than they happened to fail once when we happened to have SSL enabled.

> SSL build of mesos causes flaky testsuite.
> --
>
> Key: MESOS-3975
> URL: https://issues.apache.org/jira/browse/MESOS-3975
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
> Environment: CentOS 7.1, Kernel 3.10.0-229.20.1.el7.x86_64, gcc 
> 4.8.3, Docker 1.9
>Reporter: Till Toenshoff
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
>
> When running the tests of an SSL build of Mesos on CentOS 7.1, I see spurious 
> test failures that are, so far, not reproducible.
> The following tests did fail for me in complete runs but did seem fine when 
> running them individually, in repetition.  
> {noformat}
> DockerTest.ROOT_DOCKER_CheckPortResource
> {noformat}
> {noformat}
> ContainerizerTest.ROOT_CGROUPS_BalloonFramework
> {noformat}
> {noformat}
> [ RUN  ] 
> LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystemCommandExecutor
> 2015-11-20 
> 19:08:38,826:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> + /home/vagrant/mesos/build/src/mesos-containerizer mount --help=false 
> --operation=make-rslave --path=/
> + grep -E 
> /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/.+
>  /proc/self/mountinfo
> + grep -v 2b98025c-74f1-41d2-b35a-ce2cdfae347e
> + cut '-d ' -f5
> + xargs --no-run-if-empty umount -l
> + mount -n --rbind 
> /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/provisioner/containers/2b98025c-74f1-41d2-b35a-ce2cdfae347e/backends/copy/rootfses/bed11080-474b-4c69-8e7f-0ab85e895b0d
>  
> /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/slaves/830e842e-c36a-4e4c-bff4-5b9568d7df12-S0/frameworks/830e842e-c36a-4e4c-bff4-5b9568d7df12-/executors/c735be54-c47f-4645-bfc1-2f4647e2cddb/runs/2b98025c-74f1-41d2-b35a-ce2cdfae347e/.rootfs
> Could not load cert file
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:354: Failure
> Value of: statusRunning.get().state()
>   Actual: TASK_FAILED
> Expected: TASK_RUNNING
> 2015-11-20 
> 19:08:42,164:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2015-11-20 
> 19:08:45,501:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2015-11-20 
> 19:08:48,837:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2015-11-20 
> 19:08:52,174:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:355: Failure
> Failed to wait 15secs for statusFinished
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:349: Failure
> Actual function call count doesn't match EXPECT_CALL(sched, 
> statusUpdate(, _))...
>  Expected: to be called twice
>Actual: called once - unsatisfied and active
> 2015-11-20 
> 19:08:55,511:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> *** Aborted at 1448046536 (unix time) try "date -d @1448046536" if you are 
> using GNU date ***
> PC: @0x0 (unknown)
> *** SIGSEGV (@0x0) received by PID 21380 (TID 0x7fa1549e68c0) from PID 0; 
> stack trace: ***
> @ 0x7fa141796fbb (unknown)
> @ 0x7fa14179b341 (unknown)
> @ 0x7fa14f096130 (unknown)
> {noformat}
> Vagrantfile generator:
> {noformat}
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.hostname = "centos71"
>   config.vm.box = "bento/centos-7.1"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = 16384
> vb.cpus = 8
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = 9216

[jira] [Assigned] (MESOS-2980) Support execution configuration to be returned from provisioner

2015-11-23 Thread Timothy Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen reassigned MESOS-2980:
---

Assignee: Timothy Chen

> Support execution configuration to be returned from provisioner
> ---
>
> Key: MESOS-2980
> URL: https://issues.apache.org/jira/browse/MESOS-2980
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: mesosphere
>
> Image specs also includes execution configuration (e.g: Env, user, ports, 
> etc).
> We should support passing those information from the image provisioner back 
> to the containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3939) ubsan error in net::IP::create(sockaddr const&): misaligned address

2015-11-23 Thread Marco Massenzio (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-3939:
---
Sprint:   (was: Mesosphere Sprint 23)

> ubsan error in net::IP::create(sockaddr const&): misaligned address
> ---
>
> Key: MESOS-3939
> URL: https://issues.apache.org/jira/browse/MESOS-3939
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Minor
>  Labels: mesosphere, ubsan
>
> Running ubsan from GCC 5.2 on the current Mesos unit tests yields this, among 
> other problems:
> {noformat}
> /mesos/3rdparty/libprocess/3rdparty/stout/include/stout/ip.hpp:230:56: 
> runtime error: reference binding to misaligned address 0x0199629c for 
> type 'const struct sockaddr_storage', which requires 8 byte alignment
> 0x0199629c: note: pointer points here
>   00 00 00 00 02 00 00 00  ff ff ff 00 00 00 00 00  00 00 00 00 00 00 00 00  
> 00 00 00 00 00 00 00 00
>   ^
> #0 0x5950cb in net::IP::create(sockaddr const&) 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x5950cb)
> #1 0x5970cd in 
> net::IPNetwork::fromLinkDevice(std::__cxx11::basic_string std::char_traits, std::allocator > const&, int) 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x5970cd)
> #2 0x58e006 in NetTest_LinkDevice_Test::TestBody() 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x58e006)
> #3 0x85abd5 in void 
> testing::internal::HandleSehExceptionsInMethodIfSupported void>(testing::Test*, void (testing::Test::*)(), char const*) 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x85abd5)
> #4 0x848abc in void 
> testing::internal::HandleExceptionsInMethodIfSupported void>(testing::Test*, void (testing::Test::*)(), char const*) 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x848abc)
> #5 0x7e2755 in testing::Test::Run() 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x7e2755)
> #6 0x7e44a0 in testing::TestInfo::Run() 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x7e44a0)
> #7 0x7e5ffa in testing::TestCase::Run() 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x7e5ffa)
> #8 0x7ffe21 in testing::internal::UnitTestImpl::RunAllTests() 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x7ffe21)
> #9 0x85d7a5 in bool 
> testing::internal::HandleSehExceptionsInMethodIfSupported  bool>(testing::internal::UnitTestImpl*, bool 
> (testing::internal::UnitTestImpl::*)(), char const*) 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x85d7a5)
> #10 0x84b37a in bool 
> testing::internal::HandleExceptionsInMethodIfSupported  bool>(testing::internal::UnitTestImpl*, bool 
> (testing::internal::UnitTestImpl::*)(), char const*) 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x84b37a)
> #11 0x7f8a4a in testing::UnitTest::Run() 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x7f8a4a)
> #12 0x608a96 in RUN_ALL_TESTS() 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x608a96)
> #13 0x60896b in main 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x60896b)
> #14 0x7fd0f0c7fa3f in __libc_start_main 
> (/lib/x86_64-linux-gnu/libc.so.6+0x20a3f)
> #15 0x4145c8 in _start 
> (/home/vagrant/build-mesos-ubsan/3rdparty/libprocess/3rdparty/stout-tests+0x4145c8)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3964) LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs and LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs_Big_Quota fail on Debian 8.

2015-11-23 Thread Greg Mann (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-3964:
-
Shepherd: Timothy Chen
Priority: Blocker  (was: Major)

> LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs and 
> LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs_Big_Quota fail on Debian 8.
> ---
>
> Key: MESOS-3964
> URL: https://issues.apache.org/jira/browse/MESOS-3964
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, test
>Affects Versions: 0.26.0
> Environment: Debian 8, gcc 4.9.2, Docker 1.9.0, vagrant, libvirt
> Vagrantfile: see MESOS-3957
>Reporter: Bernd Mathiske
>Assignee: Greg Mann
>Priority: Blocker
>  Labels: mesosphere
>
> sudo ./bin/mesos-test.sh 
> --gtest_filter="LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs"
> {noformat}
> ...
> F1119 14:34:52.514742 30706 isolator_tests.cpp:455] CHECK_SOME(isolator): 
> Failed to find 'cpu.cfs_quota_us'. Your kernel might be too old to use the 
> CFS cgroups feature.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3964) LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs and LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs_Big_Quota fail on Debian 8.

2015-11-23 Thread Marco Massenzio (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-3964:
---
Story Points: 2

> LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs and 
> LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs_Big_Quota fail on Debian 8.
> ---
>
> Key: MESOS-3964
> URL: https://issues.apache.org/jira/browse/MESOS-3964
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, test
>Affects Versions: 0.26.0
> Environment: Debian 8, gcc 4.9.2, Docker 1.9.0, vagrant, libvirt
> Vagrantfile: see MESOS-3957
>Reporter: Bernd Mathiske
>Assignee: Greg Mann
>  Labels: mesosphere
>
> sudo ./bin/mesos-test.sh 
> --gtest_filter="LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs"
> {noformat}
> ...
> F1119 14:34:52.514742 30706 isolator_tests.cpp:455] CHECK_SOME(isolator): 
> Failed to find 'cpu.cfs_quota_us'. Your kernel might be too old to use the 
> CFS cgroups feature.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3969) Failing 'make distcheck' on Debian 8, somehow SSL-related.

2015-11-23 Thread Marco Massenzio (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-3969:
---
Story Points: 3

> Failing 'make distcheck' on Debian 8, somehow SSL-related.
> --
>
> Key: MESOS-3969
> URL: https://issues.apache.org/jira/browse/MESOS-3969
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
> Environment: Debian 8, gcc 4.9.2, Docker 1.9.0, vagrant, libvirt
> Vagrantfile see MESOS-3957
>Reporter: Bernd Mathiske
>Assignee: Joseph Wu
>  Labels: build, build-failure, mesosphere
>
> As non-root: make distcheck.
> {noformat}
> /bin/mkdir -p '/home/vagrant/mesos/build/mesos-0.26.0/_inst/bin'
> /bin/bash ../libtool --mode=install /usr/bin/install -c mesos-local mesos-log 
> mesos mesos-execute mesos-resolve 
> '/home/vagrant/mesos/build/mesos-0.26.0/_inst/bin'
> libtool: install: /usr/bin/install -c .libs/mesos-local 
> /home/vagrant/mesos/build/mesos-0.26.0/_inst/bin/mesos-local
> libtool: install: /usr/bin/install -c .libs/mesos-log 
> /home/vagrant/mesos/build/mesos-0.26.0/_inst/bin/mesos-log
> libtool: install: /usr/bin/install -c .libs/mesos 
> /home/vagrant/mesos/build/mesos-0.26.0/_inst/bin/mesos
> libtool: install: /usr/bin/install -c .libs/mesos-execute 
> /home/vagrant/mesos/build/mesos-0.26.0/_inst/bin/mesos-execute
> libtool: install: /usr/bin/install -c .libs/mesos-resolve 
> /home/vagrant/mesos/build/mesos-0.26.0/_inst/bin/mesos-resolve
> Traceback (most recent call last):
> File "", line 1, in 
> File 
> "/home/vagrant/mesos/build/mesos-0.26.0/build/3rdparty/pip-1.5.6/pip/__init_.py",
>  line 11, in 
> from pip.vcs import git, mercurial, subversion, bazaar # noqa
> File 
> "/home/vagrant/mesos/build/mesos-0.26.0/_build/3rdparty/pip-1.5.6/pip/vcs/mercurial.py",
>  line 9, in 
> from pip.download import path_to_url
> File 
> "/home/vagrant/mesos/build/mesos-0.26.0/_build/3rdparty/pip-1.5.6/pip/download.py",
>  line 22, in 
> from pip._vendor import requests, six
> File 
> "/home/vagrant/mesos/build/mesos-0.26.0/build/3rdparty/pip-1.5.6/pip/_vendor/requests/__init_.py",
>  line 53, in 
> from .packages.urllib3.contrib import pyopenssl
> File 
> "/home/vagrant/mesos/build/mesos-0.26.0/_build/3rdparty/pip-1.5.6/pip/_vendor/requests/packages/urllib3/contrib/pyopenssl.py",
>  line 70, in 
> ssl.PROTOCOL_SSLv3: OpenSSL.SSL.SSLv3_METHOD,
> AttributeError: 'module' object has no attribute 'PROTOCOL_SSLv3'
> Traceback (most recent call last):
> File "", line 1, in 
> File "/home/vagrant/mesos/build/mesos-0.26.0/_build/3rd
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3973) Failing 'make distcheck' on Mac OS X 10.10.5, also 10.11.

2015-11-23 Thread Marco Massenzio (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-3973:
---
Story Points: 2

> Failing 'make distcheck' on Mac OS X 10.10.5, also 10.11.
> -
>
> Key: MESOS-3973
> URL: https://issues.apache.org/jira/browse/MESOS-3973
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.26.0
> Environment: Mac OS X 10.10.5, Clang 7.0.0.
>Reporter: Bernd Mathiske
>Assignee: Gilbert Song
>  Labels: build, build-failure, mesosphere
>
> Non-root 'make distcheck.
> {noformat}
> ...
> [--] Global test environment tear-down
> [==] 826 tests from 113 test cases ran. (276624 ms total)
> [  PASSED  ] 826 tests.
>   YOU HAVE 6 DISABLED TESTS
> Making install in .
> make[3]: Nothing to be done for `install-exec-am'.
>  ../install-sh -c -d 
> '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/lib/pkgconfig'
>  /usr/bin/install -c -m 644 mesos.pc 
> '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/lib/pkgconfig'
> Making install in 3rdparty
> /Applications/Xcode.app/Contents/Developer/usr/bin/make  install-recursive
> Making install in libprocess
> Making install in 3rdparty
> /Applications/Xcode.app/Contents/Developer/usr/bin/make  install-recursive
> Making install in stout
> Making install in .
> make[9]: Nothing to be done for `install-exec-am'.
> make[9]: Nothing to be done for `install-data-am'.
> Making install in include
> make[9]: Nothing to be done for `install-exec-am'.
>  ../../../../../../3rdparty/libprocess/3rdparty/stout/install-sh -c -d 
> '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include'
>  ../../../../../../3rdparty/libprocess/3rdparty/stout/install-sh -c -d 
> '/Users/bernd/mesos/mesos/build/mesos-0.26.0/_inst/include/stout'
>  /usr/bin/install -c -m 644  
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/abort.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/attributes.hpp
>  
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/base64.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/bits.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/bytes.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/cache.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/check.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/duration.hpp
>  
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/dynamiclibrary.hpp
>  ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/error.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/exit.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/flags.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/foreach.hpp
>  
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/format.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/fs.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/gtest.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/gzip.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/hashmap.hpp
>  
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/hashset.hpp
>  
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/interval.hpp
>  ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/ip.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/json.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/lambda.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/linkedhashmap.hpp
>  ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/list.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/mac.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/multihashmap.hpp
>  
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/multimap.hpp
>  ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/net.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/none.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/nothing.hpp
>  
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/numify.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/path.hpp 
> ../../../../../../3rdparty/libprocess/3rdparty/stout/include/stout/preprocessor.hpp
>

[jira] [Updated] (MESOS-3988) Implicit roles

2015-11-23 Thread Marco Massenzio (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-3988:
---
  Assignee: Neil Conway
Issue Type: Epic  (was: Improvement)

> Implicit roles
> --
>
> Key: MESOS-3988
> URL: https://issues.apache.org/jira/browse/MESOS-3988
> Project: Mesos
>  Issue Type: Epic
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere, roles
>
> At present, Mesos uses a static list of roles that are configured when the 
> master starts up. This places some severe limitations on how roles can be 
> used (e.g., changing the set of roles requires restarting all the masters).
> As an alternative (or a precursor) to implementing full-blown dynamic roles, 
> we could instead relax the concept of roles, so that:
> * frameworks can register with any role (subject to ACLs/authz)
> * reservations can be made for any role
> Open questions, at least to me:
> * This would mean weights cannot be configured dynamically. Is that okay?
> * Is this feature useful enough without dynamic ACL changes?
> * If we implement this (+ dynamic ACLs), do we also need dynamic roles?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3975) SSL build of mesos causes flaky testsuite.

2015-11-23 Thread Joseph Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023253#comment-15023253
 ] 

Joseph Wu commented on MESOS-3975:
--

It might also be worthwhile to check if the tests fail without {{--enable-ssl}}.

> SSL build of mesos causes flaky testsuite.
> --
>
> Key: MESOS-3975
> URL: https://issues.apache.org/jira/browse/MESOS-3975
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
> Environment: CentOS 7.1, Kernel 3.10.0-229.20.1.el7.x86_64, gcc 
> 4.8.3, Docker 1.9
>Reporter: Till Toenshoff
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
>
> When running the tests of an SSL build of Mesos on CentOS 7.1, I see spurious 
> test failures that are, so far, not reproducible.
> The following tests did fail for me in complete runs but did seem fine when 
> running them individually, in repetition.  
> {noformat}
> DockerTest.ROOT_DOCKER_CheckPortResource
> {noformat}
> {noformat}
> ContainerizerTest.ROOT_CGROUPS_BalloonFramework
> {noformat}
> {noformat}
> [ RUN  ] 
> LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystemCommandExecutor
> 2015-11-20 
> 19:08:38,826:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> + /home/vagrant/mesos/build/src/mesos-containerizer mount --help=false 
> --operation=make-rslave --path=/
> + grep -E 
> /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/.+
>  /proc/self/mountinfo
> + grep -v 2b98025c-74f1-41d2-b35a-ce2cdfae347e
> + cut '-d ' -f5
> + xargs --no-run-if-empty umount -l
> + mount -n --rbind 
> /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/provisioner/containers/2b98025c-74f1-41d2-b35a-ce2cdfae347e/backends/copy/rootfses/bed11080-474b-4c69-8e7f-0ab85e895b0d
>  
> /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/slaves/830e842e-c36a-4e4c-bff4-5b9568d7df12-S0/frameworks/830e842e-c36a-4e4c-bff4-5b9568d7df12-/executors/c735be54-c47f-4645-bfc1-2f4647e2cddb/runs/2b98025c-74f1-41d2-b35a-ce2cdfae347e/.rootfs
> Could not load cert file
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:354: Failure
> Value of: statusRunning.get().state()
>   Actual: TASK_FAILED
> Expected: TASK_RUNNING
> 2015-11-20 
> 19:08:42,164:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2015-11-20 
> 19:08:45,501:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2015-11-20 
> 19:08:48,837:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2015-11-20 
> 19:08:52,174:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:355: Failure
> Failed to wait 15secs for statusFinished
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:349: Failure
> Actual function call count doesn't match EXPECT_CALL(sched, 
> statusUpdate(, _))...
>  Expected: to be called twice
>Actual: called once - unsatisfied and active
> 2015-11-20 
> 19:08:55,511:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> *** Aborted at 1448046536 (unix time) try "date -d @1448046536" if you are 
> using GNU date ***
> PC: @0x0 (unknown)
> *** SIGSEGV (@0x0) received by PID 21380 (TID 0x7fa1549e68c0) from PID 0; 
> stack trace: ***
> @ 0x7fa141796fbb (unknown)
> @ 0x7fa14179b341 (unknown)
> @ 0x7fa14f096130 (unknown)
> {noformat}
> Vagrantfile generator:
> {noformat}
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.hostname = "centos71"
>   config.vm.box = "bento/centos-7.1"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = 16384
> vb.cpus = 8
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = 9216
> vb.cpus = 4
>   end
>   config.vm.provision "shell", inline: <<-SHELL
>  sudo yum -y update systemd
>  sudo yum install -y tar wget
>  sudo wget 
>

[jira] [Commented] (MESOS-3994) Refactor registry client/puller to avoid JSON and struct

2015-11-23 Thread Gilbert Song (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023262#comment-15023262
 ] 

Gilbert Song commented on MESOS-3994:
-

https://reviews.apache.org/r/39712/

> Refactor registry client/puller to avoid JSON and struct
> 
>
> Key: MESOS-3994
> URL: https://issues.apache.org/jira/browse/MESOS-3994
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>
> We should get rid of all JSON and struct for message passing as function 
> returned type. By using the methods provided by spec.hpp to refactor all 
> unnecessary JSON message and struct in registry client and registry puller. 
> Also, remove all redundant check in registry client that are already checked 
> by spec validation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3916) MasterMaintenanceTest.InverseOffersFilters is flaky

2015-11-23 Thread Marco Massenzio (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-3916:
---
Story Points: 3

> MasterMaintenanceTest.InverseOffersFilters is flaky
> ---
>
> Key: MESOS-3916
> URL: https://issues.apache.org/jira/browse/MESOS-3916
> Project: Mesos
>  Issue Type: Bug
> Environment: Ubuntu Wily 64 bit
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: flaky-test, maintenance, mesosphere
> Attachments: wily_maintenance_test_verbose.txt
>
>
> Verbose Logs:
> {code}
> [ RUN  ] MasterMaintenanceTest.InverseOffersFilters
> I1113 16:43:58.486469  8728 leveldb.cpp:176] Opened db in 2.360405ms
> I1113 16:43:58.486935  8728 leveldb.cpp:183] Compacted db in 407105ns
> I1113 16:43:58.486995  8728 leveldb.cpp:198] Created db iterator in 16221ns
> I1113 16:43:58.487030  8728 leveldb.cpp:204] Seeked to beginning of db in 
> 10935ns
> I1113 16:43:58.487046  8728 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 999ns
> I1113 16:43:58.487090  8728 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1113 16:43:58.487735  8747 recover.cpp:449] Starting replica recovery
> I1113 16:43:58.488047  8747 recover.cpp:475] Replica is in EMPTY status
> I1113 16:43:58.488977  8745 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (58)@10.0.2.15:45384
> I1113 16:43:58.489452  8746 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1113 16:43:58.489712  8747 recover.cpp:566] Updating replica status to 
> STARTING
> I1113 16:43:58.490706  8742 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 745443ns
> I1113 16:43:58.490739  8742 replica.cpp:323] Persisted replica status to 
> STARTING
> I1113 16:43:58.490859  8742 recover.cpp:475] Replica is in STARTING status
> I1113 16:43:58.491786  8747 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (59)@10.0.2.15:45384
> I1113 16:43:58.492542  8749 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1113 16:43:58.493221  8743 recover.cpp:566] Updating replica status to VOTING
> I1113 16:43:58.493710  8743 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 331874ns
> I1113 16:43:58.493767  8743 replica.cpp:323] Persisted replica status to 
> VOTING
> I1113 16:43:58.493868  8743 recover.cpp:580] Successfully joined the Paxos 
> group
> I1113 16:43:58.494119  8743 recover.cpp:464] Recover process terminated
> I1113 16:43:58.504369  8749 master.cpp:367] Master 
> d59449fc-5462-43c5-b935-e05563fdd4b6 (vagrant-ubuntu-wily-64) started on 
> 10.0.2.15:45384
> I1113 16:43:58.504438  8749 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/ZB7csS/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="25secs" 
> --registry_strict="true" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/ZB7csS/master" 
> --zk_session_timeout="10secs"
> I1113 16:43:58.504717  8749 master.cpp:416] Master allowing unauthenticated 
> frameworks to register
> I1113 16:43:58.504889  8749 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1113 16:43:58.504922  8749 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/ZB7csS/credentials'
> I1113 16:43:58.505497  8749 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1113 16:43:58.505759  8749 master.cpp:495] Authorization enabled
> I1113 16:43:58.507638  8746 master.cpp:1606] The newly elected leader is 
> master@10.0.2.15:45384 with id d59449fc-5462-43c5-b935-e05563fdd4b6
> I1113 16:43:58.507693  8746 master.cpp:1619] Elected as the leading master!
> I1113 16:43:58.507720  8746 master.cpp:1379] Recovering from registrar
> I1113 16:43:58.507946  8749 registrar.cpp:309] Recovering registrar
> I1113 16:43:58.508561  8749 log.cpp:661] Attempting to start the writer
> I1113 16:43:58.510282  8747 replica.cpp:496] Replica received implicit 
> promise request from (60)@10.0.2.15:45384 with proposal 1
> I1113 16:43:58.510867  8747 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took

[jira] [Updated] (MESOS-3851) Investigate recent crashes in Command Executor

2015-11-23 Thread Marco Massenzio (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-3851:
---
Story Points: 2

> Investigate recent crashes in Command Executor
> --
>
> Key: MESOS-3851
> URL: https://issues.apache.org/jira/browse/MESOS-3851
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere
>
> Post https://reviews.apache.org/r/38900 i.e. updating CommandExecutor to 
> support rootfs. There seem to be some tests showing frequent crashes due to 
> assert violations.
> {{FetcherCacheTest.SimpleEviction}} failed due to the following log:
> {code}
> I1107 19:36:46.360908 30657 slave.cpp:1793] Sending queued task '3' to 
> executor ''3' of framework 7d94c7fb-8950-4bcf-80c1-46112292dcd6- at 
> executor(1)@172.17.5.200:33871'
> I1107 19:36:46.363682  1236 exec.cpp:297] 
> I1107 19:36:46.373569  1245 exec.cpp:210] Executor registered on slave 
> 7d94c7fb-8950-4bcf-80c1-46112292dcd6-S0
> @ 0x7f9f5a7db3fa  google::LogMessage::Fail()
> I1107 19:36:46.394081  1245 exec.cpp:222] Executor::registered took 395411ns
> @ 0x7f9f5a7db359  google::LogMessage::SendToLog()
> @ 0x7f9f5a7dad6a  google::LogMessage::Flush()
> @ 0x7f9f5a7dda9e  google::LogMessageFatal::~LogMessageFatal()
> @   0x48d00a  _CheckFatal::~_CheckFatal()
> @   0x49c99d  
> mesos::internal::CommandExecutorProcess::launchTask()
> @   0x4b3dd7  
> _ZZN7process8dispatchIN5mesos8internal22CommandExecutorProcessEPNS1_14ExecutorDriverERKNS1_8TaskInfoES5_S6_EEvRKNS_3PIDIT_EEMSA_FvT0_T1_ET2_T3_ENKUlPNS_11ProcessBaseEE_clESL_
> @   0x4c470c  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal22CommandExecutorProcessEPNS5_14ExecutorDriverERKNS5_8TaskInfoES9_SA_EEvRKNS0_3PIDIT_EEMSE_FvT0_T1_ET2_T3_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x7f9f5a761b1b  std::function<>::operator()()
> @ 0x7f9f5a749935  process::ProcessBase::visit()
> @ 0x7f9f5a74d700  process::DispatchEvent::visit()
> @   0x48e004  process::ProcessBase::serve()
> @ 0x7f9f5a745d21  process::ProcessManager::resume()
> @ 0x7f9f5a742f52  
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
> @ 0x7f9f5a74cf2c  
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
> @ 0x7f9f5a74cedc  
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
> @ 0x7f9f5a74ce6e  
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @ 0x7f9f5a74cdc5  
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
> @ 0x7f9f5a74cd5e  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
> @ 0x7f9f5624f1e0  (unknown)
> @ 0x7f9f564a8df5  start_thread
> @ 0x7f9f559b71ad  __clone
> I1107 19:36:46.551370 30656 containerizer.cpp:1257] Executor for container 
> '6553a617-6b4a-418d-9759-5681f45ff854' has exited
> I1107 19:36:46.551429 30656 containerizer.cpp:1074] Destroying container 
> '6553a617-6b4a-418d-9759-5681f45ff854'
> I1107 19:36:46.553869 30656 containerizer.cpp:1257] Executor for container 
> 'd2c1f924-c92a-453e-82b1-c294d09c4873' has exited
> {code}
> The reason seems to be a race between the executor receiving a 
> {{RunTaskMessage}} before {{ExecutorRegisteredMessage}} leading to the 
> {{CHECK_SOME(executorInfo)}} failure.
> Link to complete log: 
> https://issues.apache.org/jira/browse/MESOS-2831?focusedCommentId=14995535=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14995535
> Another related failure from {{ExamplesTest.PersistentVolumeFramework}}
> {code}
> @ 0x7f4f71529cbd  google::LogMessage::SendToLog()
> I1107 13:15:09.949987 31573 slave.cpp:2337] Status update manager 
> successfully handled status update acknowledgement (UUID: 
> 721c7316-5580-4636-a83a-098e3bd4ed1f) for task 
> ad90531f-d3d8-43f6-96f2-c81c4548a12d of framework 
> ac4ea54a-7d19-4e41-9ee3-1a761f8e5b0f-
> @ 0x7f4f715296ce  google::LogMessage::Flush()
> @ 0x7f4f7152c402  google::LogMessageFatal::~LogMessageFatal()
> @   0x48d00a  _CheckFatal::~_CheckFatal()
> @   0x49c99d  
>

[jira] [Commented] (MESOS-2980) Support execution configuration to be returned from provisioner

2015-11-23 Thread Timothy Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023191#comment-15023191
 ] 

Timothy Chen commented on MESOS-2980:
-

But image information comes from the provisioner, so we need to get the image 
information back with execution config and pass that to another isolator if we 
want to do that. 

> Support execution configuration to be returned from provisioner
> ---
>
> Key: MESOS-2980
> URL: https://issues.apache.org/jira/browse/MESOS-2980
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: mesosphere
>
> Image specs also includes execution configuration (e.g: Env, user, ports, 
> etc).
> We should support passing those information from the image provisioner back 
> to the containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (MESOS-3975) SSL build of mesos causes flaky testsuite.

2015-11-23 Thread Joseph Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3975:
-
Comment: was deleted

(was: It might also be worthwhile to check if the tests fail without 
{{--enable-ssl}}.)

> SSL build of mesos causes flaky testsuite.
> --
>
> Key: MESOS-3975
> URL: https://issues.apache.org/jira/browse/MESOS-3975
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
> Environment: CentOS 7.1, Kernel 3.10.0-229.20.1.el7.x86_64, gcc 
> 4.8.3, Docker 1.9
>Reporter: Till Toenshoff
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
>
> When running the tests of an SSL build of Mesos on CentOS 7.1, I see spurious 
> test failures that are, so far, not reproducible.
> The following tests did fail for me in complete runs but did seem fine when 
> running them individually, in repetition.  
> {noformat}
> DockerTest.ROOT_DOCKER_CheckPortResource
> {noformat}
> {noformat}
> ContainerizerTest.ROOT_CGROUPS_BalloonFramework
> {noformat}
> {noformat}
> [ RUN  ] 
> LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystemCommandExecutor
> 2015-11-20 
> 19:08:38,826:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> + /home/vagrant/mesos/build/src/mesos-containerizer mount --help=false 
> --operation=make-rslave --path=/
> + grep -E 
> /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/.+
>  /proc/self/mountinfo
> + grep -v 2b98025c-74f1-41d2-b35a-ce2cdfae347e
> + cut '-d ' -f5
> + xargs --no-run-if-empty umount -l
> + mount -n --rbind 
> /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/provisioner/containers/2b98025c-74f1-41d2-b35a-ce2cdfae347e/backends/copy/rootfses/bed11080-474b-4c69-8e7f-0ab85e895b0d
>  
> /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/slaves/830e842e-c36a-4e4c-bff4-5b9568d7df12-S0/frameworks/830e842e-c36a-4e4c-bff4-5b9568d7df12-/executors/c735be54-c47f-4645-bfc1-2f4647e2cddb/runs/2b98025c-74f1-41d2-b35a-ce2cdfae347e/.rootfs
> Could not load cert file
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:354: Failure
> Value of: statusRunning.get().state()
>   Actual: TASK_FAILED
> Expected: TASK_RUNNING
> 2015-11-20 
> 19:08:42,164:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2015-11-20 
> 19:08:45,501:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2015-11-20 
> 19:08:48,837:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2015-11-20 
> 19:08:52,174:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:355: Failure
> Failed to wait 15secs for statusFinished
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:349: Failure
> Actual function call count doesn't match EXPECT_CALL(sched, 
> statusUpdate(, _))...
>  Expected: to be called twice
>Actual: called once - unsatisfied and active
> 2015-11-20 
> 19:08:55,511:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> *** Aborted at 1448046536 (unix time) try "date -d @1448046536" if you are 
> using GNU date ***
> PC: @0x0 (unknown)
> *** SIGSEGV (@0x0) received by PID 21380 (TID 0x7fa1549e68c0) from PID 0; 
> stack trace: ***
> @ 0x7fa141796fbb (unknown)
> @ 0x7fa14179b341 (unknown)
> @ 0x7fa14f096130 (unknown)
> {noformat}
> Vagrantfile generator:
> {noformat}
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.hostname = "centos71"
>   config.vm.box = "bento/centos-7.1"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = 16384
> vb.cpus = 8
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = 9216
> vb.cpus = 4
>   end
>   config.vm.provision "shell", inline: <<-SHELL
>  sudo yum -y update systemd
>  sudo yum install -y tar wget
>  sudo wget 
>

[jira] [Commented] (MESOS-3975) SSL build of mesos causes flaky testsuite.

2015-11-23 Thread Joseph Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023254#comment-15023254
 ] 

Joseph Wu commented on MESOS-3975:
--

It might also be worthwhile to check if the tests fail without {{--enable-ssl}}.

> SSL build of mesos causes flaky testsuite.
> --
>
> Key: MESOS-3975
> URL: https://issues.apache.org/jira/browse/MESOS-3975
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
> Environment: CentOS 7.1, Kernel 3.10.0-229.20.1.el7.x86_64, gcc 
> 4.8.3, Docker 1.9
>Reporter: Till Toenshoff
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
>
> When running the tests of an SSL build of Mesos on CentOS 7.1, I see spurious 
> test failures that are, so far, not reproducible.
> The following tests did fail for me in complete runs but did seem fine when 
> running them individually, in repetition.  
> {noformat}
> DockerTest.ROOT_DOCKER_CheckPortResource
> {noformat}
> {noformat}
> ContainerizerTest.ROOT_CGROUPS_BalloonFramework
> {noformat}
> {noformat}
> [ RUN  ] 
> LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystemCommandExecutor
> 2015-11-20 
> 19:08:38,826:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> + /home/vagrant/mesos/build/src/mesos-containerizer mount --help=false 
> --operation=make-rslave --path=/
> + grep -E 
> /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/.+
>  /proc/self/mountinfo
> + grep -v 2b98025c-74f1-41d2-b35a-ce2cdfae347e
> + cut '-d ' -f5
> + xargs --no-run-if-empty umount -l
> + mount -n --rbind 
> /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/provisioner/containers/2b98025c-74f1-41d2-b35a-ce2cdfae347e/backends/copy/rootfses/bed11080-474b-4c69-8e7f-0ab85e895b0d
>  
> /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/slaves/830e842e-c36a-4e4c-bff4-5b9568d7df12-S0/frameworks/830e842e-c36a-4e4c-bff4-5b9568d7df12-/executors/c735be54-c47f-4645-bfc1-2f4647e2cddb/runs/2b98025c-74f1-41d2-b35a-ce2cdfae347e/.rootfs
> Could not load cert file
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:354: Failure
> Value of: statusRunning.get().state()
>   Actual: TASK_FAILED
> Expected: TASK_RUNNING
> 2015-11-20 
> 19:08:42,164:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2015-11-20 
> 19:08:45,501:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2015-11-20 
> 19:08:48,837:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2015-11-20 
> 19:08:52,174:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:355: Failure
> Failed to wait 15secs for statusFinished
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:349: Failure
> Actual function call count doesn't match EXPECT_CALL(sched, 
> statusUpdate(, _))...
>  Expected: to be called twice
>Actual: called once - unsatisfied and active
> 2015-11-20 
> 19:08:55,511:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> *** Aborted at 1448046536 (unix time) try "date -d @1448046536" if you are 
> using GNU date ***
> PC: @0x0 (unknown)
> *** SIGSEGV (@0x0) received by PID 21380 (TID 0x7fa1549e68c0) from PID 0; 
> stack trace: ***
> @ 0x7fa141796fbb (unknown)
> @ 0x7fa14179b341 (unknown)
> @ 0x7fa14f096130 (unknown)
> {noformat}
> Vagrantfile generator:
> {noformat}
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.hostname = "centos71"
>   config.vm.box = "bento/centos-7.1"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = 16384
> vb.cpus = 8
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = 9216
> vb.cpus = 4
>   end
>   config.vm.provision "shell", inline: <<-SHELL
>  sudo yum -y update systemd
>  sudo yum install -y tar wget
>  sudo wget 
>

[jira] [Updated] (MESOS-3916) MasterMaintenanceTest.InverseOffersFilters is flaky

2015-11-23 Thread Neil Conway (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-3916:
---
Shepherd: Joris Van Remoortere

> MasterMaintenanceTest.InverseOffersFilters is flaky
> ---
>
> Key: MESOS-3916
> URL: https://issues.apache.org/jira/browse/MESOS-3916
> Project: Mesos
>  Issue Type: Bug
> Environment: Ubuntu Wily 64 bit
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: flaky-test, maintenance, mesosphere
> Attachments: wily_maintenance_test_verbose.txt
>
>
> Verbose Logs:
> {code}
> [ RUN  ] MasterMaintenanceTest.InverseOffersFilters
> I1113 16:43:58.486469  8728 leveldb.cpp:176] Opened db in 2.360405ms
> I1113 16:43:58.486935  8728 leveldb.cpp:183] Compacted db in 407105ns
> I1113 16:43:58.486995  8728 leveldb.cpp:198] Created db iterator in 16221ns
> I1113 16:43:58.487030  8728 leveldb.cpp:204] Seeked to beginning of db in 
> 10935ns
> I1113 16:43:58.487046  8728 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 999ns
> I1113 16:43:58.487090  8728 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1113 16:43:58.487735  8747 recover.cpp:449] Starting replica recovery
> I1113 16:43:58.488047  8747 recover.cpp:475] Replica is in EMPTY status
> I1113 16:43:58.488977  8745 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (58)@10.0.2.15:45384
> I1113 16:43:58.489452  8746 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1113 16:43:58.489712  8747 recover.cpp:566] Updating replica status to 
> STARTING
> I1113 16:43:58.490706  8742 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 745443ns
> I1113 16:43:58.490739  8742 replica.cpp:323] Persisted replica status to 
> STARTING
> I1113 16:43:58.490859  8742 recover.cpp:475] Replica is in STARTING status
> I1113 16:43:58.491786  8747 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (59)@10.0.2.15:45384
> I1113 16:43:58.492542  8749 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1113 16:43:58.493221  8743 recover.cpp:566] Updating replica status to VOTING
> I1113 16:43:58.493710  8743 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 331874ns
> I1113 16:43:58.493767  8743 replica.cpp:323] Persisted replica status to 
> VOTING
> I1113 16:43:58.493868  8743 recover.cpp:580] Successfully joined the Paxos 
> group
> I1113 16:43:58.494119  8743 recover.cpp:464] Recover process terminated
> I1113 16:43:58.504369  8749 master.cpp:367] Master 
> d59449fc-5462-43c5-b935-e05563fdd4b6 (vagrant-ubuntu-wily-64) started on 
> 10.0.2.15:45384
> I1113 16:43:58.504438  8749 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/ZB7csS/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="25secs" 
> --registry_strict="true" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/ZB7csS/master" 
> --zk_session_timeout="10secs"
> I1113 16:43:58.504717  8749 master.cpp:416] Master allowing unauthenticated 
> frameworks to register
> I1113 16:43:58.504889  8749 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1113 16:43:58.504922  8749 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/ZB7csS/credentials'
> I1113 16:43:58.505497  8749 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1113 16:43:58.505759  8749 master.cpp:495] Authorization enabled
> I1113 16:43:58.507638  8746 master.cpp:1606] The newly elected leader is 
> master@10.0.2.15:45384 with id d59449fc-5462-43c5-b935-e05563fdd4b6
> I1113 16:43:58.507693  8746 master.cpp:1619] Elected as the leading master!
> I1113 16:43:58.507720  8746 master.cpp:1379] Recovering from registrar
> I1113 16:43:58.507946  8749 registrar.cpp:309] Recovering registrar
> I1113 16:43:58.508561  8749 log.cpp:661] Attempting to start the writer
> I1113 16:43:58.510282  8747 replica.cpp:496] Replica received implicit 
> promise request from (60)@10.0.2.15:45384 with proposal 1
> I1113 16:43:58.510867  8747 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb

[jira] [Updated] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.

2015-11-23 Thread Marco Massenzio (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-3937:
---
Story Points: 2

> Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
> ---
>
> Key: MESOS-3937
> URL: https://issues.apache.org/jira/browse/MESOS-3937
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.26.0
> Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2
> 8 CPUs, 16 GB memory
> Vagrant, libvirt/Virtual Box or VMware
>Reporter: Bernd Mathiske
>Assignee: Timothy Chen
>  Labels: mesosphere
>
> {noformat}
> ../configure
> make check
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose
> {noformat}
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from DockerContainerizerTest
> I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms
> I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms
> I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns
> I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in 
> 4927ns
> I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1605ns
> I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery
> I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status
> I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (4)@10.0.2.15:50088
> I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to 
> STARTING
> I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.016098ms
> I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to 
> STARTING
> I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status
> I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (5)@10.0.2.15:50088
> I1117 15:08:09.282552 26400 master.cpp:367] Master 
> 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on 
> 10.0.2.15:50088
> I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/40AlT8/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" 
> --zk_session_timeout="10secs"
> I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/40AlT8/credentials'
> I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING
> I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.075466ms
> I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to 
> VOTING
> I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the Paxos 
> group
> I1117 15:08:09.286471 26401 recover.cpp:464] Recover process terminated
> I1117 15:08:09.287303 26400 authenticator.cpp:520] Initializing server SASL
> I1117 15:08:09.289371 26400 master.cpp:495] Authorization enabled
> I1117 15:08:09.296018 26399 master.cpp:1606] The newly elected leader is 
> master@10.0.2.15:50088 with id 59c600f1-92ff-4926-9c84-073d9b81f68a
> I1117 15:08:09.296115 26399 master.cpp:1619] Elected as the leading master!
> I1117 15:08:09.296187 26399

[jira] [Commented] (MESOS-3851) Investigate recent crashes in Command Executor

2015-11-23 Thread Anand Mazumdar (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023186#comment-15023186
 ] 

Anand Mazumdar commented on MESOS-3851:
---

This should not be a blocker for 0.26. As [~bmahler] pointed out earlier in the 
thread, this race happens due to us doing a {{send}} without a {{link}}. Hence, 
this behavior has existed for quite some time. It's just that [~tnachen]'s 
changes to command executor highlighted this.

> Investigate recent crashes in Command Executor
> --
>
> Key: MESOS-3851
> URL: https://issues.apache.org/jira/browse/MESOS-3851
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere
>
> Post https://reviews.apache.org/r/38900 i.e. updating CommandExecutor to 
> support rootfs. There seem to be some tests showing frequent crashes due to 
> assert violations.
> {{FetcherCacheTest.SimpleEviction}} failed due to the following log:
> {code}
> I1107 19:36:46.360908 30657 slave.cpp:1793] Sending queued task '3' to 
> executor ''3' of framework 7d94c7fb-8950-4bcf-80c1-46112292dcd6- at 
> executor(1)@172.17.5.200:33871'
> I1107 19:36:46.363682  1236 exec.cpp:297] 
> I1107 19:36:46.373569  1245 exec.cpp:210] Executor registered on slave 
> 7d94c7fb-8950-4bcf-80c1-46112292dcd6-S0
> @ 0x7f9f5a7db3fa  google::LogMessage::Fail()
> I1107 19:36:46.394081  1245 exec.cpp:222] Executor::registered took 395411ns
> @ 0x7f9f5a7db359  google::LogMessage::SendToLog()
> @ 0x7f9f5a7dad6a  google::LogMessage::Flush()
> @ 0x7f9f5a7dda9e  google::LogMessageFatal::~LogMessageFatal()
> @   0x48d00a  _CheckFatal::~_CheckFatal()
> @   0x49c99d  
> mesos::internal::CommandExecutorProcess::launchTask()
> @   0x4b3dd7  
> _ZZN7process8dispatchIN5mesos8internal22CommandExecutorProcessEPNS1_14ExecutorDriverERKNS1_8TaskInfoES5_S6_EEvRKNS_3PIDIT_EEMSA_FvT0_T1_ET2_T3_ENKUlPNS_11ProcessBaseEE_clESL_
> @   0x4c470c  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal22CommandExecutorProcessEPNS5_14ExecutorDriverERKNS5_8TaskInfoES9_SA_EEvRKNS0_3PIDIT_EEMSE_FvT0_T1_ET2_T3_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x7f9f5a761b1b  std::function<>::operator()()
> @ 0x7f9f5a749935  process::ProcessBase::visit()
> @ 0x7f9f5a74d700  process::DispatchEvent::visit()
> @   0x48e004  process::ProcessBase::serve()
> @ 0x7f9f5a745d21  process::ProcessManager::resume()
> @ 0x7f9f5a742f52  
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
> @ 0x7f9f5a74cf2c  
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
> @ 0x7f9f5a74cedc  
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
> @ 0x7f9f5a74ce6e  
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @ 0x7f9f5a74cdc5  
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
> @ 0x7f9f5a74cd5e  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
> @ 0x7f9f5624f1e0  (unknown)
> @ 0x7f9f564a8df5  start_thread
> @ 0x7f9f559b71ad  __clone
> I1107 19:36:46.551370 30656 containerizer.cpp:1257] Executor for container 
> '6553a617-6b4a-418d-9759-5681f45ff854' has exited
> I1107 19:36:46.551429 30656 containerizer.cpp:1074] Destroying container 
> '6553a617-6b4a-418d-9759-5681f45ff854'
> I1107 19:36:46.553869 30656 containerizer.cpp:1257] Executor for container 
> 'd2c1f924-c92a-453e-82b1-c294d09c4873' has exited
> {code}
> The reason seems to be a race between the executor receiving a 
> {{RunTaskMessage}} before {{ExecutorRegisteredMessage}} leading to the 
> {{CHECK_SOME(executorInfo)}} failure.
> Link to complete log: 
> https://issues.apache.org/jira/browse/MESOS-2831?focusedCommentId=14995535=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14995535
> Another related failure from {{ExamplesTest.PersistentVolumeFramework}}
> {code}
> @ 0x7f4f71529cbd  google::LogMessage::SendToLog()
> I1107 13:15:09.949987 31573 slave.cpp:2337] Status update manager 
> successfully handled status update acknowledgement (UUID: 
> 721c7316-5580-4636-a83a-098e3bd4ed1f) for task 
>

[jira] [Updated] (MESOS-3949) User CGroup Isolation tests fail on Centos 6.

2015-11-23 Thread Marco Massenzio (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-3949:
---
Story Points: 3

> User CGroup Isolation tests fail on Centos 6.
> -
>
> Key: MESOS-3949
> URL: https://issues.apache.org/jira/browse/MESOS-3949
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Affects Versions: 0.26.0
> Environment: CentOS 6.6, gcc 4.8.1, on vagrant libvirt, 16GB, 8 CPUs,
> ../configure --enable-libevent --enable-ssl
>Reporter: Bernd Mathiske
>Assignee: Alexander Rojas
>  Labels: mesosphere
>
> UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup and 
> UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup fail on CentOS 6.6 with 
> similar output when libevent and SSL are enabled.
> {noformat}
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup" --verbose
> {noformat}
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from UserCgroupIsolatorTest/0, where TypeParam = 
> mesos::internal::slave::CgroupsMemIsolatorProcess
> userdel: user 'mesos.test.unprivileged.user' does not exist
> [ RUN  ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup
> I1118 16:53:35.273717 30249 mem.cpp:605] Started listening for OOM events for 
> container 867a829e-4a26-43f5-86e0-938bf1f47688
> I1118 16:53:35.274538 30249 mem.cpp:725] Started listening on low memory 
> pressure events for container 867a829e-4a26-43f5-86e0-938bf1f47688
> I1118 16:53:35.275164 30249 mem.cpp:725] Started listening on medium memory 
> pressure events for container 867a829e-4a26-43f5-86e0-938bf1f47688
> I1118 16:53:35.275784 30249 mem.cpp:725] Started listening on critical memory 
> pressure events for container 867a829e-4a26-43f5-86e0-938bf1f47688
> I1118 16:53:35.276448 30249 mem.cpp:356] Updated 'memory.soft_limit_in_bytes' 
> to 1GB for container 867a829e-4a26-43f5-86e0-938bf1f47688
> I1118 16:53:35.277331 30249 mem.cpp:391] Updated 'memory.limit_in_bytes' to 
> 1GB for container 867a829e-4a26-43f5-86e0-938bf1f47688
> -bash: 
> /sys/fs/cgroup/memory/mesos/867a829e-4a26-43f5-86e0-938bf1f47688/cgroup.procs:
>  No such file or directory
> mkdir: cannot create directory 
> `/sys/fs/cgroup/memory/mesos/867a829e-4a26-43f5-86e0-938bf1f47688/user': No 
> such file or directory
> ../../src/tests/containerizer/isolator_tests.cpp:1307: Failure
> Value of: os::system( "su - " + UNPRIVILEGED_USERNAME + " -c 'mkdir " + 
> path::join(flags.cgroups_hierarchy, userCgroup) + "'")
>   Actual: 256
> Expected: 0
> -bash: 
> /sys/fs/cgroup/memory/mesos/867a829e-4a26-43f5-86e0-938bf1f47688/user/cgroup.procs:
>  No such file or directory
> ../../src/tests/containerizer/isolator_tests.cpp:1316: Failure
> Value of: os::system( "su - " + UNPRIVILEGED_USERNAME + " -c 'echo $$ >" + 
> path::join(flags.cgroups_hierarchy, userCgroup, "cgroup.procs") + "'")
>   Actual: 256
> Expected: 0
> [  FAILED  ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where 
> TypeParam = mesos::internal::slave::CgroupsMemIsolatorProcess (149 ms)
> {noformat}
> {noformat}
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup" --verbose
> {noformat}
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from UserCgroupIsolatorTest/1, where TypeParam = 
> mesos::internal::slave::CgroupsCpushareIsolatorProcess
> userdel: user 'mesos.test.unprivileged.user' does not exist
> [ RUN  ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup
> I1118 17:01:00.550706 30357 cpushare.cpp:392] Updated 'cpu.shares' to 1024 
> (cpus 1) for container e57f4343-1a97-4b44-b347-803be47ace80
> -bash: 
> /sys/fs/cgroup/cpuacct/mesos/e57f4343-1a97-4b44-b347-803be47ace80/cgroup.procs:
>  No such file or directory
> mkdir: cannot create directory 
> `/sys/fs/cgroup/cpuacct/mesos/e57f4343-1a97-4b44-b347-803be47ace80/user': No 
> such file or directory
> ../../src/tests/containerizer/isolator_tests.cpp:1307: Failure
> Value of: os::system( "su - " + UNPRIVILEGED_USERNAME + " -c 'mkdir " + 
> path::join(flags.cgroups_hierarchy, userCgroup) + "'")
>   Actual: 256
> Expected: 0
> -bash: 
> /sys/fs/cgroup/cpuacct/mesos/e57f4343-1a97-4b44-b347-803be47ace80/user/cgroup.procs:
>  No such file or directory
> ../../src/tests/containerizer/isolator_tests.cpp:1316: Failure
> Value of: os::system( "su - " + UNPRIVILEGED_USERNAME + " -c 'echo $$ >" + 
> path::join(flags.cgroups_hierarchy, userCgroup, "cgroup.procs") + "'")
>   Actual: 256
> Expected: 0
> -bash: 
> /sys/fs/cgroup/cpu/mesos/e57f4343-1a97-4b44-b347-803be47ace80/cgroup.procs: 
> No such file or directory
> mkdir: cannot create directory 
>

[jira] [Updated] (MESOS-3988) Implicit roles

2015-11-23 Thread Marco Massenzio (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-3988:
---
Story Points: 8  (was: 10)

> Implicit roles
> --
>
> Key: MESOS-3988
> URL: https://issues.apache.org/jira/browse/MESOS-3988
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Neil Conway
>  Labels: mesosphere, roles
>
> At present, Mesos uses a static list of roles that are configured when the 
> master starts up. This places some severe limitations on how roles can be 
> used (e.g., changing the set of roles requires restarting all the masters).
> As an alternative (or a precursor) to implementing full-blown dynamic roles, 
> we could instead relax the concept of roles, so that:
> * frameworks can register with any role (subject to ACLs/authz)
> * reservations can be made for any role
> Open questions, at least to me:
> * This would mean weights cannot be configured dynamically. Is that okay?
> * Is this feature useful enough without dynamic ACL changes?
> * If we implement this (+ dynamic ACLs), do we also need dynamic roles?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3975) SSL build of mesos causes flaky testsuite.

2015-11-23 Thread Marco Massenzio (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-3975:
---
Story Points: 5

> SSL build of mesos causes flaky testsuite.
> --
>
> Key: MESOS-3975
> URL: https://issues.apache.org/jira/browse/MESOS-3975
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
> Environment: CentOS 7.1, Kernel 3.10.0-229.20.1.el7.x86_64, gcc 
> 4.8.3, Docker 1.9
>Reporter: Till Toenshoff
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
>
> When running the tests of an SSL build of Mesos on CentOS 7.1, I see spurious 
> test failures that are, so far, not reproducible.
> The following tests did fail for me in complete runs but did seem fine when 
> running them individually, in repetition.  
> {noformat}
> DockerTest.ROOT_DOCKER_CheckPortResource
> {noformat}
> {noformat}
> ContainerizerTest.ROOT_CGROUPS_BalloonFramework
> {noformat}
> {noformat}
> [ RUN  ] 
> LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystemCommandExecutor
> 2015-11-20 
> 19:08:38,826:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> + /home/vagrant/mesos/build/src/mesos-containerizer mount --help=false 
> --operation=make-rslave --path=/
> + grep -E 
> /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/.+
>  /proc/self/mountinfo
> + grep -v 2b98025c-74f1-41d2-b35a-ce2cdfae347e
> + cut '-d ' -f5
> + xargs --no-run-if-empty umount -l
> + mount -n --rbind 
> /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/provisioner/containers/2b98025c-74f1-41d2-b35a-ce2cdfae347e/backends/copy/rootfses/bed11080-474b-4c69-8e7f-0ab85e895b0d
>  
> /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystemCommandExecutor_Tz7P8c/slaves/830e842e-c36a-4e4c-bff4-5b9568d7df12-S0/frameworks/830e842e-c36a-4e4c-bff4-5b9568d7df12-/executors/c735be54-c47f-4645-bfc1-2f4647e2cddb/runs/2b98025c-74f1-41d2-b35a-ce2cdfae347e/.rootfs
> Could not load cert file
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:354: Failure
> Value of: statusRunning.get().state()
>   Actual: TASK_FAILED
> Expected: TASK_RUNNING
> 2015-11-20 
> 19:08:42,164:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2015-11-20 
> 19:08:45,501:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2015-11-20 
> 19:08:48,837:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2015-11-20 
> 19:08:52,174:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:355: Failure
> Failed to wait 15secs for statusFinished
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:349: Failure
> Actual function call count doesn't match EXPECT_CALL(sched, 
> statusUpdate(, _))...
>  Expected: to be called twice
>Actual: called once - unsatisfied and active
> 2015-11-20 
> 19:08:55,511:21380(0x7fa10d5f2700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:53444] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> *** Aborted at 1448046536 (unix time) try "date -d @1448046536" if you are 
> using GNU date ***
> PC: @0x0 (unknown)
> *** SIGSEGV (@0x0) received by PID 21380 (TID 0x7fa1549e68c0) from PID 0; 
> stack trace: ***
> @ 0x7fa141796fbb (unknown)
> @ 0x7fa14179b341 (unknown)
> @ 0x7fa14f096130 (unknown)
> {noformat}
> Vagrantfile generator:
> {noformat}
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.hostname = "centos71"
>   config.vm.box = "bento/centos-7.1"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = 16384
> vb.cpus = 8
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = 9216
> vb.cpus = 4
>   end
>   config.vm.provision "shell", inline: <<-SHELL
>  sudo yum -y update systemd
>  sudo yum install -y tar wget
>  sudo wget 
> http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo 
> -O

[jira] [Commented] (MESOS-3946) Test for role management

2015-11-23 Thread Klaus Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023733#comment-15023733
 ] 

Klaus Ma commented on MESOS-3946:
-

I think we can use this ticket for integration test for dynamic role EPIC :).

> Test for role management
> 
>
> Key: MESOS-3946
> URL: https://issues.apache.org/jira/browse/MESOS-3946
> Project: Mesos
>  Issue Type: Task
>Reporter: Yong Qiao Wang
>Assignee: Yong Qiao Wang
>
> Add test for role dynamic configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3552) CHECK failure due to floating point precision on reservation request

2015-11-23 Thread Klaus Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023808#comment-15023808
 ] 

Klaus Ma commented on MESOS-3552:
-

Have a discussion with [~jieyu] at MESOS-1187, I think we'd better to introduce 
gtest's {{almostEqual()}} into our code to check double's equal. As a long term 
solution, fixed point will be the option.

> CHECK failure due to floating point precision on reservation request
> 
>
> Key: MESOS-3552
> URL: https://issues.apache.org/jira/browse/MESOS-3552
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Mandeep Chadha
>Assignee: Mandeep Chadha
>
> result.cpus() == cpus() check is failing due to ( double == double ) 
> comparison problem. 
> Root Cause : 
> Framework requested 0.1 cpu reservation for the first task. So far so good. 
> Next Reserve operation — lead to double operations resulting in following 
> double values :
>  results.cpus() : 23.9964472863211995 cpus() : 24
> And the check ( result.cpus() == cpus() ) failed. 
>  The double arithmetic operations caused results.cpus() value to be :  
> 23.9964472863211995 and hence ( 23.9964472863211995 
> == 24 ) failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-3963) Move "using mesos::fetcher::FetcherInfo" into internal namespace in "fetcher.hpp"

2015-11-23 Thread Klaus Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma reassigned MESOS-3963:
---

Assignee: Klaus Ma

> Move "using mesos::fetcher::FetcherInfo" into internal namespace in 
> "fetcher.hpp"
> -
>
> Key: MESOS-3963
> URL: https://issues.apache.org/jira/browse/MESOS-3963
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Reporter: Klaus Ma
>Assignee: Klaus Ma
>Priority: Minor
>  Labels: newbie
>
> According to the google code style, the using should be used in internal 
> namespace in header files. Grep the header files, only fetcher.hpp deserved a 
> path.
> {quote}
> You may use a using-declaration anywhere in a .cc file (including in the 
> global namespace), and in functions, methods, classes, or within internal 
> namespaces in .h files.
> Do not use using-declarations in .h files except in explicitly marked 
> internal-only namespaces, because anything imported into a namespace in a .h 
> file becomes part of the public API exported by that file.
> {code}
> // OK in .cc files.
> // Must be in a function, method, internal namespace, or
> // class in .h files.
> using ::foo::bar;
> {code}
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3964) LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs and LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs_Big_Quota fail on Debian 8.

2015-11-23 Thread Till Toenshoff (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023324#comment-15023324
 ] 

Till Toenshoff commented on MESOS-3964:
---

You mean the CFS related items?!

> LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs and 
> LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs_Big_Quota fail on Debian 8.
> ---
>
> Key: MESOS-3964
> URL: https://issues.apache.org/jira/browse/MESOS-3964
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, test
>Affects Versions: 0.26.0
> Environment: Debian 8, gcc 4.9.2, Docker 1.9.0, vagrant, libvirt
> Vagrantfile: see MESOS-3957
>Reporter: Bernd Mathiske
>Assignee: Greg Mann
>Priority: Blocker
>  Labels: mesosphere
>
> sudo ./bin/mesos-test.sh 
> --gtest_filter="LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs"
> {noformat}
> ...
> F1119 14:34:52.514742 30706 isolator_tests.cpp:455] CHECK_SOME(isolator): 
> Failed to find 'cpu.cfs_quota_us'. Your kernel might be too old to use the 
> CFS cgroups feature.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3988) Implicit roles

2015-11-23 Thread Qian Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023391#comment-15023391
 ] 

Qian Zhang commented on MESOS-3988:
---

[~neilc], we already have a ongoing project for dynamic role/weight: 
https://issues.apache.org/jira/browse/MESOS-3177, does it meet your requirement?

> Implicit roles
> --
>
> Key: MESOS-3988
> URL: https://issues.apache.org/jira/browse/MESOS-3988
> Project: Mesos
>  Issue Type: Epic
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere, roles
>
> At present, Mesos uses a static list of roles that are configured when the 
> master starts up. This places some severe limitations on how roles can be 
> used (e.g., changing the set of roles requires restarting all the masters).
> As an alternative (or a precursor) to implementing full-blown dynamic roles, 
> we could instead relax the concept of roles, so that:
> * frameworks can register with any role (subject to ACLs/authz)
> * reservations can be made for any role
> Open questions, at least to me:
> * This would mean weights cannot be configured dynamically. Is that okay?
> * Is this feature useful enough without dynamic ACL changes?
> * If we implement this (+ dynamic ACLs), do we also need dynamic roles?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3949) User CGroup Isolation tests fail on Centos 6.

2015-11-23 Thread Marco Massenzio (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023507#comment-15023507
 ] 

Marco Massenzio commented on MESOS-3949:


{quote}
even if they didn't ever pass, it is a good idea to check why they didn't.
{quote}

I couldn't agree more!
Thanks for the investigative work, looking forward to learning what you find.

What I meant was, however, related to whether these tests' failure should block 
the {{0.26}} release; regardless of that, we should of course get to the bottom 
of the failure and make a determination as to whether the best course of action 
is to implement a fix (in the actual code and/or the test) or disable them on 
some given platforms.

Thanks again for being "on the ball" :)

> User CGroup Isolation tests fail on Centos 6.
> -
>
> Key: MESOS-3949
> URL: https://issues.apache.org/jira/browse/MESOS-3949
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Affects Versions: 0.26.0
> Environment: CentOS 6.6, gcc 4.8.1, on vagrant libvirt, 16GB, 8 CPUs,
> ../configure --enable-libevent --enable-ssl
>Reporter: Bernd Mathiske
>Assignee: Alexander Rojas
>  Labels: mesosphere
>
> UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup and 
> UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup fail on CentOS 6.6 with 
> similar output when libevent and SSL are enabled.
> {noformat}
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup" --verbose
> {noformat}
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from UserCgroupIsolatorTest/0, where TypeParam = 
> mesos::internal::slave::CgroupsMemIsolatorProcess
> userdel: user 'mesos.test.unprivileged.user' does not exist
> [ RUN  ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup
> I1118 16:53:35.273717 30249 mem.cpp:605] Started listening for OOM events for 
> container 867a829e-4a26-43f5-86e0-938bf1f47688
> I1118 16:53:35.274538 30249 mem.cpp:725] Started listening on low memory 
> pressure events for container 867a829e-4a26-43f5-86e0-938bf1f47688
> I1118 16:53:35.275164 30249 mem.cpp:725] Started listening on medium memory 
> pressure events for container 867a829e-4a26-43f5-86e0-938bf1f47688
> I1118 16:53:35.275784 30249 mem.cpp:725] Started listening on critical memory 
> pressure events for container 867a829e-4a26-43f5-86e0-938bf1f47688
> I1118 16:53:35.276448 30249 mem.cpp:356] Updated 'memory.soft_limit_in_bytes' 
> to 1GB for container 867a829e-4a26-43f5-86e0-938bf1f47688
> I1118 16:53:35.277331 30249 mem.cpp:391] Updated 'memory.limit_in_bytes' to 
> 1GB for container 867a829e-4a26-43f5-86e0-938bf1f47688
> -bash: 
> /sys/fs/cgroup/memory/mesos/867a829e-4a26-43f5-86e0-938bf1f47688/cgroup.procs:
>  No such file or directory
> mkdir: cannot create directory 
> `/sys/fs/cgroup/memory/mesos/867a829e-4a26-43f5-86e0-938bf1f47688/user': No 
> such file or directory
> ../../src/tests/containerizer/isolator_tests.cpp:1307: Failure
> Value of: os::system( "su - " + UNPRIVILEGED_USERNAME + " -c 'mkdir " + 
> path::join(flags.cgroups_hierarchy, userCgroup) + "'")
>   Actual: 256
> Expected: 0
> -bash: 
> /sys/fs/cgroup/memory/mesos/867a829e-4a26-43f5-86e0-938bf1f47688/user/cgroup.procs:
>  No such file or directory
> ../../src/tests/containerizer/isolator_tests.cpp:1316: Failure
> Value of: os::system( "su - " + UNPRIVILEGED_USERNAME + " -c 'echo $$ >" + 
> path::join(flags.cgroups_hierarchy, userCgroup, "cgroup.procs") + "'")
>   Actual: 256
> Expected: 0
> [  FAILED  ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where 
> TypeParam = mesos::internal::slave::CgroupsMemIsolatorProcess (149 ms)
> {noformat}
> {noformat}
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup" --verbose
> {noformat}
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from UserCgroupIsolatorTest/1, where TypeParam = 
> mesos::internal::slave::CgroupsCpushareIsolatorProcess
> userdel: user 'mesos.test.unprivileged.user' does not exist
> [ RUN  ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup
> I1118 17:01:00.550706 30357 cpushare.cpp:392] Updated 'cpu.shares' to 1024 
> (cpus 1) for container e57f4343-1a97-4b44-b347-803be47ace80
> -bash: 
> /sys/fs/cgroup/cpuacct/mesos/e57f4343-1a97-4b44-b347-803be47ace80/cgroup.procs:
>  No such file or directory
> mkdir: cannot create directory 
> `/sys/fs/cgroup/cpuacct/mesos/e57f4343-1a97-4b44-b347-803be47ace80/user': No 
> such file or directory
> ../../src/tests/containerizer/isolator_tests.cpp:1307: Failure
> Value of: os::system( "su - " + UNPRIVILEGED_USERNAME + " -c 'mkdir " + 
> path::join(flags.cgroups_hierarchy,

[jira] [Commented] (MESOS-3964) LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs and LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs_Big_Quota fail on Debian 8.

2015-11-23 Thread Till Toenshoff (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023372#comment-15023372
 ] 

Till Toenshoff commented on MESOS-3964:
---

It seems this always fails on "out-of-the-box" Debian kernels - all versions I 
tried lacked the "CFS bandwidth control" .  Even their experimental Kernel (4.3 
right now) does not seem to support it. So far, I did however not find out 
why... 

So unless we come up with some documentation updates that explain how to get 
this to work on Debian8, my vote here is to detect the presence of this Kernel 
feature and to disable this test on systems that do not support it.

> LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs and 
> LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs_Big_Quota fail on Debian 8.
> ---
>
> Key: MESOS-3964
> URL: https://issues.apache.org/jira/browse/MESOS-3964
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, test
>Affects Versions: 0.26.0
> Environment: Debian 8, gcc 4.9.2, Docker 1.9.0, vagrant, libvirt
> Vagrantfile: see MESOS-3957
>Reporter: Bernd Mathiske
>Assignee: Greg Mann
>Priority: Blocker
>  Labels: mesosphere
>
> sudo ./bin/mesos-test.sh 
> --gtest_filter="LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs"
> {noformat}
> ...
> F1119 14:34:52.514742 30706 isolator_tests.cpp:455] CHECK_SOME(isolator): 
> Failed to find 'cpu.cfs_quota_us'. Your kernel might be too old to use the 
> CFS cgroups feature.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3964) LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs and LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs_Big_Quota fail on Debian 8.

2015-11-23 Thread Greg Mann (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023394#comment-15023394
 ] 

Greg Mann commented on MESOS-3964:
--

As [~tillt] suggests in the comments of MESOS-3978, a possible solution to this 
issue would be to perform a check for the existence of the CFS cgroups, 
something like the following:

{code}
$ ls -l /sys/fs/cgroup/cpu/cpu.cfs_quota_us
ls: cannot access /sys/fs/cgroup/cpu/cpu.cfs_quota_us: No such file or directory
$ echo $?
2
{code}

If this test fails, the test suite could filter out the offending tests. 
However, we should be cautious that the absence of these cgroups won't cause 
problems for the isolator/containerizer code. I've sent an email out to the 
devlist suggesting this solution and soliciting feedback.

> LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs and 
> LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs_Big_Quota fail on Debian 8.
> ---
>
> Key: MESOS-3964
> URL: https://issues.apache.org/jira/browse/MESOS-3964
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, test
>Affects Versions: 0.26.0
> Environment: Debian 8, gcc 4.9.2, Docker 1.9.0, vagrant, libvirt
> Vagrantfile: see MESOS-3957
>Reporter: Bernd Mathiske
>Assignee: Greg Mann
>Priority: Blocker
>  Labels: mesosphere
>
> sudo ./bin/mesos-test.sh 
> --gtest_filter="LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs"
> {noformat}
> ...
> F1119 14:34:52.514742 30706 isolator_tests.cpp:455] CHECK_SOME(isolator): 
> Failed to find 'cpu.cfs_quota_us'. Your kernel might be too old to use the 
> CFS cgroups feature.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3994) Refactor registry client/puller to avoid JSON and struct

2015-11-23 Thread Gilbert Song (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-3994:

Sprint: Mesosphere Sprint 23

> Refactor registry client/puller to avoid JSON and struct
> 
>
> Key: MESOS-3994
> URL: https://issues.apache.org/jira/browse/MESOS-3994
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>
> We should get rid of all JSON and struct for message passing as function 
> returned type. By using the methods provided by spec.hpp to refactor all 
> unnecessary JSON message and struct in registry client and registry puller. 
> Also, remove all redundant check in registry client that are already checked 
> by spec validation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 3 >

1 - 100 of 214 matches

Mail list logo