[jira] [Commented] (MESOS-9653) Allow framework to set `min_alloctable_resources` upon revival.

2019-03-21 Thread Tim Harper (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798357#comment-16798357
 ] 

Tim Harper commented on MESOS-9653:
---

This sounds like an easy optimization for both the Mesos allocator and 
frameworks.

One thing I wonder about is how this will be exposed as to why frameworks 
aren't getting any offers.

> Allow framework to set `min_alloctable_resources` upon revival.
> ---
>
> Key: MESOS-9653
> URL: https://issues.apache.org/jira/browse/MESOS-9653
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Meng Zhu
>Priority: Major
>  Labels: mesosphere, resource-management
>
> In MESOS-9523, we added per-framework allocatable resources matcher/filter 
> where frameworks can specify in their `FrameworkInfo` when subscribing. 
> Frameworks can have some control over the shape of the resource offer via 
> this per-framework filters.
> Other than set the filters when subscribing, a natural workflow is to set 
> these filters upon revival. Frameworks can set these filters to the resource 
> quantity shape of the tasks they want to launch upon revival. If a framework 
> specifies this in a revive call, all existing filters (accumulated when 
> declining offers) and the current `min_alloctable_resources` filters will be 
> cleared and replaced with the new specified `min_alloctable_resources` filter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9269) Mesos UCR with Docker only Works on Host

2018-09-27 Thread Tim Harper (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631310#comment-16631310
 ] 

Tim Harper commented on MESOS-9269:
---

https://jira.mesosphere.com/browse/MARATHON-8448 has some relevant details

> Mesos UCR with Docker only Works on Host
> 
>
> Key: MESOS-9269
> URL: https://issues.apache.org/jira/browse/MESOS-9269
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, docker
>Affects Versions: 1.7.0
> Environment: Ubuntu 16.04
> Mesos 1.7.0
> Marathon 1.7.111
>Reporter: z s
>Priority: Major
>
> I'm having an issue setting up the `mesos-cni-port-mapper` to allow remote 
> connectivity.
> When I `curl :` from the machine I get a response but from a 
> remote machine the `curl` connection timesout. I'm not sure what's wrong with 
> my route settings.
>  
> */var/lib/mesos/cni/config/mesos-bridge.json*
>  
> {code:java}
> {
> "name" : "mesos-bridge",
> "type" : "mesos-cni-port-mapper",
> "excludeDevices" : ["mesos-cni0"],
> "chain": "MESOS-BRIDGE-PORT-MAPPER",
> "delegate": {
> "type": "bridge",
> "bridge": "mesos-cni0",
> "isGateway": true,
> "ipMasq": true,
> "ipam": {
> "type": "host-local",
> "subnet": "10.1.0.0/16",
> "routes": [
> { "dst":
> "0.0.0.0/0" }
> ]
> }
> }
> }
> {code}
>  
> {code:java}
> $ route -n
> Kernel IP routing table
> Destination Gateway Genmask Flags Metric Ref Use Iface
> 0.0.0.0 172.27.1.1 0.0.0.0 UG 0 0 0 ens3
> 10.1.0.0 0.0.0.0 255.255.0.0 U 0 0 0 mesos-cni0
> 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
> 172.27.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ens3
> {code}
> Any suggestions?
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9095) Consider including public protobuf definitions in generated jar

2018-07-19 Thread Tim Harper (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549695#comment-16549695
 ] 

Tim Harper commented on MESOS-9095:
---

Thank you for filing this, Benjamin. This will be really helpful.

Currently, Marathon does what you say (we copy the Proto sources into our own 
code base, and check in the generated code).

> Consider including public protobuf definitions in generated jar
> ---
>
> Key: MESOS-9095
> URL: https://issues.apache.org/jira/browse/MESOS-9095
> Project: Mesos
>  Issue Type: Improvement
>  Components: java api
>Reporter: Benjamin Bannier
>Priority: Major
>
> We currently do not package public proto sources alongside other resources in 
> the jar. This is inconsistent with what we do e.g., for packages or {{install 
> rules}} on the C++ side.
> Frameworks seem to work around this by forking required proto sources into 
> their own source code, or (slightly less worse) fetching them from 
> potentially poorly versioned internet resources. Both approaches can lead to 
> complicate dependencies between used jar and proto sources.
> We should include them in the jar we publish, e.g., by declaring them as 
> {{resources}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8629) GLIBCXX_3.4.21 required for Mesos Debian Jessie package, not available

2018-03-01 Thread Tim Harper (JIRA)
Tim Harper created MESOS-8629:
-

 Summary: GLIBCXX_3.4.21 required for Mesos Debian Jessie package, 
not available
 Key: MESOS-8629
 URL: https://issues.apache.org/jira/browse/MESOS-8629
 Project: Mesos
  Issue Type: Choose from below ...
Reporter: Tim Harper


h1. Overview

When I install the Mesos package for Debian Jessie using the following 
Dockerfile:

{code}

FROM debian:jessie

RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E56151BF && \
echo "deb http://ftp.debian.org/debian jessie-backports main" >> 
/etc/apt/sources.list && \
echo "deb http://repos.mesosphere.com/debian jessie-testing main" | tee -a 
/etc/apt/sources.list.d/mesosphere.list && \
echo "deb http://repos.mesosphere.com/debian jessie main" | tee -a 
/etc/apt/sources.list.d/mesosphere.list && \
apt-get update && \
# this MUST be done first, unfortunately, because Mesos packages will 
create folders that should be symlinks and break the python install process
apt-get install python2.7-minimal -y && \
apt-get install -y openjdk-8-jdk-headless openjdk-8-jre-headless 
ca-certificates-java=20161107~bpo8+1 && \
apt-get install --no-install-recommends -y --force-yes mesos=1.5.0-2.0.1 && 
\

# disable mesos-master; we don't want to start in this image
systemctl disable mesos-master && \
systemctl disable mesos-slave && \

# jdk setup
/var/lib/dpkg/info/ca-certificates-java.postinst configure && \
ln -svT "/usr/lib/jvm/java-8-openjdk-$(dpkg --print-architecture)" 
/docker-java-home && \

# jq / curl
apt-get install -y procps curl jq=1.5* && \

apt-get clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

ENV JAVA_HOME /docker-java-home

ENTRYPOINT ["/sbin/init"]
{code}

Then, Mesos will install. However, when I run the container, it will not launch:

{code}
docker run --name mesos-agent-local-a --rm --privileged --label 
marathon-package-test --cap-add SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro 
--entrypoint /bin/bash -it marathon-package-test:mesos

root@c17342b33218:/# /usr/sbin/mesos-master
/usr/sbin/mesos-master: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version 
`GLIBCXX_3.4.21' not found (required by /usr/sbin/mesos-master)
/usr/sbin/mesos-master: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version 
`GLIBCXX_3.4.21' not found (required by /usr/local/lib/libmesos-1.5.0.so)
{code}

Debian Jessie does not include a new enough version of libstdc++6 with version 
GLIBCXX_3.4.21 support. After updating to the latest library version:

{code}
root@c17342b33218:/# strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep 
GLIBCXX_3.4
GLIBCXX_3.4
GLIBCXX_3.4.1
GLIBCXX_3.4.2
GLIBCXX_3.4.3
GLIBCXX_3.4.4
GLIBCXX_3.4.5
GLIBCXX_3.4.6
GLIBCXX_3.4.7
GLIBCXX_3.4.8
GLIBCXX_3.4.9
GLIBCXX_3.4.10
GLIBCXX_3.4.11
GLIBCXX_3.4.12
GLIBCXX_3.4.13
GLIBCXX_3.4.14
GLIBCXX_3.4.15
GLIBCXX_3.4.16
GLIBCXX_3.4.17
GLIBCXX_3.4.18
GLIBCXX_3.4.19
GLIBCXX_3.4.20
{code}

h2. Potential solutions

If the Debian Mesos build is going to require {{GLIBCXX_3.4.21}}, then it seems 
that the package for Debian Jessie should be revoked. However, this does not 
seem reasonable as Debian Jessie is still supported, and the long-term-support 
continues through to 2020.

Otherwise, the package should be rebuilt with a more lenient requirement of 
GLIBCXX.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8150) Attributes documentation indicates that sets are valid attribute types; code disagrees

2017-12-19 Thread Tim Harper (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297269#comment-16297269
 ] 

Tim Harper commented on MESOS-8150:
---

After re-reading the documentation, it is clear that set attributes are not 
supported. Closing.

> Attributes documentation indicates that sets are valid attribute types; code 
> disagrees
> --
>
> Key: MESOS-8150
> URL: https://issues.apache.org/jira/browse/MESOS-8150
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Tim Harper
>Priority: Minor
>
> On the [Mesos Attributes & 
> Resources|http://mesos.apache.org/documentation/latest/attributes-resources/] 
> page, it says:
> {quote}The types of values that are supported by Attributes and Resources in 
> Mesos are scalar, ranges, sets and text.{quote}
> However, the code for 1.4.x disagrees. Sets are not supported for attribute 
> types:
> https://github.com/apache/mesos/blob/1.4.0/src/common/attributes.cpp#L171
> https://github.com/apache/mesos/blob/1.4.0/src/common/attributes.cpp#L115-L128



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8150) Attributes documentation indicates that sets are valid attribute types; code disagrees

2017-12-19 Thread Tim Harper (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297265#comment-16297265
 ] 

Tim Harper commented on MESOS-8150:
---

Related: https://issues.apache.org/jira/browse/MESOS-8150


> Attributes documentation indicates that sets are valid attribute types; code 
> disagrees
> --
>
> Key: MESOS-8150
> URL: https://issues.apache.org/jira/browse/MESOS-8150
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Tim Harper
>Priority: Minor
>
> On the [Mesos Attributes & 
> Resources|http://mesos.apache.org/documentation/latest/attributes-resources/] 
> page, it says:
> {quote}The types of values that are supported by Attributes and Resources in 
> Mesos are scalar, ranges, sets and text.{quote}
> However, the code for 1.4.x disagrees. Sets are not supported for attribute 
> types:
> https://github.com/apache/mesos/blob/1.4.0/src/common/attributes.cpp#L171
> https://github.com/apache/mesos/blob/1.4.0/src/common/attributes.cpp#L115-L128



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8171) Using a failoverTimeout of 0 with Mesos native scheduler client can result in infinite subscribe loop

2017-11-04 Thread Tim Harper (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Harper updated MESOS-8171:
--
Description: 
Over the past year, the Marathon team has been plagued with an issue that hits 
our CI builds periodically in which the scheduler driver enters a tight loop, 
sending 10,000s of SUBSCRIBE calls to the master per second. I turned on debug 
logging for the client and the server, and it pointed to an issue with the 
{{doReliableRegistration}} method in sched.cpp. Here's the logs:

{code}
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.099815 
13397 process.cpp:1383] libprocess is initialized on 127.0.1.1:60957 with 8 
worker threads
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.118237 
13397 logging.cpp:199] Logging to STDERR
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.128921 
13416 sched.cpp:232] Version: 1.4.0
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151785 
13791 group.cpp:341] Group process (zookeeper-group(1)@127.0.1.1:60957) 
connected to ZooKeeper
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151823 
13791 group.cpp:831] Syncing group operations: queue size (joins, cancels, 
datas) = (0, 0, 0)
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151837 
13791 group.cpp:419] Trying to create path '/mesos' in ZooKeeper
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152586 
13791 group.cpp:758] Found non-sequence node 'log_replicas' at '/mesos' in 
ZooKeeper
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152662 
13791 detector.cpp:152] Detected a new leader: (id='0')
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152762 
13791 group.cpp:700] Trying to get '/mesos/json.info_00' in ZooKeeper
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157148 
13791 zookeeper.cpp:262] A new leading master (UPID=master@172.16.10.95:32856) 
is detected
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157347 
13787 sched.cpp:336] New master detected at master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157557 
13787 sched.cpp:352] No credentials provided. Attempting to register without 
authentication
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157565 
13787 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157635 
13787 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.158979 
13785 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159029 
13785 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159265 
13790 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159303 
13790 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159479 
13786 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159521 
13786 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159622 
13788 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159658 
13788 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159749 
13789 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159785 
13789 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159878 
13792 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159916 
13792 sched.cpp:869] Will retry registration in 0ns if necessary
{code}

In Marathon, when we are running our tests, we set the failoverTimeout to 0 in 
order to cause the Mesos master to immediately forget about a framework when it 
disconnects.

On line 860 of sched.cpp, the retry-delay is set to 1/10th the failoverTimeout, 
which provides the best explanation for why the value is 0:

{code}
./mesos/src/sched/sched.cpp

 818 |   void doReliableRegistration(Duration maxBackoff)
 819 |   {

[jira] [Updated] (MESOS-8171) Using a failoverTimeout of 0 with Mesos native scheduler client can result in infinite subscribe loop

2017-11-04 Thread Tim Harper (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Harper updated MESOS-8171:
--
Description: 
Over the past year, the Marathon team has been plagued with an issue that hits 
our CI builds periodically in which the scheduler driver enters a tight loop, 
sending 10,000s of SUBSCRIBE calls to the master per second. I turned on debug 
logging for the client and the server, and it pointed to an issue with the 
{{doReliableRegistration}} method in sched.cpp. Here's the logs:

{code}
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.099815 
13397 process.cpp:1383] libprocess is initialized on 127.0.1.1:60957 with 8 
worker threads
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.118237 
13397 logging.cpp:199] Logging to STDERR
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.128921 
13416 sched.cpp:232] Version: 1.4.0
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151785 
13791 group.cpp:341] Group process (zookeeper-group(1)@127.0.1.1:60957) 
connected to ZooKeeper
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151823 
13791 group.cpp:831] Syncing group operations: queue size (joins, cancels, 
datas) = (0, 0, 0)
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151837 
13791 group.cpp:419] Trying to create path '/mesos' in ZooKeeper
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152586 
13791 group.cpp:758] Found non-sequence node 'log_replicas' at '/mesos' in 
ZooKeeper
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152662 
13791 detector.cpp:152] Detected a new leader: (id='0')
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152762 
13791 group.cpp:700] Trying to get '/mesos/json.info_00' in ZooKeeper
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157148 
13791 zookeeper.cpp:262] A new leading master (UPID=master@172.16.10.95:32856) 
is detected
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157347 
13787 sched.cpp:336] New master detected at master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157557 
13787 sched.cpp:352] No credentials provided. Attempting to register without 
authentication
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157565 
13787 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157635 
13787 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.158979 
13785 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159029 
13785 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159265 
13790 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159303 
13790 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159479 
13786 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159521 
13786 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159622 
13788 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159658 
13788 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159749 
13789 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159785 
13789 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159878 
13792 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159916 
13792 sched.cpp:869] Will retry registration in 0ns if necessary
{code}

In Marathon, when we are running our tests, we set the failoverTimeout to 0 in 
order to cause the Mesos master to immediately forget about a framework when it 
disconnects.

On line 860 of sched.cpp, the retry-delay is set to 1/10th the failoverTimeout, 
which provides the best explanation for why the value is 0:

{code}
./mesos/src/sched/sched.cpp

 818 |   void doReliableRegistration(Duration maxBackoff)
 819 |   {

[jira] [Updated] (MESOS-8171) Using a failoverTimeout of 0 with Mesos native scheduler client can result in infinite subscribe loop

2017-11-04 Thread Tim Harper (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Harper updated MESOS-8171:
--
Affects Version/s: 1.1.3
   1.2.2
   1.3.1

> Using a failoverTimeout of 0 with Mesos native scheduler client can result in 
> infinite subscribe loop
> -
>
> Key: MESOS-8171
> URL: https://issues.apache.org/jira/browse/MESOS-8171
> Project: Mesos
>  Issue Type: Bug
>  Components: c++ api, java api, scheduler driver
>Affects Versions: 1.1.3, 1.2.2, 1.3.1, 1.4.0
>Reporter: Tim Harper
>Priority: Minor
>
> Over the past year, the Marathon team has been plagued with an issue that 
> hits our CI builds periodically in which the scheduler driver enters a tight 
> loop, sending 10,000s of SUBSCRIBE calls to the master per second. I turned 
> on debug logging for the client and the server, and it pointed to an issue 
> with the {{doReliableRegistration}} method in sched.cpp. Here's the logs:
> {code}
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.099815 13397 process.cpp:1383] libprocess is initialized on 
> 127.0.1.1:60957 with 8 worker threads
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.118237 13397 logging.cpp:199] Logging to STDERR
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.128921 13416 sched.cpp:232] Version: 1.4.0
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.151785 13791 group.cpp:341] Group process 
> (zookeeper-group(1)@127.0.1.1:60957) connected to ZooKeeper
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.151823 13791 group.cpp:831] Syncing group operations: queue size 
> (joins, cancels, datas) = (0, 0, 0)
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.151837 13791 group.cpp:419] Trying to create path '/mesos' in 
> ZooKeeper
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.152586 13791 group.cpp:758] Found non-sequence node 'log_replicas' 
> at '/mesos' in ZooKeeper
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.152662 13791 detector.cpp:152] Detected a new leader: (id='0')
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.152762 13791 group.cpp:700] Trying to get 
> '/mesos/json.info_00' in ZooKeeper
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.157148 13791 zookeeper.cpp:262] A new leading master 
> (UPID=master@172.16.10.95:32856) is detected
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.157347 13787 sched.cpp:336] New master detected at 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.157557 13787 sched.cpp:352] No credentials provided. Attempting to 
> register without authentication
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.157565 13787 sched.cpp:836] Sending SUBSCRIBE call to 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.157635 13787 sched.cpp:869] Will retry registration in 0ns if 
> necessary
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.158979 13785 sched.cpp:836] Sending SUBSCRIBE call to 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159029 13785 sched.cpp:869] Will retry registration in 0ns if 
> necessary
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159265 13790 sched.cpp:836] Sending SUBSCRIBE call to 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159303 13790 sched.cpp:869] Will retry registration in 0ns if 
> necessary
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159479 13786 sched.cpp:836] Sending SUBSCRIBE call to 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159521 13786 sched.cpp:869] Will retry registration in 0ns if 
> necessary
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159622 13788 sched.cpp:836] Sending SUBSCRIBE call to 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159658 13788 sched.cpp:869] Will retry registration in 0ns if 
> necessary
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159749 13789 sched.cpp:836] Sending SUBSCRIBE call to 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159785 13789 

[jira] [Updated] (MESOS-8171) Using a failoverTimeout of 0 with Mesos native scheduler client can result in infinite subscribe loop

2017-11-04 Thread Tim Harper (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Harper updated MESOS-8171:
--
Description: 
Over the past year, the Marathon team has been plagued with an issue that hits 
our CI builds periodically in which the scheduler driver enters a tight loop, 
sending 10,000s of SUBSCRIBE calls to the master per second. I turned on debug 
logging for the client and the server, and it pointed to an issue with the 
{{doReliableRegistration}} method in sched.cpp. Here's the logs:

{code}
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.099815 
13397 process.cpp:1383] libprocess is initialized on 127.0.1.1:60957 with 8 
worker threads
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.118237 
13397 logging.cpp:199] Logging to STDERR
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.128921 
13416 sched.cpp:232] Version: 1.4.0
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151785 
13791 group.cpp:341] Group process (zookeeper-group(1)@127.0.1.1:60957) 
connected to ZooKeeper
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151823 
13791 group.cpp:831] Syncing group operations: queue size (joins, cancels, 
datas) = (0, 0, 0)
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151837 
13791 group.cpp:419] Trying to create path '/mesos' in ZooKeeper
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152586 
13791 group.cpp:758] Found non-sequence node 'log_replicas' at '/mesos' in 
ZooKeeper
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152662 
13791 detector.cpp:152] Detected a new leader: (id='0')
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152762 
13791 group.cpp:700] Trying to get '/mesos/json.info_00' in ZooKeeper
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157148 
13791 zookeeper.cpp:262] A new leading master (UPID=master@172.16.10.95:32856) 
is detected
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157347 
13787 sched.cpp:336] New master detected at master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157557 
13787 sched.cpp:352] No credentials provided. Attempting to register without 
authentication
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157565 
13787 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157635 
13787 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.158979 
13785 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159029 
13785 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159265 
13790 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159303 
13790 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159479 
13786 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159521 
13786 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159622 
13788 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159658 
13788 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159749 
13789 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159785 
13789 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159878 
13792 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159916 
13792 sched.cpp:869] Will retry registration in 0ns if necessary
{code}

In Marathon, when we are running our tests, we set the failoverTimeout to 0 in 
order to cause the Mesos master to immediately forget about a framework when it 
disconnects.

On line 860 of sched.cpp, the retry-delay is set to 1/10th the failoverTimeout, 
which provides the best explanation for why the value is 0:

{code}
/Users/tim/src/m8e/mesos/src/sched/sched.cpp

 818 |   void doReliableRegistration(Duration 

[jira] [Commented] (MESOS-8171) Using a failoverTimeout of 0 with Mesos native scheduler client can result in infinite subscribe loop

2017-11-04 Thread Tim Harper (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238837#comment-16238837
 ] 

Tim Harper commented on MESOS-8171:
---

It seems like perhaps an ideal solution would be to ignore 0?

{code}
if ( (duration.isSome()) && (duration.get() > Duration::zero() ) {
  ...
}
{code}

> Using a failoverTimeout of 0 with Mesos native scheduler client can result in 
> infinite subscribe loop
> -
>
> Key: MESOS-8171
> URL: https://issues.apache.org/jira/browse/MESOS-8171
> Project: Mesos
>  Issue Type: Bug
>  Components: c++ api, java api, scheduler driver
>Affects Versions: 1.4.0
>Reporter: Tim Harper
>Priority: Minor
>
> Over the past year, the Marathon team has been plagued with an issue that 
> hits our CI builds periodically in which the scheduler driver enters a tight 
> loop, sending 10,000s of SUBSCRIBE calls to the master per second. I turned 
> on debug logging for the client and the server, and it pointed to an issue 
> with the {{doReliableRegistration}} method in sched.cpp. Here's the logs:
> {code}
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.099815 13397 process.cpp:1383] libprocess is initialized on 
> 127.0.1.1:60957 with 8 worker threads
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.118237 13397 logging.cpp:199] Logging to STDERR
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.128921 13416 sched.cpp:232] Version: 1.4.0
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.151785 13791 group.cpp:341] Group process 
> (zookeeper-group(1)@127.0.1.1:60957) connected to ZooKeeper
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.151823 13791 group.cpp:831] Syncing group operations: queue size 
> (joins, cancels, datas) = (0, 0, 0)
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.151837 13791 group.cpp:419] Trying to create path '/mesos' in 
> ZooKeeper
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.152586 13791 group.cpp:758] Found non-sequence node 'log_replicas' 
> at '/mesos' in ZooKeeper
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.152662 13791 detector.cpp:152] Detected a new leader: (id='0')
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.152762 13791 group.cpp:700] Trying to get 
> '/mesos/json.info_00' in ZooKeeper
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.157148 13791 zookeeper.cpp:262] A new leading master 
> (UPID=master@172.16.10.95:32856) is detected
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.157347 13787 sched.cpp:336] New master detected at 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.157557 13787 sched.cpp:352] No credentials provided. Attempting to 
> register without authentication
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.157565 13787 sched.cpp:836] Sending SUBSCRIBE call to 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.157635 13787 sched.cpp:869] Will retry registration in 0ns if 
> necessary
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.158979 13785 sched.cpp:836] Sending SUBSCRIBE call to 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159029 13785 sched.cpp:869] Will retry registration in 0ns if 
> necessary
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159265 13790 sched.cpp:836] Sending SUBSCRIBE call to 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159303 13790 sched.cpp:869] Will retry registration in 0ns if 
> necessary
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159479 13786 sched.cpp:836] Sending SUBSCRIBE call to 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159521 13786 sched.cpp:869] Will retry registration in 0ns if 
> necessary
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159622 13788 sched.cpp:836] Sending SUBSCRIBE call to 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159658 13788 sched.cpp:869] Will retry registration in 0ns if 
> necessary
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159749 13789 sched.cpp:836] Sending SUBSCRIBE call to 
> master@172.16.10.95:32856
> WARN 

[jira] [Comment Edited] (MESOS-8171) Using a failoverTimeout of 0 with Mesos native scheduler client can result in infinite subscribe loop

2017-11-04 Thread Tim Harper (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238837#comment-16238837
 ] 

Tim Harper edited comment on MESOS-8171 at 11/4/17 7:50 AM:


It seems like perhaps an ideal solution would be to ignore 0?

{code}
if ( duration.isSome() && (duration.get() > Duration::zero()) ) {
  ...
}
{code}


was (Author: timcharper):
It seems like perhaps an ideal solution would be to ignore 0?

{code}
if ( (duration.isSome()) && (duration.get() > Duration::zero() ) {
  ...
}
{code}

> Using a failoverTimeout of 0 with Mesos native scheduler client can result in 
> infinite subscribe loop
> -
>
> Key: MESOS-8171
> URL: https://issues.apache.org/jira/browse/MESOS-8171
> Project: Mesos
>  Issue Type: Bug
>  Components: c++ api, java api, scheduler driver
>Affects Versions: 1.4.0
>Reporter: Tim Harper
>Priority: Minor
>
> Over the past year, the Marathon team has been plagued with an issue that 
> hits our CI builds periodically in which the scheduler driver enters a tight 
> loop, sending 10,000s of SUBSCRIBE calls to the master per second. I turned 
> on debug logging for the client and the server, and it pointed to an issue 
> with the {{doReliableRegistration}} method in sched.cpp. Here's the logs:
> {code}
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.099815 13397 process.cpp:1383] libprocess is initialized on 
> 127.0.1.1:60957 with 8 worker threads
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.118237 13397 logging.cpp:199] Logging to STDERR
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.128921 13416 sched.cpp:232] Version: 1.4.0
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.151785 13791 group.cpp:341] Group process 
> (zookeeper-group(1)@127.0.1.1:60957) connected to ZooKeeper
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.151823 13791 group.cpp:831] Syncing group operations: queue size 
> (joins, cancels, datas) = (0, 0, 0)
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.151837 13791 group.cpp:419] Trying to create path '/mesos' in 
> ZooKeeper
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.152586 13791 group.cpp:758] Found non-sequence node 'log_replicas' 
> at '/mesos' in ZooKeeper
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.152662 13791 detector.cpp:152] Detected a new leader: (id='0')
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.152762 13791 group.cpp:700] Trying to get 
> '/mesos/json.info_00' in ZooKeeper
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.157148 13791 zookeeper.cpp:262] A new leading master 
> (UPID=master@172.16.10.95:32856) is detected
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.157347 13787 sched.cpp:336] New master detected at 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.157557 13787 sched.cpp:352] No credentials provided. Attempting to 
> register without authentication
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.157565 13787 sched.cpp:836] Sending SUBSCRIBE call to 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.157635 13787 sched.cpp:869] Will retry registration in 0ns if 
> necessary
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.158979 13785 sched.cpp:836] Sending SUBSCRIBE call to 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159029 13785 sched.cpp:869] Will retry registration in 0ns if 
> necessary
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159265 13790 sched.cpp:836] Sending SUBSCRIBE call to 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159303 13790 sched.cpp:869] Will retry registration in 0ns if 
> necessary
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159479 13786 sched.cpp:836] Sending SUBSCRIBE call to 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159521 13786 sched.cpp:869] Will retry registration in 0ns if 
> necessary
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159622 13788 sched.cpp:836] Sending SUBSCRIBE call to 
> master@172.16.10.95:32856
> WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 
> 05:39:39.159658 13788 

[jira] [Updated] (MESOS-8171) Using a failoverTimeout of 0 with Mesos native scheduler client can result in infinite subscribe loop

2017-11-04 Thread Tim Harper (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Harper updated MESOS-8171:
--
Description: 
Over the past year, the Marathon team has been plagued with an issue that hits 
our CI builds periodically in which the scheduler driver enters a tight loop, 
sending 10,000s of SUBSCRIBE calls to the master per second. I turned on debug 
logging for the client and the server, and it pointed to an issue with the 
{{doReliableRegistration}} method in sched.cpp. Here's the logs:

{code}
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.099815 
13397 process.cpp:1383] libprocess is initialized on 127.0.1.1:60957 with 8 
worker threads
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.118237 
13397 logging.cpp:199] Logging to STDERR
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.128921 
13416 sched.cpp:232] Version: 1.4.0
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151785 
13791 group.cpp:341] Group process (zookeeper-group(1)@127.0.1.1:60957) 
connected to ZooKeeper
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151823 
13791 group.cpp:831] Syncing group operations: queue size (joins, cancels, 
datas) = (0, 0, 0)
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151837 
13791 group.cpp:419] Trying to create path '/mesos' in ZooKeeper
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152586 
13791 group.cpp:758] Found non-sequence node 'log_replicas' at '/mesos' in 
ZooKeeper
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152662 
13791 detector.cpp:152] Detected a new leader: (id='0')
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152762 
13791 group.cpp:700] Trying to get '/mesos/json.info_00' in ZooKeeper
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157148 
13791 zookeeper.cpp:262] A new leading master (UPID=master@172.16.10.95:32856) 
is detected
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157347 
13787 sched.cpp:336] New master detected at master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157557 
13787 sched.cpp:352] No credentials provided. Attempting to register without 
authentication
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157565 
13787 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157635 
13787 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.158979 
13785 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159029 
13785 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159265 
13790 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159303 
13790 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159479 
13786 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159521 
13786 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159622 
13788 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159658 
13788 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159749 
13789 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159785 
13789 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159878 
13792 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159916 
13792 sched.cpp:869] Will retry registration in 0ns if necessary
{code}

In Marathon, when we are running our tests, we set the failoverTimeout to 0 in 
order to cause the Mesos master to immediately forget about a framework when it 
disconnects.

On line 860 of sched.cpp, the retry-delay is set to 1/10th the failoverTimeout, 
which provides the best explanation for why the value is 0:

{code}
/Users/tim/src/m8e/mesos/src/sched/sched.cpp

 818 |   void doReliableRegistration(Duration 

[jira] [Created] (MESOS-8171) Using a failoverTimeout of 0 with Mesos native scheduler client can result in infinite subscribe loop

2017-11-04 Thread Tim Harper (JIRA)
Tim Harper created MESOS-8171:
-

 Summary: Using a failoverTimeout of 0 with Mesos native scheduler 
client can result in infinite subscribe loop
 Key: MESOS-8171
 URL: https://issues.apache.org/jira/browse/MESOS-8171
 Project: Mesos
  Issue Type: Bug
  Components: c++ api, java api, scheduler driver
Affects Versions: 1.4.0
Reporter: Tim Harper
Priority: Minor


Over the past year, the Marathon team has been plagued with an issue that hits 
our CI builds periodically in which the scheduler driver enters a tight loop, 
sending 10,000s of SUBSCRIBE calls to the master per second. I turned on debug 
logging for the client and the server, and it pointed to an issue with the 
{{doReliableRegistration}} method in sched.cpp. Here's the logs:

{code}
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.099815 
13397 process.cpp:1383] libprocess is initialized on 127.0.1.1:60957 with 8 
worker threads
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.118237 
13397 logging.cpp:199] Logging to STDERR
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.128921 
13416 sched.cpp:232] Version: 1.4.0
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151785 
13791 group.cpp:341] Group process (zookeeper-group(1)@127.0.1.1:60957) 
connected to ZooKeeper
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151823 
13791 group.cpp:831] Syncing group operations: queue size (joins, cancels, 
datas) = (0, 0, 0)
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151837 
13791 group.cpp:419] Trying to create path '/mesos' in ZooKeeper
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152586 
13791 group.cpp:758] Found non-sequence node 'log_replicas' at '/mesos' in 
ZooKeeper
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152662 
13791 detector.cpp:152] Detected a new leader: (id='0')
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152762 
13791 group.cpp:700] Trying to get '/mesos/json.info_00' in ZooKeeper
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157148 
13791 zookeeper.cpp:262] A new leading master (UPID=master@172.16.10.95:32856) 
is detected
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157347 
13787 sched.cpp:336] New master detected at master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157557 
13787 sched.cpp:352] No credentials provided. Attempting to register without 
authentication
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157565 
13787 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157635 
13787 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.158979 
13785 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159029 
13785 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159265 
13790 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159303 
13790 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159479 
13786 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159521 
13786 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159622 
13788 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159658 
13788 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159749 
13789 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159785 
13789 sched.cpp:869] Will retry registration in 0ns if necessary
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159878 
13792 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856
WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159916 
13792 sched.cpp:869] Will retry registration in 0ns if necessary
{code}

In Marathon, when we are running our tests, we set the failoverTimeout to 0 in 
order to cause the Mesos master to immediately 

[jira] [Created] (MESOS-8150) Attributes documentation indicates that sets are valid attribute types; code disagrees

2017-10-30 Thread Tim Harper (JIRA)
Tim Harper created MESOS-8150:
-

 Summary: Attributes documentation indicates that sets are valid 
attribute types; code disagrees
 Key: MESOS-8150
 URL: https://issues.apache.org/jira/browse/MESOS-8150
 Project: Mesos
  Issue Type: Documentation
Reporter: Tim Harper
Priority: Minor


On the [Mesos Attributes & 
Resources|http://mesos.apache.org/documentation/latest/attributes-resources/] 
page, it says:

{quote}The types of values that are supported by Attributes and Resources in 
Mesos are scalar, ranges, sets and text.{quote}

However, the code for 1.4.x disagrees. Sets are not supported for attribute 
types:

https://github.com/apache/mesos/blob/1.4.0/src/common/attributes.cpp#L171

https://github.com/apache/mesos/blob/1.4.0/src/common/attributes.cpp#L115-L128




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8148) Enforce text attribute value specification for zone and region values

2017-10-30 Thread Tim Harper (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Harper updated MESOS-8148:
--
Description: 
Mesos has a specification for characters allowed by attribute values:

http://mesos.apache.org/documentation/latest/attributes-resources/

The specification is as follows:

{code}
scalar : floatValue

floatValue : ( intValue ( "." intValue )? ) | ...

intValue : [0-9]+

range : "[" rangeValue ( "," rangeValue )* "]"

rangeValue : scalar "-" scalar

set : "{" text ( "," text )* "}"

text : [a-zA-Z0-9_/.-]
{code}

Marathon is [implementing IN and IS 
constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
 and includes plans to support further attribute types as it makes sense to do 
so (IE {{a,b IS b,a}}, {{5 IN [0-10]}}). In order to do 
this, Marathon has adopted the Mesos attribute value specification and will 
enforce it in the validation layer. As an example, it will be possible to write 
things like:

{code:java}
"constraints": [
  ["attribute", "IN", "{value-a,value-b,value-c}"]
]
{code}

Additionally, Marathon allows one to specify constraints on non-attribute 
properties, such as region, hostname, or zone. If somebody specified a zone 
value with a comma, then the user would not be able to use the Mesos set value 
type specification to describe a set of zones in which an app should be 
deployed, and, as a consequence, would result in additional complexity (IE: 
Marathon would need to implement an escaping mechanism for this case).

Ideally, the character space is confined to begin with. It the text type 
specification is sufficient, then, it seems simpler to re-use it rather than 
create another one.

  was:
Mesos has a specification for characters allowed by attribute values:

http://mesos.apache.org/documentation/latest/attributes-resources/

Marathon is [implementing IN and IS 
constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
 and includes plans to support further attribute types as it makes sense to do 
so (IE {{a,b IS b,a}}, {{5 IN [0-10]}}). In order to do 
this, Marathon has adopted the Mesos attribute value specification and will 
enforce it in the validation layer. As an example, it will be possible to write 
things like:

{code:java}
"constraints": [
  ["attribute", "IN", "{value-a,value-b,value-c}"]
]
{code}

Additionally, Marathon allows one to specify constraints on non-attribute 
properties, such as region, hostname, or zone. If somebody specified a zone 
value with a comma, then the user would not be able to use the Mesos set value 
type specification to describe a set of zones in which an app should be 
deployed, and, as a consequence, would result in additional complexity (IE: 
Marathon would need to implement an escaping mechanism for this case).

Ideally, the character space is confined to begin with. It the text type 
specification is sufficient, then, it seems simpler to re-use it rather than 
create another one.


> Enforce text attribute value specification for zone and region values
> -
>
> Key: MESOS-8148
> URL: https://issues.apache.org/jira/browse/MESOS-8148
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Tim Harper
>
> Mesos has a specification for characters allowed by attribute values:
> http://mesos.apache.org/documentation/latest/attributes-resources/
> The specification is as follows:
> {code}
> scalar : floatValue
> floatValue : ( intValue ( "." intValue )? ) | ...
> intValue : [0-9]+
> range : "[" rangeValue ( "," rangeValue )* "]"
> rangeValue : scalar "-" scalar
> set : "{" text ( "," text )* "}"
> text : [a-zA-Z0-9_/.-]
> {code}
> Marathon is [implementing IN and IS 
> constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
>  and includes plans to support further attribute types as it makes sense to 
> do so (IE {{a,b IS b,a}}, {{5 IN [0-10]}}). In order 
> to do this, Marathon has adopted the Mesos attribute value specification and 
> will enforce it in the validation layer. As an example, it will be possible 
> to write things like:
> {code:java}
> "constraints": [
>   ["attribute", "IN", "{value-a,value-b,value-c}"]
> ]
> {code}
> Additionally, Marathon allows one to specify constraints on non-attribute 
> properties, such as region, hostname, or zone. If somebody specified a zone 
> value with a comma, then the user would not be able to use the Mesos set 
> value type specification to describe a set of zones in which an app should be 
> deployed, and, as a consequence, would result in additional complexity (IE: 
> Marathon would need to implement an 

[jira] [Updated] (MESOS-8148) Enforce text attribute value specification for zone and region values

2017-10-30 Thread Tim Harper (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Harper updated MESOS-8148:
--
Description: 
Mesos has a specification for characters allowed by attribute values:

http://mesos.apache.org/documentation/latest/attributes-resources/

Marathon is [implementing IN and IS 
constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
 and includes plans to support further attribute types as it makes sense to do 
so (IE {{a,b IS b,a}}, {{5 IN [0-10]}}). In order to do 
this, Marathon has adopted the Mesos attribute value specification and will 
enforce it in the validation layer. As an example, it will be possible to write 
things like:

{code:java}
"constraints": [
  ["attribute", "IN", "{value-a,value-b,value-c}"]
]
{code}

Additionally, Marathon allows one to specify constraints on non-attribute 
properties, such as region, hostname, or zone. If somebody specified a zone 
value with a comma, then the user would not be able to use the Mesos set value 
type specification to describe a set of zones in which an app should be 
deployed, and, as a consequence, would result in additional complexity (IE: 
Marathon would need to implement an escaping mechanism for this case).

Ideally, the character space is confined to begin with. It the text type 
specification is sufficient, then, it seems simpler to re-use it rather than 
create another one.

  was:
Mesos has a specification for characters allowed by attribute values:

http://mesos.apache.org/documentation/latest/attributes-resources/

Marathon is [implementing IN and IS 
constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
 and includes plans to support further attribute types as it makes sense to do 
so (IE {{a,b IS b,a}}, {{5 IN [0-10]}}). In order to do 
this, Marathon has adopted the Mesos attribute value specification and will 
enforce it in the validation layer. As an example, it will be possible to write 
things like:

{code:java}
"constraints": [
  ["attribute", "IN", "{value-a,value-b,value-c}"]
]
{code}

Additionally, Marathon allows one to specify constraints on non-attribute 
properties, such as region, hostname, or zone. If somebody specified a zone 
value with a comma, then the user would not be able to uses the Mesos set value 
type specification to describe a set of zones in which an app would be 
deployed, and, would result in additional complexity (IE: Marathon would need 
to implement an escaping mechanism for this case).

Ideally, the character space is confined to begin with. It the text type 
specification is sufficient, then, it seems simpler to re-use it rather than 
create another one.


> Enforce text attribute value specification for zone and region values
> -
>
> Key: MESOS-8148
> URL: https://issues.apache.org/jira/browse/MESOS-8148
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Tim Harper
>
> Mesos has a specification for characters allowed by attribute values:
> http://mesos.apache.org/documentation/latest/attributes-resources/
> Marathon is [implementing IN and IS 
> constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
>  and includes plans to support further attribute types as it makes sense to 
> do so (IE {{a,b IS b,a}}, {{5 IN [0-10]}}). In order 
> to do this, Marathon has adopted the Mesos attribute value specification and 
> will enforce it in the validation layer. As an example, it will be possible 
> to write things like:
> {code:java}
> "constraints": [
>   ["attribute", "IN", "{value-a,value-b,value-c}"]
> ]
> {code}
> Additionally, Marathon allows one to specify constraints on non-attribute 
> properties, such as region, hostname, or zone. If somebody specified a zone 
> value with a comma, then the user would not be able to use the Mesos set 
> value type specification to describe a set of zones in which an app should be 
> deployed, and, as a consequence, would result in additional complexity (IE: 
> Marathon would need to implement an escaping mechanism for this case).
> Ideally, the character space is confined to begin with. It the text type 
> specification is sufficient, then, it seems simpler to re-use it rather than 
> create another one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8148) Enforce text attribute value specification for zone and region values

2017-10-30 Thread Tim Harper (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Harper updated MESOS-8148:
--
Description: 
Mesos has a specification for characters allowed by attribute values:

http://mesos.apache.org/documentation/latest/attributes-resources/

Marathon is [implementing IN and IS 
constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
 and includes plans to support further attribute types as it makes sense to do 
so (IE {{a,b IS b,a}}, {{5 IN [0-10]}}). In order to do 
this, Marathon has adopted the Mesos attribute value specification and will 
enforce it in the validation layer. As an example, it will be possible to write 
things like:

{code:java}
"constraints": [
  ["attribute", "IN", "{value-a,value-b,value-c}"]
]
{code}

Additionally, Marathon allows one to specify constraints on non-attribute 
properties, such as region, hostname, or zone. If somebody specified a zone 
value with a comma, then the user would not be able to uses the Mesos set value 
type specification to describe a set of zones in which an app would be 
deployed, and, would result in additional complexity (IE: Marathon would need 
to implement an escaping mechanism for this case).

Ideally, the character space is confined to begin with. It the text type 
specification is sufficient, then, it seems simpler to re-use it rather than 
create another one.

  was:
Mesos has a specification for characters allowed by attribute values:

http://mesos.apache.org/documentation/latest/attributes-resources/

Marathon is [implementing IN and IS 
constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
 and includes plans to support further attribute types as it makes sense to do 
so (IE {{\{a,b\} IS \{b,a\} }}, 5 IN [0-10]). In order to do this, Marathon has 
adopted the Mesos attribute value specification and will enforce it in the 
validation layer. As an example, it will be possible to write things like:

{code:java}
"constraints": [
  ["attribute", "IN", "{value-a,value-b,value-c}"]
]
{code}

Additionally, Marathon allows one to specify constraints on non-attribute 
properties, such as region, hostname, or zone. If somebody specified a zone 
value with a comma, then the user would not be able to uses the Mesos set value 
type specification to describe a set of zones in which an app would be 
deployed, and, would result in additional complexity (IE: Marathon would need 
to implement an escaping mechanism for this case).

Ideally, the character space is confined to begin with. It the text type 
specification is sufficient, then, it seems simpler to re-use it rather than 
create another one.


> Enforce text attribute value specification for zone and region values
> -
>
> Key: MESOS-8148
> URL: https://issues.apache.org/jira/browse/MESOS-8148
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Tim Harper
>
> Mesos has a specification for characters allowed by attribute values:
> http://mesos.apache.org/documentation/latest/attributes-resources/
> Marathon is [implementing IN and IS 
> constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
>  and includes plans to support further attribute types as it makes sense to 
> do so (IE {{a,b IS b,a}}, {{5 IN [0-10]}}). In order 
> to do this, Marathon has adopted the Mesos attribute value specification and 
> will enforce it in the validation layer. As an example, it will be possible 
> to write things like:
> {code:java}
> "constraints": [
>   ["attribute", "IN", "{value-a,value-b,value-c}"]
> ]
> {code}
> Additionally, Marathon allows one to specify constraints on non-attribute 
> properties, such as region, hostname, or zone. If somebody specified a zone 
> value with a comma, then the user would not be able to uses the Mesos set 
> value type specification to describe a set of zones in which an app would be 
> deployed, and, would result in additional complexity (IE: Marathon would need 
> to implement an escaping mechanism for this case).
> Ideally, the character space is confined to begin with. It the text type 
> specification is sufficient, then, it seems simpler to re-use it rather than 
> create another one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8148) Enforce text attribute value specification for zone and region values

2017-10-30 Thread Tim Harper (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Harper updated MESOS-8148:
--
Description: 
Mesos has a specification for characters allowed by attribute values:

http://mesos.apache.org/documentation/latest/attributes-resources/

Marathon is [implementing IN and IS 
constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
 and includes plans to support further attribute types as it makes sense to do 
so (IE {{\{a,b\} IS \{b,a\} }}, 5 IN [0-10]). In order to do this, Marathon has 
adopted the Mesos attribute value specification and will enforce it in the 
validation layer. As an example, it will be possible to write things like:

{code:java}
"constraints": [
  ["attribute", "IN", "{value-a,value-b,value-c}"]
]
{code}

Additionally, Marathon allows one to specify constraints on non-attribute 
properties, such as region, hostname, or zone. If somebody specified a zone 
value with a comma, then the user would not be able to uses the Mesos set value 
type specification to describe a set of zones in which an app would be 
deployed, and, would result in additional complexity (IE: Marathon would need 
to implement an escaping mechanism for this case).

Ideally, the character space is confined to begin with. It the text type 
specification is sufficient, then, it seems simpler to re-use it rather than 
create another one.

  was:
Mesos has a specification for characters allowed by attribute values:

http://mesos.apache.org/documentation/latest/attributes-resources/

Marathon is [implementing IN and IS 
constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
 and includes plans to support further attribute types as it makes sense to do 
so (IE {a,b} IS {b,a}, 5 IN [0-10]). In order to do this, Marathon has adopted 
the Mesos attribute value specification and will enforce it in the validation 
layer. As an example, it will be possible to write things like:

{code:java}
"constraints": [
  ["attribute", "IN", "{value-a,value-b,value-c}"]
]
{code}

Additionally, Marathon allows one to specify constraints on non-attribute 
properties, such as region, hostname, or zone. If somebody specified a zone 
value with a comma, then the user would not be able to uses the Mesos set value 
type specification to describe a set of zones in which an app would be 
deployed, and, would result in additional complexity (IE: Marathon would need 
to implement an escaping mechanism for this case).

Ideally, the character space is confined to begin with. It the text type 
specification is sufficient, then, it seems simpler to re-use it rather than 
create another one.


> Enforce text attribute value specification for zone and region values
> -
>
> Key: MESOS-8148
> URL: https://issues.apache.org/jira/browse/MESOS-8148
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Tim Harper
>
> Mesos has a specification for characters allowed by attribute values:
> http://mesos.apache.org/documentation/latest/attributes-resources/
> Marathon is [implementing IN and IS 
> constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
>  and includes plans to support further attribute types as it makes sense to 
> do so (IE {{\{a,b\} IS \{b,a\} }}, 5 IN [0-10]). In order to do this, 
> Marathon has adopted the Mesos attribute value specification and will enforce 
> it in the validation layer. As an example, it will be possible to write 
> things like:
> {code:java}
> "constraints": [
>   ["attribute", "IN", "{value-a,value-b,value-c}"]
> ]
> {code}
> Additionally, Marathon allows one to specify constraints on non-attribute 
> properties, such as region, hostname, or zone. If somebody specified a zone 
> value with a comma, then the user would not be able to uses the Mesos set 
> value type specification to describe a set of zones in which an app would be 
> deployed, and, would result in additional complexity (IE: Marathon would need 
> to implement an escaping mechanism for this case).
> Ideally, the character space is confined to begin with. It the text type 
> specification is sufficient, then, it seems simpler to re-use it rather than 
> create another one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8148) Enforce text attribute value specification for zone and region values

2017-10-30 Thread Tim Harper (JIRA)
Tim Harper created MESOS-8148:
-

 Summary: Enforce text attribute value specification for zone and 
region values
 Key: MESOS-8148
 URL: https://issues.apache.org/jira/browse/MESOS-8148
 Project: Mesos
  Issue Type: Improvement
Reporter: Tim Harper


Mesos has a specification for characters allowed by attribute values:

http://mesos.apache.org/documentation/latest/attributes-resources/

Marathon is [implementing IN and IS 
constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub],
 and includes plans to support further attribute types as it makes sense to do 
so (IE {a,b} IS {b,a}, 5 IN [0-10]). In order to do this, Marathon has adopted 
the Mesos attribute value specification and will enforce it in the validation 
layer. As an example, it will be possible to write things like:

{code:java}
"constraints": [
  ["attribute", "IN", "{value-a,value-b,value-c}"]
]
{code}

Additionally, Marathon allows one to specify constraints on non-attribute 
properties, such as region, hostname, or zone. If somebody specified a zone 
value with a comma, then the user would not be able to uses the Mesos set value 
type specification to describe a set of zones in which an app would be 
deployed, and, would result in additional complexity (IE: Marathon would need 
to implement an escaping mechanism for this case).

Ideally, the character space is confined to begin with. It the text type 
specification is sufficient, then, it seems simpler to re-use it rather than 
create another one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Issue Comment Deleted] (MESOS-5368) Consider introducing persistent agent ID

2017-07-26 Thread Tim Harper (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Harper updated MESOS-5368:
--
Comment: was deleted

(was: In a chat with Greg Mann, I understand a patch for this has landed in 
Mesos 1.4.x, which I believe is commit 
{{cd6495e677ec74fd3f40b0dbf3b9654475308575}}

As such, it seems this ticket should be updated to have a fix version of 1.4.0, 
and be marked as complete.)

> Consider introducing persistent agent ID
> 
>
> Key: MESOS-5368
> URL: https://issues.apache.org/jira/browse/MESOS-5368
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.2.1, 1.3.0
>Reporter: Neil Conway
>  Labels: mesosphere
>
> Currently, agent IDs identify a single "session" by an agent: that is, an 
> agent receives an agent ID when it registers with the master; it reuses that 
> agent ID if it disconnects and successfully reregisters; if the agent shuts 
> down and restarts, it registers anew and receives a new agent ID.
> It would be convenient to have a "persistent agent ID" that remains the same 
> for the duration of a given agent {{work_dir}}. This would mean that a given 
> persistent volume would not migrate between different persistent agent IDs 
> over time, for example (see MESOS-4894). If we supported permanently removing 
> an agent from the cluster (i.e., the {{work_dir}} and any volumes used by the 
> agent will never be reused), we could use the persistent agent ID to report 
> which agent has been removed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Issue Comment Deleted] (MESOS-5368) Consider introducing persistent agent ID

2017-07-26 Thread Tim Harper (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Harper updated MESOS-5368:
--
Comment: was deleted

(was: One particular pain of not having this feature is it takes Mesos longer 
than necessary to recognize that a task is definitely gone. Were we to have 
persistent agent IDs, then, when the agent re-registered, it could tell Mesos, 
"I was asked to launch that task, and yes, it is definitely dead", where-as 
right now it is left in the unreachable state until Mesos gives up on the 
agent.)

> Consider introducing persistent agent ID
> 
>
> Key: MESOS-5368
> URL: https://issues.apache.org/jira/browse/MESOS-5368
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.2.1, 1.3.0
>Reporter: Neil Conway
>  Labels: mesosphere
>
> Currently, agent IDs identify a single "session" by an agent: that is, an 
> agent receives an agent ID when it registers with the master; it reuses that 
> agent ID if it disconnects and successfully reregisters; if the agent shuts 
> down and restarts, it registers anew and receives a new agent ID.
> It would be convenient to have a "persistent agent ID" that remains the same 
> for the duration of a given agent {{work_dir}}. This would mean that a given 
> persistent volume would not migrate between different persistent agent IDs 
> over time, for example (see MESOS-4894). If we supported permanently removing 
> an agent from the cluster (i.e., the {{work_dir}} and any volumes used by the 
> agent will never be reused), we could use the persistent agent ID to report 
> which agent has been removed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-5368) Consider introducing persistent agent ID

2017-07-26 Thread Tim Harper (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102095#comment-16102095
 ] 

Tim Harper commented on MESOS-5368:
---

With a second reading, this ticket seems like a duplicate of MESOS-6223, and 
seems like it could be closed as such. MESOS-6223 has an up-to-date status.

> Consider introducing persistent agent ID
> 
>
> Key: MESOS-5368
> URL: https://issues.apache.org/jira/browse/MESOS-5368
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.2.1, 1.3.0
>Reporter: Neil Conway
>  Labels: mesosphere
>
> Currently, agent IDs identify a single "session" by an agent: that is, an 
> agent receives an agent ID when it registers with the master; it reuses that 
> agent ID if it disconnects and successfully reregisters; if the agent shuts 
> down and restarts, it registers anew and receives a new agent ID.
> It would be convenient to have a "persistent agent ID" that remains the same 
> for the duration of a given agent {{work_dir}}. This would mean that a given 
> persistent volume would not migrate between different persistent agent IDs 
> over time, for example (see MESOS-4894). If we supported permanently removing 
> an agent from the cluster (i.e., the {{work_dir}} and any volumes used by the 
> agent will never be reused), we could use the persistent agent ID to report 
> which agent has been removed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-5368) Consider introducing persistent agent ID

2017-07-26 Thread Tim Harper (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102054#comment-16102054
 ] 

Tim Harper commented on MESOS-5368:
---

In a chat with Greg Mann, I understand a patch for this has landed in Mesos 
1.4.x, which I believe is commit {{cd6495e677ec74fd3f40b0dbf3b9654475308575}}

As such, it seems this ticket should be updated to have a fix version of 1.4.0, 
and be marked as complete.

> Consider introducing persistent agent ID
> 
>
> Key: MESOS-5368
> URL: https://issues.apache.org/jira/browse/MESOS-5368
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.2.1, 1.3.0
>Reporter: Neil Conway
>  Labels: mesosphere
>
> Currently, agent IDs identify a single "session" by an agent: that is, an 
> agent receives an agent ID when it registers with the master; it reuses that 
> agent ID if it disconnects and successfully reregisters; if the agent shuts 
> down and restarts, it registers anew and receives a new agent ID.
> It would be convenient to have a "persistent agent ID" that remains the same 
> for the duration of a given agent {{work_dir}}. This would mean that a given 
> persistent volume would not migrate between different persistent agent IDs 
> over time, for example (see MESOS-4894). If we supported permanently removing 
> an agent from the cluster (i.e., the {{work_dir}} and any volumes used by the 
> agent will never be reused), we could use the persistent agent ID to report 
> which agent has been removed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-5368) Consider introducing persistent agent ID

2017-07-26 Thread Tim Harper (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102011#comment-16102011
 ] 

Tim Harper edited comment on MESOS-5368 at 7/26/17 5:53 PM:


One particular pain of not having this feature is it takes Mesos longer than 
necessary to recognize that a task is definitely gone. Were we to have 
persistent agent IDs, then, when the agent re-registered, it could tell Mesos, 
"I was asked to launch that task, and yes, it is definitely dead", where-as 
right now it is left in the unreachable state until Mesos gives up on the agent.


was (Author: timcharper):
One particular pain of not having this feature is it takes Mesos longer than 
necessary to recognize that a task is definitely gone. Were we to have 
persistent agent IDs, then, when the agent re-registered, it could tell Mesos, 
"yes, that task is definitely dead", where-as right now it is left in the 
unreachable state until Mesos gives up on the agent.

> Consider introducing persistent agent ID
> 
>
> Key: MESOS-5368
> URL: https://issues.apache.org/jira/browse/MESOS-5368
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.2.1, 1.3.0
>Reporter: Neil Conway
>  Labels: mesosphere
>
> Currently, agent IDs identify a single "session" by an agent: that is, an 
> agent receives an agent ID when it registers with the master; it reuses that 
> agent ID if it disconnects and successfully reregisters; if the agent shuts 
> down and restarts, it registers anew and receives a new agent ID.
> It would be convenient to have a "persistent agent ID" that remains the same 
> for the duration of a given agent {{work_dir}}. This would mean that a given 
> persistent volume would not migrate between different persistent agent IDs 
> over time, for example (see MESOS-4894). If we supported permanently removing 
> an agent from the cluster (i.e., the {{work_dir}} and any volumes used by the 
> agent will never be reused), we could use the persistent agent ID to report 
> which agent has been removed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-5368) Consider introducing persistent agent ID

2017-07-26 Thread Tim Harper (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102011#comment-16102011
 ] 

Tim Harper edited comment on MESOS-5368 at 7/26/17 5:52 PM:


One particular pain of not having this feature is it takes Mesos longer than 
necessary to recognize that a task is definitely gone. Were we to have 
persistent agent IDs, then, when the agent re-registered, it could tell Mesos, 
"yes, that task is definitely dead", where-as right now it is left in the 
unreachable state until Mesos gives up on the agent.


was (Author: timcharper):
One particular pain of not having this feature is it takes Mesos longer than 
necessary to recognize that a task is definitely gone. Were we to have 
persistent agent IDs, then, when the agent re-registered, it could tell Mesos, 
"yes, that task is definitely dead", where-as right now it is left perpetually 
in the unreachable state until Mesos gives up on the agent.

> Consider introducing persistent agent ID
> 
>
> Key: MESOS-5368
> URL: https://issues.apache.org/jira/browse/MESOS-5368
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.2.1, 1.3.0
>Reporter: Neil Conway
>  Labels: mesosphere
>
> Currently, agent IDs identify a single "session" by an agent: that is, an 
> agent receives an agent ID when it registers with the master; it reuses that 
> agent ID if it disconnects and successfully reregisters; if the agent shuts 
> down and restarts, it registers anew and receives a new agent ID.
> It would be convenient to have a "persistent agent ID" that remains the same 
> for the duration of a given agent {{work_dir}}. This would mean that a given 
> persistent volume would not migrate between different persistent agent IDs 
> over time, for example (see MESOS-4894). If we supported permanently removing 
> an agent from the cluster (i.e., the {{work_dir}} and any volumes used by the 
> agent will never be reused), we could use the persistent agent ID to report 
> which agent has been removed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-5368) Consider introducing persistent agent ID

2017-07-26 Thread Tim Harper (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Harper updated MESOS-5368:
--
Affects Version/s: 1.3.0

> Consider introducing persistent agent ID
> 
>
> Key: MESOS-5368
> URL: https://issues.apache.org/jira/browse/MESOS-5368
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.2.1, 1.3.0
>Reporter: Neil Conway
>  Labels: mesosphere
>
> Currently, agent IDs identify a single "session" by an agent: that is, an 
> agent receives an agent ID when it registers with the master; it reuses that 
> agent ID if it disconnects and successfully reregisters; if the agent shuts 
> down and restarts, it registers anew and receives a new agent ID.
> It would be convenient to have a "persistent agent ID" that remains the same 
> for the duration of a given agent {{work_dir}}. This would mean that a given 
> persistent volume would not migrate between different persistent agent IDs 
> over time, for example (see MESOS-4894). If we supported permanently removing 
> an agent from the cluster (i.e., the {{work_dir}} and any volumes used by the 
> agent will never be reused), we could use the persistent agent ID to report 
> which agent has been removed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-5368) Consider introducing persistent agent ID

2017-07-26 Thread Tim Harper (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Harper updated MESOS-5368:
--
Affects Version/s: 1.2.1

> Consider introducing persistent agent ID
> 
>
> Key: MESOS-5368
> URL: https://issues.apache.org/jira/browse/MESOS-5368
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.2.1, 1.3.0
>Reporter: Neil Conway
>  Labels: mesosphere
>
> Currently, agent IDs identify a single "session" by an agent: that is, an 
> agent receives an agent ID when it registers with the master; it reuses that 
> agent ID if it disconnects and successfully reregisters; if the agent shuts 
> down and restarts, it registers anew and receives a new agent ID.
> It would be convenient to have a "persistent agent ID" that remains the same 
> for the duration of a given agent {{work_dir}}. This would mean that a given 
> persistent volume would not migrate between different persistent agent IDs 
> over time, for example (see MESOS-4894). If we supported permanently removing 
> an agent from the cluster (i.e., the {{work_dir}} and any volumes used by the 
> agent will never be reused), we could use the persistent agent ID to report 
> which agent has been removed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-5368) Consider introducing persistent agent ID

2017-07-26 Thread Tim Harper (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102011#comment-16102011
 ] 

Tim Harper commented on MESOS-5368:
---

One particular pain of not having this feature is it takes Mesos longer than 
necessary to recognize that a task is definitely gone. Were we to have 
persistent agent IDs, then, when the agent re-registered, it could tell Mesos, 
"yes, that task is definitely dead", where-as right now it is left perpetually 
in the unreachable state until Mesos gives up on the agent.

> Consider introducing persistent agent ID
> 
>
> Key: MESOS-5368
> URL: https://issues.apache.org/jira/browse/MESOS-5368
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Neil Conway
>  Labels: mesosphere
>
> Currently, agent IDs identify a single "session" by an agent: that is, an 
> agent receives an agent ID when it registers with the master; it reuses that 
> agent ID if it disconnects and successfully reregisters; if the agent shuts 
> down and restarts, it registers anew and receives a new agent ID.
> It would be convenient to have a "persistent agent ID" that remains the same 
> for the duration of a given agent {{work_dir}}. This would mean that a given 
> persistent volume would not migrate between different persistent agent IDs 
> over time, for example (see MESOS-4894). If we supported permanently removing 
> an agent from the cluster (i.e., the {{work_dir}} and any volumes used by the 
> agent will never be reused), we could use the persistent agent ID to report 
> which agent has been removed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7374) Running DOCKER images in Mesos Container Runtime without `linux/filesystem` isolation enabled renders host unusable

2017-04-09 Thread Tim Harper (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Harper updated MESOS-7374:
--
Description: 
If I run the pod below (using Marathon 1.4.2) against a mesos agent that has 
the flags (also below), then the overlay filesystem replaces the system root 
mount, effectively rendering the host unusable until reboot.

flags:

- {{--containerizers mesos,docker}}
- {{--image_providers APPC,DOCKER}}
- {{--isolation cgroups/cpu,cgroups/mem,docker/runtime}}

pod definition for Marathon:
{code:java}
{
  "id": "/simplepod",
  "scaling": { "kind": "fixed", "instances": 1 },
  "containers": [
{
  "name": "sleep1",
  "exec": { "command": { "shell": "sleep 1000" } },
  "resources": { "cpus": 0.1, "mem": 32 },
  "image": {
"id": "alpine",
"kind": "DOCKER"
  }
}
  ],
  "networks": [ {"mode": "host"} ]
}
{code}

Mesos should probably check for this and avoid replacing the system root mount 
point at startup or launch time.

  was:
If I run the pod below (using Marathon 1.4.2) against a mesos agent that has 
the flags (also below), then the overlay filesystem replaces the system root 
mount, effectively rendering the host unusable until reboot.

flags:

- {{--containerizers mesos,docker}}
- {{--image_providers APPC,DOCKER}}
- {{--isolation cgroups/cpu,cgroups/mem,docker/runtime}}

pod definition:
{code:java}
{
  "id": "/simplepod",
  "scaling": { "kind": "fixed", "instances": 1 },
  "containers": [
{
  "name": "sleep1",
  "exec": { "command": { "shell": "sleep 1000" } },
  "resources": { "cpus": 0.1, "mem": 32 },
  "image": {
"id": "alpine",
"kind": "DOCKER"
  }
}
  ],
  "networks": [ {"mode": "host"} ]
}
{code}

Mesos should probably check for this at startup or launch time.


> Running DOCKER images in Mesos Container Runtime without `linux/filesystem` 
> isolation enabled renders host unusable
> ---
>
> Key: MESOS-7374
> URL: https://issues.apache.org/jira/browse/MESOS-7374
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Affects Versions: 1.2.0
>Reporter: Tim Harper
>Priority: Minor
>
> If I run the pod below (using Marathon 1.4.2) against a mesos agent that has 
> the flags (also below), then the overlay filesystem replaces the system root 
> mount, effectively rendering the host unusable until reboot.
> flags:
> - {{--containerizers mesos,docker}}
> - {{--image_providers APPC,DOCKER}}
> - {{--isolation cgroups/cpu,cgroups/mem,docker/runtime}}
> pod definition for Marathon:
> {code:java}
> {
>   "id": "/simplepod",
>   "scaling": { "kind": "fixed", "instances": 1 },
>   "containers": [
> {
>   "name": "sleep1",
>   "exec": { "command": { "shell": "sleep 1000" } },
>   "resources": { "cpus": 0.1, "mem": 32 },
>   "image": {
> "id": "alpine",
> "kind": "DOCKER"
>   }
> }
>   ],
>   "networks": [ {"mode": "host"} ]
> }
> {code}
> Mesos should probably check for this and avoid replacing the system root 
> mount point at startup or launch time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7374) Running DOCKER images in Mesos Container Runtime without `linux/filesystem` isolation enabled renders host unusable

2017-04-09 Thread Tim Harper (JIRA)
Tim Harper created MESOS-7374:
-

 Summary: Running DOCKER images in Mesos Container Runtime without 
`linux/filesystem` isolation enabled renders host unusable
 Key: MESOS-7374
 URL: https://issues.apache.org/jira/browse/MESOS-7374
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Affects Versions: 1.2.0
Reporter: Tim Harper
Priority: Minor


If I run the pod below (using Marathon 1.4.2) against a mesos agent that has 
the flags (also below), then the overlay filesystem replaces the system root 
mount, effectively rendering the host unusable until reboot.

flags:

- {{--containerizers mesos,docker}}
- {{--image_providers APPC,DOCKER}}
- {{--isolation cgroups/cpu,cgroups/mem,docker/runtime}}

pod definition:
{code:java}
{
  "id": "/simplepod",
  "scaling": { "kind": "fixed", "instances": 1 },
  "containers": [
{
  "name": "sleep1",
  "exec": { "command": { "shell": "sleep 1000" } },
  "resources": { "cpus": 0.1, "mem": 32 },
  "image": {
"id": "alpine",
"kind": "DOCKER"
  }
}
  ],
  "networks": [ {"mode": "host"} ]
}
{code}

Mesos should probably check for this at startup or launch time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6223) Allow agents to re-register post a host reboot

2017-03-01 Thread Tim Harper (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890850#comment-15890850
 ] 

Tim Harper commented on MESOS-6223:
---

This should help fix an issue we are seeing with tasks and reserved resources 
in Marathon:

https://github.com/mesosphere/marathon/issues/5284

In Marathon's case, when a residential (has reserved resources) task becomes 
unreachable, due to a the node rebooting, we never receive a terminal state for 
the task even though the host reboots and comes back online. This is because, 
we believe, during reconciliation we send the old agent ID and the task ID, and 
Mesos continually reports  an unknown status. Were the agent in question to 
keep the same agent ID, then an explicit reconciliation of that agent ID + the 
task ID, I think, should be able to result in a status update which signals 
definite terminality.

> Allow agents to re-register post a host reboot
> --
>
> Key: MESOS-6223
> URL: https://issues.apache.org/jira/browse/MESOS-6223
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Megha Sharma
>Assignee: Megha Sharma
>
> Agent does’t recover its state post a host reboot, it registers with the 
> master and gets a new SlaveID. With partition awareness, the agents are now 
> allowed to re-register after they have been marked Unreachable. The executors 
> are anyway terminated on the agent when it reboots so there is no harm in 
> letting the agent keep its SlaveID, re-register with the master and reconcile 
> the lost executors. This is a pre-requisite for supporting 
> persistent/restartable tasks in mesos (MESOS-3545).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6213) Build failure on macOS Sierra: Protobuf atomics deprecated.

2016-11-01 Thread Tim Harper (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15627854#comment-15627854
 ] 

Tim Harper commented on MESOS-6213:
---

As a workaround, you can run {{make CPPFLAGS="-Wno-deprecated-declarations"}}

> Build failure on macOS Sierra: Protobuf atomics deprecated.
> ---
>
> Key: MESOS-6213
> URL: https://issues.apache.org/jira/browse/MESOS-6213
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: Charles Allen
>
> Building on OSX is giving the following error.
> {code}
> In file included from 
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops.h:184:
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops_internals_macosx.h:173:9:
>  error: 'OSAtomicCompareAndSwap64Barrier' is deprecated: first
>   deprecated in macOS 10.12 - Use std::atomic_compare_exchange_strong() 
> from  instead [-Werror,-Wdeprecated-declarations]
> if (OSAtomicCompareAndSwap64Barrier(
> ^
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libkern/OSAtomicDeprecated.h:645:9:
>  note:
>   'OSAtomicCompareAndSwap64Barrier' has been explicitly marked deprecated 
> here
> boolOSAtomicCompareAndSwap64Barrier( int64_t __oldValue, int64_t 
> __newValue,
> ^
> {code}
> Protobuf is not listed as a component so I just set it as {{build}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5824) Include disk source information in stringification

2016-07-11 Thread Tim Harper (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371326#comment-15371326
 ] 

Tim Harper commented on MESOS-5824:
---

https://reviews.apache.org/r/49910/diff/1#index_header

> Include disk source information in stringification
> --
>
> Key: MESOS-5824
> URL: https://issues.apache.org/jira/browse/MESOS-5824
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Affects Versions: 0.28.2
>Reporter: Tim Harper
>Priority: Minor
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> Some frameworks (like kafka_mesos) ignore the Source field when trying to 
> reserve an offered mount or path persistent volume; the resulting error 
> message is bewildering:
> {code:none}
> Task uses more resources
> cpus(*):4; mem(*):4096; ports(*):[31000-31000]; disk(kafka, 
> kafka)[kafka_0:data]:960679
> than available
> cpus(*):32; mem(*):256819;  ports(*):[31000-32000]; disk(kafka, 
> kafka)[kafka_0:data]:960679;   disk(*):240169;
> {code}
> The stringification of disk resources should include source information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5824) Include disk source information in stringification

2016-07-11 Thread Tim Harper (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Harper updated MESOS-5824:
--
Attachment: (was: 0001-Output-disk-resource-source-information.patch)

> Include disk source information in stringification
> --
>
> Key: MESOS-5824
> URL: https://issues.apache.org/jira/browse/MESOS-5824
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Affects Versions: 0.28.2
>Reporter: Tim Harper
>Priority: Minor
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> Some frameworks (like kafka_mesos) ignore the Source field when trying to 
> reserve an offered mount or path persistent volume; the resulting error 
> message is bewildering:
> {code:none}
> Task uses more resources
> cpus(*):4; mem(*):4096; ports(*):[31000-31000]; disk(kafka, 
> kafka)[kafka_0:data]:960679
> than available
> cpus(*):32; mem(*):256819;  ports(*):[31000-32000]; disk(kafka, 
> kafka)[kafka_0:data]:960679;   disk(*):240169;
> {code}
> The stringification of disk resources should include source information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5824) Include disk source information in stringification

2016-07-11 Thread Tim Harper (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Harper reassigned MESOS-5824:
-

Assignee: Tim Harper

> Include disk source information in stringification
> --
>
> Key: MESOS-5824
> URL: https://issues.apache.org/jira/browse/MESOS-5824
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Affects Versions: 0.28.2
>Reporter: Tim Harper
>Assignee: Tim Harper
>Priority: Minor
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> Some frameworks (like kafka_mesos) ignore the Source field when trying to 
> reserve an offered mount or path persistent volume; the resulting error 
> message is bewildering:
> {code:none}
> Task uses more resources
> cpus(*):4; mem(*):4096; ports(*):[31000-31000]; disk(kafka, 
> kafka)[kafka_0:data]:960679
> than available
> cpus(*):32; mem(*):256819;  ports(*):[31000-32000]; disk(kafka, 
> kafka)[kafka_0:data]:960679;   disk(*):240169;
> {code}
> The stringification of disk resources should include source information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5824) Include disk source information in stringification

2016-07-11 Thread Tim Harper (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371312#comment-15371312
 ] 

Tim Harper commented on MESOS-5824:
---

The main motivation for the fix is to clarify an incredibly awful and 
misleading error message that happens when out-of-date mesos-frameworks try and 
allocate a persistent volume.

> Include disk source information in stringification
> --
>
> Key: MESOS-5824
> URL: https://issues.apache.org/jira/browse/MESOS-5824
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Affects Versions: 0.28.2
>Reporter: Tim Harper
>Priority: Minor
>  Labels: mesosphere
> Fix For: 1.1.0
>
> Attachments: 0001-Output-disk-resource-source-information.patch
>
>
> Some frameworks (like kafka_mesos) ignore the Source field when trying to 
> reserve an offered mount or path persistent volume; the resulting error 
> message is bewildering:
> {code:none}
> Task uses more resources
> cpus(*):4; mem(*):4096; ports(*):[31000-31000]; disk(kafka, 
> kafka)[kafka_0:data]:960679
> than available
> cpus(*):32; mem(*):256819;  ports(*):[31000-32000]; disk(kafka, 
> kafka)[kafka_0:data]:960679;   disk(*):240169;
> {code}
> The stringification of disk resources should include source information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5714) Specify soname for libmesos.so to major release

2016-07-10 Thread Tim Harper (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15369655#comment-15369655
 ] 

Tim Harper commented on MESOS-5714:
---

If there is an implicit expectation that the API doesn't change in backwards 
incompatible ways between point releases, then the configuration should be 
updated to reflect that.

> Specify soname for libmesos.so to major release
> ---
>
> Key: MESOS-5714
> URL: https://issues.apache.org/jira/browse/MESOS-5714
> Project: Mesos
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 0.28.2
>Reporter: Tim Harper
>  Labels: build
>
> I've installed mesos using the CentOS 7 package, and am building the 
> Ceph-Mesos framework. I've noticed when running {{ldd}} that {{ceph-mesos}} 
> is depending on too specific of a version of libmesos, which means that the 
> build will be broken on subsequent point releases.
> This seems to be because the {{soname}} for libmesos is set to a very 
> unforgiving value. If {{libmesos-0.28.2}} truly isn't ABI compatible with 
> {{libmesos-0.28.x}}, then I suppose this is set correctly and this ticket 
> should be closed summarily, albeit unfortunate.
> Here is the {{readelf}} output for {{libmesos}}
> {code}
> [root@6e189e07b470 /]# readelf -d /usr/local/lib/libmesos-0.28.2.so
> Dynamic section at offset 0x194cd18 contains 43 entries:
>   TagType Name/Value
>  0x0001 (NEEDED) Shared library: [libcrypt.so.1]
>  0x0001 (NEEDED) Shared library: [libexpat.so.1]
>  0x0001 (NEEDED) Shared library: [libdb-5.3.so]
>  0x0001 (NEEDED) Shared library: [libsasl2.so.3]
>  0x0001 (NEEDED) Shared library: [libsvn_delta-1.so.0]
>  0x0001 (NEEDED) Shared library: [libsvn_subr-1.so.0]
>  0x0001 (NEEDED) Shared library: [libaprutil-1.so.0]
>  0x0001 (NEEDED) Shared library: [libapr-1.so.0]
>  0x0001 (NEEDED) Shared library: [libpthread.so.0]
>  0x0001 (NEEDED) Shared library: [libdl.so.2]
>  0x0001 (NEEDED) Shared library: [libcurl.so.4]
>  0x0001 (NEEDED) Shared library: [libz.so.1]
>  0x0001 (NEEDED) Shared library: [librt.so.1]
>  0x0001 (NEEDED) Shared library: [libstdc++.so.6]
>  0x0001 (NEEDED) Shared library: [libm.so.6]
>  0x0001 (NEEDED) Shared library: [libc.so.6]
>  0x0001 (NEEDED) Shared library: 
> [ld-linux-x86-64.so.2]
>  0x0001 (NEEDED) Shared library: [libgcc_s.so.1]
>  0x000e (SONAME) Library soname: [libmesos-0.28.2.so]
>  0x000f (RPATH)  Library rpath: [/usr/lib/mesos]
>  0x000c (INIT)   0x92a1f0
>  0x000d (FINI)   0x13a8e94
>  0x0019 (INIT_ARRAY) 0x1ae
>  0x001b (INIT_ARRAYSZ)   1712 (bytes)
>  0x001a (FINI_ARRAY) 0x1ae8f38
>  0x001c (FINI_ARRAYSZ)   8 (bytes)
>  0x6ef5 (GNU_HASH)   0x228
>  0x0005 (STRTAB) 0x1b0be8
>  0x0006 (SYMTAB) 0x66a08
>  0x000a (STRSZ)  6130210 (bytes)
>  0x000b (SYMENT) 24 (bytes)
>  0x0003 (PLTGOT) 0x1b66000
>  0x0002 (PLTRELSZ)   387000 (bytes)
>  0x0014 (PLTREL) RELA
>  0x0017 (JMPREL) 0x8cba38
>  0x0007 (RELA)   0x7a5018
>  0x0008 (RELASZ) 1206816 (bytes)
>  0x0009 (RELAENT)24 (bytes)
>  0x6ffe (VERNEED)0x7a4e38
>  0x6fff (VERNEEDNUM) 8
>  0x6ff0 (VERSYM) 0x78960a
>  0x6ff9 (RELACOUNT)  1357
>  0x (NULL)   0x0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5714) Specify soname for libmesos.so to major release

2016-07-10 Thread Tim Harper (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15369654#comment-15369654
 ] 

Tim Harper commented on MESOS-5714:
---

The Ceph Mesos framework is written in C++

> Specify soname for libmesos.so to major release
> ---
>
> Key: MESOS-5714
> URL: https://issues.apache.org/jira/browse/MESOS-5714
> Project: Mesos
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 0.28.2
>Reporter: Tim Harper
>  Labels: build
>
> I've installed mesos using the CentOS 7 package, and am building the 
> Ceph-Mesos framework. I've noticed when running {{ldd}} that {{ceph-mesos}} 
> is depending on too specific of a version of libmesos, which means that the 
> build will be broken on subsequent point releases.
> This seems to be because the {{soname}} for libmesos is set to a very 
> unforgiving value. If {{libmesos-0.28.2}} truly isn't ABI compatible with 
> {{libmesos-0.28.x}}, then I suppose this is set correctly and this ticket 
> should be closed summarily, albeit unfortunate.
> Here is the {{readelf}} output for {{libmesos}}
> {code}
> [root@6e189e07b470 /]# readelf -d /usr/local/lib/libmesos-0.28.2.so
> Dynamic section at offset 0x194cd18 contains 43 entries:
>   TagType Name/Value
>  0x0001 (NEEDED) Shared library: [libcrypt.so.1]
>  0x0001 (NEEDED) Shared library: [libexpat.so.1]
>  0x0001 (NEEDED) Shared library: [libdb-5.3.so]
>  0x0001 (NEEDED) Shared library: [libsasl2.so.3]
>  0x0001 (NEEDED) Shared library: [libsvn_delta-1.so.0]
>  0x0001 (NEEDED) Shared library: [libsvn_subr-1.so.0]
>  0x0001 (NEEDED) Shared library: [libaprutil-1.so.0]
>  0x0001 (NEEDED) Shared library: [libapr-1.so.0]
>  0x0001 (NEEDED) Shared library: [libpthread.so.0]
>  0x0001 (NEEDED) Shared library: [libdl.so.2]
>  0x0001 (NEEDED) Shared library: [libcurl.so.4]
>  0x0001 (NEEDED) Shared library: [libz.so.1]
>  0x0001 (NEEDED) Shared library: [librt.so.1]
>  0x0001 (NEEDED) Shared library: [libstdc++.so.6]
>  0x0001 (NEEDED) Shared library: [libm.so.6]
>  0x0001 (NEEDED) Shared library: [libc.so.6]
>  0x0001 (NEEDED) Shared library: 
> [ld-linux-x86-64.so.2]
>  0x0001 (NEEDED) Shared library: [libgcc_s.so.1]
>  0x000e (SONAME) Library soname: [libmesos-0.28.2.so]
>  0x000f (RPATH)  Library rpath: [/usr/lib/mesos]
>  0x000c (INIT)   0x92a1f0
>  0x000d (FINI)   0x13a8e94
>  0x0019 (INIT_ARRAY) 0x1ae
>  0x001b (INIT_ARRAYSZ)   1712 (bytes)
>  0x001a (FINI_ARRAY) 0x1ae8f38
>  0x001c (FINI_ARRAYSZ)   8 (bytes)
>  0x6ef5 (GNU_HASH)   0x228
>  0x0005 (STRTAB) 0x1b0be8
>  0x0006 (SYMTAB) 0x66a08
>  0x000a (STRSZ)  6130210 (bytes)
>  0x000b (SYMENT) 24 (bytes)
>  0x0003 (PLTGOT) 0x1b66000
>  0x0002 (PLTRELSZ)   387000 (bytes)
>  0x0014 (PLTREL) RELA
>  0x0017 (JMPREL) 0x8cba38
>  0x0007 (RELA)   0x7a5018
>  0x0008 (RELASZ) 1206816 (bytes)
>  0x0009 (RELAENT)24 (bytes)
>  0x6ffe (VERNEED)0x7a4e38
>  0x6fff (VERNEEDNUM) 8
>  0x6ff0 (VERSYM) 0x78960a
>  0x6ff9 (RELACOUNT)  1357
>  0x (NULL)   0x0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5824) Include disk source information in stringification

2016-07-08 Thread Tim Harper (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Harper updated MESOS-5824:
--
Attachment: 0001-Output-disk-resource-source-information.patch

Attached is the patch

> Include disk source information in stringification
> --
>
> Key: MESOS-5824
> URL: https://issues.apache.org/jira/browse/MESOS-5824
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Affects Versions: 0.28.2
>Reporter: Tim Harper
> Fix For: 0.28.3
>
> Attachments: 0001-Output-disk-resource-source-information.patch
>
>
> Some frameworks (like kafka_mesos) ignore the Source field when trying to 
> reserve an offered mount or path persistent volume; the resulting error 
> message is bewildering:
> {code:none}
> Task uses more resources
> cpus(*):4; mem(*):4096; ports(*):[31000-31000]; disk(kafka, 
> kafka)[kafka_0:data]:960679
> than available
> cpus(*):32; mem(*):256819;  ports(*):[31000-32000]; disk(kafka, 
> kafka)[kafka_0:data]:960679;   disk(*):240169;
> {code}
> The stringification of disk resources should include source information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5823) Include disk source information in stringification

2016-07-08 Thread Tim Harper (JIRA)
Tim Harper created MESOS-5823:
-

 Summary: Include disk source information in stringification
 Key: MESOS-5823
 URL: https://issues.apache.org/jira/browse/MESOS-5823
 Project: Mesos
  Issue Type: Improvement
  Components: stout
Affects Versions: 0.28.2
Reporter: Tim Harper
 Fix For: 0.28.3


Some frameworks (like kafka_mesos) ignore the Source field when trying to 
reserve an offered mount or path persistent volume; the resulting error message 
is bewildering:

{code:none}
Task uses more resources
cpus(*):4; mem(*):4096; ports(*):[31000-31000]; disk(kafka, 
kafka)[kafka_0:data]:960679
than available
cpus(*):32; mem(*):256819;  ports(*):[31000-32000]; disk(kafka, 
kafka)[kafka_0:data]:960679;   disk(*):240169;
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5824) Include disk source information in stringification

2016-07-08 Thread Tim Harper (JIRA)
Tim Harper created MESOS-5824:
-

 Summary: Include disk source information in stringification
 Key: MESOS-5824
 URL: https://issues.apache.org/jira/browse/MESOS-5824
 Project: Mesos
  Issue Type: Improvement
  Components: stout
Affects Versions: 0.28.2
Reporter: Tim Harper
 Fix For: 0.28.3


Some frameworks (like kafka_mesos) ignore the Source field when trying to 
reserve an offered mount or path persistent volume; the resulting error message 
is bewildering:

{code:none}
Task uses more resources
cpus(*):4; mem(*):4096; ports(*):[31000-31000]; disk(kafka, 
kafka)[kafka_0:data]:960679
than available
cpus(*):32; mem(*):256819;  ports(*):[31000-32000]; disk(kafka, 
kafka)[kafka_0:data]:960679;   disk(*):240169;
{code}

The stringification of disk resources should include source information.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5714) Specify soname for libmesos.so to major release

2016-06-26 Thread Tim Harper (JIRA)
Tim Harper created MESOS-5714:
-

 Summary: Specify soname for libmesos.so to major release
 Key: MESOS-5714
 URL: https://issues.apache.org/jira/browse/MESOS-5714
 Project: Mesos
  Issue Type: Improvement
  Components: build
Affects Versions: 0.28.2
Reporter: Tim Harper


I've installed mesos using the CentOS 7 package, and am building the Ceph-Mesos 
framework. I've noticed when running {{ldd}} that {{ceph-mesos}} is depending 
on too specific of a version of libmesos, which means that the build will be 
broken on subsequent point releases.

This seems to be because the {{soname}} for libmesos is set to a very 
unforgiving value. If {{libmesos-0.28.2}} truly isn't ABI compatible with 
{{libmesos-0.28.x}}, then I suppose this is set correctly and this ticket 
should be closed summarily, albeit unfortunate.

Here is the {{readelf}} output for {{libmesos}}

{code}
[root@6e189e07b470 /]# readelf -d /usr/local/lib/libmesos-0.28.2.so

Dynamic section at offset 0x194cd18 contains 43 entries:
  TagType Name/Value
 0x0001 (NEEDED) Shared library: [libcrypt.so.1]
 0x0001 (NEEDED) Shared library: [libexpat.so.1]
 0x0001 (NEEDED) Shared library: [libdb-5.3.so]
 0x0001 (NEEDED) Shared library: [libsasl2.so.3]
 0x0001 (NEEDED) Shared library: [libsvn_delta-1.so.0]
 0x0001 (NEEDED) Shared library: [libsvn_subr-1.so.0]
 0x0001 (NEEDED) Shared library: [libaprutil-1.so.0]
 0x0001 (NEEDED) Shared library: [libapr-1.so.0]
 0x0001 (NEEDED) Shared library: [libpthread.so.0]
 0x0001 (NEEDED) Shared library: [libdl.so.2]
 0x0001 (NEEDED) Shared library: [libcurl.so.4]
 0x0001 (NEEDED) Shared library: [libz.so.1]
 0x0001 (NEEDED) Shared library: [librt.so.1]
 0x0001 (NEEDED) Shared library: [libstdc++.so.6]
 0x0001 (NEEDED) Shared library: [libm.so.6]
 0x0001 (NEEDED) Shared library: [libc.so.6]
 0x0001 (NEEDED) Shared library: [ld-linux-x86-64.so.2]
 0x0001 (NEEDED) Shared library: [libgcc_s.so.1]
 0x000e (SONAME) Library soname: [libmesos-0.28.2.so]
 0x000f (RPATH)  Library rpath: [/usr/lib/mesos]
 0x000c (INIT)   0x92a1f0
 0x000d (FINI)   0x13a8e94
 0x0019 (INIT_ARRAY) 0x1ae
 0x001b (INIT_ARRAYSZ)   1712 (bytes)
 0x001a (FINI_ARRAY) 0x1ae8f38
 0x001c (FINI_ARRAYSZ)   8 (bytes)
 0x6ef5 (GNU_HASH)   0x228
 0x0005 (STRTAB) 0x1b0be8
 0x0006 (SYMTAB) 0x66a08
 0x000a (STRSZ)  6130210 (bytes)
 0x000b (SYMENT) 24 (bytes)
 0x0003 (PLTGOT) 0x1b66000
 0x0002 (PLTRELSZ)   387000 (bytes)
 0x0014 (PLTREL) RELA
 0x0017 (JMPREL) 0x8cba38
 0x0007 (RELA)   0x7a5018
 0x0008 (RELASZ) 1206816 (bytes)
 0x0009 (RELAENT)24 (bytes)
 0x6ffe (VERNEED)0x7a4e38
 0x6fff (VERNEEDNUM) 8
 0x6ff0 (VERSYM) 0x78960a
 0x6ff9 (RELACOUNT)  1357
 0x (NULL)   0x0
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)