[jira] [Commented] (MESOS-9653) Allow framework to set `min_alloctable_resources` upon revival.
[ https://issues.apache.org/jira/browse/MESOS-9653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798357#comment-16798357 ] Tim Harper commented on MESOS-9653: --- This sounds like an easy optimization for both the Mesos allocator and frameworks. One thing I wonder about is how this will be exposed as to why frameworks aren't getting any offers. > Allow framework to set `min_alloctable_resources` upon revival. > --- > > Key: MESOS-9653 > URL: https://issues.apache.org/jira/browse/MESOS-9653 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Meng Zhu >Priority: Major > Labels: mesosphere, resource-management > > In MESOS-9523, we added per-framework allocatable resources matcher/filter > where frameworks can specify in their `FrameworkInfo` when subscribing. > Frameworks can have some control over the shape of the resource offer via > this per-framework filters. > Other than set the filters when subscribing, a natural workflow is to set > these filters upon revival. Frameworks can set these filters to the resource > quantity shape of the tasks they want to launch upon revival. If a framework > specifies this in a revive call, all existing filters (accumulated when > declining offers) and the current `min_alloctable_resources` filters will be > cleared and replaced with the new specified `min_alloctable_resources` filter. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9269) Mesos UCR with Docker only Works on Host
[ https://issues.apache.org/jira/browse/MESOS-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631310#comment-16631310 ] Tim Harper commented on MESOS-9269: --- https://jira.mesosphere.com/browse/MARATHON-8448 has some relevant details > Mesos UCR with Docker only Works on Host > > > Key: MESOS-9269 > URL: https://issues.apache.org/jira/browse/MESOS-9269 > Project: Mesos > Issue Type: Bug > Components: agent, docker >Affects Versions: 1.7.0 > Environment: Ubuntu 16.04 > Mesos 1.7.0 > Marathon 1.7.111 >Reporter: z s >Priority: Major > > I'm having an issue setting up the `mesos-cni-port-mapper` to allow remote > connectivity. > When I `curl :` from the machine I get a response but from a > remote machine the `curl` connection timesout. I'm not sure what's wrong with > my route settings. > > */var/lib/mesos/cni/config/mesos-bridge.json* > > {code:java} > { > "name" : "mesos-bridge", > "type" : "mesos-cni-port-mapper", > "excludeDevices" : ["mesos-cni0"], > "chain": "MESOS-BRIDGE-PORT-MAPPER", > "delegate": { > "type": "bridge", > "bridge": "mesos-cni0", > "isGateway": true, > "ipMasq": true, > "ipam": { > "type": "host-local", > "subnet": "10.1.0.0/16", > "routes": [ > { "dst": > "0.0.0.0/0" } > ] > } > } > } > {code} > > {code:java} > $ route -n > Kernel IP routing table > Destination Gateway Genmask Flags Metric Ref Use Iface > 0.0.0.0 172.27.1.1 0.0.0.0 UG 0 0 0 ens3 > 10.1.0.0 0.0.0.0 255.255.0.0 U 0 0 0 mesos-cni0 > 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 > 172.27.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ens3 > {code} > Any suggestions? > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9095) Consider including public protobuf definitions in generated jar
[ https://issues.apache.org/jira/browse/MESOS-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549695#comment-16549695 ] Tim Harper commented on MESOS-9095: --- Thank you for filing this, Benjamin. This will be really helpful. Currently, Marathon does what you say (we copy the Proto sources into our own code base, and check in the generated code). > Consider including public protobuf definitions in generated jar > --- > > Key: MESOS-9095 > URL: https://issues.apache.org/jira/browse/MESOS-9095 > Project: Mesos > Issue Type: Improvement > Components: java api >Reporter: Benjamin Bannier >Priority: Major > > We currently do not package public proto sources alongside other resources in > the jar. This is inconsistent with what we do e.g., for packages or {{install > rules}} on the C++ side. > Frameworks seem to work around this by forking required proto sources into > their own source code, or (slightly less worse) fetching them from > potentially poorly versioned internet resources. Both approaches can lead to > complicate dependencies between used jar and proto sources. > We should include them in the jar we publish, e.g., by declaring them as > {{resources}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-8629) GLIBCXX_3.4.21 required for Mesos Debian Jessie package, not available
Tim Harper created MESOS-8629: - Summary: GLIBCXX_3.4.21 required for Mesos Debian Jessie package, not available Key: MESOS-8629 URL: https://issues.apache.org/jira/browse/MESOS-8629 Project: Mesos Issue Type: Choose from below ... Reporter: Tim Harper h1. Overview When I install the Mesos package for Debian Jessie using the following Dockerfile: {code} FROM debian:jessie RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E56151BF && \ echo "deb http://ftp.debian.org/debian jessie-backports main" >> /etc/apt/sources.list && \ echo "deb http://repos.mesosphere.com/debian jessie-testing main" | tee -a /etc/apt/sources.list.d/mesosphere.list && \ echo "deb http://repos.mesosphere.com/debian jessie main" | tee -a /etc/apt/sources.list.d/mesosphere.list && \ apt-get update && \ # this MUST be done first, unfortunately, because Mesos packages will create folders that should be symlinks and break the python install process apt-get install python2.7-minimal -y && \ apt-get install -y openjdk-8-jdk-headless openjdk-8-jre-headless ca-certificates-java=20161107~bpo8+1 && \ apt-get install --no-install-recommends -y --force-yes mesos=1.5.0-2.0.1 && \ # disable mesos-master; we don't want to start in this image systemctl disable mesos-master && \ systemctl disable mesos-slave && \ # jdk setup /var/lib/dpkg/info/ca-certificates-java.postinst configure && \ ln -svT "/usr/lib/jvm/java-8-openjdk-$(dpkg --print-architecture)" /docker-java-home && \ # jq / curl apt-get install -y procps curl jq=1.5* && \ apt-get clean && \ rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* ENV JAVA_HOME /docker-java-home ENTRYPOINT ["/sbin/init"] {code} Then, Mesos will install. However, when I run the container, it will not launch: {code} docker run --name mesos-agent-local-a --rm --privileged --label marathon-package-test --cap-add SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro --entrypoint /bin/bash -it marathon-package-test:mesos root@c17342b33218:/# /usr/sbin/mesos-master /usr/sbin/mesos-master: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /usr/sbin/mesos-master) /usr/sbin/mesos-master: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /usr/local/lib/libmesos-1.5.0.so) {code} Debian Jessie does not include a new enough version of libstdc++6 with version GLIBCXX_3.4.21 support. After updating to the latest library version: {code} root@c17342b33218:/# strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXX_3.4 GLIBCXX_3.4 GLIBCXX_3.4.1 GLIBCXX_3.4.2 GLIBCXX_3.4.3 GLIBCXX_3.4.4 GLIBCXX_3.4.5 GLIBCXX_3.4.6 GLIBCXX_3.4.7 GLIBCXX_3.4.8 GLIBCXX_3.4.9 GLIBCXX_3.4.10 GLIBCXX_3.4.11 GLIBCXX_3.4.12 GLIBCXX_3.4.13 GLIBCXX_3.4.14 GLIBCXX_3.4.15 GLIBCXX_3.4.16 GLIBCXX_3.4.17 GLIBCXX_3.4.18 GLIBCXX_3.4.19 GLIBCXX_3.4.20 {code} h2. Potential solutions If the Debian Mesos build is going to require {{GLIBCXX_3.4.21}}, then it seems that the package for Debian Jessie should be revoked. However, this does not seem reasonable as Debian Jessie is still supported, and the long-term-support continues through to 2020. Otherwise, the package should be rebuilt with a more lenient requirement of GLIBCXX. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8150) Attributes documentation indicates that sets are valid attribute types; code disagrees
[ https://issues.apache.org/jira/browse/MESOS-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297269#comment-16297269 ] Tim Harper commented on MESOS-8150: --- After re-reading the documentation, it is clear that set attributes are not supported. Closing. > Attributes documentation indicates that sets are valid attribute types; code > disagrees > -- > > Key: MESOS-8150 > URL: https://issues.apache.org/jira/browse/MESOS-8150 > Project: Mesos > Issue Type: Documentation >Reporter: Tim Harper >Priority: Minor > > On the [Mesos Attributes & > Resources|http://mesos.apache.org/documentation/latest/attributes-resources/] > page, it says: > {quote}The types of values that are supported by Attributes and Resources in > Mesos are scalar, ranges, sets and text.{quote} > However, the code for 1.4.x disagrees. Sets are not supported for attribute > types: > https://github.com/apache/mesos/blob/1.4.0/src/common/attributes.cpp#L171 > https://github.com/apache/mesos/blob/1.4.0/src/common/attributes.cpp#L115-L128 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-8150) Attributes documentation indicates that sets are valid attribute types; code disagrees
[ https://issues.apache.org/jira/browse/MESOS-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297265#comment-16297265 ] Tim Harper commented on MESOS-8150: --- Related: https://issues.apache.org/jira/browse/MESOS-8150 > Attributes documentation indicates that sets are valid attribute types; code > disagrees > -- > > Key: MESOS-8150 > URL: https://issues.apache.org/jira/browse/MESOS-8150 > Project: Mesos > Issue Type: Documentation >Reporter: Tim Harper >Priority: Minor > > On the [Mesos Attributes & > Resources|http://mesos.apache.org/documentation/latest/attributes-resources/] > page, it says: > {quote}The types of values that are supported by Attributes and Resources in > Mesos are scalar, ranges, sets and text.{quote} > However, the code for 1.4.x disagrees. Sets are not supported for attribute > types: > https://github.com/apache/mesos/blob/1.4.0/src/common/attributes.cpp#L171 > https://github.com/apache/mesos/blob/1.4.0/src/common/attributes.cpp#L115-L128 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8171) Using a failoverTimeout of 0 with Mesos native scheduler client can result in infinite subscribe loop
[ https://issues.apache.org/jira/browse/MESOS-8171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Harper updated MESOS-8171: -- Description: Over the past year, the Marathon team has been plagued with an issue that hits our CI builds periodically in which the scheduler driver enters a tight loop, sending 10,000s of SUBSCRIBE calls to the master per second. I turned on debug logging for the client and the server, and it pointed to an issue with the {{doReliableRegistration}} method in sched.cpp. Here's the logs: {code} WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.099815 13397 process.cpp:1383] libprocess is initialized on 127.0.1.1:60957 with 8 worker threads WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.118237 13397 logging.cpp:199] Logging to STDERR WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.128921 13416 sched.cpp:232] Version: 1.4.0 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151785 13791 group.cpp:341] Group process (zookeeper-group(1)@127.0.1.1:60957) connected to ZooKeeper WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151823 13791 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151837 13791 group.cpp:419] Trying to create path '/mesos' in ZooKeeper WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152586 13791 group.cpp:758] Found non-sequence node 'log_replicas' at '/mesos' in ZooKeeper WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152662 13791 detector.cpp:152] Detected a new leader: (id='0') WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152762 13791 group.cpp:700] Trying to get '/mesos/json.info_00' in ZooKeeper WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157148 13791 zookeeper.cpp:262] A new leading master (UPID=master@172.16.10.95:32856) is detected WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157347 13787 sched.cpp:336] New master detected at master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157557 13787 sched.cpp:352] No credentials provided. Attempting to register without authentication WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157565 13787 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157635 13787 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.158979 13785 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159029 13785 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159265 13790 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159303 13790 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159479 13786 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159521 13786 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159622 13788 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159658 13788 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159749 13789 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159785 13789 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159878 13792 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159916 13792 sched.cpp:869] Will retry registration in 0ns if necessary {code} In Marathon, when we are running our tests, we set the failoverTimeout to 0 in order to cause the Mesos master to immediately forget about a framework when it disconnects. On line 860 of sched.cpp, the retry-delay is set to 1/10th the failoverTimeout, which provides the best explanation for why the value is 0: {code} ./mesos/src/sched/sched.cpp 818 | void doReliableRegistration(Duration maxBackoff) 819 | {
[jira] [Updated] (MESOS-8171) Using a failoverTimeout of 0 with Mesos native scheduler client can result in infinite subscribe loop
[ https://issues.apache.org/jira/browse/MESOS-8171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Harper updated MESOS-8171: -- Description: Over the past year, the Marathon team has been plagued with an issue that hits our CI builds periodically in which the scheduler driver enters a tight loop, sending 10,000s of SUBSCRIBE calls to the master per second. I turned on debug logging for the client and the server, and it pointed to an issue with the {{doReliableRegistration}} method in sched.cpp. Here's the logs: {code} WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.099815 13397 process.cpp:1383] libprocess is initialized on 127.0.1.1:60957 with 8 worker threads WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.118237 13397 logging.cpp:199] Logging to STDERR WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.128921 13416 sched.cpp:232] Version: 1.4.0 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151785 13791 group.cpp:341] Group process (zookeeper-group(1)@127.0.1.1:60957) connected to ZooKeeper WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151823 13791 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151837 13791 group.cpp:419] Trying to create path '/mesos' in ZooKeeper WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152586 13791 group.cpp:758] Found non-sequence node 'log_replicas' at '/mesos' in ZooKeeper WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152662 13791 detector.cpp:152] Detected a new leader: (id='0') WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152762 13791 group.cpp:700] Trying to get '/mesos/json.info_00' in ZooKeeper WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157148 13791 zookeeper.cpp:262] A new leading master (UPID=master@172.16.10.95:32856) is detected WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157347 13787 sched.cpp:336] New master detected at master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157557 13787 sched.cpp:352] No credentials provided. Attempting to register without authentication WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157565 13787 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157635 13787 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.158979 13785 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159029 13785 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159265 13790 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159303 13790 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159479 13786 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159521 13786 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159622 13788 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159658 13788 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159749 13789 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159785 13789 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159878 13792 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159916 13792 sched.cpp:869] Will retry registration in 0ns if necessary {code} In Marathon, when we are running our tests, we set the failoverTimeout to 0 in order to cause the Mesos master to immediately forget about a framework when it disconnects. On line 860 of sched.cpp, the retry-delay is set to 1/10th the failoverTimeout, which provides the best explanation for why the value is 0: {code} ./mesos/src/sched/sched.cpp 818 | void doReliableRegistration(Duration maxBackoff) 819 | {
[jira] [Updated] (MESOS-8171) Using a failoverTimeout of 0 with Mesos native scheduler client can result in infinite subscribe loop
[ https://issues.apache.org/jira/browse/MESOS-8171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Harper updated MESOS-8171: -- Affects Version/s: 1.1.3 1.2.2 1.3.1 > Using a failoverTimeout of 0 with Mesos native scheduler client can result in > infinite subscribe loop > - > > Key: MESOS-8171 > URL: https://issues.apache.org/jira/browse/MESOS-8171 > Project: Mesos > Issue Type: Bug > Components: c++ api, java api, scheduler driver >Affects Versions: 1.1.3, 1.2.2, 1.3.1, 1.4.0 >Reporter: Tim Harper >Priority: Minor > > Over the past year, the Marathon team has been plagued with an issue that > hits our CI builds periodically in which the scheduler driver enters a tight > loop, sending 10,000s of SUBSCRIBE calls to the master per second. I turned > on debug logging for the client and the server, and it pointed to an issue > with the {{doReliableRegistration}} method in sched.cpp. Here's the logs: > {code} > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.099815 13397 process.cpp:1383] libprocess is initialized on > 127.0.1.1:60957 with 8 worker threads > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.118237 13397 logging.cpp:199] Logging to STDERR > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.128921 13416 sched.cpp:232] Version: 1.4.0 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.151785 13791 group.cpp:341] Group process > (zookeeper-group(1)@127.0.1.1:60957) connected to ZooKeeper > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.151823 13791 group.cpp:831] Syncing group operations: queue size > (joins, cancels, datas) = (0, 0, 0) > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.151837 13791 group.cpp:419] Trying to create path '/mesos' in > ZooKeeper > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.152586 13791 group.cpp:758] Found non-sequence node 'log_replicas' > at '/mesos' in ZooKeeper > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.152662 13791 detector.cpp:152] Detected a new leader: (id='0') > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.152762 13791 group.cpp:700] Trying to get > '/mesos/json.info_00' in ZooKeeper > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.157148 13791 zookeeper.cpp:262] A new leading master > (UPID=master@172.16.10.95:32856) is detected > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.157347 13787 sched.cpp:336] New master detected at > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.157557 13787 sched.cpp:352] No credentials provided. Attempting to > register without authentication > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.157565 13787 sched.cpp:836] Sending SUBSCRIBE call to > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.157635 13787 sched.cpp:869] Will retry registration in 0ns if > necessary > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.158979 13785 sched.cpp:836] Sending SUBSCRIBE call to > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159029 13785 sched.cpp:869] Will retry registration in 0ns if > necessary > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159265 13790 sched.cpp:836] Sending SUBSCRIBE call to > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159303 13790 sched.cpp:869] Will retry registration in 0ns if > necessary > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159479 13786 sched.cpp:836] Sending SUBSCRIBE call to > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159521 13786 sched.cpp:869] Will retry registration in 0ns if > necessary > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159622 13788 sched.cpp:836] Sending SUBSCRIBE call to > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159658 13788 sched.cpp:869] Will retry registration in 0ns if > necessary > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159749 13789 sched.cpp:836] Sending SUBSCRIBE call to > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159785 13789
[jira] [Updated] (MESOS-8171) Using a failoverTimeout of 0 with Mesos native scheduler client can result in infinite subscribe loop
[ https://issues.apache.org/jira/browse/MESOS-8171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Harper updated MESOS-8171: -- Description: Over the past year, the Marathon team has been plagued with an issue that hits our CI builds periodically in which the scheduler driver enters a tight loop, sending 10,000s of SUBSCRIBE calls to the master per second. I turned on debug logging for the client and the server, and it pointed to an issue with the {{doReliableRegistration}} method in sched.cpp. Here's the logs: {code} WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.099815 13397 process.cpp:1383] libprocess is initialized on 127.0.1.1:60957 with 8 worker threads WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.118237 13397 logging.cpp:199] Logging to STDERR WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.128921 13416 sched.cpp:232] Version: 1.4.0 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151785 13791 group.cpp:341] Group process (zookeeper-group(1)@127.0.1.1:60957) connected to ZooKeeper WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151823 13791 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151837 13791 group.cpp:419] Trying to create path '/mesos' in ZooKeeper WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152586 13791 group.cpp:758] Found non-sequence node 'log_replicas' at '/mesos' in ZooKeeper WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152662 13791 detector.cpp:152] Detected a new leader: (id='0') WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152762 13791 group.cpp:700] Trying to get '/mesos/json.info_00' in ZooKeeper WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157148 13791 zookeeper.cpp:262] A new leading master (UPID=master@172.16.10.95:32856) is detected WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157347 13787 sched.cpp:336] New master detected at master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157557 13787 sched.cpp:352] No credentials provided. Attempting to register without authentication WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157565 13787 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157635 13787 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.158979 13785 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159029 13785 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159265 13790 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159303 13790 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159479 13786 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159521 13786 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159622 13788 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159658 13788 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159749 13789 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159785 13789 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159878 13792 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159916 13792 sched.cpp:869] Will retry registration in 0ns if necessary {code} In Marathon, when we are running our tests, we set the failoverTimeout to 0 in order to cause the Mesos master to immediately forget about a framework when it disconnects. On line 860 of sched.cpp, the retry-delay is set to 1/10th the failoverTimeout, which provides the best explanation for why the value is 0: {code} /Users/tim/src/m8e/mesos/src/sched/sched.cpp 818 | void doReliableRegistration(Duration
[jira] [Commented] (MESOS-8171) Using a failoverTimeout of 0 with Mesos native scheduler client can result in infinite subscribe loop
[ https://issues.apache.org/jira/browse/MESOS-8171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238837#comment-16238837 ] Tim Harper commented on MESOS-8171: --- It seems like perhaps an ideal solution would be to ignore 0? {code} if ( (duration.isSome()) && (duration.get() > Duration::zero() ) { ... } {code} > Using a failoverTimeout of 0 with Mesos native scheduler client can result in > infinite subscribe loop > - > > Key: MESOS-8171 > URL: https://issues.apache.org/jira/browse/MESOS-8171 > Project: Mesos > Issue Type: Bug > Components: c++ api, java api, scheduler driver >Affects Versions: 1.4.0 >Reporter: Tim Harper >Priority: Minor > > Over the past year, the Marathon team has been plagued with an issue that > hits our CI builds periodically in which the scheduler driver enters a tight > loop, sending 10,000s of SUBSCRIBE calls to the master per second. I turned > on debug logging for the client and the server, and it pointed to an issue > with the {{doReliableRegistration}} method in sched.cpp. Here's the logs: > {code} > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.099815 13397 process.cpp:1383] libprocess is initialized on > 127.0.1.1:60957 with 8 worker threads > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.118237 13397 logging.cpp:199] Logging to STDERR > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.128921 13416 sched.cpp:232] Version: 1.4.0 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.151785 13791 group.cpp:341] Group process > (zookeeper-group(1)@127.0.1.1:60957) connected to ZooKeeper > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.151823 13791 group.cpp:831] Syncing group operations: queue size > (joins, cancels, datas) = (0, 0, 0) > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.151837 13791 group.cpp:419] Trying to create path '/mesos' in > ZooKeeper > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.152586 13791 group.cpp:758] Found non-sequence node 'log_replicas' > at '/mesos' in ZooKeeper > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.152662 13791 detector.cpp:152] Detected a new leader: (id='0') > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.152762 13791 group.cpp:700] Trying to get > '/mesos/json.info_00' in ZooKeeper > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.157148 13791 zookeeper.cpp:262] A new leading master > (UPID=master@172.16.10.95:32856) is detected > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.157347 13787 sched.cpp:336] New master detected at > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.157557 13787 sched.cpp:352] No credentials provided. Attempting to > register without authentication > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.157565 13787 sched.cpp:836] Sending SUBSCRIBE call to > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.157635 13787 sched.cpp:869] Will retry registration in 0ns if > necessary > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.158979 13785 sched.cpp:836] Sending SUBSCRIBE call to > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159029 13785 sched.cpp:869] Will retry registration in 0ns if > necessary > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159265 13790 sched.cpp:836] Sending SUBSCRIBE call to > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159303 13790 sched.cpp:869] Will retry registration in 0ns if > necessary > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159479 13786 sched.cpp:836] Sending SUBSCRIBE call to > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159521 13786 sched.cpp:869] Will retry registration in 0ns if > necessary > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159622 13788 sched.cpp:836] Sending SUBSCRIBE call to > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159658 13788 sched.cpp:869] Will retry registration in 0ns if > necessary > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159749 13789 sched.cpp:836] Sending SUBSCRIBE call to > master@172.16.10.95:32856 > WARN
[jira] [Comment Edited] (MESOS-8171) Using a failoverTimeout of 0 with Mesos native scheduler client can result in infinite subscribe loop
[ https://issues.apache.org/jira/browse/MESOS-8171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238837#comment-16238837 ] Tim Harper edited comment on MESOS-8171 at 11/4/17 7:50 AM: It seems like perhaps an ideal solution would be to ignore 0? {code} if ( duration.isSome() && (duration.get() > Duration::zero()) ) { ... } {code} was (Author: timcharper): It seems like perhaps an ideal solution would be to ignore 0? {code} if ( (duration.isSome()) && (duration.get() > Duration::zero() ) { ... } {code} > Using a failoverTimeout of 0 with Mesos native scheduler client can result in > infinite subscribe loop > - > > Key: MESOS-8171 > URL: https://issues.apache.org/jira/browse/MESOS-8171 > Project: Mesos > Issue Type: Bug > Components: c++ api, java api, scheduler driver >Affects Versions: 1.4.0 >Reporter: Tim Harper >Priority: Minor > > Over the past year, the Marathon team has been plagued with an issue that > hits our CI builds periodically in which the scheduler driver enters a tight > loop, sending 10,000s of SUBSCRIBE calls to the master per second. I turned > on debug logging for the client and the server, and it pointed to an issue > with the {{doReliableRegistration}} method in sched.cpp. Here's the logs: > {code} > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.099815 13397 process.cpp:1383] libprocess is initialized on > 127.0.1.1:60957 with 8 worker threads > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.118237 13397 logging.cpp:199] Logging to STDERR > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.128921 13416 sched.cpp:232] Version: 1.4.0 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.151785 13791 group.cpp:341] Group process > (zookeeper-group(1)@127.0.1.1:60957) connected to ZooKeeper > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.151823 13791 group.cpp:831] Syncing group operations: queue size > (joins, cancels, datas) = (0, 0, 0) > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.151837 13791 group.cpp:419] Trying to create path '/mesos' in > ZooKeeper > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.152586 13791 group.cpp:758] Found non-sequence node 'log_replicas' > at '/mesos' in ZooKeeper > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.152662 13791 detector.cpp:152] Detected a new leader: (id='0') > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.152762 13791 group.cpp:700] Trying to get > '/mesos/json.info_00' in ZooKeeper > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.157148 13791 zookeeper.cpp:262] A new leading master > (UPID=master@172.16.10.95:32856) is detected > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.157347 13787 sched.cpp:336] New master detected at > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.157557 13787 sched.cpp:352] No credentials provided. Attempting to > register without authentication > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.157565 13787 sched.cpp:836] Sending SUBSCRIBE call to > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.157635 13787 sched.cpp:869] Will retry registration in 0ns if > necessary > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.158979 13785 sched.cpp:836] Sending SUBSCRIBE call to > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159029 13785 sched.cpp:869] Will retry registration in 0ns if > necessary > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159265 13790 sched.cpp:836] Sending SUBSCRIBE call to > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159303 13790 sched.cpp:869] Will retry registration in 0ns if > necessary > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159479 13786 sched.cpp:836] Sending SUBSCRIBE call to > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159521 13786 sched.cpp:869] Will retry registration in 0ns if > necessary > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159622 13788 sched.cpp:836] Sending SUBSCRIBE call to > master@172.16.10.95:32856 > WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 > 05:39:39.159658 13788
[jira] [Updated] (MESOS-8171) Using a failoverTimeout of 0 with Mesos native scheduler client can result in infinite subscribe loop
[ https://issues.apache.org/jira/browse/MESOS-8171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Harper updated MESOS-8171: -- Description: Over the past year, the Marathon team has been plagued with an issue that hits our CI builds periodically in which the scheduler driver enters a tight loop, sending 10,000s of SUBSCRIBE calls to the master per second. I turned on debug logging for the client and the server, and it pointed to an issue with the {{doReliableRegistration}} method in sched.cpp. Here's the logs: {code} WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.099815 13397 process.cpp:1383] libprocess is initialized on 127.0.1.1:60957 with 8 worker threads WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.118237 13397 logging.cpp:199] Logging to STDERR WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.128921 13416 sched.cpp:232] Version: 1.4.0 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151785 13791 group.cpp:341] Group process (zookeeper-group(1)@127.0.1.1:60957) connected to ZooKeeper WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151823 13791 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151837 13791 group.cpp:419] Trying to create path '/mesos' in ZooKeeper WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152586 13791 group.cpp:758] Found non-sequence node 'log_replicas' at '/mesos' in ZooKeeper WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152662 13791 detector.cpp:152] Detected a new leader: (id='0') WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152762 13791 group.cpp:700] Trying to get '/mesos/json.info_00' in ZooKeeper WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157148 13791 zookeeper.cpp:262] A new leading master (UPID=master@172.16.10.95:32856) is detected WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157347 13787 sched.cpp:336] New master detected at master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157557 13787 sched.cpp:352] No credentials provided. Attempting to register without authentication WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157565 13787 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157635 13787 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.158979 13785 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159029 13785 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159265 13790 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159303 13790 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159479 13786 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159521 13786 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159622 13788 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159658 13788 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159749 13789 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159785 13789 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159878 13792 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159916 13792 sched.cpp:869] Will retry registration in 0ns if necessary {code} In Marathon, when we are running our tests, we set the failoverTimeout to 0 in order to cause the Mesos master to immediately forget about a framework when it disconnects. On line 860 of sched.cpp, the retry-delay is set to 1/10th the failoverTimeout, which provides the best explanation for why the value is 0: {code} /Users/tim/src/m8e/mesos/src/sched/sched.cpp 818 | void doReliableRegistration(Duration
[jira] [Created] (MESOS-8171) Using a failoverTimeout of 0 with Mesos native scheduler client can result in infinite subscribe loop
Tim Harper created MESOS-8171: - Summary: Using a failoverTimeout of 0 with Mesos native scheduler client can result in infinite subscribe loop Key: MESOS-8171 URL: https://issues.apache.org/jira/browse/MESOS-8171 Project: Mesos Issue Type: Bug Components: c++ api, java api, scheduler driver Affects Versions: 1.4.0 Reporter: Tim Harper Priority: Minor Over the past year, the Marathon team has been plagued with an issue that hits our CI builds periodically in which the scheduler driver enters a tight loop, sending 10,000s of SUBSCRIBE calls to the master per second. I turned on debug logging for the client and the server, and it pointed to an issue with the {{doReliableRegistration}} method in sched.cpp. Here's the logs: {code} WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.099815 13397 process.cpp:1383] libprocess is initialized on 127.0.1.1:60957 with 8 worker threads WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.118237 13397 logging.cpp:199] Logging to STDERR WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.128921 13416 sched.cpp:232] Version: 1.4.0 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151785 13791 group.cpp:341] Group process (zookeeper-group(1)@127.0.1.1:60957) connected to ZooKeeper WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151823 13791 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.151837 13791 group.cpp:419] Trying to create path '/mesos' in ZooKeeper WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152586 13791 group.cpp:758] Found non-sequence node 'log_replicas' at '/mesos' in ZooKeeper WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152662 13791 detector.cpp:152] Detected a new leader: (id='0') WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.152762 13791 group.cpp:700] Trying to get '/mesos/json.info_00' in ZooKeeper WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157148 13791 zookeeper.cpp:262] A new leading master (UPID=master@172.16.10.95:32856) is detected WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157347 13787 sched.cpp:336] New master detected at master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157557 13787 sched.cpp:352] No credentials provided. Attempting to register without authentication WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157565 13787 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.157635 13787 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.158979 13785 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159029 13785 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159265 13790 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159303 13790 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159479 13786 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159521 13786 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159622 13788 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159658 13788 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159749 13789 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159785 13789 sched.cpp:869] Will retry registration in 0ns if necessary WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159878 13792 sched.cpp:836] Sending SUBSCRIBE call to master@172.16.10.95:32856 WARN [05:39:39 EventsIntegrationTest-LocalMarathon-32858] I1104 05:39:39.159916 13792 sched.cpp:869] Will retry registration in 0ns if necessary {code} In Marathon, when we are running our tests, we set the failoverTimeout to 0 in order to cause the Mesos master to immediately
[jira] [Created] (MESOS-8150) Attributes documentation indicates that sets are valid attribute types; code disagrees
Tim Harper created MESOS-8150: - Summary: Attributes documentation indicates that sets are valid attribute types; code disagrees Key: MESOS-8150 URL: https://issues.apache.org/jira/browse/MESOS-8150 Project: Mesos Issue Type: Documentation Reporter: Tim Harper Priority: Minor On the [Mesos Attributes & Resources|http://mesos.apache.org/documentation/latest/attributes-resources/] page, it says: {quote}The types of values that are supported by Attributes and Resources in Mesos are scalar, ranges, sets and text.{quote} However, the code for 1.4.x disagrees. Sets are not supported for attribute types: https://github.com/apache/mesos/blob/1.4.0/src/common/attributes.cpp#L171 https://github.com/apache/mesos/blob/1.4.0/src/common/attributes.cpp#L115-L128 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8148) Enforce text attribute value specification for zone and region values
[ https://issues.apache.org/jira/browse/MESOS-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Harper updated MESOS-8148: -- Description: Mesos has a specification for characters allowed by attribute values: http://mesos.apache.org/documentation/latest/attributes-resources/ The specification is as follows: {code} scalar : floatValue floatValue : ( intValue ( "." intValue )? ) | ... intValue : [0-9]+ range : "[" rangeValue ( "," rangeValue )* "]" rangeValue : scalar "-" scalar set : "{" text ( "," text )* "}" text : [a-zA-Z0-9_/.-] {code} Marathon is [implementing IN and IS constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub], and includes plans to support further attribute types as it makes sense to do so (IE {{a,b IS b,a}}, {{5 IN [0-10]}}). In order to do this, Marathon has adopted the Mesos attribute value specification and will enforce it in the validation layer. As an example, it will be possible to write things like: {code:java} "constraints": [ ["attribute", "IN", "{value-a,value-b,value-c}"] ] {code} Additionally, Marathon allows one to specify constraints on non-attribute properties, such as region, hostname, or zone. If somebody specified a zone value with a comma, then the user would not be able to use the Mesos set value type specification to describe a set of zones in which an app should be deployed, and, as a consequence, would result in additional complexity (IE: Marathon would need to implement an escaping mechanism for this case). Ideally, the character space is confined to begin with. It the text type specification is sufficient, then, it seems simpler to re-use it rather than create another one. was: Mesos has a specification for characters allowed by attribute values: http://mesos.apache.org/documentation/latest/attributes-resources/ Marathon is [implementing IN and IS constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub], and includes plans to support further attribute types as it makes sense to do so (IE {{a,b IS b,a}}, {{5 IN [0-10]}}). In order to do this, Marathon has adopted the Mesos attribute value specification and will enforce it in the validation layer. As an example, it will be possible to write things like: {code:java} "constraints": [ ["attribute", "IN", "{value-a,value-b,value-c}"] ] {code} Additionally, Marathon allows one to specify constraints on non-attribute properties, such as region, hostname, or zone. If somebody specified a zone value with a comma, then the user would not be able to use the Mesos set value type specification to describe a set of zones in which an app should be deployed, and, as a consequence, would result in additional complexity (IE: Marathon would need to implement an escaping mechanism for this case). Ideally, the character space is confined to begin with. It the text type specification is sufficient, then, it seems simpler to re-use it rather than create another one. > Enforce text attribute value specification for zone and region values > - > > Key: MESOS-8148 > URL: https://issues.apache.org/jira/browse/MESOS-8148 > Project: Mesos > Issue Type: Improvement >Reporter: Tim Harper > > Mesos has a specification for characters allowed by attribute values: > http://mesos.apache.org/documentation/latest/attributes-resources/ > The specification is as follows: > {code} > scalar : floatValue > floatValue : ( intValue ( "." intValue )? ) | ... > intValue : [0-9]+ > range : "[" rangeValue ( "," rangeValue )* "]" > rangeValue : scalar "-" scalar > set : "{" text ( "," text )* "}" > text : [a-zA-Z0-9_/.-] > {code} > Marathon is [implementing IN and IS > constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub], > and includes plans to support further attribute types as it makes sense to > do so (IE {{a,b IS b,a}}, {{5 IN [0-10]}}). In order > to do this, Marathon has adopted the Mesos attribute value specification and > will enforce it in the validation layer. As an example, it will be possible > to write things like: > {code:java} > "constraints": [ > ["attribute", "IN", "{value-a,value-b,value-c}"] > ] > {code} > Additionally, Marathon allows one to specify constraints on non-attribute > properties, such as region, hostname, or zone. If somebody specified a zone > value with a comma, then the user would not be able to use the Mesos set > value type specification to describe a set of zones in which an app should be > deployed, and, as a consequence, would result in additional complexity (IE: > Marathon would need to implement an
[jira] [Updated] (MESOS-8148) Enforce text attribute value specification for zone and region values
[ https://issues.apache.org/jira/browse/MESOS-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Harper updated MESOS-8148: -- Description: Mesos has a specification for characters allowed by attribute values: http://mesos.apache.org/documentation/latest/attributes-resources/ Marathon is [implementing IN and IS constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub], and includes plans to support further attribute types as it makes sense to do so (IE {{a,b IS b,a}}, {{5 IN [0-10]}}). In order to do this, Marathon has adopted the Mesos attribute value specification and will enforce it in the validation layer. As an example, it will be possible to write things like: {code:java} "constraints": [ ["attribute", "IN", "{value-a,value-b,value-c}"] ] {code} Additionally, Marathon allows one to specify constraints on non-attribute properties, such as region, hostname, or zone. If somebody specified a zone value with a comma, then the user would not be able to use the Mesos set value type specification to describe a set of zones in which an app should be deployed, and, as a consequence, would result in additional complexity (IE: Marathon would need to implement an escaping mechanism for this case). Ideally, the character space is confined to begin with. It the text type specification is sufficient, then, it seems simpler to re-use it rather than create another one. was: Mesos has a specification for characters allowed by attribute values: http://mesos.apache.org/documentation/latest/attributes-resources/ Marathon is [implementing IN and IS constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub], and includes plans to support further attribute types as it makes sense to do so (IE {{a,b IS b,a}}, {{5 IN [0-10]}}). In order to do this, Marathon has adopted the Mesos attribute value specification and will enforce it in the validation layer. As an example, it will be possible to write things like: {code:java} "constraints": [ ["attribute", "IN", "{value-a,value-b,value-c}"] ] {code} Additionally, Marathon allows one to specify constraints on non-attribute properties, such as region, hostname, or zone. If somebody specified a zone value with a comma, then the user would not be able to uses the Mesos set value type specification to describe a set of zones in which an app would be deployed, and, would result in additional complexity (IE: Marathon would need to implement an escaping mechanism for this case). Ideally, the character space is confined to begin with. It the text type specification is sufficient, then, it seems simpler to re-use it rather than create another one. > Enforce text attribute value specification for zone and region values > - > > Key: MESOS-8148 > URL: https://issues.apache.org/jira/browse/MESOS-8148 > Project: Mesos > Issue Type: Improvement >Reporter: Tim Harper > > Mesos has a specification for characters allowed by attribute values: > http://mesos.apache.org/documentation/latest/attributes-resources/ > Marathon is [implementing IN and IS > constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub], > and includes plans to support further attribute types as it makes sense to > do so (IE {{a,b IS b,a}}, {{5 IN [0-10]}}). In order > to do this, Marathon has adopted the Mesos attribute value specification and > will enforce it in the validation layer. As an example, it will be possible > to write things like: > {code:java} > "constraints": [ > ["attribute", "IN", "{value-a,value-b,value-c}"] > ] > {code} > Additionally, Marathon allows one to specify constraints on non-attribute > properties, such as region, hostname, or zone. If somebody specified a zone > value with a comma, then the user would not be able to use the Mesos set > value type specification to describe a set of zones in which an app should be > deployed, and, as a consequence, would result in additional complexity (IE: > Marathon would need to implement an escaping mechanism for this case). > Ideally, the character space is confined to begin with. It the text type > specification is sufficient, then, it seems simpler to re-use it rather than > create another one. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8148) Enforce text attribute value specification for zone and region values
[ https://issues.apache.org/jira/browse/MESOS-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Harper updated MESOS-8148: -- Description: Mesos has a specification for characters allowed by attribute values: http://mesos.apache.org/documentation/latest/attributes-resources/ Marathon is [implementing IN and IS constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub], and includes plans to support further attribute types as it makes sense to do so (IE {{a,b IS b,a}}, {{5 IN [0-10]}}). In order to do this, Marathon has adopted the Mesos attribute value specification and will enforce it in the validation layer. As an example, it will be possible to write things like: {code:java} "constraints": [ ["attribute", "IN", "{value-a,value-b,value-c}"] ] {code} Additionally, Marathon allows one to specify constraints on non-attribute properties, such as region, hostname, or zone. If somebody specified a zone value with a comma, then the user would not be able to uses the Mesos set value type specification to describe a set of zones in which an app would be deployed, and, would result in additional complexity (IE: Marathon would need to implement an escaping mechanism for this case). Ideally, the character space is confined to begin with. It the text type specification is sufficient, then, it seems simpler to re-use it rather than create another one. was: Mesos has a specification for characters allowed by attribute values: http://mesos.apache.org/documentation/latest/attributes-resources/ Marathon is [implementing IN and IS constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub], and includes plans to support further attribute types as it makes sense to do so (IE {{\{a,b\} IS \{b,a\} }}, 5 IN [0-10]). In order to do this, Marathon has adopted the Mesos attribute value specification and will enforce it in the validation layer. As an example, it will be possible to write things like: {code:java} "constraints": [ ["attribute", "IN", "{value-a,value-b,value-c}"] ] {code} Additionally, Marathon allows one to specify constraints on non-attribute properties, such as region, hostname, or zone. If somebody specified a zone value with a comma, then the user would not be able to uses the Mesos set value type specification to describe a set of zones in which an app would be deployed, and, would result in additional complexity (IE: Marathon would need to implement an escaping mechanism for this case). Ideally, the character space is confined to begin with. It the text type specification is sufficient, then, it seems simpler to re-use it rather than create another one. > Enforce text attribute value specification for zone and region values > - > > Key: MESOS-8148 > URL: https://issues.apache.org/jira/browse/MESOS-8148 > Project: Mesos > Issue Type: Improvement >Reporter: Tim Harper > > Mesos has a specification for characters allowed by attribute values: > http://mesos.apache.org/documentation/latest/attributes-resources/ > Marathon is [implementing IN and IS > constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub], > and includes plans to support further attribute types as it makes sense to > do so (IE {{a,b IS b,a}}, {{5 IN [0-10]}}). In order > to do this, Marathon has adopted the Mesos attribute value specification and > will enforce it in the validation layer. As an example, it will be possible > to write things like: > {code:java} > "constraints": [ > ["attribute", "IN", "{value-a,value-b,value-c}"] > ] > {code} > Additionally, Marathon allows one to specify constraints on non-attribute > properties, such as region, hostname, or zone. If somebody specified a zone > value with a comma, then the user would not be able to uses the Mesos set > value type specification to describe a set of zones in which an app would be > deployed, and, would result in additional complexity (IE: Marathon would need > to implement an escaping mechanism for this case). > Ideally, the character space is confined to begin with. It the text type > specification is sufficient, then, it seems simpler to re-use it rather than > create another one. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8148) Enforce text attribute value specification for zone and region values
[ https://issues.apache.org/jira/browse/MESOS-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Harper updated MESOS-8148: -- Description: Mesos has a specification for characters allowed by attribute values: http://mesos.apache.org/documentation/latest/attributes-resources/ Marathon is [implementing IN and IS constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub], and includes plans to support further attribute types as it makes sense to do so (IE {{\{a,b\} IS \{b,a\} }}, 5 IN [0-10]). In order to do this, Marathon has adopted the Mesos attribute value specification and will enforce it in the validation layer. As an example, it will be possible to write things like: {code:java} "constraints": [ ["attribute", "IN", "{value-a,value-b,value-c}"] ] {code} Additionally, Marathon allows one to specify constraints on non-attribute properties, such as region, hostname, or zone. If somebody specified a zone value with a comma, then the user would not be able to uses the Mesos set value type specification to describe a set of zones in which an app would be deployed, and, would result in additional complexity (IE: Marathon would need to implement an escaping mechanism for this case). Ideally, the character space is confined to begin with. It the text type specification is sufficient, then, it seems simpler to re-use it rather than create another one. was: Mesos has a specification for characters allowed by attribute values: http://mesos.apache.org/documentation/latest/attributes-resources/ Marathon is [implementing IN and IS constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub], and includes plans to support further attribute types as it makes sense to do so (IE {a,b} IS {b,a}, 5 IN [0-10]). In order to do this, Marathon has adopted the Mesos attribute value specification and will enforce it in the validation layer. As an example, it will be possible to write things like: {code:java} "constraints": [ ["attribute", "IN", "{value-a,value-b,value-c}"] ] {code} Additionally, Marathon allows one to specify constraints on non-attribute properties, such as region, hostname, or zone. If somebody specified a zone value with a comma, then the user would not be able to uses the Mesos set value type specification to describe a set of zones in which an app would be deployed, and, would result in additional complexity (IE: Marathon would need to implement an escaping mechanism for this case). Ideally, the character space is confined to begin with. It the text type specification is sufficient, then, it seems simpler to re-use it rather than create another one. > Enforce text attribute value specification for zone and region values > - > > Key: MESOS-8148 > URL: https://issues.apache.org/jira/browse/MESOS-8148 > Project: Mesos > Issue Type: Improvement >Reporter: Tim Harper > > Mesos has a specification for characters allowed by attribute values: > http://mesos.apache.org/documentation/latest/attributes-resources/ > Marathon is [implementing IN and IS > constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub], > and includes plans to support further attribute types as it makes sense to > do so (IE {{\{a,b\} IS \{b,a\} }}, 5 IN [0-10]). In order to do this, > Marathon has adopted the Mesos attribute value specification and will enforce > it in the validation layer. As an example, it will be possible to write > things like: > {code:java} > "constraints": [ > ["attribute", "IN", "{value-a,value-b,value-c}"] > ] > {code} > Additionally, Marathon allows one to specify constraints on non-attribute > properties, such as region, hostname, or zone. If somebody specified a zone > value with a comma, then the user would not be able to uses the Mesos set > value type specification to describe a set of zones in which an app would be > deployed, and, would result in additional complexity (IE: Marathon would need > to implement an escaping mechanism for this case). > Ideally, the character space is confined to begin with. It the text type > specification is sufficient, then, it seems simpler to re-use it rather than > create another one. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-8148) Enforce text attribute value specification for zone and region values
Tim Harper created MESOS-8148: - Summary: Enforce text attribute value specification for zone and region values Key: MESOS-8148 URL: https://issues.apache.org/jira/browse/MESOS-8148 Project: Mesos Issue Type: Improvement Reporter: Tim Harper Mesos has a specification for characters allowed by attribute values: http://mesos.apache.org/documentation/latest/attributes-resources/ Marathon is [implementing IN and IS constraints|https://docs.google.com/document/d/e/2PACX-1vSFvPol0pcHC2Web7EaNU0oSDS5wrOWSgFcmuslYBtISV2NB2JZ_D-B4wpWy_Vutaf08m2LX6WZVy6s/pub], and includes plans to support further attribute types as it makes sense to do so (IE {a,b} IS {b,a}, 5 IN [0-10]). In order to do this, Marathon has adopted the Mesos attribute value specification and will enforce it in the validation layer. As an example, it will be possible to write things like: {code:java} "constraints": [ ["attribute", "IN", "{value-a,value-b,value-c}"] ] {code} Additionally, Marathon allows one to specify constraints on non-attribute properties, such as region, hostname, or zone. If somebody specified a zone value with a comma, then the user would not be able to uses the Mesos set value type specification to describe a set of zones in which an app would be deployed, and, would result in additional complexity (IE: Marathon would need to implement an escaping mechanism for this case). Ideally, the character space is confined to begin with. It the text type specification is sufficient, then, it seems simpler to re-use it rather than create another one. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Issue Comment Deleted] (MESOS-5368) Consider introducing persistent agent ID
[ https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Harper updated MESOS-5368: -- Comment: was deleted (was: In a chat with Greg Mann, I understand a patch for this has landed in Mesos 1.4.x, which I believe is commit {{cd6495e677ec74fd3f40b0dbf3b9654475308575}} As such, it seems this ticket should be updated to have a fix version of 1.4.0, and be marked as complete.) > Consider introducing persistent agent ID > > > Key: MESOS-5368 > URL: https://issues.apache.org/jira/browse/MESOS-5368 > Project: Mesos > Issue Type: Improvement >Affects Versions: 1.2.1, 1.3.0 >Reporter: Neil Conway > Labels: mesosphere > > Currently, agent IDs identify a single "session" by an agent: that is, an > agent receives an agent ID when it registers with the master; it reuses that > agent ID if it disconnects and successfully reregisters; if the agent shuts > down and restarts, it registers anew and receives a new agent ID. > It would be convenient to have a "persistent agent ID" that remains the same > for the duration of a given agent {{work_dir}}. This would mean that a given > persistent volume would not migrate between different persistent agent IDs > over time, for example (see MESOS-4894). If we supported permanently removing > an agent from the cluster (i.e., the {{work_dir}} and any volumes used by the > agent will never be reused), we could use the persistent agent ID to report > which agent has been removed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Issue Comment Deleted] (MESOS-5368) Consider introducing persistent agent ID
[ https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Harper updated MESOS-5368: -- Comment: was deleted (was: One particular pain of not having this feature is it takes Mesos longer than necessary to recognize that a task is definitely gone. Were we to have persistent agent IDs, then, when the agent re-registered, it could tell Mesos, "I was asked to launch that task, and yes, it is definitely dead", where-as right now it is left in the unreachable state until Mesos gives up on the agent.) > Consider introducing persistent agent ID > > > Key: MESOS-5368 > URL: https://issues.apache.org/jira/browse/MESOS-5368 > Project: Mesos > Issue Type: Improvement >Affects Versions: 1.2.1, 1.3.0 >Reporter: Neil Conway > Labels: mesosphere > > Currently, agent IDs identify a single "session" by an agent: that is, an > agent receives an agent ID when it registers with the master; it reuses that > agent ID if it disconnects and successfully reregisters; if the agent shuts > down and restarts, it registers anew and receives a new agent ID. > It would be convenient to have a "persistent agent ID" that remains the same > for the duration of a given agent {{work_dir}}. This would mean that a given > persistent volume would not migrate between different persistent agent IDs > over time, for example (see MESOS-4894). If we supported permanently removing > an agent from the cluster (i.e., the {{work_dir}} and any volumes used by the > agent will never be reused), we could use the persistent agent ID to report > which agent has been removed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-5368) Consider introducing persistent agent ID
[ https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102095#comment-16102095 ] Tim Harper commented on MESOS-5368: --- With a second reading, this ticket seems like a duplicate of MESOS-6223, and seems like it could be closed as such. MESOS-6223 has an up-to-date status. > Consider introducing persistent agent ID > > > Key: MESOS-5368 > URL: https://issues.apache.org/jira/browse/MESOS-5368 > Project: Mesos > Issue Type: Improvement >Affects Versions: 1.2.1, 1.3.0 >Reporter: Neil Conway > Labels: mesosphere > > Currently, agent IDs identify a single "session" by an agent: that is, an > agent receives an agent ID when it registers with the master; it reuses that > agent ID if it disconnects and successfully reregisters; if the agent shuts > down and restarts, it registers anew and receives a new agent ID. > It would be convenient to have a "persistent agent ID" that remains the same > for the duration of a given agent {{work_dir}}. This would mean that a given > persistent volume would not migrate between different persistent agent IDs > over time, for example (see MESOS-4894). If we supported permanently removing > an agent from the cluster (i.e., the {{work_dir}} and any volumes used by the > agent will never be reused), we could use the persistent agent ID to report > which agent has been removed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-5368) Consider introducing persistent agent ID
[ https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102054#comment-16102054 ] Tim Harper commented on MESOS-5368: --- In a chat with Greg Mann, I understand a patch for this has landed in Mesos 1.4.x, which I believe is commit {{cd6495e677ec74fd3f40b0dbf3b9654475308575}} As such, it seems this ticket should be updated to have a fix version of 1.4.0, and be marked as complete. > Consider introducing persistent agent ID > > > Key: MESOS-5368 > URL: https://issues.apache.org/jira/browse/MESOS-5368 > Project: Mesos > Issue Type: Improvement >Affects Versions: 1.2.1, 1.3.0 >Reporter: Neil Conway > Labels: mesosphere > > Currently, agent IDs identify a single "session" by an agent: that is, an > agent receives an agent ID when it registers with the master; it reuses that > agent ID if it disconnects and successfully reregisters; if the agent shuts > down and restarts, it registers anew and receives a new agent ID. > It would be convenient to have a "persistent agent ID" that remains the same > for the duration of a given agent {{work_dir}}. This would mean that a given > persistent volume would not migrate between different persistent agent IDs > over time, for example (see MESOS-4894). If we supported permanently removing > an agent from the cluster (i.e., the {{work_dir}} and any volumes used by the > agent will never be reused), we could use the persistent agent ID to report > which agent has been removed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (MESOS-5368) Consider introducing persistent agent ID
[ https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102011#comment-16102011 ] Tim Harper edited comment on MESOS-5368 at 7/26/17 5:53 PM: One particular pain of not having this feature is it takes Mesos longer than necessary to recognize that a task is definitely gone. Were we to have persistent agent IDs, then, when the agent re-registered, it could tell Mesos, "I was asked to launch that task, and yes, it is definitely dead", where-as right now it is left in the unreachable state until Mesos gives up on the agent. was (Author: timcharper): One particular pain of not having this feature is it takes Mesos longer than necessary to recognize that a task is definitely gone. Were we to have persistent agent IDs, then, when the agent re-registered, it could tell Mesos, "yes, that task is definitely dead", where-as right now it is left in the unreachable state until Mesos gives up on the agent. > Consider introducing persistent agent ID > > > Key: MESOS-5368 > URL: https://issues.apache.org/jira/browse/MESOS-5368 > Project: Mesos > Issue Type: Improvement >Affects Versions: 1.2.1, 1.3.0 >Reporter: Neil Conway > Labels: mesosphere > > Currently, agent IDs identify a single "session" by an agent: that is, an > agent receives an agent ID when it registers with the master; it reuses that > agent ID if it disconnects and successfully reregisters; if the agent shuts > down and restarts, it registers anew and receives a new agent ID. > It would be convenient to have a "persistent agent ID" that remains the same > for the duration of a given agent {{work_dir}}. This would mean that a given > persistent volume would not migrate between different persistent agent IDs > over time, for example (see MESOS-4894). If we supported permanently removing > an agent from the cluster (i.e., the {{work_dir}} and any volumes used by the > agent will never be reused), we could use the persistent agent ID to report > which agent has been removed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (MESOS-5368) Consider introducing persistent agent ID
[ https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102011#comment-16102011 ] Tim Harper edited comment on MESOS-5368 at 7/26/17 5:52 PM: One particular pain of not having this feature is it takes Mesos longer than necessary to recognize that a task is definitely gone. Were we to have persistent agent IDs, then, when the agent re-registered, it could tell Mesos, "yes, that task is definitely dead", where-as right now it is left in the unreachable state until Mesos gives up on the agent. was (Author: timcharper): One particular pain of not having this feature is it takes Mesos longer than necessary to recognize that a task is definitely gone. Were we to have persistent agent IDs, then, when the agent re-registered, it could tell Mesos, "yes, that task is definitely dead", where-as right now it is left perpetually in the unreachable state until Mesos gives up on the agent. > Consider introducing persistent agent ID > > > Key: MESOS-5368 > URL: https://issues.apache.org/jira/browse/MESOS-5368 > Project: Mesos > Issue Type: Improvement >Affects Versions: 1.2.1, 1.3.0 >Reporter: Neil Conway > Labels: mesosphere > > Currently, agent IDs identify a single "session" by an agent: that is, an > agent receives an agent ID when it registers with the master; it reuses that > agent ID if it disconnects and successfully reregisters; if the agent shuts > down and restarts, it registers anew and receives a new agent ID. > It would be convenient to have a "persistent agent ID" that remains the same > for the duration of a given agent {{work_dir}}. This would mean that a given > persistent volume would not migrate between different persistent agent IDs > over time, for example (see MESOS-4894). If we supported permanently removing > an agent from the cluster (i.e., the {{work_dir}} and any volumes used by the > agent will never be reused), we could use the persistent agent ID to report > which agent has been removed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-5368) Consider introducing persistent agent ID
[ https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Harper updated MESOS-5368: -- Affects Version/s: 1.3.0 > Consider introducing persistent agent ID > > > Key: MESOS-5368 > URL: https://issues.apache.org/jira/browse/MESOS-5368 > Project: Mesos > Issue Type: Improvement >Affects Versions: 1.2.1, 1.3.0 >Reporter: Neil Conway > Labels: mesosphere > > Currently, agent IDs identify a single "session" by an agent: that is, an > agent receives an agent ID when it registers with the master; it reuses that > agent ID if it disconnects and successfully reregisters; if the agent shuts > down and restarts, it registers anew and receives a new agent ID. > It would be convenient to have a "persistent agent ID" that remains the same > for the duration of a given agent {{work_dir}}. This would mean that a given > persistent volume would not migrate between different persistent agent IDs > over time, for example (see MESOS-4894). If we supported permanently removing > an agent from the cluster (i.e., the {{work_dir}} and any volumes used by the > agent will never be reused), we could use the persistent agent ID to report > which agent has been removed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-5368) Consider introducing persistent agent ID
[ https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Harper updated MESOS-5368: -- Affects Version/s: 1.2.1 > Consider introducing persistent agent ID > > > Key: MESOS-5368 > URL: https://issues.apache.org/jira/browse/MESOS-5368 > Project: Mesos > Issue Type: Improvement >Affects Versions: 1.2.1, 1.3.0 >Reporter: Neil Conway > Labels: mesosphere > > Currently, agent IDs identify a single "session" by an agent: that is, an > agent receives an agent ID when it registers with the master; it reuses that > agent ID if it disconnects and successfully reregisters; if the agent shuts > down and restarts, it registers anew and receives a new agent ID. > It would be convenient to have a "persistent agent ID" that remains the same > for the duration of a given agent {{work_dir}}. This would mean that a given > persistent volume would not migrate between different persistent agent IDs > over time, for example (see MESOS-4894). If we supported permanently removing > an agent from the cluster (i.e., the {{work_dir}} and any volumes used by the > agent will never be reused), we could use the persistent agent ID to report > which agent has been removed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-5368) Consider introducing persistent agent ID
[ https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102011#comment-16102011 ] Tim Harper commented on MESOS-5368: --- One particular pain of not having this feature is it takes Mesos longer than necessary to recognize that a task is definitely gone. Were we to have persistent agent IDs, then, when the agent re-registered, it could tell Mesos, "yes, that task is definitely dead", where-as right now it is left perpetually in the unreachable state until Mesos gives up on the agent. > Consider introducing persistent agent ID > > > Key: MESOS-5368 > URL: https://issues.apache.org/jira/browse/MESOS-5368 > Project: Mesos > Issue Type: Improvement >Reporter: Neil Conway > Labels: mesosphere > > Currently, agent IDs identify a single "session" by an agent: that is, an > agent receives an agent ID when it registers with the master; it reuses that > agent ID if it disconnects and successfully reregisters; if the agent shuts > down and restarts, it registers anew and receives a new agent ID. > It would be convenient to have a "persistent agent ID" that remains the same > for the duration of a given agent {{work_dir}}. This would mean that a given > persistent volume would not migrate between different persistent agent IDs > over time, for example (see MESOS-4894). If we supported permanently removing > an agent from the cluster (i.e., the {{work_dir}} and any volumes used by the > agent will never be reused), we could use the persistent agent ID to report > which agent has been removed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7374) Running DOCKER images in Mesos Container Runtime without `linux/filesystem` isolation enabled renders host unusable
[ https://issues.apache.org/jira/browse/MESOS-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Harper updated MESOS-7374: -- Description: If I run the pod below (using Marathon 1.4.2) against a mesos agent that has the flags (also below), then the overlay filesystem replaces the system root mount, effectively rendering the host unusable until reboot. flags: - {{--containerizers mesos,docker}} - {{--image_providers APPC,DOCKER}} - {{--isolation cgroups/cpu,cgroups/mem,docker/runtime}} pod definition for Marathon: {code:java} { "id": "/simplepod", "scaling": { "kind": "fixed", "instances": 1 }, "containers": [ { "name": "sleep1", "exec": { "command": { "shell": "sleep 1000" } }, "resources": { "cpus": 0.1, "mem": 32 }, "image": { "id": "alpine", "kind": "DOCKER" } } ], "networks": [ {"mode": "host"} ] } {code} Mesos should probably check for this and avoid replacing the system root mount point at startup or launch time. was: If I run the pod below (using Marathon 1.4.2) against a mesos agent that has the flags (also below), then the overlay filesystem replaces the system root mount, effectively rendering the host unusable until reboot. flags: - {{--containerizers mesos,docker}} - {{--image_providers APPC,DOCKER}} - {{--isolation cgroups/cpu,cgroups/mem,docker/runtime}} pod definition: {code:java} { "id": "/simplepod", "scaling": { "kind": "fixed", "instances": 1 }, "containers": [ { "name": "sleep1", "exec": { "command": { "shell": "sleep 1000" } }, "resources": { "cpus": 0.1, "mem": 32 }, "image": { "id": "alpine", "kind": "DOCKER" } } ], "networks": [ {"mode": "host"} ] } {code} Mesos should probably check for this at startup or launch time. > Running DOCKER images in Mesos Container Runtime without `linux/filesystem` > isolation enabled renders host unusable > --- > > Key: MESOS-7374 > URL: https://issues.apache.org/jira/browse/MESOS-7374 > Project: Mesos > Issue Type: Bug > Components: isolation >Affects Versions: 1.2.0 >Reporter: Tim Harper >Priority: Minor > > If I run the pod below (using Marathon 1.4.2) against a mesos agent that has > the flags (also below), then the overlay filesystem replaces the system root > mount, effectively rendering the host unusable until reboot. > flags: > - {{--containerizers mesos,docker}} > - {{--image_providers APPC,DOCKER}} > - {{--isolation cgroups/cpu,cgroups/mem,docker/runtime}} > pod definition for Marathon: > {code:java} > { > "id": "/simplepod", > "scaling": { "kind": "fixed", "instances": 1 }, > "containers": [ > { > "name": "sleep1", > "exec": { "command": { "shell": "sleep 1000" } }, > "resources": { "cpus": 0.1, "mem": 32 }, > "image": { > "id": "alpine", > "kind": "DOCKER" > } > } > ], > "networks": [ {"mode": "host"} ] > } > {code} > Mesos should probably check for this and avoid replacing the system root > mount point at startup or launch time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7374) Running DOCKER images in Mesos Container Runtime without `linux/filesystem` isolation enabled renders host unusable
Tim Harper created MESOS-7374: - Summary: Running DOCKER images in Mesos Container Runtime without `linux/filesystem` isolation enabled renders host unusable Key: MESOS-7374 URL: https://issues.apache.org/jira/browse/MESOS-7374 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 1.2.0 Reporter: Tim Harper Priority: Minor If I run the pod below (using Marathon 1.4.2) against a mesos agent that has the flags (also below), then the overlay filesystem replaces the system root mount, effectively rendering the host unusable until reboot. flags: - {{--containerizers mesos,docker}} - {{--image_providers APPC,DOCKER}} - {{--isolation cgroups/cpu,cgroups/mem,docker/runtime}} pod definition: {code:java} { "id": "/simplepod", "scaling": { "kind": "fixed", "instances": 1 }, "containers": [ { "name": "sleep1", "exec": { "command": { "shell": "sleep 1000" } }, "resources": { "cpus": 0.1, "mem": 32 }, "image": { "id": "alpine", "kind": "DOCKER" } } ], "networks": [ {"mode": "host"} ] } {code} Mesos should probably check for this at startup or launch time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-6223) Allow agents to re-register post a host reboot
[ https://issues.apache.org/jira/browse/MESOS-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890850#comment-15890850 ] Tim Harper commented on MESOS-6223: --- This should help fix an issue we are seeing with tasks and reserved resources in Marathon: https://github.com/mesosphere/marathon/issues/5284 In Marathon's case, when a residential (has reserved resources) task becomes unreachable, due to a the node rebooting, we never receive a terminal state for the task even though the host reboots and comes back online. This is because, we believe, during reconciliation we send the old agent ID and the task ID, and Mesos continually reports an unknown status. Were the agent in question to keep the same agent ID, then an explicit reconciliation of that agent ID + the task ID, I think, should be able to result in a status update which signals definite terminality. > Allow agents to re-register post a host reboot > -- > > Key: MESOS-6223 > URL: https://issues.apache.org/jira/browse/MESOS-6223 > Project: Mesos > Issue Type: Improvement > Components: agent >Reporter: Megha Sharma >Assignee: Megha Sharma > > Agent does’t recover its state post a host reboot, it registers with the > master and gets a new SlaveID. With partition awareness, the agents are now > allowed to re-register after they have been marked Unreachable. The executors > are anyway terminated on the agent when it reboots so there is no harm in > letting the agent keep its SlaveID, re-register with the master and reconcile > the lost executors. This is a pre-requisite for supporting > persistent/restartable tasks in mesos (MESOS-3545). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-6213) Build failure on macOS Sierra: Protobuf atomics deprecated.
[ https://issues.apache.org/jira/browse/MESOS-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15627854#comment-15627854 ] Tim Harper commented on MESOS-6213: --- As a workaround, you can run {{make CPPFLAGS="-Wno-deprecated-declarations"}} > Build failure on macOS Sierra: Protobuf atomics deprecated. > --- > > Key: MESOS-6213 > URL: https://issues.apache.org/jira/browse/MESOS-6213 > Project: Mesos > Issue Type: Bug > Components: build >Reporter: Charles Allen > > Building on OSX is giving the following error. > {code} > In file included from > ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops.h:184: > ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops_internals_macosx.h:173:9: > error: 'OSAtomicCompareAndSwap64Barrier' is deprecated: first > deprecated in macOS 10.12 - Use std::atomic_compare_exchange_strong() > from instead [-Werror,-Wdeprecated-declarations] > if (OSAtomicCompareAndSwap64Barrier( > ^ > /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libkern/OSAtomicDeprecated.h:645:9: > note: > 'OSAtomicCompareAndSwap64Barrier' has been explicitly marked deprecated > here > boolOSAtomicCompareAndSwap64Barrier( int64_t __oldValue, int64_t > __newValue, > ^ > {code} > Protobuf is not listed as a component so I just set it as {{build}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5824) Include disk source information in stringification
[ https://issues.apache.org/jira/browse/MESOS-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371326#comment-15371326 ] Tim Harper commented on MESOS-5824: --- https://reviews.apache.org/r/49910/diff/1#index_header > Include disk source information in stringification > -- > > Key: MESOS-5824 > URL: https://issues.apache.org/jira/browse/MESOS-5824 > Project: Mesos > Issue Type: Improvement > Components: stout >Affects Versions: 0.28.2 >Reporter: Tim Harper >Priority: Minor > Labels: mesosphere > Fix For: 1.1.0 > > > Some frameworks (like kafka_mesos) ignore the Source field when trying to > reserve an offered mount or path persistent volume; the resulting error > message is bewildering: > {code:none} > Task uses more resources > cpus(*):4; mem(*):4096; ports(*):[31000-31000]; disk(kafka, > kafka)[kafka_0:data]:960679 > than available > cpus(*):32; mem(*):256819; ports(*):[31000-32000]; disk(kafka, > kafka)[kafka_0:data]:960679; disk(*):240169; > {code} > The stringification of disk resources should include source information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5824) Include disk source information in stringification
[ https://issues.apache.org/jira/browse/MESOS-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Harper updated MESOS-5824: -- Attachment: (was: 0001-Output-disk-resource-source-information.patch) > Include disk source information in stringification > -- > > Key: MESOS-5824 > URL: https://issues.apache.org/jira/browse/MESOS-5824 > Project: Mesos > Issue Type: Improvement > Components: stout >Affects Versions: 0.28.2 >Reporter: Tim Harper >Priority: Minor > Labels: mesosphere > Fix For: 1.1.0 > > > Some frameworks (like kafka_mesos) ignore the Source field when trying to > reserve an offered mount or path persistent volume; the resulting error > message is bewildering: > {code:none} > Task uses more resources > cpus(*):4; mem(*):4096; ports(*):[31000-31000]; disk(kafka, > kafka)[kafka_0:data]:960679 > than available > cpus(*):32; mem(*):256819; ports(*):[31000-32000]; disk(kafka, > kafka)[kafka_0:data]:960679; disk(*):240169; > {code} > The stringification of disk resources should include source information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-5824) Include disk source information in stringification
[ https://issues.apache.org/jira/browse/MESOS-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Harper reassigned MESOS-5824: - Assignee: Tim Harper > Include disk source information in stringification > -- > > Key: MESOS-5824 > URL: https://issues.apache.org/jira/browse/MESOS-5824 > Project: Mesos > Issue Type: Improvement > Components: stout >Affects Versions: 0.28.2 >Reporter: Tim Harper >Assignee: Tim Harper >Priority: Minor > Labels: mesosphere > Fix For: 1.1.0 > > > Some frameworks (like kafka_mesos) ignore the Source field when trying to > reserve an offered mount or path persistent volume; the resulting error > message is bewildering: > {code:none} > Task uses more resources > cpus(*):4; mem(*):4096; ports(*):[31000-31000]; disk(kafka, > kafka)[kafka_0:data]:960679 > than available > cpus(*):32; mem(*):256819; ports(*):[31000-32000]; disk(kafka, > kafka)[kafka_0:data]:960679; disk(*):240169; > {code} > The stringification of disk resources should include source information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5824) Include disk source information in stringification
[ https://issues.apache.org/jira/browse/MESOS-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371312#comment-15371312 ] Tim Harper commented on MESOS-5824: --- The main motivation for the fix is to clarify an incredibly awful and misleading error message that happens when out-of-date mesos-frameworks try and allocate a persistent volume. > Include disk source information in stringification > -- > > Key: MESOS-5824 > URL: https://issues.apache.org/jira/browse/MESOS-5824 > Project: Mesos > Issue Type: Improvement > Components: stout >Affects Versions: 0.28.2 >Reporter: Tim Harper >Priority: Minor > Labels: mesosphere > Fix For: 1.1.0 > > Attachments: 0001-Output-disk-resource-source-information.patch > > > Some frameworks (like kafka_mesos) ignore the Source field when trying to > reserve an offered mount or path persistent volume; the resulting error > message is bewildering: > {code:none} > Task uses more resources > cpus(*):4; mem(*):4096; ports(*):[31000-31000]; disk(kafka, > kafka)[kafka_0:data]:960679 > than available > cpus(*):32; mem(*):256819; ports(*):[31000-32000]; disk(kafka, > kafka)[kafka_0:data]:960679; disk(*):240169; > {code} > The stringification of disk resources should include source information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5714) Specify soname for libmesos.so to major release
[ https://issues.apache.org/jira/browse/MESOS-5714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15369655#comment-15369655 ] Tim Harper commented on MESOS-5714: --- If there is an implicit expectation that the API doesn't change in backwards incompatible ways between point releases, then the configuration should be updated to reflect that. > Specify soname for libmesos.so to major release > --- > > Key: MESOS-5714 > URL: https://issues.apache.org/jira/browse/MESOS-5714 > Project: Mesos > Issue Type: Improvement > Components: build >Affects Versions: 0.28.2 >Reporter: Tim Harper > Labels: build > > I've installed mesos using the CentOS 7 package, and am building the > Ceph-Mesos framework. I've noticed when running {{ldd}} that {{ceph-mesos}} > is depending on too specific of a version of libmesos, which means that the > build will be broken on subsequent point releases. > This seems to be because the {{soname}} for libmesos is set to a very > unforgiving value. If {{libmesos-0.28.2}} truly isn't ABI compatible with > {{libmesos-0.28.x}}, then I suppose this is set correctly and this ticket > should be closed summarily, albeit unfortunate. > Here is the {{readelf}} output for {{libmesos}} > {code} > [root@6e189e07b470 /]# readelf -d /usr/local/lib/libmesos-0.28.2.so > Dynamic section at offset 0x194cd18 contains 43 entries: > TagType Name/Value > 0x0001 (NEEDED) Shared library: [libcrypt.so.1] > 0x0001 (NEEDED) Shared library: [libexpat.so.1] > 0x0001 (NEEDED) Shared library: [libdb-5.3.so] > 0x0001 (NEEDED) Shared library: [libsasl2.so.3] > 0x0001 (NEEDED) Shared library: [libsvn_delta-1.so.0] > 0x0001 (NEEDED) Shared library: [libsvn_subr-1.so.0] > 0x0001 (NEEDED) Shared library: [libaprutil-1.so.0] > 0x0001 (NEEDED) Shared library: [libapr-1.so.0] > 0x0001 (NEEDED) Shared library: [libpthread.so.0] > 0x0001 (NEEDED) Shared library: [libdl.so.2] > 0x0001 (NEEDED) Shared library: [libcurl.so.4] > 0x0001 (NEEDED) Shared library: [libz.so.1] > 0x0001 (NEEDED) Shared library: [librt.so.1] > 0x0001 (NEEDED) Shared library: [libstdc++.so.6] > 0x0001 (NEEDED) Shared library: [libm.so.6] > 0x0001 (NEEDED) Shared library: [libc.so.6] > 0x0001 (NEEDED) Shared library: > [ld-linux-x86-64.so.2] > 0x0001 (NEEDED) Shared library: [libgcc_s.so.1] > 0x000e (SONAME) Library soname: [libmesos-0.28.2.so] > 0x000f (RPATH) Library rpath: [/usr/lib/mesos] > 0x000c (INIT) 0x92a1f0 > 0x000d (FINI) 0x13a8e94 > 0x0019 (INIT_ARRAY) 0x1ae > 0x001b (INIT_ARRAYSZ) 1712 (bytes) > 0x001a (FINI_ARRAY) 0x1ae8f38 > 0x001c (FINI_ARRAYSZ) 8 (bytes) > 0x6ef5 (GNU_HASH) 0x228 > 0x0005 (STRTAB) 0x1b0be8 > 0x0006 (SYMTAB) 0x66a08 > 0x000a (STRSZ) 6130210 (bytes) > 0x000b (SYMENT) 24 (bytes) > 0x0003 (PLTGOT) 0x1b66000 > 0x0002 (PLTRELSZ) 387000 (bytes) > 0x0014 (PLTREL) RELA > 0x0017 (JMPREL) 0x8cba38 > 0x0007 (RELA) 0x7a5018 > 0x0008 (RELASZ) 1206816 (bytes) > 0x0009 (RELAENT)24 (bytes) > 0x6ffe (VERNEED)0x7a4e38 > 0x6fff (VERNEEDNUM) 8 > 0x6ff0 (VERSYM) 0x78960a > 0x6ff9 (RELACOUNT) 1357 > 0x (NULL) 0x0 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5714) Specify soname for libmesos.so to major release
[ https://issues.apache.org/jira/browse/MESOS-5714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15369654#comment-15369654 ] Tim Harper commented on MESOS-5714: --- The Ceph Mesos framework is written in C++ > Specify soname for libmesos.so to major release > --- > > Key: MESOS-5714 > URL: https://issues.apache.org/jira/browse/MESOS-5714 > Project: Mesos > Issue Type: Improvement > Components: build >Affects Versions: 0.28.2 >Reporter: Tim Harper > Labels: build > > I've installed mesos using the CentOS 7 package, and am building the > Ceph-Mesos framework. I've noticed when running {{ldd}} that {{ceph-mesos}} > is depending on too specific of a version of libmesos, which means that the > build will be broken on subsequent point releases. > This seems to be because the {{soname}} for libmesos is set to a very > unforgiving value. If {{libmesos-0.28.2}} truly isn't ABI compatible with > {{libmesos-0.28.x}}, then I suppose this is set correctly and this ticket > should be closed summarily, albeit unfortunate. > Here is the {{readelf}} output for {{libmesos}} > {code} > [root@6e189e07b470 /]# readelf -d /usr/local/lib/libmesos-0.28.2.so > Dynamic section at offset 0x194cd18 contains 43 entries: > TagType Name/Value > 0x0001 (NEEDED) Shared library: [libcrypt.so.1] > 0x0001 (NEEDED) Shared library: [libexpat.so.1] > 0x0001 (NEEDED) Shared library: [libdb-5.3.so] > 0x0001 (NEEDED) Shared library: [libsasl2.so.3] > 0x0001 (NEEDED) Shared library: [libsvn_delta-1.so.0] > 0x0001 (NEEDED) Shared library: [libsvn_subr-1.so.0] > 0x0001 (NEEDED) Shared library: [libaprutil-1.so.0] > 0x0001 (NEEDED) Shared library: [libapr-1.so.0] > 0x0001 (NEEDED) Shared library: [libpthread.so.0] > 0x0001 (NEEDED) Shared library: [libdl.so.2] > 0x0001 (NEEDED) Shared library: [libcurl.so.4] > 0x0001 (NEEDED) Shared library: [libz.so.1] > 0x0001 (NEEDED) Shared library: [librt.so.1] > 0x0001 (NEEDED) Shared library: [libstdc++.so.6] > 0x0001 (NEEDED) Shared library: [libm.so.6] > 0x0001 (NEEDED) Shared library: [libc.so.6] > 0x0001 (NEEDED) Shared library: > [ld-linux-x86-64.so.2] > 0x0001 (NEEDED) Shared library: [libgcc_s.so.1] > 0x000e (SONAME) Library soname: [libmesos-0.28.2.so] > 0x000f (RPATH) Library rpath: [/usr/lib/mesos] > 0x000c (INIT) 0x92a1f0 > 0x000d (FINI) 0x13a8e94 > 0x0019 (INIT_ARRAY) 0x1ae > 0x001b (INIT_ARRAYSZ) 1712 (bytes) > 0x001a (FINI_ARRAY) 0x1ae8f38 > 0x001c (FINI_ARRAYSZ) 8 (bytes) > 0x6ef5 (GNU_HASH) 0x228 > 0x0005 (STRTAB) 0x1b0be8 > 0x0006 (SYMTAB) 0x66a08 > 0x000a (STRSZ) 6130210 (bytes) > 0x000b (SYMENT) 24 (bytes) > 0x0003 (PLTGOT) 0x1b66000 > 0x0002 (PLTRELSZ) 387000 (bytes) > 0x0014 (PLTREL) RELA > 0x0017 (JMPREL) 0x8cba38 > 0x0007 (RELA) 0x7a5018 > 0x0008 (RELASZ) 1206816 (bytes) > 0x0009 (RELAENT)24 (bytes) > 0x6ffe (VERNEED)0x7a4e38 > 0x6fff (VERNEEDNUM) 8 > 0x6ff0 (VERSYM) 0x78960a > 0x6ff9 (RELACOUNT) 1357 > 0x (NULL) 0x0 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5824) Include disk source information in stringification
[ https://issues.apache.org/jira/browse/MESOS-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Harper updated MESOS-5824: -- Attachment: 0001-Output-disk-resource-source-information.patch Attached is the patch > Include disk source information in stringification > -- > > Key: MESOS-5824 > URL: https://issues.apache.org/jira/browse/MESOS-5824 > Project: Mesos > Issue Type: Improvement > Components: stout >Affects Versions: 0.28.2 >Reporter: Tim Harper > Fix For: 0.28.3 > > Attachments: 0001-Output-disk-resource-source-information.patch > > > Some frameworks (like kafka_mesos) ignore the Source field when trying to > reserve an offered mount or path persistent volume; the resulting error > message is bewildering: > {code:none} > Task uses more resources > cpus(*):4; mem(*):4096; ports(*):[31000-31000]; disk(kafka, > kafka)[kafka_0:data]:960679 > than available > cpus(*):32; mem(*):256819; ports(*):[31000-32000]; disk(kafka, > kafka)[kafka_0:data]:960679; disk(*):240169; > {code} > The stringification of disk resources should include source information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5823) Include disk source information in stringification
Tim Harper created MESOS-5823: - Summary: Include disk source information in stringification Key: MESOS-5823 URL: https://issues.apache.org/jira/browse/MESOS-5823 Project: Mesos Issue Type: Improvement Components: stout Affects Versions: 0.28.2 Reporter: Tim Harper Fix For: 0.28.3 Some frameworks (like kafka_mesos) ignore the Source field when trying to reserve an offered mount or path persistent volume; the resulting error message is bewildering: {code:none} Task uses more resources cpus(*):4; mem(*):4096; ports(*):[31000-31000]; disk(kafka, kafka)[kafka_0:data]:960679 than available cpus(*):32; mem(*):256819; ports(*):[31000-32000]; disk(kafka, kafka)[kafka_0:data]:960679; disk(*):240169; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5824) Include disk source information in stringification
Tim Harper created MESOS-5824: - Summary: Include disk source information in stringification Key: MESOS-5824 URL: https://issues.apache.org/jira/browse/MESOS-5824 Project: Mesos Issue Type: Improvement Components: stout Affects Versions: 0.28.2 Reporter: Tim Harper Fix For: 0.28.3 Some frameworks (like kafka_mesos) ignore the Source field when trying to reserve an offered mount or path persistent volume; the resulting error message is bewildering: {code:none} Task uses more resources cpus(*):4; mem(*):4096; ports(*):[31000-31000]; disk(kafka, kafka)[kafka_0:data]:960679 than available cpus(*):32; mem(*):256819; ports(*):[31000-32000]; disk(kafka, kafka)[kafka_0:data]:960679; disk(*):240169; {code} The stringification of disk resources should include source information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5714) Specify soname for libmesos.so to major release
Tim Harper created MESOS-5714: - Summary: Specify soname for libmesos.so to major release Key: MESOS-5714 URL: https://issues.apache.org/jira/browse/MESOS-5714 Project: Mesos Issue Type: Improvement Components: build Affects Versions: 0.28.2 Reporter: Tim Harper I've installed mesos using the CentOS 7 package, and am building the Ceph-Mesos framework. I've noticed when running {{ldd}} that {{ceph-mesos}} is depending on too specific of a version of libmesos, which means that the build will be broken on subsequent point releases. This seems to be because the {{soname}} for libmesos is set to a very unforgiving value. If {{libmesos-0.28.2}} truly isn't ABI compatible with {{libmesos-0.28.x}}, then I suppose this is set correctly and this ticket should be closed summarily, albeit unfortunate. Here is the {{readelf}} output for {{libmesos}} {code} [root@6e189e07b470 /]# readelf -d /usr/local/lib/libmesos-0.28.2.so Dynamic section at offset 0x194cd18 contains 43 entries: TagType Name/Value 0x0001 (NEEDED) Shared library: [libcrypt.so.1] 0x0001 (NEEDED) Shared library: [libexpat.so.1] 0x0001 (NEEDED) Shared library: [libdb-5.3.so] 0x0001 (NEEDED) Shared library: [libsasl2.so.3] 0x0001 (NEEDED) Shared library: [libsvn_delta-1.so.0] 0x0001 (NEEDED) Shared library: [libsvn_subr-1.so.0] 0x0001 (NEEDED) Shared library: [libaprutil-1.so.0] 0x0001 (NEEDED) Shared library: [libapr-1.so.0] 0x0001 (NEEDED) Shared library: [libpthread.so.0] 0x0001 (NEEDED) Shared library: [libdl.so.2] 0x0001 (NEEDED) Shared library: [libcurl.so.4] 0x0001 (NEEDED) Shared library: [libz.so.1] 0x0001 (NEEDED) Shared library: [librt.so.1] 0x0001 (NEEDED) Shared library: [libstdc++.so.6] 0x0001 (NEEDED) Shared library: [libm.so.6] 0x0001 (NEEDED) Shared library: [libc.so.6] 0x0001 (NEEDED) Shared library: [ld-linux-x86-64.so.2] 0x0001 (NEEDED) Shared library: [libgcc_s.so.1] 0x000e (SONAME) Library soname: [libmesos-0.28.2.so] 0x000f (RPATH) Library rpath: [/usr/lib/mesos] 0x000c (INIT) 0x92a1f0 0x000d (FINI) 0x13a8e94 0x0019 (INIT_ARRAY) 0x1ae 0x001b (INIT_ARRAYSZ) 1712 (bytes) 0x001a (FINI_ARRAY) 0x1ae8f38 0x001c (FINI_ARRAYSZ) 8 (bytes) 0x6ef5 (GNU_HASH) 0x228 0x0005 (STRTAB) 0x1b0be8 0x0006 (SYMTAB) 0x66a08 0x000a (STRSZ) 6130210 (bytes) 0x000b (SYMENT) 24 (bytes) 0x0003 (PLTGOT) 0x1b66000 0x0002 (PLTRELSZ) 387000 (bytes) 0x0014 (PLTREL) RELA 0x0017 (JMPREL) 0x8cba38 0x0007 (RELA) 0x7a5018 0x0008 (RELASZ) 1206816 (bytes) 0x0009 (RELAENT)24 (bytes) 0x6ffe (VERNEED)0x7a4e38 0x6fff (VERNEEDNUM) 8 0x6ff0 (VERSYM) 0x78960a 0x6ff9 (RELACOUNT) 1357 0x (NULL) 0x0 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)