[jira] [Resolved] (MESOS-2284) Slave cannot be registered while masters keep switching to another one.
[ https://issues.apache.org/jira/browse/MESOS-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hou Xiaokun resolved MESOS-2284. Resolution: Fixed Fix Version/s: 0.21.0 hi, I changed the quorum to 1. Slave can be displayed now! Thanks! Slave cannot be registered while masters keep switching to another one. --- Key: MESOS-2284 URL: https://issues.apache.org/jira/browse/MESOS-2284 Project: Mesos Issue Type: Bug Components: documentation Affects Versions: 0.20.1 Environment: Ubuntu14.04 Reporter: Hou Xiaokun Priority: Blocker Fix For: 0.21.0 I followed the instruction in page http://mesosphere.com/docs/getting-started/datacenter/install/. Setup two masters and one slave. And quorum value is 2. Configured ip addresses in hostname files separately. Here is the log from slave node, I0127 22:37:26.762953 1966 slave.cpp:627] No credentials provided. Attempting to register without authentication I0127 22:37:26.762985 1966 slave.cpp:638] Detecting new master I0127 22:37:26.763022 1966 status_update_manager.cpp:171] Pausing sending status updates I0127 22:38:06.683840 1962 slave.cpp:3321] Current usage 16.98%. Max allowed age: 5.111732713224155days I0127 22:38:26.986556 1966 slave.cpp:2623] master@10.27.17.135:5050 exited W0127 22:38:26.986675 1966 slave.cpp:2626] Master disconnected! Waiting for a new master to be elected I0127 22:38:34.909605 1963 detector.cpp:138] Detected a new leader: (id='2028') I0127 22:38:34.909811 1963 group.cpp:659] Trying to get '/mesos/info_002028' in ZooKeeper I0127 22:38:34.910909 1963 detector.cpp:433] A new leading master (UPID=master@10.27.16.214:5050) is detected I0127 22:38:34.910989 1963 slave.cpp:602] New master detected at master@10.27.16.214:5050 I0127 22:38:34.93 1963 slave.cpp:627] No credentials provided. Attempting to register without authentication I0127 22:38:34.911144 1963 slave.cpp:638] Detecting new master I0127 22:38:34.911183 1963 status_update_manager.cpp:171] Pausing sending status updates I0127 22:39:06.684526 1964 slave.cpp:3321] Current usage 16.98%. Max allowed age: 5.111731773610567days I0127 22:39:35.231653 1963 slave.cpp:2623] master@10.27.16.214:5050 exited W0127 22:39:35.231869 1963 slave.cpp:2626] Master disconnected! Waiting for a new master to be elected I0127 22:39:42.761540 1964 detector.cpp:138] Detected a new leader: (id='2029') I0127 22:39:42.761732 1964 group.cpp:659] Trying to get '/mesos/info_002029' in ZooKeeper I0127 22:39:42.762914 1964 detector.cpp:433] A new leading master (UPID=master@10.27.17.135:5050) is detected I0127 22:39:42.762984 1964 slave.cpp:602] New master detected at master@10.27.17.135:5050 I0127 22:39:42.763089 1964 slave.cpp:627] No credentials provided. Attempting to register without authentication I0127 22:39:42.763118 1964 slave.cpp:638] Detecting new master I0127 22:39:42.763155 1964 status_update_manager.cpp:171] Pausing sending status updates -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2276) Mesos-slave refuses to startup with many stopped docker containers
[ https://issues.apache.org/jira/browse/MESOS-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294907#comment-14294907 ] Dr. Stefan Schimanski commented on MESOS-2276: -- I have changed the topic of this issue. As the original issue is resolved, it is left that mesos-slave should behave much more forgiving in the situation of many stopped containers. Moreover, a proper error message would help to identify the problem. Mesos-slave refuses to startup with many stopped docker containers -- Key: MESOS-2276 URL: https://issues.apache.org/jira/browse/MESOS-2276 Project: Mesos Issue Type: Bug Components: docker, slave Affects Versions: 0.21.0, 0.21.1 Environment: Ubuntu 14.04LTS, Mesosphere packages Reporter: Dr. Stefan Schimanski The mesos-slave is launched as # /usr/local/sbin/mesos-slave --master=zk://10.0.0.1:2181,10.0.0.2:2181,10.0.0.3:2181/mesos --ip=10.0.0.2 --log_dir=/var/log/mesos --attributes=node_id:srv002 --checkpoint --containerizers=docker --executor_registration_timeout=5mins --logging_level=INFO giving this output: I0127 19:26:32.674113 19880 logging.cpp:172] INFO level logging started! I0127 19:26:32.674741 19880 main.cpp:142] Build: 2014-11-22 05:29:57 by root I0127 19:26:32.674774 19880 main.cpp:144] Version: 0.21.0 I0127 19:26:32.674799 19880 main.cpp:147] Git tag: 0.21.0 I0127 19:26:32.674824 19880 main.cpp:151] Git SHA: ab8fa655d34e8e15a4290422df38a18db1c09b5b I0127 19:26:32.786731 19880 main.cpp:165] Starting Mesos slave 2015-01-27 19:26:32,786:19880(0x7fcf0cf9f700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2015-01-27 19:26:32,786:19880(0x7fcf0cf9f700):ZOO_INFO@log_env@716: Client environment:host.name=srv002 2015-01-27 19:26:32,787:19880(0x7fcf0cf9f700):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2015-01-27 19:26:32,787:19880(0x7fcf0cf9f700):ZOO_INFO@log_env@724: Client environment:os.arch=3.13.0-44-generic 2015-01-27 19:26:32,787:19880(0x7fcf0cf9f700):ZOO_INFO@log_env@725: Client environment:os.version=#73-Ubuntu SMP Tue Dec 16 00:22:43 UTC 2014 2015-01-27 19:26:32,788:19880(0x7fcf0cf9f700):ZOO_INFO@log_env@733: Client environment:user.name=root 2015-01-27 19:26:32,788:19880(0x7fcf0cf9f700):ZOO_INFO@log_env@741: Client environment:user.home=/root 2015-01-27 19:26:32,788:19880(0x7fcf0cf9f700):ZOO_INFO@log_env@753: Client environment:user.dir=/root 2015-01-27 19:26:32,789:19880(0x7fcf0cf9f700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=10.0.0.1:2181,10.0.0.2:2181,10.0.0.3:2181 sessionTimeout=1 watcher=0x7fcf13592a0a sessionId=0 sessionPasswd=null context=0x7fceec0009e0 flags=0 I0127 19:26:32.796588 19880 slave.cpp:169] Slave started on 1)@10.0.0.2:5051 I0127 19:26:32.797345 19880 slave.cpp:289] Slave resources: cpus(*):8; mem(*):6960; disk(*):246731; ports(*):[31000-32000] I0127 19:26:32.798017 19880 slave.cpp:318] Slave hostname: srv002 I0127 19:26:32.798076 19880 slave.cpp:319] Slave checkpoint: true 2015-01-27 19:26:32,800:19880(0x7fcf08f5c700):ZOO_INFO@check_events@1703: initiated connection to server [10.0.0.1:2181] I0127 19:26:32.808229 19886 state.cpp:33] Recovering state from '/tmp/mesos/meta' I0127 19:26:32.809090 19882 status_update_manager.cpp:197] Recovering status update manager I0127 19:26:32.809677 19887 docker.cpp:767] Recovering Docker containers 2015-01-27 19:26:32,821:19880(0x7fcf08f5c700):ZOO_INFO@check_events@1750: session establishment complete on server [10.0.0.1:2181], sessionId=0x14b2adf7a560106, negotiated timeout=1 I0127 19:26:32.823292 19885 group.cpp:313] Group process (group(1)@10.0.0.2:5051) connected to ZooKeeper I0127 19:26:32.823443 19885 group.cpp:790] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) I0127 19:26:32.823484 19885 group.cpp:385] Trying to create path '/mesos' in ZooKeeper I0127 19:26:32.829711 19882 detector.cpp:138] Detected a new leader: (id='143') I0127 19:26:32.830559 19882 group.cpp:659] Trying to get '/mesos/info_000143' in ZooKeeper I0127 19:26:32.837913 19886 detector.cpp:433] A new leading master (UPID=master@10.0.0.1:5050) is detected Failed to perform recovery: Collect failed: Failed to create pipe: Too many open files To remedy this do as follows: Step 1: rm -f /tmp/mesos/meta/slaves/latest This ensures slave doesn't recover old live executors. Step 2: Restart the slave. At /tmp/mesos/meta/slaves/latest there is nothing. The slave was part of a 3 node cluster before. When started as an upstart service, the process is relaunched all the time and a large number of defunct processes appear, like these ones: root 30321 0.0 0.0 13000 440 ?S
[jira] [Updated] (MESOS-2276) Mesos-slave refuses to startup with many stopped docker containers
[ https://issues.apache.org/jira/browse/MESOS-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dr. Stefan Schimanski updated MESOS-2276: - Summary: Mesos-slave refuses to startup with many stopped docker containers (was: Mesos-slave with containerizer Docker doesn't startup anymore) Mesos-slave refuses to startup with many stopped docker containers -- Key: MESOS-2276 URL: https://issues.apache.org/jira/browse/MESOS-2276 Project: Mesos Issue Type: Bug Components: docker, slave Affects Versions: 0.21.0, 0.21.1 Environment: Ubuntu 14.04LTS, Mesosphere packages Reporter: Dr. Stefan Schimanski The mesos-slave is launched as # /usr/local/sbin/mesos-slave --master=zk://10.0.0.1:2181,10.0.0.2:2181,10.0.0.3:2181/mesos --ip=10.0.0.2 --log_dir=/var/log/mesos --attributes=node_id:srv002 --checkpoint --containerizers=docker --executor_registration_timeout=5mins --logging_level=INFO giving this output: I0127 19:26:32.674113 19880 logging.cpp:172] INFO level logging started! I0127 19:26:32.674741 19880 main.cpp:142] Build: 2014-11-22 05:29:57 by root I0127 19:26:32.674774 19880 main.cpp:144] Version: 0.21.0 I0127 19:26:32.674799 19880 main.cpp:147] Git tag: 0.21.0 I0127 19:26:32.674824 19880 main.cpp:151] Git SHA: ab8fa655d34e8e15a4290422df38a18db1c09b5b I0127 19:26:32.786731 19880 main.cpp:165] Starting Mesos slave 2015-01-27 19:26:32,786:19880(0x7fcf0cf9f700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2015-01-27 19:26:32,786:19880(0x7fcf0cf9f700):ZOO_INFO@log_env@716: Client environment:host.name=srv002 2015-01-27 19:26:32,787:19880(0x7fcf0cf9f700):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2015-01-27 19:26:32,787:19880(0x7fcf0cf9f700):ZOO_INFO@log_env@724: Client environment:os.arch=3.13.0-44-generic 2015-01-27 19:26:32,787:19880(0x7fcf0cf9f700):ZOO_INFO@log_env@725: Client environment:os.version=#73-Ubuntu SMP Tue Dec 16 00:22:43 UTC 2014 2015-01-27 19:26:32,788:19880(0x7fcf0cf9f700):ZOO_INFO@log_env@733: Client environment:user.name=root 2015-01-27 19:26:32,788:19880(0x7fcf0cf9f700):ZOO_INFO@log_env@741: Client environment:user.home=/root 2015-01-27 19:26:32,788:19880(0x7fcf0cf9f700):ZOO_INFO@log_env@753: Client environment:user.dir=/root 2015-01-27 19:26:32,789:19880(0x7fcf0cf9f700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=10.0.0.1:2181,10.0.0.2:2181,10.0.0.3:2181 sessionTimeout=1 watcher=0x7fcf13592a0a sessionId=0 sessionPasswd=null context=0x7fceec0009e0 flags=0 I0127 19:26:32.796588 19880 slave.cpp:169] Slave started on 1)@10.0.0.2:5051 I0127 19:26:32.797345 19880 slave.cpp:289] Slave resources: cpus(*):8; mem(*):6960; disk(*):246731; ports(*):[31000-32000] I0127 19:26:32.798017 19880 slave.cpp:318] Slave hostname: srv002 I0127 19:26:32.798076 19880 slave.cpp:319] Slave checkpoint: true 2015-01-27 19:26:32,800:19880(0x7fcf08f5c700):ZOO_INFO@check_events@1703: initiated connection to server [10.0.0.1:2181] I0127 19:26:32.808229 19886 state.cpp:33] Recovering state from '/tmp/mesos/meta' I0127 19:26:32.809090 19882 status_update_manager.cpp:197] Recovering status update manager I0127 19:26:32.809677 19887 docker.cpp:767] Recovering Docker containers 2015-01-27 19:26:32,821:19880(0x7fcf08f5c700):ZOO_INFO@check_events@1750: session establishment complete on server [10.0.0.1:2181], sessionId=0x14b2adf7a560106, negotiated timeout=1 I0127 19:26:32.823292 19885 group.cpp:313] Group process (group(1)@10.0.0.2:5051) connected to ZooKeeper I0127 19:26:32.823443 19885 group.cpp:790] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) I0127 19:26:32.823484 19885 group.cpp:385] Trying to create path '/mesos' in ZooKeeper I0127 19:26:32.829711 19882 detector.cpp:138] Detected a new leader: (id='143') I0127 19:26:32.830559 19882 group.cpp:659] Trying to get '/mesos/info_000143' in ZooKeeper I0127 19:26:32.837913 19886 detector.cpp:433] A new leading master (UPID=master@10.0.0.1:5050) is detected Failed to perform recovery: Collect failed: Failed to create pipe: Too many open files To remedy this do as follows: Step 1: rm -f /tmp/mesos/meta/slaves/latest This ensures slave doesn't recover old live executors. Step 2: Restart the slave. At /tmp/mesos/meta/slaves/latest there is nothing. The slave was part of a 3 node cluster before. When started as an upstart service, the process is relaunched all the time and a large number of defunct processes appear, like these ones: root 30321 0.0 0.0 13000 440 ?S19:28 0:00 iptables --wait -L -n root 30322 0.0 0.0 396 ?S19:28 0:00 sh -c docker inspect
[jira] [Reopened] (MESOS-2284) Slave cannot be registered while masters keep switching to another one.
[ https://issues.apache.org/jira/browse/MESOS-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov reopened MESOS-2284: Slave cannot be registered while masters keep switching to another one. --- Key: MESOS-2284 URL: https://issues.apache.org/jira/browse/MESOS-2284 Project: Mesos Issue Type: Bug Components: documentation Affects Versions: 0.20.1 Environment: Ubuntu14.04 Reporter: Hou Xiaokun Priority: Blocker Fix For: 0.21.0 I followed the instruction in page http://mesosphere.com/docs/getting-started/datacenter/install/. Setup two masters and one slave. And quorum value is 2. Configured ip addresses in hostname files separately. Here is the log from slave node, I0127 22:37:26.762953 1966 slave.cpp:627] No credentials provided. Attempting to register without authentication I0127 22:37:26.762985 1966 slave.cpp:638] Detecting new master I0127 22:37:26.763022 1966 status_update_manager.cpp:171] Pausing sending status updates I0127 22:38:06.683840 1962 slave.cpp:3321] Current usage 16.98%. Max allowed age: 5.111732713224155days I0127 22:38:26.986556 1966 slave.cpp:2623] master@10.27.17.135:5050 exited W0127 22:38:26.986675 1966 slave.cpp:2626] Master disconnected! Waiting for a new master to be elected I0127 22:38:34.909605 1963 detector.cpp:138] Detected a new leader: (id='2028') I0127 22:38:34.909811 1963 group.cpp:659] Trying to get '/mesos/info_002028' in ZooKeeper I0127 22:38:34.910909 1963 detector.cpp:433] A new leading master (UPID=master@10.27.16.214:5050) is detected I0127 22:38:34.910989 1963 slave.cpp:602] New master detected at master@10.27.16.214:5050 I0127 22:38:34.93 1963 slave.cpp:627] No credentials provided. Attempting to register without authentication I0127 22:38:34.911144 1963 slave.cpp:638] Detecting new master I0127 22:38:34.911183 1963 status_update_manager.cpp:171] Pausing sending status updates I0127 22:39:06.684526 1964 slave.cpp:3321] Current usage 16.98%. Max allowed age: 5.111731773610567days I0127 22:39:35.231653 1963 slave.cpp:2623] master@10.27.16.214:5050 exited W0127 22:39:35.231869 1963 slave.cpp:2626] Master disconnected! Waiting for a new master to be elected I0127 22:39:42.761540 1964 detector.cpp:138] Detected a new leader: (id='2029') I0127 22:39:42.761732 1964 group.cpp:659] Trying to get '/mesos/info_002029' in ZooKeeper I0127 22:39:42.762914 1964 detector.cpp:433] A new leading master (UPID=master@10.27.17.135:5050) is detected I0127 22:39:42.762984 1964 slave.cpp:602] New master detected at master@10.27.17.135:5050 I0127 22:39:42.763089 1964 slave.cpp:627] No credentials provided. Attempting to register without authentication I0127 22:39:42.763118 1964 slave.cpp:638] Detecting new master I0127 22:39:42.763155 1964 status_update_manager.cpp:171] Pausing sending status updates -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MESOS-2284) Slave cannot be registered while masters keep switching to another one.
[ https://issues.apache.org/jira/browse/MESOS-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov resolved MESOS-2284. Resolution: Not a Problem Slave cannot be registered while masters keep switching to another one. --- Key: MESOS-2284 URL: https://issues.apache.org/jira/browse/MESOS-2284 Project: Mesos Issue Type: Bug Components: documentation Affects Versions: 0.20.1 Environment: Ubuntu14.04 Reporter: Hou Xiaokun Priority: Blocker Fix For: 0.21.0 I followed the instruction in page http://mesosphere.com/docs/getting-started/datacenter/install/. Setup two masters and one slave. And quorum value is 2. Configured ip addresses in hostname files separately. Here is the log from slave node, I0127 22:37:26.762953 1966 slave.cpp:627] No credentials provided. Attempting to register without authentication I0127 22:37:26.762985 1966 slave.cpp:638] Detecting new master I0127 22:37:26.763022 1966 status_update_manager.cpp:171] Pausing sending status updates I0127 22:38:06.683840 1962 slave.cpp:3321] Current usage 16.98%. Max allowed age: 5.111732713224155days I0127 22:38:26.986556 1966 slave.cpp:2623] master@10.27.17.135:5050 exited W0127 22:38:26.986675 1966 slave.cpp:2626] Master disconnected! Waiting for a new master to be elected I0127 22:38:34.909605 1963 detector.cpp:138] Detected a new leader: (id='2028') I0127 22:38:34.909811 1963 group.cpp:659] Trying to get '/mesos/info_002028' in ZooKeeper I0127 22:38:34.910909 1963 detector.cpp:433] A new leading master (UPID=master@10.27.16.214:5050) is detected I0127 22:38:34.910989 1963 slave.cpp:602] New master detected at master@10.27.16.214:5050 I0127 22:38:34.93 1963 slave.cpp:627] No credentials provided. Attempting to register without authentication I0127 22:38:34.911144 1963 slave.cpp:638] Detecting new master I0127 22:38:34.911183 1963 status_update_manager.cpp:171] Pausing sending status updates I0127 22:39:06.684526 1964 slave.cpp:3321] Current usage 16.98%. Max allowed age: 5.111731773610567days I0127 22:39:35.231653 1963 slave.cpp:2623] master@10.27.16.214:5050 exited W0127 22:39:35.231869 1963 slave.cpp:2626] Master disconnected! Waiting for a new master to be elected I0127 22:39:42.761540 1964 detector.cpp:138] Detected a new leader: (id='2029') I0127 22:39:42.761732 1964 group.cpp:659] Trying to get '/mesos/info_002029' in ZooKeeper I0127 22:39:42.762914 1964 detector.cpp:433] A new leading master (UPID=master@10.27.17.135:5050) is detected I0127 22:39:42.762984 1964 slave.cpp:602] New master detected at master@10.27.17.135:5050 I0127 22:39:42.763089 1964 slave.cpp:627] No credentials provided. Attempting to register without authentication I0127 22:39:42.763118 1964 slave.cpp:638] Detecting new master I0127 22:39:42.763155 1964 status_update_manager.cpp:171] Pausing sending status updates -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-354) oversubscribe resources
[ https://issues.apache.org/jira/browse/MESOS-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295839#comment-14295839 ] Niklas Quarfot Nielsen commented on MESOS-354: -- Oversubscription means many things and can be considered as a subset of the currently ongoing effort in optimistic offers: Where optimistic offers lets the allocator to offer resources: - In multiple frameworks to increase 'parallelism' (opposed to the conservative/pessimistic scheme) and **increase task throughput**. - Preemptable resources from unallocated but reserved resources, to **limit reservation slack** (difference between reserverd and allocated resources). A third (and equally important) case, which expands these scenarios is oversubscription of _allocated_ resources which limits the **usage slack** (difference between allocated and used resources). There has been a lot of recent research which shows the ability to reduce usage slack with 60% while maintaining the Service Level Objective (SLO) of latency critical workloads(1). However, this kind of oversubscription needs policies and fine-tuning to make sure that best-effort tasks doesn't interfere with latency critical ones. Therefore, we'd like to start a discussion on how such a system would look in Mesos. I will create a JIRA ticket (linking to this one) to start the conversation. (1) http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/43017.pdf oversubscribe resources --- Key: MESOS-354 URL: https://issues.apache.org/jira/browse/MESOS-354 Project: Mesos Issue Type: Story Components: isolation, master, slave Reporter: brian wickman Priority: Minor Attachments: mesos_virtual_offers.pdf This proposal is predicated upon offer revocation. The idea would be to add a new revoked status either by (1) piggybacking off an existing status update (TASK_LOST or TASK_KILLED) or (2) introducing a new status update TASK_REVOKED. In order to augment an offer with metadata about revocability, there are options: 1) Add a revocable boolean to the Offer and a) offer only one type of Offer per slave at a particular time b) offer both revocable and non-revocable resources at the same time but require frameworks to understand that Offers can contain overlapping resources 2) Add a revocable_resources field on the Offer which is a superset of the regular resources field. By consuming resources = revocable_resources in a launchTask, the Task becomes a revocable task. If launching a task with resources, the Task is non-revocable. The use cases for revocable tasks are batch tasks (e.g. hadoop/pig/mapreduce) and non-revocable tasks are online higher-SLA tasks (e.g. services.) Consider a non-revocable that asks for 4 cores, 8 GB RAM and 20 GB of disk. One of these resources is a rate (4 cpu seconds per second) and two of them are fixed values (8GB and 20GB respectively, though disk resources can be further broken down into spindles - fixed - and iops - a rate.) In practice, these are the maximum resources in the respective dimensions that this task will use. In reality, we provision tasks at some factor below peak, and only hit peak resource consumption in rare circumstances or perhaps at a diurnal peak. In the meantime, we stand to gain from offering the some constant factor of the difference between (reserved - actual) of non-revocable tasks as revocable resources, depending upon our tolerance for revocable task churn. The main challenge is coming up with an accurate short / medium / long-term prediction of resource consumption based upon current behavior. In many cases it would be OK to be sloppy: * CPU / iops / network IO are rates (compressible) and can often be OK below guarantees for brief periods of time while task revocation takes place * Memory slack can be provided by enabling swap and dynamically setting swap paging boundaries. Should swap ever be activated, that would be a signal to revoke. The master / allocator would piggyback on the slave heartbeat mechanism to learn of the amount of revocable resources available at any point in time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2232) Suppress MockAllocator::transformAllocation() warnings.
[ https://issues.apache.org/jira/browse/MESOS-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296089#comment-14296089 ] Benjamin Mahler commented on MESOS-2232: First two are committed: {noformat} commit ccd697df0b7e05b07dee75d53e0ff55d6884ba2f Author: Benjamin Mahler benjamin.mah...@gmail.com Date: Fri Jan 16 12:13:01 2015 -0800 Renamed MockAllocatorProcess to TestAllocatorProcess. Review: https://reviews.apache.org/r/29989 {noformat} {noformat} commit b7bb6696b5a78dbc896b4756b7d4123e86c01635 Author: Benjamin Mahler benjamin.mah...@gmail.com Date: Fri Jan 16 14:10:05 2015 -0800 Updated TestAllocatorProcess to avoid the test warnings. Review: https://reviews.apache.org/r/29990 {noformat} Suppress MockAllocator::transformAllocation() warnings. --- Key: MESOS-2232 URL: https://issues.apache.org/jira/browse/MESOS-2232 Project: Mesos Issue Type: Bug Components: test Reporter: Alexander Rukletsov Assignee: Benjamin Mahler Priority: Minor After transforming allocated resources feature was added to allocator, a number of warnings are popping out for allocator tests. Commits leading to this behaviour: {{dacc88292cc13d4b08fe8cda4df71110a96cb12a}} {{5a02d5bdc75d3b1149dcda519016374be06ec6bd}} corresponding reviews: https://reviews.apache.org/r/29083 https://reviews.apache.org/r/29084 Here is an example: {code} [ RUN ] MasterAllocatorTest/0.FrameworkReregistersFirst GMOCK WARNING: Uninteresting mock function call - taking default action specified at: ../../../src/tests/mesos.hpp:719: Function call: transformAllocation(@0x7fd3bb5274d8 20150115-185632-1677764800-59671-44186-, @0x7fd3bb5274f8 20150115-185632-1677764800-59671-44186-S0, @0x1119140e0 16-byte object F0-5E 52-BB D3-7F 00-00 C0-5F 52-BB D3-7F 00-00) Stack trace: [ OK ] MasterAllocatorTest/0.FrameworkReregistersFirst (204 ms) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-2183) docker containerizer doesn't work when mesos-slave is running in a container
[ https://issues.apache.org/jira/browse/MESOS-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296106#comment-14296106 ] Jay Buffington edited comment on MESOS-2183 at 1/29/15 12:10 AM: - Hey [~tnachen], I read your doc at https://docs.google.com/document/d/1_1oLHXg_aHj_fYCzsjYwox9xvIYNAKIeVjO5BFxsUGI/edit# and it's not clear you address the issue I encountered. In my mesos-slave running in coreos I have it: * running inside a pid namespace * using the mounted /var/run/docker.sock to start a sibling container * running docker inspect to get the pid it just launched * it sees that the pid docker inspect reports * it tries to determine the libprocess port based on that pid * it doesn't see that pid since the pid docker inspect returns is only visible in the root namespace * it does docker stop/kill because it incorrectly thinks the executor failed to start since it couldn't see the pid I don't understand how your patch addresses that issue. Can you give me a summary of how it fixes this problem I've described? was (Author: jaybuff): Hey [~tnachen], I read your doc at https://docs.google.com/document/d/1_1oLHXg_aHj_fYCzsjYwox9xvIYNAKIeVjO5BFxsUGI/edit# and it's not clear you address the issue I encountered. In my mesos-slave running in coreos I have it: * running inside a pid namespace * useing the mounted /var/run/docker.sock to start a sibling container * running docker inspect to get the pid it just launched * it sees that the pid docker inspect reports * it tries to determine the libprocess port based on that pid * it does see that pid since the pid docker inspect returns is only visible in the root namespace * it does docker stop/kill because it incorrectly thinks the executor failed to start since it couldn't see the pid I don't understand how your patch addresses that issue. Can you give me a summary of how it fixes this problem I've described? docker containerizer doesn't work when mesos-slave is running in a container Key: MESOS-2183 URL: https://issues.apache.org/jira/browse/MESOS-2183 Project: Mesos Issue Type: Bug Components: containerization, docker Reporter: Jay Buffington Assignee: Timothy Chen I've started running the mesos-slave process itself inside a docker container. I bind mount in the dockerd socket, so there is only one docker daemon running on the system. The mesos-slave process uses docker run to start an executor in another, sibling, container. It asks docker inspect what the pid of the executor running in the container is. Since the mesos-slave process is in its own pid namespace, it cannot see the pid for the executor in /proc. Therefore, it thinks the executor died and it does a docker kill. It looks like the executor pid is also used to determine what port the executor is listening on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2144) Segmentation Fault in ExamplesTest.LowLevelSchedulerPthread
[ https://issues.apache.org/jira/browse/MESOS-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296128#comment-14296128 ] Cody Maloney commented on MESOS-2144: - Based on the addresses being at the low end of the address range I'm guessing it is happening while running __cxa_exit (global static destruction), or some other system cleanup symbol and this is during glibc doing something on mesos' behalf. Likely whatever that library is doesn't have symbols / is stripped if it is coming from the Linux distribution. Side note: Backtraces from our code don't use the debugging info. But yea, definitely looks like debugging is enabled. And functions shouldn't be optimized, binary isn't stripped of symbols, so stack traces should have all the function symbols. Segmentation Fault in ExamplesTest.LowLevelSchedulerPthread --- Key: MESOS-2144 URL: https://issues.apache.org/jira/browse/MESOS-2144 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.21.0 Reporter: Cody Maloney Priority: Minor Labels: flaky Occured on review bot review of: https://reviews.apache.org/r/28262/#review62333 The review doesn't touch code related to the test (And doesn't break libprocess in general) [ RUN ] ExamplesTest.LowLevelSchedulerPthread ../../src/tests/script.cpp:83: Failure Failed low_level_scheduler_pthread_test.sh terminated with signal Segmentation fault [ FAILED ] ExamplesTest.LowLevelSchedulerPthread (7561 ms) The test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2183) docker containerizer doesn't work when mesos-slave is running in a container
[ https://issues.apache.org/jira/browse/MESOS-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296106#comment-14296106 ] Jay Buffington commented on MESOS-2183: --- Hey [~tnachen], I read your doc at https://docs.google.com/document/d/1_1oLHXg_aHj_fYCzsjYwox9xvIYNAKIeVjO5BFxsUGI/edit# and it's not clear you address the issue I encountered. In my mesos-slave running in coreos I have it: * running inside a pid namespace * useing the mounted /var/run/docker.sock to start a sibling container * running docker inspect to get the pid it just launched * it sees that the pid docker inspect reports * it tries to determine the libprocess port based on that pid * it does see that pid since the pid docker inspect returns is only visible in the root namespace * it does docker stop/kill because it incorrectly thinks the executor failed to start since it couldn't see the pid I don't understand how your patch addresses that issue. Can you give me a summary of how it fixes this problem I've described? docker containerizer doesn't work when mesos-slave is running in a container Key: MESOS-2183 URL: https://issues.apache.org/jira/browse/MESOS-2183 Project: Mesos Issue Type: Bug Components: containerization, docker Reporter: Jay Buffington Assignee: Timothy Chen I've started running the mesos-slave process itself inside a docker container. I bind mount in the dockerd socket, so there is only one docker daemon running on the system. The mesos-slave process uses docker run to start an executor in another, sibling, container. It asks docker inspect what the pid of the executor running in the container is. Since the mesos-slave process is in its own pid namespace, it cannot see the pid for the executor in /proc. Therefore, it thinks the executor died and it does a docker kill. It looks like the executor pid is also used to determine what port the executor is listening on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2162) Consider a C++ implementation of CoreOS AppContainer spec
[ https://issues.apache.org/jira/browse/MESOS-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295983#comment-14295983 ] Steven Schlansker commented on MESOS-2162: -- I would love to help out in any way I can, but I am not much of a C++ guy. But at the very least I would happily test it, or if you have other suggestions for how I can help... Consider a C++ implementation of CoreOS AppContainer spec - Key: MESOS-2162 URL: https://issues.apache.org/jira/browse/MESOS-2162 Project: Mesos Issue Type: Story Components: containerization Reporter: Dominic Hamon Labels: mesosphere, twitter CoreOS have released a [specification|https://github.com/coreos/rocket/blob/master/app-container/SPEC.md] for a container abstraction as an alternative to Docker. They have also released a reference implementation, [rocket|https://coreos.com/blog/rocket/]. We should consider a C++ implementation of the specification to have parity with the community and then use this implementation for our containerizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2289) Design doc for the HTTP API
Vinod Kone created MESOS-2289: - Summary: Design doc for the HTTP API Key: MESOS-2289 URL: https://issues.apache.org/jira/browse/MESOS-2289 Project: Mesos Issue Type: Task Reporter: Vinod Kone This tracks the design of the HTTP API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2288) HTTP API for interacting with Mesos
[ https://issues.apache.org/jira/browse/MESOS-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2288: -- Epic Name: HTTP API (was: http api) HTTP API for interacting with Mesos --- Key: MESOS-2288 URL: https://issues.apache.org/jira/browse/MESOS-2288 Project: Mesos Issue Type: Epic Reporter: Vinod Kone Currently Mesos frameworks (schedulers and executors) interact with Mesos (masters and slaves) via drivers provided by Mesos. While the driver helped in providing some common functionality for all frameworks (master detection, authentication, validation etc), it has several drawbacks. -- Frameworks need to depend on a native library which makes their build/deploy process cumbersome. -- Pure language frameworks cannot use off the shelf libraries to interact with the undocumented API used by the driver. -- Makes it hard for developers to implement new APIs (lot of boiler plate code to write). This proposal is for Mesos to provide a well documented public HTTP API that frameworks (and maybe operators) can use to interact with Mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2162) Consider a C++ implementation of CoreOS AppContainer spec
[ https://issues.apache.org/jira/browse/MESOS-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295944#comment-14295944 ] Steven Schlansker commented on MESOS-2162: -- This library may be a good starting point: https://github.com/cdaylward/libappc/ Consider a C++ implementation of CoreOS AppContainer spec - Key: MESOS-2162 URL: https://issues.apache.org/jira/browse/MESOS-2162 Project: Mesos Issue Type: Story Components: containerization Reporter: Dominic Hamon Labels: mesosphere, twitter CoreOS have released a [specification|https://github.com/coreos/rocket/blob/master/app-container/SPEC.md] for a container abstraction as an alternative to Docker. They have also released a reference implementation, [rocket|https://coreos.com/blog/rocket/]. We should consider a C++ implementation of the specification to have parity with the community and then use this implementation for our containerizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2162) Consider a C++ implementation of CoreOS AppContainer spec
[ https://issues.apache.org/jira/browse/MESOS-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295978#comment-14295978 ] Timothy Chen commented on MESOS-2162: - Hi Steven, that's what I think too. It's my plan to work on this but this quarter I won't have much time to do so. Are you interested in this? We could work together. Consider a C++ implementation of CoreOS AppContainer spec - Key: MESOS-2162 URL: https://issues.apache.org/jira/browse/MESOS-2162 Project: Mesos Issue Type: Story Components: containerization Reporter: Dominic Hamon Labels: mesosphere, twitter CoreOS have released a [specification|https://github.com/coreos/rocket/blob/master/app-container/SPEC.md] for a container abstraction as an alternative to Docker. They have also released a reference implementation, [rocket|https://coreos.com/blog/rocket/]. We should consider a C++ implementation of the specification to have parity with the community and then use this implementation for our containerizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2162) Consider a C++ implementation of CoreOS AppContainer spec
[ https://issues.apache.org/jira/browse/MESOS-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295996#comment-14295996 ] Ian Downes commented on MESOS-2162: --- I'll be working on this too, development and/or shepherding. Consider a C++ implementation of CoreOS AppContainer spec - Key: MESOS-2162 URL: https://issues.apache.org/jira/browse/MESOS-2162 Project: Mesos Issue Type: Story Components: containerization Reporter: Dominic Hamon Labels: mesosphere, twitter CoreOS have released a [specification|https://github.com/coreos/rocket/blob/master/app-container/SPEC.md] for a container abstraction as an alternative to Docker. They have also released a reference implementation, [rocket|https://coreos.com/blog/rocket/]. We should consider a C++ implementation of the specification to have parity with the community and then use this implementation for our containerizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2291) Move executor driver validations to slave
Vinod Kone created MESOS-2291: - Summary: Move executor driver validations to slave Key: MESOS-2291 URL: https://issues.apache.org/jira/browse/MESOS-2291 Project: Mesos Issue Type: Task Reporter: Vinod Kone With HTTP API, the executor driver will no longer exist and hence all the validations should move to the slave. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2297) Add authentication support for HTTP API
Vinod Kone created MESOS-2297: - Summary: Add authentication support for HTTP API Key: MESOS-2297 URL: https://issues.apache.org/jira/browse/MESOS-2297 Project: Mesos Issue Type: Task Reporter: Vinod Kone To start with, we will only support basic http auth. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2295) Implement the Call endpoint on Slave
Vinod Kone created MESOS-2295: - Summary: Implement the Call endpoint on Slave Key: MESOS-2295 URL: https://issues.apache.org/jira/browse/MESOS-2295 Project: Mesos Issue Type: Task Reporter: Vinod Kone -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2298) Provide master detection library/libraries for pure schedulers
Vinod Kone created MESOS-2298: - Summary: Provide master detection library/libraries for pure schedulers Key: MESOS-2298 URL: https://issues.apache.org/jira/browse/MESOS-2298 Project: Mesos Issue Type: Task Reporter: Vinod Kone When schedulers start interacting with Mesos master via HTTP endpoints, they need a way to detect masters. Ideally, Mesos provides master detection library/libraries in supported languages (java and python to start with) to make this easy for frameworks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2162) Consider a C++ implementation of CoreOS AppContainer spec
[ https://issues.apache.org/jira/browse/MESOS-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295939#comment-14295939 ] Steven Schlansker commented on MESOS-2162: -- Any possibility of getting this scheduled for an upcoming release? Consider a C++ implementation of CoreOS AppContainer spec - Key: MESOS-2162 URL: https://issues.apache.org/jira/browse/MESOS-2162 Project: Mesos Issue Type: Story Components: containerization Reporter: Dominic Hamon Labels: mesosphere, twitter CoreOS have released a [specification|https://github.com/coreos/rocket/blob/master/app-container/SPEC.md] for a container abstraction as an alternative to Docker. They have also released a reference implementation, [rocket|https://coreos.com/blog/rocket/]. We should consider a C++ implementation of the specification to have parity with the community and then use this implementation for our containerizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2162) Consider a C++ implementation of CoreOS AppContainer spec
[ https://issues.apache.org/jira/browse/MESOS-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295981#comment-14295981 ] Steven Schlansker commented on MESOS-2162: -- I would love to help out in any way I can, but I am not much of a C++ guy. But at the very least I would happily test it, or if you have other suggestions for how I can help... Consider a C++ implementation of CoreOS AppContainer spec - Key: MESOS-2162 URL: https://issues.apache.org/jira/browse/MESOS-2162 Project: Mesos Issue Type: Story Components: containerization Reporter: Dominic Hamon Labels: mesosphere, twitter CoreOS have released a [specification|https://github.com/coreos/rocket/blob/master/app-container/SPEC.md] for a container abstraction as an alternative to Docker. They have also released a reference implementation, [rocket|https://coreos.com/blog/rocket/]. We should consider a C++ implementation of the specification to have parity with the community and then use this implementation for our containerizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (MESOS-2162) Consider a C++ implementation of CoreOS AppContainer spec
[ https://issues.apache.org/jira/browse/MESOS-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Schlansker updated MESOS-2162: - Comment: was deleted (was: I would love to help out in any way I can, but I am not much of a C++ guy. But at the very least I would happily test it, or if you have other suggestions for how I can help...) Consider a C++ implementation of CoreOS AppContainer spec - Key: MESOS-2162 URL: https://issues.apache.org/jira/browse/MESOS-2162 Project: Mesos Issue Type: Story Components: containerization Reporter: Dominic Hamon Labels: mesosphere, twitter CoreOS have released a [specification|https://github.com/coreos/rocket/blob/master/app-container/SPEC.md] for a container abstraction as an alternative to Docker. They have also released a reference implementation, [rocket|https://coreos.com/blog/rocket/]. We should consider a C++ implementation of the specification to have parity with the community and then use this implementation for our containerizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1127) Expose lower-level scheduler/executor API
[ https://issues.apache.org/jira/browse/MESOS-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-1127: -- Epic Name: (was: HTTP API) Issue Type: Task (was: Epic) Expose lower-level scheduler/executor API - Key: MESOS-1127 URL: https://issues.apache.org/jira/browse/MESOS-1127 Project: Mesos Issue Type: Task Components: framework Reporter: Benjamin Hindman Assignee: Benjamin Hindman Labels: twitter The default scheduler/executor interface and implementation in Mesos have a few drawbacks: (1) The interface is fairly high-level which makes it hard to do certain things, for example, handle events (callbacks) in batch. This can have a big impact on the performance of schedulers (for example, writing task updates that need to be persisted). (2) The implementation requires writing a lot of boilerplate JNI and native Python wrappers when adding additional API components. The plan is to provide a lower-level API that can easily be used to implement the higher-level API that is currently provided. This will also open the door to more easily building native-language Mesos libraries (i.e., not needing the C++ shim layer) and building new higher-level abstractions on top of the lower-level API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2293) Implement the Call endpoint on master
Vinod Kone created MESOS-2293: - Summary: Implement the Call endpoint on master Key: MESOS-2293 URL: https://issues.apache.org/jira/browse/MESOS-2293 Project: Mesos Issue Type: Task Reporter: Vinod Kone -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2292) Implement Call/Event protobufs for Executor
Vinod Kone created MESOS-2292: - Summary: Implement Call/Event protobufs for Executor Key: MESOS-2292 URL: https://issues.apache.org/jira/browse/MESOS-2292 Project: Mesos Issue Type: Task Reporter: Vinod Kone -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2294) Implement the Events endpoint on master
Vinod Kone created MESOS-2294: - Summary: Implement the Events endpoint on master Key: MESOS-2294 URL: https://issues.apache.org/jira/browse/MESOS-2294 Project: Mesos Issue Type: Task Reporter: Vinod Kone -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2215) The Docker containerizer attempts to recover any task when checkpointing is enabled, not just docker tasks.
[ https://issues.apache.org/jira/browse/MESOS-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Niemitz updated MESOS-2215: - Description: Once the slave restarts and recovers the task, I see this error in the log for all tasks that were recovered every second or so. Note, these were NOT docker tasks: W0113 16:01:00.790323 773142 monitor.cpp:213] Failed to get resource usage for container 7b729b89-dc7e-4d08-af97-8cd1af560a21 for executor thermos-1421085237813-slipstream-prod-agent-3-8f769514-1835-4151-90d0-3f55dcc940dd of framework 20150109-161713-715350282-5050-290797-: Failed to 'docker inspect mesos-7b729b89-dc7e-4d08-af97-8cd1af560a21': exit status = exited with status 1 stderr = Error: No such image or container: mesos-7b729b89-dc7e-4d08-af97-8cd1af560a21 However the tasks themselves are still healthy and running. The slave was launched with --containerizers=mesos,docker - More info: it looks like the docker containerizer is a little too ambitious about recovering containers, again this was not a docker task: I0113 15:59:59.476145 773142 docker.cpp:814] Recovering container '7b729b89-dc7e-4d08-af97-8cd1af560a21' for executor 'thermos-1421085237813-slipstream-prod-agent-3-8f769514-1835-4151-90d0-3f55dcc940dd' of framework 20150109-161713-715350282-5050-290797- Looking into the source, it looks like the problem is that the ComposingContainerize runs recover in parallel, but neither the docker containerizer nor mesos containerizer check if they should recover the task or not (because they were the ones that launched it). Perhaps this needs to be written into the checkpoint somewhere? was: Once the slave restarts and recovers the task, I see this error in the log for all tasks that were recovered every second or so. Note, these were NOT docker tasks: W0113 16:01:00.790323 773142 monitor.cpp:213] Failed to get resource usage for container 7b729b89-dc7e-4d08-af97-8cd1af560a21 for executor thermos-1421085237813-slipstream-prod-agent-3-8f769514-1835-4151-90d0-3f55dcc940dd of framework 20150109-161713-715350282-5050-290797-: Failed to 'docker inspect mesos-7b729b89-dc7e-4d08-af97-8cd1af560a21': exit status = exited with status 1 stderr = Error: No such image or container: mesos-7b729b89-dc7e-4d08-af97-8cd1af560a21 However the tasks themselves are still healthy and running. The slave was launched with --containerizers=mesos,docker - More info: it looks like the docker containerizer is a little too ambitious about recovering containers, again this was not a docker task: I0113 15:59:59.476145 773142 docker.cpp:814] Recovering container '7b729b89-dc7e-4d08-af97-8cd1af560a21' for executor 'thermos-1421085237813-slipstream-prod-agent-3-8f769514-1835-4151-90d0-3f55dcc940dd' of framework 20150109-161713-715350282-5050-290797- Looking into the source, it looks like the problem is that the ComposingContainerize runs recover in parallel, but neither the docker containerizer not mesos containerizer check if they should recover the task or not (because they were the ones that launched it). Perhaps this needs to be written into the checkpoint somewhere? The Docker containerizer attempts to recover any task when checkpointing is enabled, not just docker tasks. --- Key: MESOS-2215 URL: https://issues.apache.org/jira/browse/MESOS-2215 Project: Mesos Issue Type: Bug Components: docker Affects Versions: 0.21.0 Reporter: Steve Niemitz Assignee: Timothy Chen Once the slave restarts and recovers the task, I see this error in the log for all tasks that were recovered every second or so. Note, these were NOT docker tasks: W0113 16:01:00.790323 773142 monitor.cpp:213] Failed to get resource usage for container 7b729b89-dc7e-4d08-af97-8cd1af560a21 for executor thermos-1421085237813-slipstream-prod-agent-3-8f769514-1835-4151-90d0-3f55dcc940dd of framework 20150109-161713-715350282-5050-290797-: Failed to 'docker inspect mesos-7b729b89-dc7e-4d08-af97-8cd1af560a21': exit status = exited with status 1 stderr = Error: No such image or container: mesos-7b729b89-dc7e-4d08-af97-8cd1af560a21 However the tasks themselves are still healthy and running. The slave was launched with --containerizers=mesos,docker - More info: it looks like the docker containerizer is a little too ambitious about recovering containers, again this was not a docker task: I0113 15:59:59.476145 773142 docker.cpp:814] Recovering container '7b729b89-dc7e-4d08-af97-8cd1af560a21' for executor 'thermos-1421085237813-slipstream-prod-agent-3-8f769514-1835-4151-90d0-3f55dcc940dd' of framework 20150109-161713-715350282-5050-290797- Looking into the source, it looks like the problem is
[jira] [Created] (MESOS-2290) Move all scheduler driver validations to master
Vinod Kone created MESOS-2290: - Summary: Move all scheduler driver validations to master Key: MESOS-2290 URL: https://issues.apache.org/jira/browse/MESOS-2290 Project: Mesos Issue Type: Task Reporter: Vinod Kone With HTTP API, the scheduler driver will no longer exist and hence all the validations should move to the master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1806) Substituting etcd or ReplicatedLog for Zookeeper
[ https://issues.apache.org/jira/browse/MESOS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296345#comment-14296345 ] Cody Maloney commented on MESOS-1806: - https://reviews.apache.org/r/30194/ https://reviews.apache.org/r/30195/ https://reviews.apache.org/r/30393/ https://reviews.apache.org/r/30394/ https://reviews.apache.org/r/30395/ https://reviews.apache.org/r/30396/ https://reviews.apache.org/r/30397/ https://reviews.apache.org/r/30398/ Substituting etcd or ReplicatedLog for Zookeeper Key: MESOS-1806 URL: https://issues.apache.org/jira/browse/MESOS-1806 Project: Mesos Issue Type: Task Reporter: Ed Ropple Assignee: Cody Maloney Priority: Minor adam_mesos eropple: Could you also file a new JIRA for Mesos to drop ZK in favor of etcd or ReplicatedLog? Would love to get some momentum going on that one. -- Consider it filed. =) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1825) Support the webui over HTTPS.
[ https://issues.apache.org/jira/browse/MESOS-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-1825: --- Summary: Support the webui over HTTPS. (was: support https link) Support the webui over HTTPS. - Key: MESOS-1825 URL: https://issues.apache.org/jira/browse/MESOS-1825 Project: Mesos Issue Type: Bug Components: webui Reporter: Kien Pham Priority: Minor Labels: newbie Right now at Mesos UI, link are hardcoded to http:// . It should not be hardcoded so that it can support https link. Ex: https://github.com/apache/mesos/blob/master/src/webui/master/static/js/controllers.js#L17 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1825) Support the webui over HTTPS.
[ https://issues.apache.org/jira/browse/MESOS-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296276#comment-14296276 ] ASF GitHub Bot commented on MESOS-1825: --- Github user bmahler commented on the pull request: https://github.com/apache/mesos/pull/34#issuecomment-71957729 Thanks Arnaud! Nice, there will be built-in HTTPS support in Mesos at some point, you may want to chime in here: https://issues.apache.org/jira/browse/MESOS-1825 Support the webui over HTTPS. - Key: MESOS-1825 URL: https://issues.apache.org/jira/browse/MESOS-1825 Project: Mesos Issue Type: Bug Components: webui Reporter: Kien Pham Priority: Minor Labels: newbie Right now at Mesos UI, link are hardcoded to http:// . It should not be hardcoded so that it can support https link. Ex: https://github.com/apache/mesos/blob/master/src/webui/master/static/js/controllers.js#L17 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2183) docker containerizer doesn't work when mesos-slave is running in a container
[ https://issues.apache.org/jira/browse/MESOS-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296300#comment-14296300 ] Timothy Chen commented on MESOS-2183: - So I'm planning to leverage the --pid=host flag in docker 1.5, which won't clone a new pid namespace. With this you won't see the problems you are seeing. What I described in my doc is to handle recovery, docker containerizer doesn't work when mesos-slave is running in a container Key: MESOS-2183 URL: https://issues.apache.org/jira/browse/MESOS-2183 Project: Mesos Issue Type: Bug Components: containerization, docker Reporter: Jay Buffington Assignee: Timothy Chen I've started running the mesos-slave process itself inside a docker container. I bind mount in the dockerd socket, so there is only one docker daemon running on the system. The mesos-slave process uses docker run to start an executor in another, sibling, container. It asks docker inspect what the pid of the executor running in the container is. Since the mesos-slave process is in its own pid namespace, it cannot see the pid for the executor in /proc. Therefore, it thinks the executor died and it does a docker kill. It looks like the executor pid is also used to determine what port the executor is listening on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2228) SlaveTest.MesosExecutorGracefulShutdown is flaky
[ https://issues.apache.org/jira/browse/MESOS-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler reassigned MESOS-2228: -- Assignee: Benjamin Mahler (was: Alexander Rukletsov) {quote} (or is not being reaped) {quote} From the output, we're not seeing 'Terminated' in the output, which means that it's the SIGKILL reaching the pid, not the SIGTERM, no? Because of this, it doesn't seem like it's a reaping issue, anything I'm missing? {quote} From the logs it looks like a simple sleep task doesn't terminate {quote} Looks like this to me as well, these are VMs and we sometimes see strange blocking behavior. I've bumped the timeout for now and included a nicer error message. Please take a look: https://reviews.apache.org/r/30402/ SlaveTest.MesosExecutorGracefulShutdown is flaky Key: MESOS-2228 URL: https://issues.apache.org/jira/browse/MESOS-2228 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.22.0 Reporter: Vinod Kone Assignee: Benjamin Mahler Labels: twitter Observed this on internal CI {noformat} [ RUN ] SlaveTest.MesosExecutorGracefulShutdown Using temporary directory '/tmp/SlaveTest_MesosExecutorGracefulShutdown_AWdtVJ' I0124 08:14:04.399211 7926 leveldb.cpp:176] Opened db in 27.364056ms I0124 08:14:04.402632 7926 leveldb.cpp:183] Compacted db in 3.357646ms I0124 08:14:04.402691 7926 leveldb.cpp:198] Created db iterator in 23822ns I0124 08:14:04.402708 7926 leveldb.cpp:204] Seeked to beginning of db in 1913ns I0124 08:14:04.402716 7926 leveldb.cpp:273] Iterated through 0 keys in the db in 458ns I0124 08:14:04.402767 7926 replica.cpp:744] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0124 08:14:04.403728 7951 recover.cpp:449] Starting replica recovery I0124 08:14:04.404011 7951 recover.cpp:475] Replica is in EMPTY status I0124 08:14:04.407765 7950 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0124 08:14:04.408710 7951 recover.cpp:195] Received a recover response from a replica in EMPTY status I0124 08:14:04.419666 7951 recover.cpp:566] Updating replica status to STARTING I0124 08:14:04.429719 7953 master.cpp:262] Master 20150124-081404-16842879-47787-7926 (utopic) started on 127.0.1.1:47787 I0124 08:14:04.429790 7953 master.cpp:308] Master only allowing authenticated frameworks to register I0124 08:14:04.429802 7953 master.cpp:313] Master only allowing authenticated slaves to register I0124 08:14:04.429826 7953 credentials.hpp:36] Loading credentials for authentication from '/tmp/SlaveTest_MesosExecutorGracefulShutdown_AWdtVJ/credentials' I0124 08:14:04.430277 7953 master.cpp:357] Authorization enabled I0124 08:14:04.432682 7953 master.cpp:1219] The newly elected leader is master@127.0.1.1:47787 with id 20150124-081404-16842879-47787-7926 I0124 08:14:04.432816 7953 master.cpp:1232] Elected as the leading master! I0124 08:14:04.432894 7953 master.cpp:1050] Recovering from registrar I0124 08:14:04.433212 7950 registrar.cpp:313] Recovering registrar I0124 08:14:04.434226 7951 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 14.323302ms I0124 08:14:04.434270 7951 replica.cpp:323] Persisted replica status to STARTING I0124 08:14:04.434489 7951 recover.cpp:475] Replica is in STARTING status I0124 08:14:04.436164 7951 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0124 08:14:04.439368 7947 recover.cpp:195] Received a recover response from a replica in STARTING status I0124 08:14:04.440626 7947 recover.cpp:566] Updating replica status to VOTING I0124 08:14:04.443667 7947 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 2.698664ms I0124 08:14:04.443759 7947 replica.cpp:323] Persisted replica status to VOTING I0124 08:14:04.443925 7947 recover.cpp:580] Successfully joined the Paxos group I0124 08:14:04.444160 7947 recover.cpp:464] Recover process terminated I0124 08:14:04.444543 7949 log.cpp:660] Attempting to start the writer I0124 08:14:04.446331 7949 replica.cpp:477] Replica received implicit promise request with proposal 1 I0124 08:14:04.449329 7949 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 2.690453ms I0124 08:14:04.449388 7949 replica.cpp:345] Persisted promised to 1 I0124 08:14:04.450637 7947 coordinator.cpp:230] Coordinator attemping to fill missing position I0124 08:14:04.452271 7949 replica.cpp:378] Replica received explicit promise request for position 0 with proposal 2 I0124 08:14:04.455124 7949 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 2.593522ms I0124 08:14:04.455157 7949 replica.cpp:679] Persisted action at 0 I0124 08:14:04.456594
[jira] [Updated] (MESOS-2228) SlaveTest.MesosExecutorGracefulShutdown is flaky
[ https://issues.apache.org/jira/browse/MESOS-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-2228: --- Labels: twitter (was: ) SlaveTest.MesosExecutorGracefulShutdown is flaky Key: MESOS-2228 URL: https://issues.apache.org/jira/browse/MESOS-2228 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.22.0 Reporter: Vinod Kone Assignee: Benjamin Mahler Labels: twitter Observed this on internal CI {noformat} [ RUN ] SlaveTest.MesosExecutorGracefulShutdown Using temporary directory '/tmp/SlaveTest_MesosExecutorGracefulShutdown_AWdtVJ' I0124 08:14:04.399211 7926 leveldb.cpp:176] Opened db in 27.364056ms I0124 08:14:04.402632 7926 leveldb.cpp:183] Compacted db in 3.357646ms I0124 08:14:04.402691 7926 leveldb.cpp:198] Created db iterator in 23822ns I0124 08:14:04.402708 7926 leveldb.cpp:204] Seeked to beginning of db in 1913ns I0124 08:14:04.402716 7926 leveldb.cpp:273] Iterated through 0 keys in the db in 458ns I0124 08:14:04.402767 7926 replica.cpp:744] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0124 08:14:04.403728 7951 recover.cpp:449] Starting replica recovery I0124 08:14:04.404011 7951 recover.cpp:475] Replica is in EMPTY status I0124 08:14:04.407765 7950 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0124 08:14:04.408710 7951 recover.cpp:195] Received a recover response from a replica in EMPTY status I0124 08:14:04.419666 7951 recover.cpp:566] Updating replica status to STARTING I0124 08:14:04.429719 7953 master.cpp:262] Master 20150124-081404-16842879-47787-7926 (utopic) started on 127.0.1.1:47787 I0124 08:14:04.429790 7953 master.cpp:308] Master only allowing authenticated frameworks to register I0124 08:14:04.429802 7953 master.cpp:313] Master only allowing authenticated slaves to register I0124 08:14:04.429826 7953 credentials.hpp:36] Loading credentials for authentication from '/tmp/SlaveTest_MesosExecutorGracefulShutdown_AWdtVJ/credentials' I0124 08:14:04.430277 7953 master.cpp:357] Authorization enabled I0124 08:14:04.432682 7953 master.cpp:1219] The newly elected leader is master@127.0.1.1:47787 with id 20150124-081404-16842879-47787-7926 I0124 08:14:04.432816 7953 master.cpp:1232] Elected as the leading master! I0124 08:14:04.432894 7953 master.cpp:1050] Recovering from registrar I0124 08:14:04.433212 7950 registrar.cpp:313] Recovering registrar I0124 08:14:04.434226 7951 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 14.323302ms I0124 08:14:04.434270 7951 replica.cpp:323] Persisted replica status to STARTING I0124 08:14:04.434489 7951 recover.cpp:475] Replica is in STARTING status I0124 08:14:04.436164 7951 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0124 08:14:04.439368 7947 recover.cpp:195] Received a recover response from a replica in STARTING status I0124 08:14:04.440626 7947 recover.cpp:566] Updating replica status to VOTING I0124 08:14:04.443667 7947 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 2.698664ms I0124 08:14:04.443759 7947 replica.cpp:323] Persisted replica status to VOTING I0124 08:14:04.443925 7947 recover.cpp:580] Successfully joined the Paxos group I0124 08:14:04.444160 7947 recover.cpp:464] Recover process terminated I0124 08:14:04.444543 7949 log.cpp:660] Attempting to start the writer I0124 08:14:04.446331 7949 replica.cpp:477] Replica received implicit promise request with proposal 1 I0124 08:14:04.449329 7949 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 2.690453ms I0124 08:14:04.449388 7949 replica.cpp:345] Persisted promised to 1 I0124 08:14:04.450637 7947 coordinator.cpp:230] Coordinator attemping to fill missing position I0124 08:14:04.452271 7949 replica.cpp:378] Replica received explicit promise request for position 0 with proposal 2 I0124 08:14:04.455124 7949 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 2.593522ms I0124 08:14:04.455157 7949 replica.cpp:679] Persisted action at 0 I0124 08:14:04.456594 7951 replica.cpp:511] Replica received write request for position 0 I0124 08:14:04.456657 7951 leveldb.cpp:438] Reading position from leveldb took 30358ns I0124 08:14:04.464860 7951 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 8.164646ms I0124 08:14:04.464903 7951 replica.cpp:679] Persisted action at 0 I0124 08:14:04.465947 7949 replica.cpp:658] Replica received learned notice for position 0 I0124 08:14:04.471567 7949 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 5.587838ms I0124 08:14:04.471601 7949 replica.cpp:679] Persisted action at 0
[jira] [Updated] (MESOS-2228) SlaveTest.MesosExecutorGracefulShutdown is flaky
[ https://issues.apache.org/jira/browse/MESOS-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-2228: --- Sprint: Twitter Mesos Q1 Sprint 1 SlaveTest.MesosExecutorGracefulShutdown is flaky Key: MESOS-2228 URL: https://issues.apache.org/jira/browse/MESOS-2228 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.22.0 Reporter: Vinod Kone Assignee: Benjamin Mahler Labels: twitter Observed this on internal CI {noformat} [ RUN ] SlaveTest.MesosExecutorGracefulShutdown Using temporary directory '/tmp/SlaveTest_MesosExecutorGracefulShutdown_AWdtVJ' I0124 08:14:04.399211 7926 leveldb.cpp:176] Opened db in 27.364056ms I0124 08:14:04.402632 7926 leveldb.cpp:183] Compacted db in 3.357646ms I0124 08:14:04.402691 7926 leveldb.cpp:198] Created db iterator in 23822ns I0124 08:14:04.402708 7926 leveldb.cpp:204] Seeked to beginning of db in 1913ns I0124 08:14:04.402716 7926 leveldb.cpp:273] Iterated through 0 keys in the db in 458ns I0124 08:14:04.402767 7926 replica.cpp:744] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0124 08:14:04.403728 7951 recover.cpp:449] Starting replica recovery I0124 08:14:04.404011 7951 recover.cpp:475] Replica is in EMPTY status I0124 08:14:04.407765 7950 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0124 08:14:04.408710 7951 recover.cpp:195] Received a recover response from a replica in EMPTY status I0124 08:14:04.419666 7951 recover.cpp:566] Updating replica status to STARTING I0124 08:14:04.429719 7953 master.cpp:262] Master 20150124-081404-16842879-47787-7926 (utopic) started on 127.0.1.1:47787 I0124 08:14:04.429790 7953 master.cpp:308] Master only allowing authenticated frameworks to register I0124 08:14:04.429802 7953 master.cpp:313] Master only allowing authenticated slaves to register I0124 08:14:04.429826 7953 credentials.hpp:36] Loading credentials for authentication from '/tmp/SlaveTest_MesosExecutorGracefulShutdown_AWdtVJ/credentials' I0124 08:14:04.430277 7953 master.cpp:357] Authorization enabled I0124 08:14:04.432682 7953 master.cpp:1219] The newly elected leader is master@127.0.1.1:47787 with id 20150124-081404-16842879-47787-7926 I0124 08:14:04.432816 7953 master.cpp:1232] Elected as the leading master! I0124 08:14:04.432894 7953 master.cpp:1050] Recovering from registrar I0124 08:14:04.433212 7950 registrar.cpp:313] Recovering registrar I0124 08:14:04.434226 7951 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 14.323302ms I0124 08:14:04.434270 7951 replica.cpp:323] Persisted replica status to STARTING I0124 08:14:04.434489 7951 recover.cpp:475] Replica is in STARTING status I0124 08:14:04.436164 7951 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0124 08:14:04.439368 7947 recover.cpp:195] Received a recover response from a replica in STARTING status I0124 08:14:04.440626 7947 recover.cpp:566] Updating replica status to VOTING I0124 08:14:04.443667 7947 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 2.698664ms I0124 08:14:04.443759 7947 replica.cpp:323] Persisted replica status to VOTING I0124 08:14:04.443925 7947 recover.cpp:580] Successfully joined the Paxos group I0124 08:14:04.444160 7947 recover.cpp:464] Recover process terminated I0124 08:14:04.444543 7949 log.cpp:660] Attempting to start the writer I0124 08:14:04.446331 7949 replica.cpp:477] Replica received implicit promise request with proposal 1 I0124 08:14:04.449329 7949 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 2.690453ms I0124 08:14:04.449388 7949 replica.cpp:345] Persisted promised to 1 I0124 08:14:04.450637 7947 coordinator.cpp:230] Coordinator attemping to fill missing position I0124 08:14:04.452271 7949 replica.cpp:378] Replica received explicit promise request for position 0 with proposal 2 I0124 08:14:04.455124 7949 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 2.593522ms I0124 08:14:04.455157 7949 replica.cpp:679] Persisted action at 0 I0124 08:14:04.456594 7951 replica.cpp:511] Replica received write request for position 0 I0124 08:14:04.456657 7951 leveldb.cpp:438] Reading position from leveldb took 30358ns I0124 08:14:04.464860 7951 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 8.164646ms I0124 08:14:04.464903 7951 replica.cpp:679] Persisted action at 0 I0124 08:14:04.465947 7949 replica.cpp:658] Replica received learned notice for position 0 I0124 08:14:04.471567 7949 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 5.587838ms I0124 08:14:04.471601 7949 replica.cpp:679] Persisted
[jira] [Created] (MESOS-2286) Simplify the allocator architecture
Alexander Rukletsov created MESOS-2286: -- Summary: Simplify the allocator architecture Key: MESOS-2286 URL: https://issues.apache.org/jira/browse/MESOS-2286 Project: Mesos Issue Type: Improvement Reporter: Alexander Rukletsov -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2285) Eliminate dependency on master::Flags in Allocator
Alexander Rukletsov created MESOS-2285: -- Summary: Eliminate dependency on master::Flags in Allocator Key: MESOS-2285 URL: https://issues.apache.org/jira/browse/MESOS-2285 Project: Mesos Issue Type: Improvement Components: allocation Reporter: Alexander Rukletsov Priority: Minor {{Allocator}} extracts parameters from {{master::Flags}} during initialization. Currently, only {{allocation_interval}} key from {{master::Flags}} is used. It makes sense to introduce a separate structure {{allocator::Options}} with values relevant for allocation and eliminate dependency on {{master::Flags}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2287) Document undocumented tests
Niklas Quarfot Nielsen created MESOS-2287: - Summary: Document undocumented tests Key: MESOS-2287 URL: https://issues.apache.org/jira/browse/MESOS-2287 Project: Mesos Issue Type: Improvement Reporter: Niklas Quarfot Nielsen Priority: Trivial We have a inconsistency in the way we document tests. It has become a rule of thumb to include a small blob about the test. For example: {code} // This tests the 'active' field in slave entries from state.json. We // first verify an active slave, deactivate it and verify that the // 'active' field is false. TEST_F(MasterTest, SlaveActiveEndpoint) { // Start a master. TryPIDMaster master = StartMaster(); ASSERT_SOME(master); ... {code} However, we still have many tests that haven't been documented. For example: {code} } TEST_F(MasterTest, MetricsInStatsEndpoint) { TryPIDMaster master = StartMaster(); ASSERT_SOME(master); Futureprocess::http::Response response = process::http::get(master.get(), stats.json); ... {code} It would be great to do a scan and make sure all the tests are documented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2286) Simplify the allocator architecture
[ https://issues.apache.org/jira/browse/MESOS-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-2286: --- Component/s: allocation Description: Allocator refactor [https://issues.apache.org/jira/browse/MESOS-2213] will distinguish between general allocators and Process-based ones. This introduces a chain of inheritance with a single real allocator at the bottom. Consider simplifying this architecture without impacting adding new allocators. Priority: Minor (was: Major) Simplify the allocator architecture --- Key: MESOS-2286 URL: https://issues.apache.org/jira/browse/MESOS-2286 Project: Mesos Issue Type: Improvement Components: allocation Reporter: Alexander Rukletsov Priority: Minor Allocator refactor [https://issues.apache.org/jira/browse/MESOS-2213] will distinguish between general allocators and Process-based ones. This introduces a chain of inheritance with a single real allocator at the bottom. Consider simplifying this architecture without impacting adding new allocators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)