Re: Review Request 33514: Wired up --allocator flag in master.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33514/#review83764 --- Patch looks great! Reviews applied: [33513, 33514] All tests passed. - Mesos ReviewBot On May 14, 2015, 1:23 p.m., Alexander Rukletsov wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33514/ --- (Updated May 14, 2015, 1:23 p.m.) Review request for mesos, Kapil Arya, Niklas Nielsen, and Till Toenshoff. Bugs: MESOS-2597 https://issues.apache.org/jira/browse/MESOS-2597 Repository: mesos Description --- See summary. Diffs - src/master/flags.hpp 996cf38c88f9718e55e88d6e906b5e3d1989478a src/master/flags.cpp 5798989d3f135978ec3d5f714b1cd8d84180fc90 src/master/main.cpp 18f8c3178459da0cbf23a1817ec49cd9d3998bfd Diff: https://reviews.apache.org/r/33514/diff/ Testing --- make check (Mac OS 10.9.5, CentOS 7.0) Checked the whole pipeline works with a modularized allocator. The allocator is HierarchicalDRF packed into a library: https://github.com/rukletsov/mesos-modules/tree/master/external-allocator Master and slave have been pointed to an allocator module: ``` ./bin/mesos-master.sh --work_dir=m/work --modules=file://path/external-allocator.json --allocator=ExternalAllocatorModule ./bin/mesos-slave.sh --master=master-ip --work_dir=s/work ``` Slave successfully registers, its resources are offered to a simple task: ``` ./src/mesos-execute --command=sleep 1 --master=master-ip --name=alloc-test ``` Thanks, Alexander Rukletsov
Re: Review Request 29947: Fixed a race condition in hook tests for remove-executor hook.
On May 14, 2015, 12:38 p.m., Niklas Nielsen wrote: src/examples/test_hook_module.cpp, lines 146-150 https://reviews.apache.org/r/29947/diff/10/?file=958466#file958466line146 Looks a bit dense; can we break it up a bit? Ideally, this whole block would be replaced by a single statement that does all this behind the curtains. For that reason, I would prefer to keep these close together to give the impression that this is a single unit of work. If you prefer, I can add a TODO on top of it. - Kapil --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29947/#review83786 --- On May 13, 2015, 4:36 p.m., Kapil Arya wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29947/ --- (Updated May 13, 2015, 4:36 p.m.) Review request for mesos and Niklas Nielsen. Bugs: MESOS-2226 https://issues.apache.org/jira/browse/MESOS-2226 Repository: mesos Description --- There is currently no good way to synchronize between the test body and the hook code, so we wire a promise (owned by the test code). The technical debt is covered by the following JIRA issue: https://issues.apache.org/jira/browse/MESOS-2641 Diffs - src/examples/test_hook_module.cpp b25830ab6475f997422cfd2f60cc9a79e1acadfe src/messages/messages.proto 98d859f3db6013a2e155d838f590a0cde6dc5ed5 src/tests/hook_tests.cpp a65c0ab7c41ec3b7964f9d572381fa3e61746dc3 Diff: https://reviews.apache.org/r/29947/diff/ Testing --- make check Thanks, Kapil Arya
Re: Review Request 34225: Deleted travis YAML file.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34225/#review83812 --- Ship it! Ship It! - Cody Maloney On May 14, 2015, 6:21 p.m., Vinod Kone wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34225/ --- (Updated May 14, 2015, 6:21 p.m.) Review request for mesos and Ben Mahler. Repository: mesos Description --- Travis builds are not supported/maintained; so it doesn't make sense to have this file and TravisCI keep trying to build the project and send failure notifications to the IRC channel. We have moved to Docker based builds on ASF to test Mesos on different configurations. Diffs - .travis.yml 4991dd77a62b8bfc6045099ca99682b47aadaa97 Diff: https://reviews.apache.org/r/34225/diff/ Testing --- Thanks, Vinod Kone
Re: Review Request 32587: Stopped using RunTaskMessage.framework_id.
On May 12, 2015, 9:45 a.m., Niklas Nielsen wrote: Hey Kapil; what's blocking this patch? Adam B wrote: As part of the deprecation cycle, we need to wait until 0.24 for this patch. All the relevant 0.23 patches have landed. NOTE: This patch is only to be merged _ONLY_ after all the dependent patches have shipped, i.e. after 0.23.0 (tracked here: https://issues.apache.org/jira/browse/MESOS-2561) has released. Kapil Arya wrote: Exactly. As soon as 0.23.0 ships, we can do a rebase, review and commit. Let's close this patch for now then and reopen later. - Niklas --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32587/#review83423 --- On April 7, 2015, 10 a.m., Kapil Arya wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32587/ --- (Updated April 7, 2015, 10 a.m.) Review request for mesos, Adam B and Niklas Nielsen. Bugs: MESOS-2559 https://issues.apache.org/jira/browse/MESOS-2559 Repository: mesos Description --- FrameworkID can be retrieved from RunTaskMessage.framework. NOTE: This patch is only to be merged _ONLY_ after all the dependent patches have shipped, i.e. after 0.23.0 (tracked here: https://issues.apache.org/jira/browse/MESOS-2561) has released. Diffs - src/master/master.cpp dccd7c635da4b7031cd109bd84e7f17b31777ef1 src/slave/slave.cpp c7e65a6c095963feaa9d5fdbb519c68f8f761d16 src/tests/mesos.cpp 11e88330dd50913ab00da79054225d113541e83e Diff: https://reviews.apache.org/r/32587/diff/ Testing --- make check TODO: Test for upgrade path. Thanks, Kapil Arya
Re: Review Request 29947: Fixed a race condition in hook tests for remove-executor hook.
On Feb. 11, 2015, 5:27 a.m., Adam B wrote: src/tests/hook_tests.cpp, lines 302-305 https://reviews.apache.org/r/29947/diff/7/?file=837751#file837751line302 Did you consider just sending an explicit ShutdownExecutorMessage from the slave to the executor? Then you can wait around for the hook to complete without worrying about the master/slave/framework shutting down in the meantime. Kapil Arya wrote: I haven't considered the explicit message. I will try to find a prior example and adjust accordingly. Since we have already decoupled shutdown with successful execution of the hook, is this issue still relevant? - Kapil --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29947/#review71946 --- On May 13, 2015, 4:36 p.m., Kapil Arya wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29947/ --- (Updated May 13, 2015, 4:36 p.m.) Review request for mesos and Niklas Nielsen. Bugs: MESOS-2226 https://issues.apache.org/jira/browse/MESOS-2226 Repository: mesos Description --- There is currently no good way to synchronize between the test body and the hook code, so we wire a promise (owned by the test code). The technical debt is covered by the following JIRA issue: https://issues.apache.org/jira/browse/MESOS-2641 Diffs - src/examples/test_hook_module.cpp b25830ab6475f997422cfd2f60cc9a79e1acadfe src/messages/messages.proto 98d859f3db6013a2e155d838f590a0cde6dc5ed5 src/tests/hook_tests.cpp a65c0ab7c41ec3b7964f9d572381fa3e61746dc3 Diff: https://reviews.apache.org/r/29947/diff/ Testing --- make check Thanks, Kapil Arya
Re: Review Request 29947: Fixed a race condition in hook tests for remove-executor hook.
On May 14, 2015, 12:39 p.m., Niklas Nielsen wrote: src/examples/test_hook_module.cpp, line 37 https://reviews.apache.org/r/29947/diff/10/?file=958466#file958466line37 Why did you need this one? Without this we would have to use qualify HookExecuted with `internal::`. (Note that we don't enclose this file in `mesos::internal`). - Kapil --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29947/#review83787 --- On May 13, 2015, 4:36 p.m., Kapil Arya wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29947/ --- (Updated May 13, 2015, 4:36 p.m.) Review request for mesos and Niklas Nielsen. Bugs: MESOS-2226 https://issues.apache.org/jira/browse/MESOS-2226 Repository: mesos Description --- There is currently no good way to synchronize between the test body and the hook code, so we wire a promise (owned by the test code). The technical debt is covered by the following JIRA issue: https://issues.apache.org/jira/browse/MESOS-2641 Diffs - src/examples/test_hook_module.cpp b25830ab6475f997422cfd2f60cc9a79e1acadfe src/messages/messages.proto 98d859f3db6013a2e155d838f590a0cde6dc5ed5 src/tests/hook_tests.cpp a65c0ab7c41ec3b7964f9d572381fa3e61746dc3 Diff: https://reviews.apache.org/r/29947/diff/ Testing --- make check Thanks, Kapil Arya
Re: Review Request 33372: Added decorator documentation and described the semantic change in Mesos 0.23.0
On April 22, 2015, 1:53 a.m., Adam B wrote: docs/modules.md, lines 165-166 https://reviews.apache.org/r/33372/diff/2/?file=938440#file938440line165 I still think we need a note in upgrades.md since this is a hook API change when upgrading. Done :) - Niklas --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33372/#review81133 --- On May 14, 2015, 10:06 a.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33372/ --- (Updated May 14, 2015, 10:06 a.m.) Review request for mesos and Adam B. Bugs: MESOS-2622 https://issues.apache.org/jira/browse/MESOS-2622 Repository: mesos Description --- See summary. Diffs - docs/modules.md d0391ced9eac35e7ed371dfbd981675139fabaa5 docs/upgrades.md 2a15694607c079ad95ef6cf7f1490872ab9a5976 Diff: https://reviews.apache.org/r/33372/diff/ Testing --- Thanks, Niklas Nielsen
Re: Review Request 33372: Added decorator documentation and described the semantic change in Mesos 0.23.0
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33372/#review83803 --- Patch looks great! Reviews applied: [30961, 30962, 31016, 31028, 32948, 31017, 33372] All tests passed. - Mesos ReviewBot On May 14, 2015, 5:06 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33372/ --- (Updated May 14, 2015, 5:06 p.m.) Review request for mesos and Adam B. Bugs: MESOS-2622 https://issues.apache.org/jira/browse/MESOS-2622 Repository: mesos Description --- See summary. Diffs - docs/modules.md d0391ced9eac35e7ed371dfbd981675139fabaa5 docs/upgrades.md 2a15694607c079ad95ef6cf7f1490872ab9a5976 Diff: https://reviews.apache.org/r/33372/diff/ Testing --- Thanks, Niklas Nielsen
Re: Review Request 29947: Fixed a race condition in hook tests for remove-executor hook.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29947/#review83786 --- Ship it! Thanks Kapil! It looks SO much better!! LGTM src/examples/test_hook_module.cpp https://reviews.apache.org/r/29947/#comment134848 Looks a bit dense; can we break it up a bit? - Niklas Nielsen On May 13, 2015, 1:36 p.m., Kapil Arya wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29947/ --- (Updated May 13, 2015, 1:36 p.m.) Review request for mesos and Niklas Nielsen. Bugs: MESOS-2226 https://issues.apache.org/jira/browse/MESOS-2226 Repository: mesos Description --- There is currently no good way to synchronize between the test body and the hook code, so we wire a promise (owned by the test code). The technical debt is covered by the following JIRA issue: https://issues.apache.org/jira/browse/MESOS-2641 Diffs - src/examples/test_hook_module.cpp b25830ab6475f997422cfd2f60cc9a79e1acadfe src/messages/messages.proto 98d859f3db6013a2e155d838f590a0cde6dc5ed5 src/tests/hook_tests.cpp a65c0ab7c41ec3b7964f9d572381fa3e61746dc3 Diff: https://reviews.apache.org/r/29947/diff/ Testing --- make check Thanks, Kapil Arya
Re: Review Request 34136: Add ContainerImage protobuf.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34136/#review83823 --- include/mesos/mesos.proto https://reviews.apache.org/r/34136/#comment134887 BenH left a similar comment earlier last time when we looked at the new changes, and basically I think instead of calling it just ContainerImage we should make sure it's Appc specific, in case there are other formats we want to support in the future and now we have to deprecate and reintroduce new ones like we did the in past. - Timothy Chen On May 13, 2015, 12:47 a.m., Ian Downes wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34136/ --- (Updated May 13, 2015, 12:47 a.m.) Review request for mesos, Chi Zhang, Paul Brett, Timothy Chen, and Vinod Kone. Repository: mesos Description --- Add ContainerImage protobuf. Diffs - include/mesos/mesos.proto db4fc8c001dd68bc3b9ca83650170c4f26db18c7 Diff: https://reviews.apache.org/r/34136/diff/ Testing --- Thanks, Ian Downes
Re: Review Request 31505: Add flow classifiers for fq_codel on egress
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31505/#review83690 --- src/slave/containerizer/isolators/network/port_mapping.hpp https://reviews.apache.org/r/31505/#comment134860 Please add some comments about what this is. Consider using hashset instead of std::set. src/slave/containerizer/isolators/network/port_mapping.cpp https://reviews.apache.org/r/31505/#comment134858 Could you please add some comments about what this is? For example, what is container min flowid? What is host flowid? Why ICMP and ARP are sharing the same flowid? src/slave/containerizer/isolators/network/port_mapping.cpp https://reviews.apache.org/r/31505/#comment134733 What does that mean? What do you want to revise? You probably want to restructure the TODO like the following: ``` TODO(cwang): Maybe we can continue See details in MESOS-2370. ``` src/slave/containerizer/isolators/network/port_mapping.cpp https://reviews.apache.org/r/31505/#comment134859 The comments here is a little confusing. The flowIDs here is actually freeFlowIds, right? How about just calling it freeFlowIds? src/slave/containerizer/isolators/network/port_mapping.cpp https://reviews.apache.org/r/31505/#comment134863 We define variables when we want to use them. So you don't have to declare this variable here. src/slave/containerizer/isolators/network/port_mapping.cpp https://reviews.apache.org/r/31505/#comment134866 Can you move this code block after getting the vethClassifiers? src/slave/containerizer/isolators/network/port_mapping.cpp https://reviews.apache.org/r/31505/#comment134862 No need to use `filter::` prefix for `filter::Filter` as it's already included. src/slave/containerizer/isolators/network/port_mapping.cpp https://reviews.apache.org/r/31505/#comment134864 s/flowClassifiers/eth0EgressFilters/ src/slave/containerizer/isolators/network/port_mapping.cpp https://reviews.apache.org/r/31505/#comment134865 s/classifiers/vethIngressClassifiers/ src/slave/containerizer/isolators/network/port_mapping.cpp https://reviews.apache.org/r/31505/#comment134875 You probably also want to do some sanity check here to make sure all port ranges from a container points to the same flow on eth0 egress. Also, PortRange supports hashing, so you can probably pre-create a hashmapPortRange, classid mapping and lookup the hashmap later: ``` // Construct a port range to classid mapping from host eth0 egress. // TODO: Consider moving the following to a common place so that we // don't duplicate this logic for each container. ResultvectorFilterip::Classifier eth0EgressFilters = ip::filters(eth0, fq_codel::HANDLE); ... hashmapPortRange, uint16_t classids; foreach (const Filterip::Classifier filter, eth0EgressFilters.get()) { const OptionPortRange sourcePorts = filter.classifier().sourcePorts(); const Optionuint16_t classid = filter.classifier().classid(); if (sourcePorts.isNone()) { return Error(Missing source ports for filters on egress of + eth0); } if (classid.isNone()) { return Error(Missing classid for filters on egress of + eth0); } if (classids.contains(range) { return Error( Duplicated port range + stringify(range) + detected on egress of + eth0); } classids[range.get()] = classid.get(); } ... IntervalSetuint16_t nonEphemeralPorts; IntervalSetuint16_t ephemeralPorts; Optionuint16_t classid; foreach (...) { ... if (classids.contains(sourcePorts.get()) { if (classid.isNone()) { classid = classids.get(sourcePorts.get()); } else if (classid != classids.get(sourcePorts.get())) { return Error(A container is associated with multiple flows on eth0 egress); } } else if (classid.isSome()) { // This is the case where some port range of a container is assigned // to a flow while some isn't. This could happen if slave crashes // while those filters are created. However, this is OK for us because // ... LOG(WARNING) ...; } ... } ``` src/slave/containerizer/isolators/network/port_mapping.cpp https://reviews.apache.org/r/31505/#comment134885 info-flowId = getNextFlowId(); src/slave/containerizer/isolators/network/port_mapping.cpp https://reviews.apache.org/r/31505/#comment134886 s/flowId/info-flowId/ src/slave/containerizer/isolators/network/port_mapping.cpp https://reviews.apache.org/r/31505/#comment134889 Please passing in flowId and make 'addHostIPFilters' accept an Optionuint16_t flowId. src/slave/containerizer/isolators/network/port_mapping.cpp
Re: Review Request 31505: Add flow classifiers for fq_codel on egress
On May 14, 2015, 7:44 p.m., Jie Yu wrote: src/slave/containerizer/isolators/network/port_mapping.hpp, line 329 https://reviews.apache.org/r/31505/diff/5/?file=955871#file955871line329 Please add some comments about what this is. Consider using hashset instead of std::set. We don't need a hash here, we only need to get an unused ID from this set. - Cong --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31505/#review83690 --- On May 12, 2015, 12:14 a.m., Cong Wang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31505/ --- (Updated May 12, 2015, 12:14 a.m.) Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu. Bugs: MESOS-2422 https://issues.apache.org/jira/browse/MESOS-2422 Repository: mesos Description --- Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops. We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior. TODO: get some performance numbers Diffs - src/slave/containerizer/isolators/network/port_mapping.hpp c72fb47f60f40cda8d84a10497b9133f83cf018e src/slave/containerizer/isolators/network/port_mapping.cpp a4abaff30bb4646b1b1edfdbbc243c9e3f6851df Diff: https://reviews.apache.org/r/31505/diff/ Testing --- Manually start two mesos containers with netperf running side. Thanks, Cong Wang
Re: Review Request 31505: Add flow classifiers for fq_codel on egress
On May 14, 2015, 7:44 p.m., Jie Yu wrote: src/slave/containerizer/isolators/network/port_mapping.cpp, line 1767 https://reviews.apache.org/r/31505/diff/5/?file=955872#file955872line1767 No need to use `filter::` prefix for `filter::Filter` as it's already included. Without it I get: ../../src/slave/containerizer/isolators/network/port_mapping.cpp: In member function 'Trymesos::internal::slave::PortMappingIsolatorProcess::Info* mesos::internal::slave::PortMappingIsolatorProcess::_recover(pid_t)': ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1766:38: error: template argument 1 is invalid ResultvectorFilterip::Classifier eth0EgressFilters = - Cong --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31505/#review83690 --- On May 12, 2015, 12:14 a.m., Cong Wang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31505/ --- (Updated May 12, 2015, 12:14 a.m.) Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu. Bugs: MESOS-2422 https://issues.apache.org/jira/browse/MESOS-2422 Repository: mesos Description --- Currently we do nothing on the host egress side. By default, kernel uses its own hash function to classify the packets to different TX queues (if the hardware interface supports multiqueue). So packets coming out of different containers could end up queueing in the same TX queue, in this case we saw buffer bloat on some TX queue caused packet drops. We need to isolation the egress traffic so that containers will not have interference with each other. The number of hardware TX queues is limited by hardware interface, usually not enough to map our container in 1:1 way, therefore we need some software solution. We choose fq_codel and use tc filters to classify packets coming out of different containers to different fq_codel flows, and the codel algorithm on each flow could also help us to reduce the buffer bloat. Note when the packets leave fq_codel, they still share the physical TX queue(s), this is however (almost) beyond what we can control, we have to rely on the kernel behavior. TODO: get some performance numbers Diffs - src/slave/containerizer/isolators/network/port_mapping.hpp c72fb47f60f40cda8d84a10497b9133f83cf018e src/slave/containerizer/isolators/network/port_mapping.cpp a4abaff30bb4646b1b1edfdbbc243c9e3f6851df Diff: https://reviews.apache.org/r/31505/diff/ Testing --- Manually start two mesos containers with netperf running side. Thanks, Cong Wang
Re: Review Request 33865: Added RevocableInfo message to Resource protobuf.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33865/ --- (Updated May 14, 2015, 11:57 p.m.) Review request for mesos, Jie Yu, Joris Van Remoortere, and Niklas Nielsen. Changes --- jie's comments. Bugs: MESOS-2691 https://issues.apache.org/jira/browse/MESOS-2691 Repository: mesos Description --- RevocableInfo currently supports OVERSUBSCRIBED resources. In the future it can be easily extended to use other types of revocable reosurces (e.g., resources allocated to other roles). Disabled the ability to use revocable resources for reservation or persistence because the semantics seem weird. We can enable it in the future if there is a use case for that. Diffs (updated) - include/mesos/mesos.proto 15f55a3c54be3475e77d561f106e00ea6e53c2fa src/common/resources.cpp 843a06d6c4d3e9ff0d1665360bae7c57bcfecb83 src/master/validation.cpp c3e96ae0e684f3f365e9aa365bccc953d32b0452 src/tests/resources_tests.cpp a7ec59ea217ad71f7d1e93ca6039d5b2491b3237 Diff: https://reviews.apache.org/r/33865/diff/ Testing --- make check Thanks, Vinod Kone
Re: Review Request 33865: Added RevocableInfo message to Resource protobuf.
On May 7, 2015, 12:38 a.m., Jie Yu wrote: include/mesos/mesos.proto, lines 454-458 https://reviews.apache.org/r/33865/diff/1/?file=950607#file950607line454 Chatted with Vinod offline (PS: Vinod is going to send out a summary of the discussion). In short, OVERSUBSCRIBED = (Allocated but unused) + (Unallocated) So, the type `OVERSUBSCRIBED` here is too general and ambiguate. You description of this review is interesting. Resources reserved for a different role but unused (or unallocated) can be revokable. The question is to whom should we send this revokable resources. For instance, if a framework under role A reserved some resources on the slave (resources' role == A) and not using any of them yet. Should we send revokable offers to framework under role A, or all frameworks. It's not clear yet and I think that's a policy issue. It would be better to let a modulized component (e.g., resources estimator) to control that so that we can potentially use different policies. If we decide to send revokable offer for those reserved resources to all frameworks, we could strip the role in the revokable resources (i.e., make them * resources). However, I don't know what type should we set for the RevokableInfo for the above case? How can we distinguish that with the case where the resources are from unreserved resources (i.e., anyone can use that)? As a result, I think maybe we should punt on the `type` here right now and just leave an empty RevokableInfo since it's not strictly needed? Fair point. I will remove the type for now. On May 7, 2015, 12:38 a.m., Jie Yu wrote: include/mesos/mesos.proto, line 455 https://reviews.apache.org/r/33865/diff/1/?file=950607#file950607line455 2 spaces indent please. N/A On May 7, 2015, 12:38 a.m., Jie Yu wrote: src/common/resources.cpp, lines 490-501 https://reviews.apache.org/r/33865/diff/1/?file=950608#file950608line490 Should we put these checks to master validation? As I discussed with Mpark week ago, the validations in Resources object should be minimal checks to ensure all methods in Resources are well behaved. Clearly, the check you have here is a bit high level and probably should be put in master validation? Moved to validation.cpp. Also, I think the last check in Resource::validate() (* resource cannot be dynamically reserved) should be moved to master validation? - Vinod --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33865/#review82763 --- On May 5, 2015, 9:13 p.m., Vinod Kone wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33865/ --- (Updated May 5, 2015, 9:13 p.m.) Review request for mesos, Jie Yu, Joris Van Remoortere, and Niklas Nielsen. Bugs: MESOS-2691 https://issues.apache.org/jira/browse/MESOS-2691 Repository: mesos Description --- RevocableInfo currently supports OVERSUBSCRIBED resources. In the future it can be easily extended to use other types of revocable reosurces (e.g., resources allocated to other roles). Disabled the ability to use revocable resources for reservation or persistence because the semantics seem weird. We can enable it in the future if there is a use case for that. Diffs - include/mesos/mesos.proto db4fc8c001dd68bc3b9ca83650170c4f26db18c7 src/common/resources.cpp 235930ff2dbb3ea49a3a0696dc070f2bd56fba4b src/tests/resources_tests.cpp a7ec59ea217ad71f7d1e93ca6039d5b2491b3237 Diff: https://reviews.apache.org/r/33865/diff/ Testing --- make check Thanks, Vinod Kone
Re: Review Request 34139: AppC image discovery.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34139/#review83825 --- src/slave/containerizer/provisioners/appc/discovery.hpp https://reviews.apache.org/r/34139/#comment134902 How about namespacing this under appc as well? mesos::internal::slave::Discovery seems pretty generic to me. src/slave/containerizer/provisioners/appc/discovery.cpp https://reviews.apache.org/r/34139/#comment134903 static? src/slave/containerizer/provisioners/appc/discovery.cpp https://reviews.apache.org/r/34139/#comment134904 Can you print the unsupported strategy? src/slave/flags.cpp https://reviews.apache.org/r/34139/#comment134900 Do you think we should namespace these flags with appc? Seems like appc so far is the only one that has a configurable discvoery mechanism and local dir - Timothy Chen On May 13, 2015, 12:47 a.m., Ian Downes wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34139/ --- (Updated May 13, 2015, 12:47 a.m.) Review request for mesos, Chi Zhang, Paul Brett, Timothy Chen, and Vinod Kone. Repository: mesos Description --- Local and simple discovery only. Diffs - src/Makefile.am 14bc976a7b6a656fb58085484d25c3de3cf0f693 src/slave/containerizer/provisioners/appc/discovery.hpp PRE-CREATION src/slave/containerizer/provisioners/appc/discovery.cpp PRE-CREATION src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 src/slave/flags.cpp d0932b04e3825abb6173efe0d1aee199aa356932 Diff: https://reviews.apache.org/r/34139/diff/ Testing --- Thanks, Ian Downes
Re: Review Request 29507: Added Configurable Slave Ping Timeouts
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29507/ --- (Updated May 14, 2015, 1:54 a.m.) Review request for mesos, Ben Mahler and Niklas Nielsen. Changes --- Rebased and added configuration/upgrade documentation. Bugs: MESOS-2110 https://issues.apache.org/jira/browse/MESOS-2110 Repository: mesos Description --- Added new --slave_ping_timeout and --max_slave_ping_timeouts flags to mesos-master to supplement the DEFAULT_SLAVE_PING_TIMEOUT (15secs) and DEFAULT_MAX_SLAVE_PING_TIMEOUTS (5). These can be extended if slaves are expected/allowed to be down for longer than a minute or two. Slave will receive master's ping timeout in SlaveRe[re]gisteredMessage. Beware that this affects recovery from network timeouts as well as actual slave node/process failover. Diffs (updated) - docs/configuration.md 54c4e31 docs/upgrades.md 2a15694 src/master/constants.hpp c386eab src/master/constants.cpp 9ee17e9 src/master/flags.hpp 996cf38 src/master/flags.cpp 5798989 src/master/master.cpp ec32cd6 src/messages/messages.proto 98d859f src/slave/constants.hpp fd1c1ab src/slave/constants.cpp 2a99b11 src/slave/slave.hpp adb52b5 src/slave/slave.cpp 1b17441 src/tests/fault_tolerance_tests.cpp 3a27d82 src/tests/partition_tests.cpp 1018e47 src/tests/slave_recovery_tests.cpp c036e9c src/tests/slave_tests.cpp 04e79ec Diff: https://reviews.apache.org/r/29507/diff/ Testing --- Manually tested slave failover/shutdown with master using different --slave_ping_timeout and --max_slave_ping_timeouts. Ran unit tests with shorter non-default values for ping timeouts. `make check` with new unit tests: ShortPingTimeoutUnreachableMaster and ShortPingTimeoutUnreachableSlave Thanks, Adam B
Re: Review Request 29507: Added Configurable Slave Ping Timeouts
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29507/ --- (Updated May 14, 2015, 3:01 a.m.) Review request for mesos, Ben Mahler and Niklas Nielsen. Changes --- Rebased Bugs: MESOS-2110 https://issues.apache.org/jira/browse/MESOS-2110 Repository: mesos Description --- Added new --slave_ping_timeout and --max_slave_ping_timeouts flags to mesos-master to supplement the DEFAULT_SLAVE_PING_TIMEOUT (15secs) and DEFAULT_MAX_SLAVE_PING_TIMEOUTS (5). These can be extended if slaves are expected/allowed to be down for longer than a minute or two. Slave will receive master's ping timeout in SlaveRe[re]gisteredMessage. Beware that this affects recovery from network timeouts as well as actual slave node/process failover. Diffs (updated) - docs/configuration.md 54c4e31 docs/upgrades.md 2a15694 src/master/constants.hpp c386eab src/master/constants.cpp 9ee17e9 src/master/flags.hpp 996cf38 src/master/flags.cpp 5798989 src/master/master.cpp eaea79d src/messages/messages.proto 19e2444 src/slave/constants.hpp df02043 src/slave/constants.cpp 07f699a src/slave/slave.hpp b62ed7b src/slave/slave.cpp 132f83e src/tests/partition_tests.cpp f7ee3ab src/tests/slave_recovery_tests.cpp c036e9c src/tests/slave_tests.cpp acae497 Diff: https://reviews.apache.org/r/29507/diff/ Testing --- Manually tested slave failover/shutdown with master using different --slave_ping_timeout and --max_slave_ping_timeouts. Ran unit tests with shorter non-default values for ping timeouts. `make check` with new unit tests: ShortPingTimeoutUnreachableMaster and ShortPingTimeoutUnreachableSlave Thanks, Adam B