Re: Review Request 33876: Added usages() to resource monitor
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33876/#review82627 --- Patch looks great! Reviews applied: [33875, 33876] All tests passed. - Mesos ReviewBot On May 6, 2015, 2:16 a.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33876/ --- (Updated May 6, 2015, 2:16 a.m.) Review request for mesos and Vinod Kone. Repository: mesos Description --- Added usages() to resource monitor to enable internal components tapping into resource statistics. Diffs - src/slave/monitor.hpp 69c60a10187f8ea617c6be9738b28e8103e0ed27 src/slave/monitor.cpp 398af010564e999b46e38560ba1e652261a9420c Diff: https://reviews.apache.org/r/33876/diff/ Testing --- make check Thanks, Niklas Nielsen
Review Request 33876: Added usages() to resource monitor
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33876/ --- Review request for mesos and Vinod Kone. Repository: mesos Description --- Added usages() to resource monitor to enable internal components tapping into resource statistics. Diffs - src/slave/monitor.hpp 69c60a10187f8ea617c6be9738b28e8103e0ed27 src/slave/monitor.cpp 398af010564e999b46e38560ba1e652261a9420c Diff: https://reviews.apache.org/r/33876/diff/ Testing --- make check Thanks, Niklas Nielsen
Re: Review Request 33865: Added RevocableInfo message to Resource protobuf.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33865/#review82594 --- Patch looks great! Reviews applied: [33865] All tests passed. - Mesos ReviewBot On May 5, 2015, 9:13 p.m., Vinod Kone wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33865/ --- (Updated May 5, 2015, 9:13 p.m.) Review request for mesos, Jie Yu, Joris Van Remoortere, and Niklas Nielsen. Bugs: MESOS-2691 https://issues.apache.org/jira/browse/MESOS-2691 Repository: mesos Description --- RevocableInfo currently supports OVERSUBSCRIBED resources. In the future it can be easily extended to use other types of revocable reosurces (e.g., resources allocated to other roles). Disabled the ability to use revocable resources for reservation or persistence because the semantics seem weird. We can enable it in the future if there is a use case for that. Diffs - include/mesos/mesos.proto db4fc8c001dd68bc3b9ca83650170c4f26db18c7 src/common/resources.cpp 235930ff2dbb3ea49a3a0696dc070f2bd56fba4b src/tests/resources_tests.cpp a7ec59ea217ad71f7d1e93ca6039d5b2491b3237 Diff: https://reviews.apache.org/r/33865/diff/ Testing --- make check Thanks, Vinod Kone
Re: Review Request 33505: Add state-summary endpoint to master.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33505/#review82596 --- Is there a ticket for this? Would love to get a higher level view of the motivation for the new endpoint and its format. A bit tough to tell what a sample response looks like without reading through this diff carefully. :) - Ben Mahler On May 5, 2015, 9:51 p.m., Joris Van Remoortere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33505/ --- (Updated May 5, 2015, 9:51 p.m.) Review request for mesos and Benjamin Hindman. Repository: mesos Description --- This exposes framework and slave statistics that are of high importance and aggregated per framework or slave respectively. The statistics will not be exact due to the circular buffer that stores completed tasks, but this was also an issue with state.json. Diffs - src/master/http.cpp fb448256d7ced1f47ea48ccfca2ae267bc26ef94 src/master/master.hpp 49ee050ca4d2b2c5f75ce864fcf6ae703dfdeadd src/master/master.cpp bee842557c8397428ca51e46faa182a391584be3 src/tests/master_tests.cpp bdfccb2427cba938dbbaa8e832255153172b0501 Diff: https://reviews.apache.org/r/33505/diff/ Testing --- make check Thanks, Joris Van Remoortere
Re: Review Request 33849: mesos: use standard macros for compiler and vendor detection
On May 5, 2015, 4:29 p.m., Cody Maloney wrote: configure.ac, line 575 https://reviews.apache.org/r/33849/diff/1/?file=950367#file950367line575 Can you check ax_cxx_compiler_version = Clang/LLVM 3.5 here? Or does that not work on both OS X and Linux? James Peach wrote: Hmm. I don't have a Linux system with clang 3.5 and I'm not quite sure what you are asking here. I do have a later version of OS X clang and this change works (and automatically disables unused local typedef warnings). I tried to leave the compiler logic alone in this patch since I don't have systems to test all the possible combinations. I agree that it would be desirable to separate the typedef warnings from the compiler versions (though I was confused that GCC and clang seem to have subtly different names for the same warning). I'm happy to knock up a separate patch for that if you like. James Peach wrote: If you're asking whether we can drop the ``-Wno-unused-local-typedef`` flags on later clangs, I'm not confident that I have the right set of build dependencies to answer that. I tested dropping that warning on my version of clang on OS X (later that 3.5) dropping; both boost and generated protobuf code still spew that warning. Cody Maloney wrote: Apple Clang gives a different version string when you give it `clang --version` than Linux clang Apple: ``` Apple LLVM version 6.1.0 (clang-602.0.49) (based on LLVM 3.6.0svn) Target: x86_64-apple-darwin14.3.0 Thread model: posix ``` Linux: ``` clang version 3.6.0 (tags/RELEASE_360/final) Target: x86_64-unknown-linux-gnu Thread model: posix ``` For the `-Wno-unused-local-typedef` - that is definitely a needed flag for some of the Clang + GCC versions we support. It's much more of Lets try compiling some code which checks if we'll generate a warning, then if the compiler is clang add `-Wno-unused-local-typedef`, if it is gcc add `-Wno-unused-local-typedefs` (Note the s). James Peach wrote: That sounds like you want to just unconditionally turn ``-Wunused-local-typedef`` off. Either if you test whether it works, then turn it off if it does, then the net effect is to always turn it off :) The warning didn't exist in Clang 3.5 though I think, so one of our platforms doesn't need it. Other than that one compiler+version combo though it is always on - Cody --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33849/#review82524 --- On May 5, 2015, 5:52 p.m., James Peach wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33849/ --- (Updated May 5, 2015, 5:52 p.m.) Review request for mesos, Benjamin Hindman, Cody Maloney, and Timothy St. Clair. Bugs: MESOS-2666 https://issues.apache.org/jira/browse/MESOS-2666 Repository: mesos Description --- Import the AX_COMPILER_VERSION and AX_COMPILER_VENDOR macros from the autoconf archive and use them to detect and configure the supported copmilers. Diffs - configure.ac 14658f6f6920549f1948b8bd8890c67bc067e3dd m4/ax_compiler_vendor.m4 PRE-CREATION m4/ax_compiler_version.m4 PRE-CREATION Diff: https://reviews.apache.org/r/33849/diff/ Testing --- Make check on CentOS 7 and Mac OS X 10.10.3. Verified that the default CentOS 6 GCC toolchain is still rejected as too old. Thanks, James Peach
Re: Review Request 32163: Added a function which checks if a json object is contained within another.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32163/ --- (Updated May 5, 2015, 6:26 p.m.) Review request for mesos, Benjamin Hindman, Bernd Mathiske, Joerg Schad, Marco Massenzio, Niklas Nielsen, and Till Toenshoff. Changes --- Updates comments, breaks containment test into smaller more localized tests. Bugs: MESOS-2510 https://issues.apache.org/jira/browse/MESOS-2510 Repository: mesos Description --- Adds a function which allows to perform comparison tests on subsets of json blobs. i.e. ```cpp JSON::Value expected = JSON::parse( { \key\ : true }).get(); // Returned json: // { // uptime : 45234.123, // key : true // } JSON::Value actual = bar(); // I'm only interested on the key entry and ignore the rest. EXPECT_TRUE(contains(actual, expected)); ``` Increasing readability for tests that include json. For more information on the reason of why this patch is needed, please check the JIRA entry. Diffs (updated) - 3rdparty/libprocess/3rdparty/stout/include/stout/json.hpp 334c898906018be6e663f53815abbe047806b95c 3rdparty/libprocess/3rdparty/stout/tests/json_tests.cpp f60d1bbe60f2e2b6460c06bba98e8b85ebb6a3f9 Diff: https://reviews.apache.org/r/32163/diff/ Testing --- make check Thanks, Alexander Rojas
Re: Review Request 32163: Added a function which checks if a json object is contained within another.
On May 4, 2015, 10:56 p.m., Niklas Nielsen wrote: 3rdparty/libprocess/3rdparty/stout/tests/json_tests.cpp, line 448 https://reviews.apache.org/r/32163/diff/10/?file=947348#file947348line448 should 'json' be capitalized in our comments? Yes, fixing it. - Alexander --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32163/#review82410 --- On May 5, 2015, 6:26 p.m., Alexander Rojas wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32163/ --- (Updated May 5, 2015, 6:26 p.m.) Review request for mesos, Benjamin Hindman, Bernd Mathiske, Joerg Schad, Marco Massenzio, Niklas Nielsen, and Till Toenshoff. Bugs: MESOS-2510 https://issues.apache.org/jira/browse/MESOS-2510 Repository: mesos Description --- Adds a function which allows to perform comparison tests on subsets of json blobs. i.e. ```cpp JSON::Value expected = JSON::parse( { \key\ : true }).get(); // Returned json: // { // uptime : 45234.123, // key : true // } JSON::Value actual = bar(); // I'm only interested on the key entry and ignore the rest. EXPECT_TRUE(contains(actual, expected)); ``` Increasing readability for tests that include json. For more information on the reason of why this patch is needed, please check the JIRA entry. Diffs - 3rdparty/libprocess/3rdparty/stout/include/stout/json.hpp 334c898906018be6e663f53815abbe047806b95c 3rdparty/libprocess/3rdparty/stout/tests/json_tests.cpp f60d1bbe60f2e2b6460c06bba98e8b85ebb6a3f9 Diff: https://reviews.apache.org/r/32163/diff/ Testing --- make check Thanks, Alexander Rojas
Re: Review Request 33849: mesos: use standard macros for compiler and vendor detection
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33849/#review82524 --- A couple nits, but structurally looks good to me overall. Mainly just some things I'd like to see cleaned up while we're here configure.ac https://reviews.apache.org/r/33849/#comment133264 Can you check ax_cxx_compiler_version = Clang/LLVM 3.5 here? Or does that not work on both OS X and Linux? configure.ac https://reviews.apache.org/r/33849/#comment133263 Not something you need to do here, but would be nice if we used more or less the same check for 'Wno-unused-local-typedef' for both clang and gcc since you can easily tell which compiler it is now and update the added flag. configure.ac https://reviews.apache.org/r/33849/#comment133265 I know this is a mess I left, but it's not overly clean here that we bundle the gcc 4.8 version warning with the unused local typedef flag adding, could you pull out the is = gcc 4.8 check to be its own line, then unconditionally adding '-Wno-unused-local-typedefs', or using the clang check mentioned above. - Cody Maloney On May 5, 2015, 3:53 p.m., James Peach wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33849/ --- (Updated May 5, 2015, 3:53 p.m.) Review request for mesos, Benjamin Hindman, Cody Maloney, and Timothy St. Clair. Bugs: MESOS-2666 https://issues.apache.org/jira/browse/MESOS-2666 Repository: mesos Description --- Import the AX_COMPILER_VERSION and AX_COMPILER_VENDOR macros from the autoconf archive and use them to detect and configure the supported copmilers. Diffs - configure.ac 14658f6f6920549f1948b8bd8890c67bc067e3dd m4/ax_compiler_vendor.m4 PRE-CREATION m4/ax_compiler_version.m4 PRE-CREATION Diff: https://reviews.apache.org/r/33849/diff/ Testing --- Make check on CentOS 7 and Mac OS X 10.10.3. Verified that the default CentOS 6 GCC toolchain is still rejected as too old. Thanks, James Peach
Re: Review Request 33849: mesos: use standard macros for compiler and vendor detection
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33849/#review82525 --- Patch looks great! Reviews applied: [33849] All tests passed. - Mesos ReviewBot On May 5, 2015, 3:53 p.m., James Peach wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33849/ --- (Updated May 5, 2015, 3:53 p.m.) Review request for mesos, Benjamin Hindman, Cody Maloney, and Timothy St. Clair. Bugs: MESOS-2666 https://issues.apache.org/jira/browse/MESOS-2666 Repository: mesos Description --- Import the AX_COMPILER_VERSION and AX_COMPILER_VENDOR macros from the autoconf archive and use them to detect and configure the supported copmilers. Diffs - configure.ac 14658f6f6920549f1948b8bd8890c67bc067e3dd m4/ax_compiler_vendor.m4 PRE-CREATION m4/ax_compiler_version.m4 PRE-CREATION Diff: https://reviews.apache.org/r/33849/diff/ Testing --- Make check on CentOS 7 and Mac OS X 10.10.3. Verified that the default CentOS 6 GCC toolchain is still rejected as too old. Thanks, James Peach
Re: Review Request 33849: mesos: use standard macros for compiler and vendor detection
On May 5, 2015, 4:29 p.m., Cody Maloney wrote: configure.ac, line 575 https://reviews.apache.org/r/33849/diff/1/?file=950367#file950367line575 Can you check ax_cxx_compiler_version = Clang/LLVM 3.5 here? Or does that not work on both OS X and Linux? James Peach wrote: Hmm. I don't have a Linux system with clang 3.5 and I'm not quite sure what you are asking here. I do have a later version of OS X clang and this change works (and automatically disables unused local typedef warnings). I tried to leave the compiler logic alone in this patch since I don't have systems to test all the possible combinations. I agree that it would be desirable to separate the typedef warnings from the compiler versions (though I was confused that GCC and clang seem to have subtly different names for the same warning). I'm happy to knock up a separate patch for that if you like. James Peach wrote: If you're asking whether we can drop the ``-Wno-unused-local-typedef`` flags on later clangs, I'm not confident that I have the right set of build dependencies to answer that. I tested dropping that warning on my version of clang on OS X (later that 3.5) dropping; both boost and generated protobuf code still spew that warning. Apple Clang gives a different version string when you give it `clang --version` than Linux clang Apple: ``` Apple LLVM version 6.1.0 (clang-602.0.49) (based on LLVM 3.6.0svn) Target: x86_64-apple-darwin14.3.0 Thread model: posix ``` Linux: ``` clang version 3.6.0 (tags/RELEASE_360/final) Target: x86_64-unknown-linux-gnu Thread model: posix ``` For the `-Wno-unused-local-typedef` - that is definitely a needed flag for some of the Clang + GCC versions we support. It's much more of Lets try compiling some code which checks if we'll generate a warning, then if the compiler is clang add `-Wno-unused-local-typedef`, if it is gcc add `-Wno-unused-local-typedefs` (Note the s). - Cody --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33849/#review82524 --- On May 5, 2015, 3:53 p.m., James Peach wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33849/ --- (Updated May 5, 2015, 3:53 p.m.) Review request for mesos, Benjamin Hindman, Cody Maloney, and Timothy St. Clair. Bugs: MESOS-2666 https://issues.apache.org/jira/browse/MESOS-2666 Repository: mesos Description --- Import the AX_COMPILER_VERSION and AX_COMPILER_VENDOR macros from the autoconf archive and use them to detect and configure the supported copmilers. Diffs - configure.ac 14658f6f6920549f1948b8bd8890c67bc067e3dd m4/ax_compiler_vendor.m4 PRE-CREATION m4/ax_compiler_version.m4 PRE-CREATION Diff: https://reviews.apache.org/r/33849/diff/ Testing --- Make check on CentOS 7 and Mac OS X 10.10.3. Verified that the default CentOS 6 GCC toolchain is still rejected as too old. Thanks, James Peach
Re: Review Request 33733: Fixed style error with template definitions withing common/parse.hpp
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33733/#review82563 --- LGTM - Joerg Schad On April 30, 2015, 10:45 p.m., Alexander Rojas wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33733/ --- (Updated April 30, 2015, 10:45 p.m.) Review request for mesos, Benjamin Hindman, Joerg Schad, Michael Park, and Till Toenshoff. Repository: mesos Description --- Replaces every instance of `template` to `template ` in the file `src/common/parse.hpp` Diffs - src/common/parse.hpp 547b32041f39f0ff0c38179b66a32b2239134abc Diff: https://reviews.apache.org/r/33733/diff/ Testing --- make check Thanks, Alexander Rojas
Re: Review Request 33849: mesos: use standard macros for compiler and vendor detection
On May 5, 2015, 4:29 p.m., Cody Maloney wrote: configure.ac, line 575 https://reviews.apache.org/r/33849/diff/1/?file=950367#file950367line575 Can you check ax_cxx_compiler_version = Clang/LLVM 3.5 here? Or does that not work on both OS X and Linux? James Peach wrote: Hmm. I don't have a Linux system with clang 3.5 and I'm not quite sure what you are asking here. I do have a later version of OS X clang and this change works (and automatically disables unused local typedef warnings). I tried to leave the compiler logic alone in this patch since I don't have systems to test all the possible combinations. I agree that it would be desirable to separate the typedef warnings from the compiler versions (though I was confused that GCC and clang seem to have subtly different names for the same warning). I'm happy to knock up a separate patch for that if you like. James Peach wrote: If you're asking whether we can drop the ``-Wno-unused-local-typedef`` flags on later clangs, I'm not confident that I have the right set of build dependencies to answer that. I tested dropping that warning on my version of clang on OS X (later that 3.5) dropping; both boost and generated protobuf code still spew that warning. Cody Maloney wrote: Apple Clang gives a different version string when you give it `clang --version` than Linux clang Apple: ``` Apple LLVM version 6.1.0 (clang-602.0.49) (based on LLVM 3.6.0svn) Target: x86_64-apple-darwin14.3.0 Thread model: posix ``` Linux: ``` clang version 3.6.0 (tags/RELEASE_360/final) Target: x86_64-unknown-linux-gnu Thread model: posix ``` For the `-Wno-unused-local-typedef` - that is definitely a needed flag for some of the Clang + GCC versions we support. It's much more of Lets try compiling some code which checks if we'll generate a warning, then if the compiler is clang add `-Wno-unused-local-typedef`, if it is gcc add `-Wno-unused-local-typedefs` (Note the s). James Peach wrote: That sounds like you want to just unconditionally turn ``-Wunused-local-typedef`` off. Either if you test whether it works, then turn it off if it does, then the net effect is to always turn it off :) Cody Maloney wrote: The warning didn't exist in Clang 3.5 though I think, so one of our platforms doesn't need it. Other than that one compiler+version combo though it is always on So it should be safe (and desirable) to do ``-Wno-unused-local-typedef`` ``-Wno-unknown-warning-option`` then? - James --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33849/#review82524 --- On May 5, 2015, 5:52 p.m., James Peach wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33849/ --- (Updated May 5, 2015, 5:52 p.m.) Review request for mesos, Benjamin Hindman, Cody Maloney, and Timothy St. Clair. Bugs: MESOS-2666 https://issues.apache.org/jira/browse/MESOS-2666 Repository: mesos Description --- Import the AX_COMPILER_VERSION and AX_COMPILER_VENDOR macros from the autoconf archive and use them to detect and configure the supported copmilers. Diffs - configure.ac 14658f6f6920549f1948b8bd8890c67bc067e3dd m4/ax_compiler_vendor.m4 PRE-CREATION m4/ax_compiler_version.m4 PRE-CREATION Diff: https://reviews.apache.org/r/33849/diff/ Testing --- Make check on CentOS 7 and Mac OS X 10.10.3. Verified that the default CentOS 6 GCC toolchain is still rejected as too old. Thanks, James Peach
Re: Review Request 33823: Protocol file with the HTTP API messages between executor and slave
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33823/#review82545 --- Can you add comments to the protobufs as we did for scheduler proto? include/mesos/executor/executor.proto https://reviews.apache.org/r/33823/#comment133293 s/RUN/LAUNCH/ include/mesos/executor/executor.proto https://reviews.apache.org/r/33823/#comment133288 Is FrameworkID not present in FrameworkInfo? include/mesos/executor/executor.proto https://reviews.apache.org/r/33823/#comment133289 Ditto. SlaveID should already be in SlaveInfo? include/mesos/executor/executor.proto https://reviews.apache.org/r/33823/#comment133290 Why FrameworkInfo? include/mesos/executor/executor.proto https://reviews.apache.org/r/33823/#comment133291 Why SlaveID and FrameworkID? include/mesos/executor/executor.proto https://reviews.apache.org/r/33823/#comment133314 Ditto. Kill these. include/mesos/executor/executor.proto https://reviews.apache.org/r/33823/#comment133315 Kill these? include/mesos/executor/executor.proto https://reviews.apache.org/r/33823/#comment133317 s/StatusUpdate/Update/ include/mesos/executor/executor.proto https://reviews.apache.org/r/33823/#comment133318 No corresponding Type for this? Also, how and when is this used? - Vinod Kone On May 4, 2015, 10:21 p.m., Alexander Rojas wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33823/ --- (Updated May 4, 2015, 10:21 p.m.) Review request for mesos, Isabel Jimenez, Marco Massenzio, and Vinod Kone. Repository: mesos Description --- See summary. Diffs - include/mesos/executor/executor.hpp PRE-CREATION include/mesos/executor/executor.proto PRE-CREATION src/Makefile.am 93c7c8a807a33ab639be6289535bbd32022aa85b Diff: https://reviews.apache.org/r/33823/diff/ Testing --- make check Thanks, Alexander Rojas
Re: Review Request 33249: Send statusUpdate to scheduler on containerizer launch failure
On April 21, 2015, 11:25 p.m., Jie Yu wrote: src/slave/slave.cpp, lines 3065-3078 https://reviews.apache.org/r/33249/diff/3/?file=938221#file938221line3065 Instead of doing that in your way, can we just try to make sure `containerizer-wait` here will return a failure (or a Termination with some reason) when `containerizer-launch` fails. In that way, the `executorTerminated` will properly send status updates to the slave (TASK_LOST/TASK_FAILED). Or am I missing something? Jie Yu wrote: OK, I think I got confused by the ticket. There are actually two problems here. The problem I am refering to is the fact that we don't send status update to the scheduler if containerizer launch fails until executor reregistration timeout happens. Since for docker containerizer, someone might use a very large timeout value, ideally, the slave should send a status update to the scheduler right after containerizer launch fails. After chat with Jay, the problem you guys are refering to is the fact that the scheduler cannot disinguish between the case where the task has failed vs. the case where the configuration of a task is not correct, because in both cases, the scheduler will receive a TASK_FAILED/TASK_LOST. Jie Yu wrote: To address the first problem, I think the simplest way is to add a containerizer-destroy(..) in executorLaunched when containerizer-launch fails. In that way, it's going to trigger containerizer-wait and thus send status update to the scheduler. Jie Yu wrote: Regarding the second problem, IMO, we should include a reason field in Termination (https://issues.apache.org/jira/browse/MESOS-2035) and let sendExecutorTerminatedStatusUpdate to propagate the termination reason to the scheduler. Timothy Chen wrote: Reason field sounds good, I think what you proposed makes sense, in docker containerizer at least we also need to make sure termination message is set correctly as currently it doesn't contain all the error information that we pass back to the launch future. Jay Buffington wrote: Sorry for the confusion. There are three problems that are all related. Yes, we need to send statusUpdates as soon as containerizer launch fails and yes, we need to set the reason so the scheduler can distinguish a slave failure from a bad request. However, my intention with this patch is not to address either of those two problems. My goal with this patch is to simply send the containerizer launch failure message back to the scheduler. I am using Aurora with the docker containerizer. There are a myriad of reasons that dockerd could fail to docker run a container. Currently, when that fails the user sees a useless and incorrect Abnormal Executor Termination message in the Aurora web ui. With this patch they see the stderr output of docker run. This way they can report meaningful error messages to the operations team. I can update this patch to address the other two issues, but the key is that when the containerizer launch fails we have to send a statusUpdate with a message that contains future.failure(). Before this patch we were only logging it. The scheduler needs to get that error message. Jie Yu wrote: Thanks for clarifying it, Jay! In fact, my proposal above should be able to solve the third problem cleanly. Check out `Slave::sendExecutorTerminatedStatusUpdate`. The containerizer should properly set the message and reason fields in the Termination protobuf (basically why the container gets terminated and what's the error message). The slave will forward the reason and message to the scheduler through status update. I chatted with BenM about this yesterday, and there are a couple of notes I want to write down here. 1. We probably need multiple levels for TaskStatus::Reason. In other words, we probably need a repeated Reason reasons field in status update message. For instance, for a containerizer launch failure, we probably need two reasons: 1) the top level reason REASON_EXECUTOR_TERMINATED; 2) the second level reason REASON_DOCKER_PULL_FAILURE; 2. We probably want to allow extension to TaskStatus::Reason enum. For example, some framework/executor may want to add customized reasons. We could leverage the protobuf extension support to achieve that (https://developers.google.com/protocol-buffers/docs/proto#extensions). 3. The semantics around Termination is broken currently and we need to clean it up. For instance, the following code is broken, ``` void Slave::sendExecutorTerminatedStatusUpdate(...) { ... if (termination.isReady() termination.get().killed()) { taskState = TASK_FAILED; // TODO(dhamon): MESOS-2035: Add 'reason' to containerizer::Termination. reason = TaskStatus::REASON_MEMORY_LIMIT; }
Re: Review Request 31504: Add a basic filter to match all packets
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31504/ --- (Updated May 5, 2015, 6:37 p.m.) Review request for mesos, Chi Zhang, Ian Downes, and Jie Yu. Changes --- Address review comments Bugs: MESOS-2422 https://issues.apache.org/jira/browse/MESOS-2422 Repository: mesos Description --- We need a default filter which has the lowest priority and can match all packets that are not matched by the previous filters, so that no packet will escape (otherwise it would be dropped by fq_codel). Because basic filter can accept protocol as a parameter, the arp filter which is based on basic can move to this as well. Diffs (updated) - src/Makefile.am 93c7c8a807a33ab639be6289535bbd32022aa85b src/linux/routing/filter/basic.hpp PRE-CREATION src/linux/routing/filter/basic.cpp PRE-CREATION src/tests/routing_tests.cpp ce583b59bf9fb2ef855aa82ab6083ea11b138e55 Diff: https://reviews.apache.org/r/31504/diff/ Testing --- Run the testcase. Thanks, Cong Wang
Re: Review Request 33249: Send statusUpdate to scheduler on containerizer launch failure
On April 21, 2015, 4:25 p.m., Jie Yu wrote: src/slave/slave.cpp, lines 3065-3078 https://reviews.apache.org/r/33249/diff/3/?file=938221#file938221line3065 Instead of doing that in your way, can we just try to make sure `containerizer-wait` here will return a failure (or a Termination with some reason) when `containerizer-launch` fails. In that way, the `executorTerminated` will properly send status updates to the slave (TASK_LOST/TASK_FAILED). Or am I missing something? Jie Yu wrote: OK, I think I got confused by the ticket. There are actually two problems here. The problem I am refering to is the fact that we don't send status update to the scheduler if containerizer launch fails until executor reregistration timeout happens. Since for docker containerizer, someone might use a very large timeout value, ideally, the slave should send a status update to the scheduler right after containerizer launch fails. After chat with Jay, the problem you guys are refering to is the fact that the scheduler cannot disinguish between the case where the task has failed vs. the case where the configuration of a task is not correct, because in both cases, the scheduler will receive a TASK_FAILED/TASK_LOST. Jie Yu wrote: To address the first problem, I think the simplest way is to add a containerizer-destroy(..) in executorLaunched when containerizer-launch fails. In that way, it's going to trigger containerizer-wait and thus send status update to the scheduler. Jie Yu wrote: Regarding the second problem, IMO, we should include a reason field in Termination (https://issues.apache.org/jira/browse/MESOS-2035) and let sendExecutorTerminatedStatusUpdate to propagate the termination reason to the scheduler. Timothy Chen wrote: Reason field sounds good, I think what you proposed makes sense, in docker containerizer at least we also need to make sure termination message is set correctly as currently it doesn't contain all the error information that we pass back to the launch future. Jay Buffington wrote: Sorry for the confusion. There are three problems that are all related. Yes, we need to send statusUpdates as soon as containerizer launch fails and yes, we need to set the reason so the scheduler can distinguish a slave failure from a bad request. However, my intention with this patch is not to address either of those two problems. My goal with this patch is to simply send the containerizer launch failure message back to the scheduler. I am using Aurora with the docker containerizer. There are a myriad of reasons that dockerd could fail to docker run a container. Currently, when that fails the user sees a useless and incorrect Abnormal Executor Termination message in the Aurora web ui. With this patch they see the stderr output of docker run. This way they can report meaningful error messages to the operations team. I can update this patch to address the other two issues, but the key is that when the containerizer launch fails we have to send a statusUpdate with a message that contains future.failure(). Before this patch we were only logging it. The scheduler needs to get that error message. Jie Yu wrote: Thanks for clarifying it, Jay! In fact, my proposal above should be able to solve the third problem cleanly. Check out `Slave::sendExecutorTerminatedStatusUpdate`. The containerizer should properly set the message and reason fields in the Termination protobuf (basically why the container gets terminated and what's the error message). The slave will forward the reason and message to the scheduler through status update. I chatted with BenM about this yesterday, and there are a couple of notes I want to write down here. 1. We probably need multiple levels for TaskStatus::Reason. In other words, we probably need a repeated Reason reasons field in status update message. For instance, for a containerizer launch failure, we probably need two reasons: 1) the top level reason REASON_EXECUTOR_TERMINATED; 2) the second level reason REASON_DOCKER_PULL_FAILURE; 2. We probably want to allow extension to TaskStatus::Reason enum. For example, some framework/executor may want to add customized reasons. We could leverage the protobuf extension support to achieve that (https://developers.google.com/protocol-buffers/docs/proto#extensions). 3. The semantics around Termination is broken currently and we need to clean it up. For instance, the following code is broken, ``` void Slave::sendExecutorTerminatedStatusUpdate(...) { ... if (termination.isReady() termination.get().killed()) { taskState = TASK_FAILED; // TODO(dhamon): MESOS-2035: Add 'reason' to containerizer::Termination. reason = TaskStatus::REASON_MEMORY_LIMIT; }
Review Request 33865: Added RevocableInfo message to Resource protobuf.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33865/ --- Review request for mesos, Jie Yu, Joris Van Remoortere, and Niklas Nielsen. Bugs: MESOS-2691 https://issues.apache.org/jira/browse/MESOS-2691 Repository: mesos Description --- RevocableInfo currently supports OVERSUBSCRIBED resources. In the future it can be easily extended to use other types of revocable reosurces (e.g., resources allocated to other roles). Disabled the ability to use revocable resources for reservation or persistence because the semantics seem weird. We can enable it in the future if there is a use case for that. Diffs - include/mesos/mesos.proto db4fc8c001dd68bc3b9ca83650170c4f26db18c7 src/common/resources.cpp 235930ff2dbb3ea49a3a0696dc070f2bd56fba4b src/tests/resources_tests.cpp a7ec59ea217ad71f7d1e93ca6039d5b2491b3237 Diff: https://reviews.apache.org/r/33865/diff/ Testing --- make check Thanks, Vinod Kone
Re: Review Request 33733: Fixed style error with template definitions withing common/parse.hpp
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33733/#review82585 --- Ship it! Could you fill us in on the reasoning and/or update the styleguide for this? Thanks a bunch :) - Till Toenshoff On April 30, 2015, 10:45 p.m., Alexander Rojas wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33733/ --- (Updated April 30, 2015, 10:45 p.m.) Review request for mesos, Benjamin Hindman, Joerg Schad, Michael Park, and Till Toenshoff. Repository: mesos Description --- Replaces every instance of `template` to `template ` in the file `src/common/parse.hpp` Diffs - src/common/parse.hpp 547b32041f39f0ff0c38179b66a32b2239134abc Diff: https://reviews.apache.org/r/33733/diff/ Testing --- make check Thanks, Alexander Rojas
Re: Review Request 33849: mesos: use standard macros for compiler and vendor detection
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33849/#review82586 --- Patch looks great! Reviews applied: [33849] All tests passed. - Mesos ReviewBot On May 5, 2015, 5:52 p.m., James Peach wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33849/ --- (Updated May 5, 2015, 5:52 p.m.) Review request for mesos, Benjamin Hindman, Cody Maloney, and Timothy St. Clair. Bugs: MESOS-2666 https://issues.apache.org/jira/browse/MESOS-2666 Repository: mesos Description --- Import the AX_COMPILER_VERSION and AX_COMPILER_VENDOR macros from the autoconf archive and use them to detect and configure the supported copmilers. Diffs - configure.ac 14658f6f6920549f1948b8bd8890c67bc067e3dd m4/ax_compiler_vendor.m4 PRE-CREATION m4/ax_compiler_version.m4 PRE-CREATION Diff: https://reviews.apache.org/r/33849/diff/ Testing --- Make check on CentOS 7 and Mac OS X 10.10.3. Verified that the default CentOS 6 GCC toolchain is still rejected as too old. Thanks, James Peach
Re: Review Request 33643: Add EMPTY to stout hashset
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33643/ --- (Updated May 5, 2015, 9:50 p.m.) Review request for mesos and Benjamin Hindman. Repository: mesos Description --- See summary. Diffs - 3rdparty/libprocess/3rdparty/stout/include/stout/hashset.hpp d2b74393b7c4b65477698d9c810dfe3c8673c2ab Diff: https://reviews.apache.org/r/33643/diff/ Testing --- make check Thanks, Joris Van Remoortere
Re: Review Request 33505: Add state-summary endpoint to master.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33505/ --- (Updated May 5, 2015, 9:51 p.m.) Review request for mesos and Benjamin Hindman. Repository: mesos Description --- This exposes framework and slave statistics that are of high importance and aggregated per framework or slave respectively. The statistics will not be exact due to the circular buffer that stores completed tasks, but this was also an issue with state.json. Diffs (updated) - src/master/http.cpp fb448256d7ced1f47ea48ccfca2ae267bc26ef94 src/master/master.hpp 49ee050ca4d2b2c5f75ce864fcf6ae703dfdeadd src/master/master.cpp bee842557c8397428ca51e46faa182a391584be3 src/tests/master_tests.cpp bdfccb2427cba938dbbaa8e832255153172b0501 Diff: https://reviews.apache.org/r/33505/diff/ Testing --- make check Thanks, Joris Van Remoortere