[jira] [Commented] (MESOS-2297) Add authentication support for HTTP API
[ https://issues.apache.org/jira/browse/MESOS-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612951#comment-14612951 ] Till Toenshoff commented on MESOS-2297: --- +1 Add authentication support for HTTP API --- Key: MESOS-2297 URL: https://issues.apache.org/jira/browse/MESOS-2297 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Isabel Jimenez Labels: mesosphere To start with, we will only support basic http auth. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1457) Process IDs should be required to be human-readable
[ https://issues.apache.org/jira/browse/MESOS-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617296#comment-14617296 ] Till Toenshoff commented on MESOS-1457: --- I got pointed to this older issue as the patches did not get committed. Seems Palak's solution is acceptable. It would be great if we could indeed get a comment into the ProcessBase constructor stating something like the proposed {noformat} // Please provide a process ID prefix to ease debugging (See MESOS-1457). {noformat} [~PalakPC] could you possibly propose the above in a review-request and rebase those other two patches so we can get them committed? Process IDs should be required to be human-readable Key: MESOS-1457 URL: https://issues.apache.org/jira/browse/MESOS-1457 Project: Mesos Issue Type: Improvement Components: libprocess Reporter: Dominic Hamon Assignee: Palak Choudhary Priority: Minor When debugging, it's very useful to understand which processes are getting timeslices. As such, the human-readable names that can be passed to {{ProcessBase}} are incredibly valuable, however they are currently optional. If the constructor of {{ProcessBase}} took a mandatory string, every process would get a human-readable name and debugging would be much easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-3170) 0.23 Build fails when compiling against -lsasl2 which has been statically linked
[ https://issues.apache.org/jira/browse/MESOS-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff reassigned MESOS-3170: - Assignee: Till Toenshoff 0.23 Build fails when compiling against -lsasl2 which has been statically linked Key: MESOS-3170 URL: https://issues.apache.org/jira/browse/MESOS-3170 Project: Mesos Issue Type: Bug Components: build Affects Versions: 0.23.0 Reporter: Chris Heller Assignee: Till Toenshoff Priority: Minor Labels: easyfix Fix For: 0.24.0 If the sasl library has been statically linked the check from CRAM-MD5 can fail, due to missing symbols. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1010) Python extension build is broken if gflags-dev is installed
[ https://issues.apache.org/jira/browse/MESOS-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693621#comment-14693621 ] Till Toenshoff commented on MESOS-1010: --- I have no reopened that review request - will also discuss it with the some other committers today - stay tuned for more :) Python extension build is broken if gflags-dev is installed --- Key: MESOS-1010 URL: https://issues.apache.org/jira/browse/MESOS-1010 Project: Mesos Issue Type: Bug Components: build, python api Environment: Fedora 20, amd64, GCC: 4.8.2; OSX Yosemite, Apple LLVM 6.1.0 (~LLVM 3.6.0) Reporter: Nikita Vetoshkin Assignee: Greg Mann Labels: flaky-test, mesosphere In my environment mesos build from master results in broken python api module {{_mesos.so}}: {noformat} nekto0n@ya-darkstar ~/workspace/mesos/src/python $ PYTHONPATH=build/lib.linux-x86_64-2.7/ python -c import _mesos Traceback (most recent call last): File string, line 1, in module ImportError: /home/nekto0n/workspace/mesos/src/python/build/lib.linux-x86_64-2.7/_mesos.so: undefined symbol: _ZN6google14FlagRegistererC1EPKcS2_S2_S2_PvS3_ {noformat} Unmangled version of symbol looks like this: {noformat} google::FlagRegisterer::FlagRegisterer(char const*, char const*, char const*, char const*, void*, void*) {noformat} During {{./configure}} step {{glog}} finds {{gflags}} development files and starts using them, thus *implicitly* adding dependency on {{libgflags.so}}. This breaks Python extensions module and perhaps can break other mesos subsystems when moved to hosts without {{gflags}} installed. This task is done when the ExamplesTest.PythonFramework test will pass on a system with gflags installed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2946) Authorizer Module: Interface design
[ https://issues.apache.org/jira/browse/MESOS-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-2946: -- Description: h4.Motivation Design an interface covering authorizer modules while staying minimal invasive in regards to changes on the existing {{LocalAuthorizer}} implementation. was: Motivation Design an interface covering authorizer modules while staying minimal invasive in regards to changes on the existing {{LocalAuthorizer}} implementation. Authorizer Module: Interface design --- Key: MESOS-2946 URL: https://issues.apache.org/jira/browse/MESOS-2946 Project: Mesos Issue Type: Improvement Reporter: Till Toenshoff h4.Motivation Design an interface covering authorizer modules while staying minimal invasive in regards to changes on the existing {{LocalAuthorizer}} implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2947) Authorizer Module: Implementation, Integration Tests
Till Toenshoff created MESOS-2947: - Summary: Authorizer Module: Implementation, Integration Tests Key: MESOS-2947 URL: https://issues.apache.org/jira/browse/MESOS-2947 Project: Mesos Issue Type: Improvement Reporter: Till Toenshoff h4.Motivation Provide an example authorizer module based on the {{LocalAuthorizer}} implementation. Make sure that such authorizer module can be fully unit- and integration- tested within the mesos test suite. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2945) Create an Authorizer Module
[ https://issues.apache.org/jira/browse/MESOS-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-2945: -- Epic Name: Authorizer Module Create an Authorizer Module --- Key: MESOS-2945 URL: https://issues.apache.org/jira/browse/MESOS-2945 Project: Mesos Issue Type: Epic Reporter: Till Toenshoff h4. Motivation Allow for third parties to quickly develop and plug-in new authorizing methods. The modularized Authorizer API will lower the barrier for the community to provide new methods to Mesos. An example for such additional, next step module could be LDAP / AD backed authorization. Alternative authorizing methods may bring in new dependencies that we don't want to enforce on all of our users. Mesos users may be required to use custom authorizing techniques due to strict security policies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2947) Authorizer Module: Implementation, Integration Tests
[ https://issues.apache.org/jira/browse/MESOS-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-2947: -- Labels: mesosphere module security (was: ) Authorizer Module: Implementation, Integration Tests -- Key: MESOS-2947 URL: https://issues.apache.org/jira/browse/MESOS-2947 Project: Mesos Issue Type: Improvement Reporter: Till Toenshoff Assignee: Till Toenshoff Labels: mesosphere, module, security h4.Motivation Provide an example authorizer module based on the {{LocalAuthorizer}} implementation. Make sure that such authorizer module can be fully unit- and integration- tested within the mesos test suite. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2946) Authorizer Module: Interface design
Till Toenshoff created MESOS-2946: - Summary: Authorizer Module: Interface design Key: MESOS-2946 URL: https://issues.apache.org/jira/browse/MESOS-2946 Project: Mesos Issue Type: Improvement Reporter: Till Toenshoff Motivation Design an interface covering authorizer modules while staying minimal invasive in regards to changes on the existing {{LocalAuthorizer}} implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2946) Authorizer Module: Interface design
[ https://issues.apache.org/jira/browse/MESOS-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-2946: -- Description: h4.Motivation Design an interface covering authorizer modules while staying minimally invasive in regards to changes to the existing {{LocalAuthorizer}} implementation. was: h4.Motivation Design an interface covering authorizer modules while staying minimal invasive in regards to changes on the existing {{LocalAuthorizer}} implementation. Authorizer Module: Interface design --- Key: MESOS-2946 URL: https://issues.apache.org/jira/browse/MESOS-2946 Project: Mesos Issue Type: Improvement Reporter: Till Toenshoff h4.Motivation Design an interface covering authorizer modules while staying minimally invasive in regards to changes to the existing {{LocalAuthorizer}} implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2946) Authorizer Module: Interface design
[ https://issues.apache.org/jira/browse/MESOS-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-2946: -- Story Points: 2 Authorizer Module: Interface design --- Key: MESOS-2946 URL: https://issues.apache.org/jira/browse/MESOS-2946 Project: Mesos Issue Type: Improvement Reporter: Till Toenshoff Assignee: Till Toenshoff h4.Motivation Design an interface covering authorizer modules while staying minimally invasive in regards to changes to the existing {{LocalAuthorizer}} implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2945) Create an Authorizer Module
Till Toenshoff created MESOS-2945: - Summary: Create an Authorizer Module Key: MESOS-2945 URL: https://issues.apache.org/jira/browse/MESOS-2945 Project: Mesos Issue Type: Improvement Reporter: Till Toenshoff h4. Motivation Allow for third parties to quickly develop and plug-in new authorizing methods. The modularized Authorizer API will lower the barrier for the community to provide new methods to Mesos. An example for such additional, next step module could be LDAP / AD backed authorization. Alternative authorizing methods may bring in new dependencies that we don't want to enforce on all of our users. Mesos users may be required to use custom authorizing techniques due to strict security policies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (MESOS-3173) Mark Path::basename, Path::dirname as const functions.
[ https://issues.apache.org/jira/browse/MESOS-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3173: -- Comment: was deleted (was: https://reviews.apache.org/r/36773/) Mark Path::basename, Path::dirname as const functions. -- Key: MESOS-3173 URL: https://issues.apache.org/jira/browse/MESOS-3173 Project: Mesos Issue Type: Improvement Components: stout Reporter: Jan Schlicht Assignee: Jan Schlicht Priority: Trivial Labels: easyfix, mesosphere The functions Path::basename and Path::dirname in stout/path.hpp are not marked const, although they could. Marking them const would remove some ambiguities in the usage of these functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1457) Process IDs should be required to be human-readable
[ https://issues.apache.org/jira/browse/MESOS-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658324#comment-14658324 ] Till Toenshoff commented on MESOS-1457: --- Shepherd will get assigned shortly. Process IDs should be required to be human-readable Key: MESOS-1457 URL: https://issues.apache.org/jira/browse/MESOS-1457 Project: Mesos Issue Type: Improvement Components: libprocess Reporter: Dominic Hamon Assignee: Palak Choudhary Priority: Minor When debugging, it's very useful to understand which processes are getting timeslices. As such, the human-readable names that can be passed to {{ProcessBase}} are incredibly valuable, however they are currently optional. If the constructor of {{ProcessBase}} took a mandatory string, every process would get a human-readable name and debugging would be much easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-830) ExamplesTest.JavaFramework is flaky
[ https://issues.apache.org/jira/browse/MESOS-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658434#comment-14658434 ] Till Toenshoff commented on MESOS-830: -- [~greggomann] I added some debug code into that macro which told me that pthread_rwlock_wrlock returned 22 (Invalid Argument) and from that I assumed that the mutex in question had gotten killed already. ExamplesTest.JavaFramework is flaky --- Key: MESOS-830 URL: https://issues.apache.org/jira/browse/MESOS-830 Project: Mesos Issue Type: Bug Components: test Reporter: Vinod Kone Assignee: Greg Mann Labels: flaky, mesosphere Identify the cause of the following test failure: [ RUN ] ExamplesTest.JavaFramework Using temporary directory '/tmp/ExamplesTest_JavaFramework_wSc7u8' Enabling authentication for the framework I1120 15:13:39.820032 1681264640 master.cpp:285] Master started on 172.25.133.171:52576 I1120 15:13:39.820180 1681264640 master.cpp:299] Master ID: 201311201513-2877626796-52576-3234 I1120 15:13:39.820194 1681264640 master.cpp:302] Master only allowing authenticated frameworks to register! I1120 15:13:39.821197 1679654912 slave.cpp:112] Slave started on 1)@172.25.133.171:52576 I1120 15:13:39.821795 1679654912 slave.cpp:212] Slave resources: cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.822855 1682337792 slave.cpp:112] Slave started on 2)@172.25.133.171:52576 I1120 15:13:39.823652 1682337792 slave.cpp:212] Slave resources: cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.825330 1679118336 master.cpp:744] The newly elected leader is master@172.25.133.171:52576 I1120 15:13:39.825445 1679118336 master.cpp:748] Elected as the leading master! I1120 15:13:39.825907 1681264640 state.cpp:33] Recovering state from '/tmp/ExamplesTest_JavaFramework_wSc7u8/0/meta' I1120 15:13:39.826127 1681264640 status_update_manager.cpp:180] Recovering status update manager I1120 15:13:39.826331 1681801216 process_isolator.cpp:317] Recovering isolator I1120 15:13:39.826738 1682874368 slave.cpp:2743] Finished recovery I1120 15:13:39.827747 1682337792 state.cpp:33] Recovering state from '/tmp/ExamplesTest_JavaFramework_wSc7u8/1/meta' I1120 15:13:39.827945 1680191488 slave.cpp:112] Slave started on 3)@172.25.133.171:52576 I1120 15:13:39.828415 1682337792 status_update_manager.cpp:180] Recovering status update manager I1120 15:13:39.828608 1680728064 sched.cpp:260] Authenticating with master master@172.25.133.171:52576 I1120 15:13:39.828606 1680191488 slave.cpp:212] Slave resources: cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.828680 1682874368 slave.cpp:497] New master detected at master@172.25.133.171:52576 I1120 15:13:39.828765 1682337792 process_isolator.cpp:317] Recovering isolator I1120 15:13:39.829828 1680728064 sched.cpp:229] Detecting new master I1120 15:13:39.830288 1679654912 authenticatee.hpp:100] Initializing client SASL I1120 15:13:39.831635 1680191488 state.cpp:33] Recovering state from '/tmp/ExamplesTest_JavaFramework_wSc7u8/2/meta' I1120 15:13:39.831991 1679118336 status_update_manager.cpp:158] New master detected at master@172.25.133.171:52576 I1120 15:13:39.832042 1682874368 slave.cpp:524] Detecting new master I1120 15:13:39.832314 1682337792 slave.cpp:2743] Finished recovery I1120 15:13:39.832309 1681264640 master.cpp:1266] Attempting to register slave on vkone.local at slave(1)@172.25.133.171:52576 I1120 15:13:39.832929 1680728064 status_update_manager.cpp:180] Recovering status update manager I1120 15:13:39.833371 1681801216 slave.cpp:497] New master detected at master@172.25.133.171:52576 I1120 15:13:39.833273 1681264640 master.cpp:2513] Adding slave 201311201513-2877626796-52576-3234-0 at vkone.local with cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.833595 1680728064 process_isolator.cpp:317] Recovering isolator I1120 15:13:39.833859 1681801216 slave.cpp:524] Detecting new master I1120 15:13:39.833861 1682874368 status_update_manager.cpp:158] New master detected at master@172.25.133.171:52576 I1120 15:13:39.834092 1680191488 slave.cpp:542] Registered with master master@172.25.133.171:52576; given slave ID 201311201513-2877626796-52576-3234-0 I1120 15:13:39.834486 1681264640 master.cpp:1266] Attempting to register slave on vkone.local at slave(2)@172.25.133.171:52576 I1120 15:13:39.834549 1681264640 master.cpp:2513] Adding slave 201311201513-2877626796-52576-3234-1 at vkone.local with cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.834750 1680191488 slave.cpp:555] Checkpointing SlaveInfo to
[jira] [Updated] (MESOS-3170) 0.23 Build fails when compiling against -lsasl2 which has been statically linked
[ https://issues.apache.org/jira/browse/MESOS-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3170: -- Shepherd: Till Toenshoff 0.23 Build fails when compiling against -lsasl2 which has been statically linked Key: MESOS-3170 URL: https://issues.apache.org/jira/browse/MESOS-3170 Project: Mesos Issue Type: Bug Components: build Affects Versions: 0.23.0 Reporter: Chris Heller Assignee: Chris Heller Priority: Minor Labels: easyfix Fix For: 0.24.0 If the sasl library has been statically linked the check from CRAM-MD5 can fail, due to missing symbols. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3260) SchedulerTest.* are broken on OSX and CentOS
Till Toenshoff created MESOS-3260: - Summary: SchedulerTest.* are broken on OSX and CentOS Key: MESOS-3260 URL: https://issues.apache.org/jira/browse/MESOS-3260 Project: Mesos Issue Type: Bug Environment: OSX 10.10.5 (14F6a), Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn) Reporter: Till Toenshoff Priority: Blocker Running a plain configure and make check on OSX does currently lead to the following: {noformat} [ RUN ] SchedulerTest.Subscribe ../../src/tests/scheduler_tests.cpp:168: Failure Value of: event.get().type() Actual: HEARTBEAT Expected: Event::SUBSCRIBED Which is: SUBSCRIBED ../../src/tests/scheduler_tests.cpp:169: Failure Value of: event.get().subscribed().framework_id() Actual: Expected: id Which is: 20150813-222454-347252928-56290-60707- [ FAILED ] SchedulerTest.Subscribe (183 ms) [ RUN ] SchedulerTest.TaskRunning ../../src/tests/scheduler_tests.cpp:227: Failure Value of: event.get().type() Actual: HEARTBEAT Expected: Event::OFFERS Which is: OFFERS ../../src/tests/scheduler_tests.cpp:228: Failure Expected: (0) != (event.get().offers().offers().size()), actual: 0 vs 0 [libprotobuf FATAL ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/google/protobuf/repeated_field.h:824] CHECK failed: (index) (size()): ../../src/tests/scheduler_tests.cpp:237: Failure Actual function call count doesn't match EXPECT_CALL(containerizer, update(_, _))... Expected: to be called at least once Actual: never called - unsatisfied and active ../../src/tests/scheduler_tests.cpp:233: Failure Actual function call count doesn't match EXPECT_CALL(exec, launchTask(_, _))... Expected: to be called once Actual: never called - unsatisfied and active ../../src/tests/scheduler_tests.cpp:230: Failure Actual function call count doesn't match EXPECT_CALL(exec, registered(_, _, _, _))... Expected: to be called once Actual: never called - unsatisfied and active unknown file: Failure C++ exception with description CHECK failed: (index) (size()): thrown in the test body. *** Aborted at 1439497494 (unix time) try date -d @1439497494 if you are using GNU date *** PC: @ 0x7fb2c0f20490 (unknown) *** SIGBUS (@0x7fb2c0f20490) received by PID 60707 (TID 0x7fff7a876300) stack trace: *** @ 0x7fff8a77ef1a _sigtramp @ 0x7fff532c9990 (unknown) @0x10d3bcedb mesos::internal::tests::MesosTest::ShutdownSlaves() @0x10d3bce75 mesos::internal::tests::MesosTest::Shutdown() @0x10d3b7d47 mesos::internal::tests::MesosTest::TearDown() @0x10dbc8283 testing::internal::HandleSehExceptionsInMethodIfSupported() @0x10dbafab7 testing::internal::HandleExceptionsInMethodIfSupported() @0x10db6f8ba testing::Test::Run() @0x10db70deb testing::TestInfo::Run() @0x10db71ab7 testing::TestCase::Run() @0x10db804b3 testing::internal::UnitTestImpl::RunAllTests() @0x10dbc4fe3 testing::internal::HandleSehExceptionsInMethodIfSupported() @0x10dbb1ea7 testing::internal::HandleExceptionsInMethodIfSupported() @0x10db800b0 testing::UnitTest::Run() @0x10d10c8d1 RUN_ALL_TESTS() @0x10d108b87 main @ 0x7fff8da765c9 start Bus error: 10 {noformat} Results on CentOS look similar. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3149) Use setuptools to install python cli package
[ https://issues.apache.org/jira/browse/MESOS-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3149: -- Shepherd: Till Toenshoff Use setuptools to install python cli package Key: MESOS-3149 URL: https://issues.apache.org/jira/browse/MESOS-3149 Project: Mesos Issue Type: Task Reporter: haosdent Assignee: haosdent mesos-ps/mesos-cat which depends on src/cli/python/mesos could not work in OSX because src/cli/python is not installed to sys.path. It's time to finish this TODO. {code} # Add 'src/cli/python' to PYTHONPATH. # TODO(benh): Remove this if/when we install the 'mesos' module via # PIP and setuptools. PYTHONPATH=@abs_top_srcdir@/src/cli/python:${PYTHONPATH} {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2697) Add a /teardown endpoint on master to teardown a framework
[ https://issues.apache.org/jira/browse/MESOS-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642691#comment-14642691 ] Till Toenshoff commented on MESOS-2697: --- commit 90f3fec71535bdf9c0cd5fc90c62e19a86b92470 Author: Joerg Schad jo...@mesosphere.io Date: Mon Jul 27 14:17:12 2015 +0200 Updated Authorization documentation to use /teardown endpoint. With Mesos 0.23 the /shutdown endpoint has been deprecated in favor of the /teardown endpoint. See MESOS-2697 for details. Review: https://reviews.apache.org/r/36774 Add a /teardown endpoint on master to teardown a framework -- Key: MESOS-2697 URL: https://issues.apache.org/jira/browse/MESOS-2697 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Vinod Kone Fix For: 0.23.0 We plan to rename /shutdown endpoint to /teardown to be compatible with the new API. /shutdown will be deprecated in 0.24.0 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3785) Use URI content modification time to trigger fetcher cache updates.
[ https://issues.apache.org/jira/browse/MESOS-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3785: -- Target Version/s: 0.27.0 (was: 0.26.0) > Use URI content modification time to trigger fetcher cache updates. > --- > > Key: MESOS-3785 > URL: https://issues.apache.org/jira/browse/MESOS-3785 > Project: Mesos > Issue Type: Improvement > Components: fetcher >Reporter: Bernd Mathiske >Assignee: Benjamin Bannier > Labels: mesosphere > > Instead of using checksums to trigger fetcher cache updates, we can for > starters use the content modification time (mtime), which is available for a > number of download protocols, e.g. HTTP and HDFS. > Proposal: Instead of just fetching the content size, we fetch both size and > mtime together. As before, if there is no size, then caching fails and we > fall back on direct downloading to the sandbox. > Assuming a size is given, we compare the mtime from the fetch URI with the > mtime known to the cache. If it differs, we update the cache. (As a defensive > measure, a difference in size should also trigger an update.) > Not having an mtime available at the fetch URI is simply treated as a unique > valid mtime value that differs from all others. This means that when > initially there is no mtime, cache content remains valid until there is one. > Thereafter, anew lack of an mtime invalidates the cache once. In other > words: any change from no mtime to having one or back is the same as > encountering a new mtime. > Note that this scheme does not require any new protobuf fields. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3851) Investigate recent crashes in Command Executor
[ https://issues.apache.org/jira/browse/MESOS-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14998699#comment-14998699 ] Till Toenshoff commented on MESOS-3851: --- I will be committing the workaround patch Tim has provided https://reviews.apache.org/r/40107/ (thanks a bunch [~tnachen]!) shortly after running a final check on it. > Investigate recent crashes in Command Executor > -- > > Key: MESOS-3851 > URL: https://issues.apache.org/jira/browse/MESOS-3851 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Anand Mazumdar >Priority: Blocker > Labels: mesosphere > > Post https://reviews.apache.org/r/38900 i.e. updating CommandExecutor to > support rootfs. There seem to be some tests showing frequent crashes due to > assert violations. > {{FetcherCacheTest.SimpleEviction}} failed due to the following log: > {code} > I1107 19:36:46.360908 30657 slave.cpp:1793] Sending queued task '3' to > executor ''3' of framework 7d94c7fb-8950-4bcf-80c1-46112292dcd6- at > executor(1)@172.17.5.200:33871' > I1107 19:36:46.363682 1236 exec.cpp:297] > I1107 19:36:46.373569 1245 exec.cpp:210] Executor registered on slave > 7d94c7fb-8950-4bcf-80c1-46112292dcd6-S0 > @ 0x7f9f5a7db3fa google::LogMessage::Fail() > I1107 19:36:46.394081 1245 exec.cpp:222] Executor::registered took 395411ns > @ 0x7f9f5a7db359 google::LogMessage::SendToLog() > @ 0x7f9f5a7dad6a google::LogMessage::Flush() > @ 0x7f9f5a7dda9e google::LogMessageFatal::~LogMessageFatal() > @ 0x48d00a _CheckFatal::~_CheckFatal() > @ 0x49c99d > mesos::internal::CommandExecutorProcess::launchTask() > @ 0x4b3dd7 > _ZZN7process8dispatchIN5mesos8internal22CommandExecutorProcessEPNS1_14ExecutorDriverERKNS1_8TaskInfoES5_S6_EEvRKNS_3PIDIT_EEMSA_FvT0_T1_ET2_T3_ENKUlPNS_11ProcessBaseEE_clESL_ > @ 0x4c470c > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal22CommandExecutorProcessEPNS5_14ExecutorDriverERKNS5_8TaskInfoES9_SA_EEvRKNS0_3PIDIT_EEMSE_FvT0_T1_ET2_T3_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ > @ 0x7f9f5a761b1b std::function<>::operator()() > @ 0x7f9f5a749935 process::ProcessBase::visit() > @ 0x7f9f5a74d700 process::DispatchEvent::visit() > @ 0x48e004 process::ProcessBase::serve() > @ 0x7f9f5a745d21 process::ProcessManager::resume() > @ 0x7f9f5a742f52 > _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_ > @ 0x7f9f5a74cf2c > _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE > @ 0x7f9f5a74cedc > _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_ > @ 0x7f9f5a74ce6e > _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE > @ 0x7f9f5a74cdc5 > _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv > @ 0x7f9f5a74cd5e > _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv > @ 0x7f9f5624f1e0 (unknown) > @ 0x7f9f564a8df5 start_thread > @ 0x7f9f559b71ad __clone > I1107 19:36:46.551370 30656 containerizer.cpp:1257] Executor for container > '6553a617-6b4a-418d-9759-5681f45ff854' has exited > I1107 19:36:46.551429 30656 containerizer.cpp:1074] Destroying container > '6553a617-6b4a-418d-9759-5681f45ff854' > I1107 19:36:46.553869 30656 containerizer.cpp:1257] Executor for container > 'd2c1f924-c92a-453e-82b1-c294d09c4873' has exited > {code} > The reason seems to be a race between the executor receiving a > {{RunTaskMessage}} before {{ExecutorRegisteredMessage}} leading to the > {{CHECK_SOME(executorInfo)}} failure. > Link to complete log: > https://issues.apache.org/jira/browse/MESOS-2831?focusedCommentId=14995535=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14995535 > Another related failure from {{ExamplesTest.PersistentVolumeFramework}} > {code} > @ 0x7f4f71529cbd google::LogMessage::SendToLog() > I1107 13:15:09.949987 31573 slave.cpp:2337] Status update manager > successfully handled status update acknowledgement (UUID: > 721c7316-5580-4636-a83a-098e3bd4ed1f) for task > ad90531f-d3d8-43f6-96f2-c81c4548a12d of framework > ac4ea54a-7d19-4e41-9ee3-1a761f8e5b0f- > @ 0x7f4f715296ce google::LogMessage::Flush() > @
[jira] [Updated] (MESOS-3581) License headers show up all over doxygen documentation.
[ https://issues.apache.org/jira/browse/MESOS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3581: -- Target Version/s: (was: 0.26.0) > License headers show up all over doxygen documentation. > --- > > Key: MESOS-3581 > URL: https://issues.apache.org/jira/browse/MESOS-3581 > Project: Mesos > Issue Type: Documentation > Components: documentation >Affects Versions: 0.24.1 >Reporter: Benjamin Bannier >Assignee: Benjamin Bannier >Priority: Minor > Labels: mesosphere > > Currently license headers are commented in something resembling Javadoc style, > {code} > /** > * Licensed ... > {code} > Since we use Javadoc-style comment blocks for doxygen documentation all > license headers appear in the generated documentation, potentially and likely > hiding the actual documentation. > Using {{/*}} to start the comment blocks would be enough to hide them from > doxygen, but would likely also result in a largish (though mostly > uninteresting) patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3802) Clear the suppressed flag when deactive a framework
[ https://issues.apache.org/jira/browse/MESOS-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3802: -- Target Version/s: (was: 0.26.0) > Clear the suppressed flag when deactive a framework > --- > > Key: MESOS-3802 > URL: https://issues.apache.org/jira/browse/MESOS-3802 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.26.0 >Reporter: Guangya Liu >Assignee: Guangya Liu > > When deactivate the framework, the suppressed flag was not cleared and this > will cause the framework cannot get resource immediately after active, we > should clear this flag when deactivate the framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3418) Factor out V1 API test helper functions
[ https://issues.apache.org/jira/browse/MESOS-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3418: -- Target Version/s: 0.27.0 (was: 0.26.0) > Factor out V1 API test helper functions > --- > > Key: MESOS-3418 > URL: https://issues.apache.org/jira/browse/MESOS-3418 > Project: Mesos > Issue Type: Improvement >Reporter: Joris Van Remoortere >Assignee: Guangya Liu > Labels: beginner, mesosphere, newbie, v1_api > > We currently have some helper functionality for V1 API tests. This is copied > in a few test files. > Factor this out into a common place once the API is stabilized. > {code} > // Helper class for using EXPECT_CALL since the Mesos scheduler API > // is callback based. > class Callbacks > { > public: > MOCK_METHOD0(connected, void(void)); > MOCK_METHOD0(disconnected, void(void)); > MOCK_METHOD1(received, void(const std::queue&)); > }; > {code} > {code} > // Enqueues all received events into a libprocess queue. > // TODO(jmlvanre): Factor this common code out of tests into V1 > // helper. > ACTION_P(Enqueue, queue) > { > std::queue events = arg0; > while (!events.empty()) { > // Note that we currently drop HEARTBEATs because most of these tests > // are not designed to deal with heartbeats. > // TODO(vinod): Implement DROP_HTTP_CALLS that can filter heartbeats. > if (events.front().type() == Event::HEARTBEAT) { > VLOG(1) << "Ignoring HEARTBEAT event"; > } else { > queue->put(events.front()); > } > events.pop(); > } > } > {code} > We can also update the helpers in {{/tests/mesos.hpp}} to support the V1 API. > This would let us get ride of lines like: > {code} > v1::TaskInfo taskInfo = evolve(createTask(devolve(offer), "", > DEFAULT_EXECUTOR_ID)); > {code} > In favor of: > {code} > v1::TaskInfo taskInfo = createTask(offer, "", DEFAULT_EXECUTOR_ID); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3851) Investigate recent crashes in Command Executor
[ https://issues.apache.org/jira/browse/MESOS-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14998731#comment-14998731 ] Till Toenshoff commented on MESOS-3851: --- This following commit fixes the crash - we still may want to find the reasoning for the race condition and hence I will not close this ticket but will remove the target version (0.26.0) to unblock 0.26.0. {noformat} commit b6d4b28a4c9ca717ad8be5bbc27e40c005fc51ad Author: Timothy ChenDate: Tue Nov 10 15:46:17 2015 +0100 Removed unused checks in command executor. Review: https://reviews.apache.org/r/40107 {noformat} > Investigate recent crashes in Command Executor > -- > > Key: MESOS-3851 > URL: https://issues.apache.org/jira/browse/MESOS-3851 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Anand Mazumdar >Priority: Blocker > Labels: mesosphere > > Post https://reviews.apache.org/r/38900 i.e. updating CommandExecutor to > support rootfs. There seem to be some tests showing frequent crashes due to > assert violations. > {{FetcherCacheTest.SimpleEviction}} failed due to the following log: > {code} > I1107 19:36:46.360908 30657 slave.cpp:1793] Sending queued task '3' to > executor ''3' of framework 7d94c7fb-8950-4bcf-80c1-46112292dcd6- at > executor(1)@172.17.5.200:33871' > I1107 19:36:46.363682 1236 exec.cpp:297] > I1107 19:36:46.373569 1245 exec.cpp:210] Executor registered on slave > 7d94c7fb-8950-4bcf-80c1-46112292dcd6-S0 > @ 0x7f9f5a7db3fa google::LogMessage::Fail() > I1107 19:36:46.394081 1245 exec.cpp:222] Executor::registered took 395411ns > @ 0x7f9f5a7db359 google::LogMessage::SendToLog() > @ 0x7f9f5a7dad6a google::LogMessage::Flush() > @ 0x7f9f5a7dda9e google::LogMessageFatal::~LogMessageFatal() > @ 0x48d00a _CheckFatal::~_CheckFatal() > @ 0x49c99d > mesos::internal::CommandExecutorProcess::launchTask() > @ 0x4b3dd7 > _ZZN7process8dispatchIN5mesos8internal22CommandExecutorProcessEPNS1_14ExecutorDriverERKNS1_8TaskInfoES5_S6_EEvRKNS_3PIDIT_EEMSA_FvT0_T1_ET2_T3_ENKUlPNS_11ProcessBaseEE_clESL_ > @ 0x4c470c > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal22CommandExecutorProcessEPNS5_14ExecutorDriverERKNS5_8TaskInfoES9_SA_EEvRKNS0_3PIDIT_EEMSE_FvT0_T1_ET2_T3_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ > @ 0x7f9f5a761b1b std::function<>::operator()() > @ 0x7f9f5a749935 process::ProcessBase::visit() > @ 0x7f9f5a74d700 process::DispatchEvent::visit() > @ 0x48e004 process::ProcessBase::serve() > @ 0x7f9f5a745d21 process::ProcessManager::resume() > @ 0x7f9f5a742f52 > _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_ > @ 0x7f9f5a74cf2c > _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE > @ 0x7f9f5a74cedc > _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_ > @ 0x7f9f5a74ce6e > _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE > @ 0x7f9f5a74cdc5 > _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv > @ 0x7f9f5a74cd5e > _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv > @ 0x7f9f5624f1e0 (unknown) > @ 0x7f9f564a8df5 start_thread > @ 0x7f9f559b71ad __clone > I1107 19:36:46.551370 30656 containerizer.cpp:1257] Executor for container > '6553a617-6b4a-418d-9759-5681f45ff854' has exited > I1107 19:36:46.551429 30656 containerizer.cpp:1074] Destroying container > '6553a617-6b4a-418d-9759-5681f45ff854' > I1107 19:36:46.553869 30656 containerizer.cpp:1257] Executor for container > 'd2c1f924-c92a-453e-82b1-c294d09c4873' has exited > {code} > The reason seems to be a race between the executor receiving a > {{RunTaskMessage}} before {{ExecutorRegisteredMessage}} leading to the > {{CHECK_SOME(executorInfo)}} failure. > Link to complete log: > https://issues.apache.org/jira/browse/MESOS-2831?focusedCommentId=14995535=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14995535 > Another related failure from {{ExamplesTest.PersistentVolumeFramework}} > {code} > @ 0x7f4f71529cbd google::LogMessage::SendToLog() > I1107 13:15:09.949987 31573 slave.cpp:2337] Status
[jira] [Updated] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
[ https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3937: -- Description: {noformat} ../configure make check sudo ./bin/mesos-tests.sh --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose {noformat} {noformat} [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from DockerContainerizerTest I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in 4927ns I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the db in 1605ns I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received a broadcasted recover request from (4)@10.0.2.15:50088 I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from a replica in EMPTY status I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to STARTING I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 1.016098ms I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to STARTING I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status received a broadcasted recover request from (5)@10.0.2.15:50088 I1117 15:08:09.282552 26400 master.cpp:367] Master 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on 10.0.2.15:50088 I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/40AlT8/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" --quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="25secs" --registry_strict="true" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" --zk_session_timeout="10secs" I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing authenticated frameworks to register I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing authenticated slaves to register I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for authentication from '/tmp/40AlT8/credentials' I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from a replica in STARTING status I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' authenticator I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 1.075466ms I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to VOTING I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the Paxos group I1117 15:08:09.286471 26401 recover.cpp:464] Recover process terminated I1117 15:08:09.287303 26400 authenticator.cpp:520] Initializing server SASL I1117 15:08:09.289371 26400 master.cpp:495] Authorization enabled I1117 15:08:09.296018 26399 master.cpp:1606] The newly elected leader is master@10.0.2.15:50088 with id 59c600f1-92ff-4926-9c84-073d9b81f68a I1117 15:08:09.296115 26399 master.cpp:1619] Elected as the leading master! I1117 15:08:09.296187 26399 master.cpp:1379] Recovering from registrar I1117 15:08:09.296717 26397 registrar.cpp:309] Recovering registrar I1117 15:08:09.298842 26396 log.cpp:661] Attempting to start the writer I1117 15:08:09.301563 26394 replica.cpp:496] Replica received implicit promise request from (6)@10.0.2.15:50088 with proposal 1 I1117 15:08:09.302561 26394 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 922719ns I1117 15:08:09.302635 26394 replica.cpp:345] Persisted promised to 1 I1117 15:08:09.303755 26394 coordinator.cpp:240] Coordinator attempting to fill missing positions I1117 15:08:09.306161 26394 replica.cpp:391] Replica received explicit promise request from (7)@10.0.2.15:50088 for position 0 with proposal 2 I1117 15:08:09.306972 26394
[jira] [Commented] (MESOS-3583) Introduce sessions in HTTP Scheduler API Subscribed Responses
[ https://issues.apache.org/jira/browse/MESOS-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992311#comment-14992311 ] Till Toenshoff commented on MESOS-3583: --- [~anandmazumdar] shall we push this towards 0.27.0? > Introduce sessions in HTTP Scheduler API Subscribed Responses > - > > Key: MESOS-3583 > URL: https://issues.apache.org/jira/browse/MESOS-3583 > Project: Mesos > Issue Type: Task >Reporter: Anand Mazumdar > Labels: mesosphere, tech-debt > > Currently, the HTTP Scheduler API has no concept of Sessions aka > {{SessionID}} or a {{TokenID}}. This is useful in some failure scenarios. As > of now, if a framework fails over and then subscribes again with the same > {{FrameworkID}} with the {{force}} option set. The Mesos master would > subscribe it. > If the previous instance of the framework/scheduler tries to send a Call , > e.g. {{Call::KILL}} with the same previous {{FrameworkID}} set, it would be > still accepted by the master leading to erroneously killing a task. > This is possible because we do not have a way currently of distinguishing > connections. It used to work in the previous driver implementation due to the > master also performing a {{UPID}} check to verify if they matched and only > then allowing the call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3460) Update Java Test Framework Support QuiesceOffer and reviveOffer
[ https://issues.apache.org/jira/browse/MESOS-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3460: -- Target Version/s: 0.27.0 (was: 0.26.0) > Update Java Test Framework Support QuiesceOffer and reviveOffer > --- > > Key: MESOS-3460 > URL: https://issues.apache.org/jira/browse/MESOS-3460 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.26.0 >Reporter: Guangya Liu > Fix For: 0.26.0 > > > This is a follow up for https://reviews.apache.org/r/38120/ , we need to add > Java framework support for quieseceOffers -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3461) Update Python Test Framework Support QuiesceOffer and reviveOffer
[ https://issues.apache.org/jira/browse/MESOS-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3461: -- Target Version/s: 0.27.0 (was: 0.26.0) > Update Python Test Framework Support QuiesceOffer and reviveOffer > - > > Key: MESOS-3461 > URL: https://issues.apache.org/jira/browse/MESOS-3461 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.26.0 >Reporter: Guangya Liu > Fix For: 0.26.0 > > > This is a follow up for https://reviews.apache.org/r/38121/ , we need to add > Python framework support for quieseceOffers -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3756) Generalized HTTP Authentication Modules
[ https://issues.apache.org/jira/browse/MESOS-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3756: -- Target Version/s: 0.27.0 (was: 0.26.0) > Generalized HTTP Authentication Modules > --- > > Key: MESOS-3756 > URL: https://issues.apache.org/jira/browse/MESOS-3756 > Project: Mesos > Issue Type: Task > Components: modules >Reporter: Bernd Mathiske >Assignee: Alexander Rojas > > Libprocess is going to factor out an authentication interface: MESOS-3231 > Here we propose that Mesos can provide implementations for this interface as > Mesos modules. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3461) Update Python Test Framework Support QuiesceOffer and reviveOffer
[ https://issues.apache.org/jira/browse/MESOS-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3461: -- Fix Version/s: (was: 0.26.0) > Update Python Test Framework Support QuiesceOffer and reviveOffer > - > > Key: MESOS-3461 > URL: https://issues.apache.org/jira/browse/MESOS-3461 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.26.0 >Reporter: Guangya Liu > > This is a follow up for https://reviews.apache.org/r/38121/ , we need to add > Python framework support for quieseceOffers -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3460) Update Java Test Framework Support QuiesceOffer and reviveOffer
[ https://issues.apache.org/jira/browse/MESOS-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3460: -- Fix Version/s: (was: 0.26.0) > Update Java Test Framework Support QuiesceOffer and reviveOffer > --- > > Key: MESOS-3460 > URL: https://issues.apache.org/jira/browse/MESOS-3460 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.26.0 >Reporter: Guangya Liu > > This is a follow up for https://reviews.apache.org/r/38120/ , we need to add > Java framework support for quieseceOffers -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2295) Implement the Call endpoint on Slave
[ https://issues.apache.org/jira/browse/MESOS-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992314#comment-14992314 ] Till Toenshoff commented on MESOS-2295: --- [~anandmazumdar] Seems this issue got resolved via the subtasks, correct? > Implement the Call endpoint on Slave > > > Key: MESOS-2295 > URL: https://issues.apache.org/jira/browse/MESOS-2295 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Anand Mazumdar > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3810) Must be able to use NetworkInfo with mesos-executor
[ https://issues.apache.org/jira/browse/MESOS-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3810: -- Assignee: Spike Curtis > Must be able to use NetworkInfo with mesos-executor > --- > > Key: MESOS-3810 > URL: https://issues.apache.org/jira/browse/MESOS-3810 > Project: Mesos > Issue Type: Bug > Components: containerization, slave >Affects Versions: 0.25.0 >Reporter: Spike Curtis >Assignee: Spike Curtis >Priority: Blocker > > ContainerInfo with included NetworkInfo can appear in one of two places > during a task launch: in the ExecutorInfo.container, or if using the > mesos-executor (aka command executor), within the TaskInfo.container. > Mesos 0.25.0 correctly supports the former, but not the latter. In that > case, the MesosContainerizer fails the task launch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3558) Make the CommandExecutor use the Executor Library speaking HTTP
[ https://issues.apache.org/jira/browse/MESOS-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3558: -- Target Version/s: 0.27.0 (was: 0.26.0) > Make the CommandExecutor use the Executor Library speaking HTTP > --- > > Key: MESOS-3558 > URL: https://issues.apache.org/jira/browse/MESOS-3558 > Project: Mesos > Issue Type: Task >Reporter: Anand Mazumdar > Labels: mesosphere > > Instead of using the {{MesosExecutorDriver}} , we should make the > {{CommandExecutor}} in {{src/launcher/executor.cpp}} use the new Executor > HTTP Library that we create in {{MESOS-3550}}. > This would act as a good validation of the {{HTTP API}} implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3688) Get Container Name information when launching a container task
[ https://issues.apache.org/jira/browse/MESOS-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3688: -- Target Version/s: 0.27.0 (was: 0.26.0) > Get Container Name information when launching a container task > -- > > Key: MESOS-3688 > URL: https://issues.apache.org/jira/browse/MESOS-3688 > Project: Mesos > Issue Type: Improvement > Components: containerization >Affects Versions: 0.24.1 >Reporter: Raffaele Di Fazio >Assignee: Kapil Arya > Labels: mesosphere > > We want to get the Docker Name (or Docker ID, or both) when launching a > container task with mesos. The container name is generated by mesos itself > (i.e. mesos-77e5fde6-83e7-4618-a2dd-d5b10f2b4d25, obtained with "docker ps") > and it would be nice to expose this information to frameworks so that this > information can be used, for example by Marathon to give this information to > users via a REST API. > To go a bit in depth with our use case, we have files created by fluentd > logdriver that are named with Docker Name or Docker ID (full or short) and we > need a mapping for the users of the REST API and thus the first step is to > make this information available from mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3849) Corrected style in Makefiles
[ https://issues.apache.org/jira/browse/MESOS-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3849: -- Target Version/s: 0.26.0 > Corrected style in Makefiles > > > Key: MESOS-3849 > URL: https://issues.apache.org/jira/browse/MESOS-3849 > Project: Mesos > Issue Type: Bug > Components: build >Reporter: Alexander Rukletsov >Assignee: Alexander Rukletsov > Labels: mesosphere > > Order of files in Makefiles is not strictly alphabetic -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3835) Expose framework principal through state.json/state
[ https://issues.apache.org/jira/browse/MESOS-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3835: -- Affects Version/s: (was: 0.26.0) Fix Version/s: (was: 0.26.0) > Expose framework principal through state.json/state > --- > > Key: MESOS-3835 > URL: https://issues.apache.org/jira/browse/MESOS-3835 > Project: Mesos > Issue Type: Wish > Components: master >Reporter: Sargun Dhillon >Assignee: Guangya Liu >Priority: Trivial > > We would like to expose the framework principal through the Master > /state.json or /state. This is for the purposes of both debugging (from the > operator perspective). This could be used for inspection during the process > of creating, or modifying ACLs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3736) Support docker local store pull same image simultaneously
[ https://issues.apache.org/jira/browse/MESOS-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995392#comment-14995392 ] Till Toenshoff commented on MESOS-3736: --- [~gilbert] I have bumped this to 0.27.0 as the RR seems to be WIP and we would love to cut 0.26.0 very soon. > Support docker local store pull same image simultaneously > -- > > Key: MESOS-3736 > URL: https://issues.apache.org/jira/browse/MESOS-3736 > Project: Mesos > Issue Type: Improvement >Reporter: Gilbert Song >Assignee: Gilbert Song > Labels: mesosphere > > The current local store implements get() using the local puller. For all > requests of pulling same docker image at the same time, the local puller just > untar the image tarball as many times as those requests are, and cp all of > them to the same directory, which wastes time and bear high demand of > computation. We should be able to support the local store/puller only do > these for the first time, and the simultaneous pulling request should wait > for the promised future and get it once the first pulling finishes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3736) Support docker local store pull same image simultaneously
[ https://issues.apache.org/jira/browse/MESOS-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3736: -- Target Version/s: 0.27.0 (was: 0.26.0) > Support docker local store pull same image simultaneously > -- > > Key: MESOS-3736 > URL: https://issues.apache.org/jira/browse/MESOS-3736 > Project: Mesos > Issue Type: Improvement >Reporter: Gilbert Song >Assignee: Gilbert Song > Labels: mesosphere > > The current local store implements get() using the local puller. For all > requests of pulling same docker image at the same time, the local puller just > untar the image tarball as many times as those requests are, and cp all of > them to the same directory, which wastes time and bear high demand of > computation. We should be able to support the local store/puller only do > these for the first time, and the simultaneous pulling request should wait > for the promised future and get it once the first pulling finishes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3554) Allocator changes trigger large re-compiles
[ https://issues.apache.org/jira/browse/MESOS-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995399#comment-14995399 ] Till Toenshoff commented on MESOS-3554: --- [~jvanremoortere] from the status of this issue it seems this is not entirely resolved - shall we bump it up to 0.27.0, so that this does not block 0.26.0? > Allocator changes trigger large re-compiles > --- > > Key: MESOS-3554 > URL: https://issues.apache.org/jira/browse/MESOS-3554 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Joris Van Remoortere >Assignee: Joris Van Remoortere > Labels: mesosphere > > Due to the templatized nature of the allocator, even small changes trigger > large recompiles of the code-base. This make iterating on changes expensive > for developers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3838) Put authorize logic for teardown into a common function
[ https://issues.apache.org/jira/browse/MESOS-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3838: -- Target Version/s: 0.27.0 (was: 0.26.0) > Put authorize logic for teardown into a common function > --- > > Key: MESOS-3838 > URL: https://issues.apache.org/jira/browse/MESOS-3838 > Project: Mesos > Issue Type: Bug >Reporter: Guangya Liu >Assignee: Guangya Liu > > The mesos now have {{authorizeTask}}, {{authorizeFramework}} and may have > {{authorizeReserveResource}} and {{authorizeUnReserveResource}} later. > But now the {{Master::Http::teardown()}} is putting the authorize logic in > the {{Master::Http::teardown()}} itself, it is better to put authorize logic > for teardown into a common function {{authorizeTeardown()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3802) Clear the suppressed flag when deactive a framework
[ https://issues.apache.org/jira/browse/MESOS-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3802: -- Fix Version/s: (was: 0.26.0) > Clear the suppressed flag when deactive a framework > --- > > Key: MESOS-3802 > URL: https://issues.apache.org/jira/browse/MESOS-3802 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.26.0 >Reporter: Guangya Liu >Assignee: Guangya Liu > > When deactivate the framework, the suppressed flag was not cleared and this > will cause the framework cannot get resource immediately after active, we > should clear this flag when deactivate the framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3035) As a Developer I would like a standard way to run a Subprocess in libprocess
[ https://issues.apache.org/jira/browse/MESOS-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3035: -- Target Version/s: 0.27.0 (was: 0.26.0) > As a Developer I would like a standard way to run a Subprocess in libprocess > > > Key: MESOS-3035 > URL: https://issues.apache.org/jira/browse/MESOS-3035 > Project: Mesos > Issue Type: Story > Components: libprocess >Reporter: Marco Massenzio >Assignee: Marco Massenzio > > As part of MESOS-2830 and MESOS-2902 I have been researching the ability to > run a {{Subprocess}} and capture the {{stdout / stderr}} along with the exit > status code. > {{process::subprocess()}} offers much of the functionality, but in a way that > still requires a lot of handiwork on the developer's part; we would like to > further abstract away the ability to just pass a string, an optional set of > command-line arguments and then collect the output of the command (bonus: > without blocking). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3835) Expose framework principal through state.json/state
[ https://issues.apache.org/jira/browse/MESOS-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3835: -- Target Version/s: 0.27.0 (was: 0.26.0) > Expose framework principal through state.json/state > --- > > Key: MESOS-3835 > URL: https://issues.apache.org/jira/browse/MESOS-3835 > Project: Mesos > Issue Type: Wish > Components: master >Reporter: Sargun Dhillon >Assignee: Guangya Liu >Priority: Trivial > > We would like to expose the framework principal through the Master > /state.json or /state. This is for the purposes of both debugging (from the > operator perspective). This could be used for inspection during the process > of creating, or modifying ACLs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3841) Master HTTP API support to get the leader
[ https://issues.apache.org/jira/browse/MESOS-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3841: -- Fix Version/s: (was: 0.26.0) > Master HTTP API support to get the leader > - > > Key: MESOS-3841 > URL: https://issues.apache.org/jira/browse/MESOS-3841 > Project: Mesos > Issue Type: Improvement > Components: HTTP API >Reporter: Cosmin Lehene > > There's currently no good way to query the current master ensemble leader. > Some workarounds to get the leader (and parse it from leader@ip) from > {{/state.json}} or to grep it from {{master/redirect}}. > The scheduler API does an HTTP redirect, but that requires an HTTP POST > coming from a framework as well > {{POST /api/v1/scheduler HTTP/1.1}} > There should be a lightweight API call to get the current master. > This could be part of a more granular representation (REST) of the current > state.json. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2077) Ensure that TASK_LOSTs for a hard slave drain (SIGUSR1) include a Reason.
[ https://issues.apache.org/jira/browse/MESOS-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-2077: -- Target Version/s: 0.27.0 (was: 0.26.0) > Ensure that TASK_LOSTs for a hard slave drain (SIGUSR1) include a Reason. > - > > Key: MESOS-2077 > URL: https://issues.apache.org/jira/browse/MESOS-2077 > Project: Mesos > Issue Type: Improvement > Components: master, slave >Reporter: Benjamin Mahler >Assignee: Guangya Liu > Labels: twitter > > For maintenance, sometimes operators will force the drain of a slave (via > SIGUSR1), when deemed safe (e.g. non-critical tasks running) and/or necessary > (e.g. bad hardware). > To eliminate alerting noise, we'd like to add a 'Reason' that expresses the > forced drain of the slave, so that these are not considered to be a generic > slave removal TASK_LOST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3769) Agent logs are misleading during agent shutdown
[ https://issues.apache.org/jira/browse/MESOS-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3769: -- Fix Version/s: (was: 0.26.0) > Agent logs are misleading during agent shutdown > --- > > Key: MESOS-3769 > URL: https://issues.apache.org/jira/browse/MESOS-3769 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.26.0 >Reporter: Alexander Rukletsov >Assignee: Guangya Liu >Priority: Minor > Labels: newbie > > When analyzing output of the {{MasterAllocatorTest.SlaveLost}} test I spotted > following logs: > {noformat} > I1020 18:18:09.026553 237658112 status_update_manager.cpp:322] Received > status update TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for > task 0 of framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- > I1020 18:18:09.026845 234438656 slave.cpp:3090] Forwarding the update > TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for task 0 of > framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- to > master@172.18.6.110:62507 > I1020 18:18:09.026973 234438656 slave.cpp:651] ; unregistering and shutting > down > I1020 18:18:09.027007 234438656 slave.cpp:2016] Asked to shut down framework > 7aff439d-307c-486b-9c0d-c2a47ddbda5b- by @0.0.0.0:0 > I1020 18:18:09.027019 234438656 slave.cpp:2041] Shutting down framework > 7aff439d-307c-486b-9c0d-c2a47ddbda5b- > {noformat} > It looks like {{Slave::shutdown()}} uses wrong assumptions about possible > execution paths. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3420) Resolve shutdown semantics for Machine/Down
[ https://issues.apache.org/jira/browse/MESOS-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993475#comment-14993475 ] Till Toenshoff commented on MESOS-3420: --- [~klaus1982] we are preparing to tag 0.26.0 - shall we push this one to 0.27.0? > Resolve shutdown semantics for Machine/Down > --- > > Key: MESOS-3420 > URL: https://issues.apache.org/jira/browse/MESOS-3420 > Project: Mesos > Issue Type: Task >Reporter: Joris Van Remoortere >Assignee: Klaus Ma > Labels: maintenance, mesosphere > > When an operator uses the {{machine/down}} endpoint, the master sends a > shutdown message to the agent. > We need to discuss and resolve the semantics that we want regarding the > operators and frameworks knowing when their tasks are terminated. > One option is to explicitly remove the agent from the master which will send > the {{TASK_LOST}} updates and {{SlaveLostMessage}} directly from the master. > The concern around this is that during a network partition, or if the agent > was down at the time, that these tasks could still be running. > This is a general problem related to task life-times being dissociated with > that life-time of the agent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3339) Implement filtering mechanism for (Scheduler API Events) Testing
[ https://issues.apache.org/jira/browse/MESOS-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993480#comment-14993480 ] Till Toenshoff commented on MESOS-3339: --- [~anandmazumdar] we are preparing to tag the 0.26.0 release. The posted RR seems to be work-in-progress. Shall we push this one to 0.27.0? > Implement filtering mechanism for (Scheduler API Events) Testing > > > Key: MESOS-3339 > URL: https://issues.apache.org/jira/browse/MESOS-3339 > Project: Mesos > Issue Type: Task > Components: test >Reporter: Anand Mazumdar >Assignee: Anand Mazumdar > Labels: mesosphere > > Currently, our testing infrastructure does not have a mechanism of > filtering/dropping HTTP events of a particular type from the Scheduler API > response stream. We need a {{DROP_HTTP_CALLS}} abstraction that can help us > to filter a particular event type. > {code} > // Enqueues all received events into a libprocess queue. > ACTION_P(Enqueue, queue) > { > std::queue events = arg0; > while (!events.empty()) { > // Note that we currently drop HEARTBEATs because most of these tests > // are not designed to deal with heartbeats. > // TODO(vinod): Implement DROP_HTTP_CALLS that can filter heartbeats. > if (events.front().type() == Event::HEARTBEAT) { > VLOG(1) << "Ignoring HEARTBEAT event"; > } else { > queue->put(events.front()); > } > events.pop(); > } > } > {code} > This helper code is duplicated in at least two places currently, Scheduler > Library/Maintenance Primitives tests. > - The solution can be as trivial as moving this helper function to a common > test-header. > - Implement a {{DROP_HTTP_CALLS}} similar to what we do for other protobufs > via {{DROP_CALLS}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3832) Scheduler HTTP API does not redirect to leading master
[ https://issues.apache.org/jira/browse/MESOS-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993578#comment-14993578 ] Till Toenshoff commented on MESOS-3832: --- [~drexin] are you working on this one, cause we are preparing to tag 0.26.0 - shall we push this one to 0.27.0? > Scheduler HTTP API does not redirect to leading master > -- > > Key: MESOS-3832 > URL: https://issues.apache.org/jira/browse/MESOS-3832 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Affects Versions: 0.24.0, 0.24.1, 0.25.0 >Reporter: Dario Rexin >Assignee: Dario Rexin > Labels: newbie > > The documentation for the Scheduler HTTP API says: > {quote}If requests are made to a non-leading master a “HTTP 307 Temporary > Redirect” will be received with the “Location” header pointing to the leading > master.{quote} > While the redirect functionality has been implemented, it was not actually > used in the handler for the HTTP api. > A probable fix could be: > - Check if the current master is the leading master. > - If not, invoke the existing {{redirect}} method in {{src/master/http.cpp}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3405) Add JSON::protobuf for google::protobuf::RepeatedPtrField.
[ https://issues.apache.org/jira/browse/MESOS-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993583#comment-14993583 ] Till Toenshoff commented on MESOS-3405: --- [~mcypark] seems the RRs are ready to commit, no? We would love to cut the 0.26.0 and hence need to get this committed or pushed to 0.27.0. > Add JSON::protobuf for google::protobuf::RepeatedPtrField. > -- > > Key: MESOS-3405 > URL: https://issues.apache.org/jira/browse/MESOS-3405 > Project: Mesos > Issue Type: Task > Components: stout >Reporter: Michael Park >Assignee: Klaus Ma > > Currently, {{stout/protobuf.hpp}} provides a {{JSON::Protobuf}} utility which > converts a {{google::protobuf::Message}} into a {{JSON::Object}}. > We should add the support for {{google::protobuf::RepeatedPtrField}} by > introducing overloaded functions. > {code} > namespace JSON { > Object protobuf(const google::protobuf::Message& message) > { > Object object; > /* Move the body of JSON::Protobuf constructor here. */ > return object; > } > template > Array protobuf(const google::protobuf::RepeatedPtrField& repeated) > { > static_assert(std::is_convertible::value, > "T must be a google::protobuf::Message"); > JSON::Array array; > array.values.reserve(repeated.size()); > foreach (const T& elem, repeated) { > array.values.push_back(JSON::Protobuf(elem)); > } > return array; > } > } > {code} > The new {{RepeatedPtrField}} version can be used in at least the following > places: > * {{src/common/http.cpp}} > * {{src/master/http.cpp}} > * {{src/slave/containerizer/mesos/containerizer.cpp}} > * {{src/tests/reservation_endpoints_tests.cpp}} > * {{src/tests/resources_tests.cpp}}: {{ResourcesTest.ParsingFromJSON}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3063) Add an example framework using dynamic reservation
[ https://issues.apache.org/jira/browse/MESOS-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993587#comment-14993587 ] Till Toenshoff commented on MESOS-3063: --- [~klaus1982] shall we push this towards 0.27.0? > Add an example framework using dynamic reservation > -- > > Key: MESOS-3063 > URL: https://issues.apache.org/jira/browse/MESOS-3063 > Project: Mesos > Issue Type: Task >Reporter: Michael Park >Assignee: Klaus Ma > Labels: mesosphere, persistent-volumes > > An example framework using dynamic reservation should added to > # test dynamic reservations further, and > # to be used as a reference for those who want to use the dynamic reservation > feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2077) Ensure that TASK_LOSTs for a hard slave drain (SIGUSR1) include a Reason.
[ https://issues.apache.org/jira/browse/MESOS-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-2077: -- Affects Version/s: (was: 0.26.0) Fix Version/s: (was: 0.26.0) > Ensure that TASK_LOSTs for a hard slave drain (SIGUSR1) include a Reason. > - > > Key: MESOS-2077 > URL: https://issues.apache.org/jira/browse/MESOS-2077 > Project: Mesos > Issue Type: Improvement > Components: master, slave >Reporter: Benjamin Mahler >Assignee: Guangya Liu > Labels: twitter > > For maintenance, sometimes operators will force the drain of a slave (via > SIGUSR1), when deemed safe (e.g. non-critical tasks running) and/or necessary > (e.g. bad hardware). > To eliminate alerting noise, we'd like to add a 'Reason' that expresses the > forced drain of the slave, so that these are not considered to be a generic > slave removal TASK_LOST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3838) Put authorize logic for teardown into a common function
[ https://issues.apache.org/jira/browse/MESOS-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3838: -- Affects Version/s: (was: 0.26.0) Fix Version/s: (was: 0.26.0) > Put authorize logic for teardown into a common function > --- > > Key: MESOS-3838 > URL: https://issues.apache.org/jira/browse/MESOS-3838 > Project: Mesos > Issue Type: Bug >Reporter: Guangya Liu >Assignee: Guangya Liu > > The mesos now have {{authorizeTask}}, {{authorizeFramework}} and may have > {{authorizeReserveResource}} and {{authorizeUnReserveResource}} later. > But now the {{Master::Http::teardown()}} is putting the authorize logic in > the {{Master::Http::teardown()}} itself, it is better to put authorize logic > for teardown into a common function {{authorizeTeardown()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3728) Libprocess: Flaky behavior on test suite when finalizing.
[ https://issues.apache.org/jira/browse/MESOS-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956605#comment-14956605 ] Till Toenshoff commented on MESOS-3728: --- Just got a slightly different trace {noformat} E1014 11:54:17.246136 4820992 process.cpp:1914] Failed to shutdown socket with fd 15: Socket is not connected Assertion failed: (ec == 0), function unlock, file /BuildRoot/Library/Caches/com.apple.xbs/Sources/libcxx/libcxx-120.1/src/mutex.cpp, line 45. E1014 11:54:17.246408 4820992 process.cpp:1914] Failed to shutdown socket with fd 17: Socket is not connected *** Aborted at 1444816457 (unix time) try "date -d @1444816457" if you are using GNU date *** PC: @ 0x7fff94e370ae __pthread_kill *** SIGABRT (@0x7fff94e370ae) received by PID 12800 (TID 0x7fff7a208000) stack trace: *** @ 0x7fff9410d52a _sigtramp @0x1011587f0 _ZZ11synchronizeINSt3__115recursive_mutexEE12SynchronizedIT_EPS3_ENKUlPS1_E_clES6_ @ 0x7fff9c71237b abort @ 0x7fff9c6d99c4 __assert_rtn @ 0x7fff8a27afc8 std::__1::mutex::unlock() @0x1015329a9 _ZZ11synchronizeINSt3__15mutexEE12SynchronizedIT_EPS3_ENKUlPS1_E0_clES6_ @0x101532988 _ZZ11synchronizeINSt3__15mutexEE12SynchronizedIT_EPS3_ENUlPS1_E0_8__invokeES6_ @0x101532a40 Synchronized<>::~Synchronized() @0x1014f90f5 Synchronized<>::~Synchronized() @0x10150d45c Gate::empty() @0x1014f02cc process::ProcessManager::wait() @0x1014f42b6 process::wait() @0x100ed67ee process::wait() @0x1014e9c82 process::ProcessManager::~ProcessManager() @0x1014da875 process::ProcessManager::~ProcessManager() @0x1014da848 process::finalize() @0x100f3f72e main @ 0x7fff9ba305ad start make[5]: *** [check-local] Abort trap: 6 make[4]: *** [check-am] Error 2 make[3]: *** [check-recursive] Error 1 make[2]: *** [check-recursive] Error 1 make[1]: *** [check] Error 2 make: *** [check-recursive] Error 1 {noformat} > Libprocess: Flaky behavior on test suite when finalizing. > - > > Key: MESOS-3728 > URL: https://issues.apache.org/jira/browse/MESOS-3728 > Project: Mesos > Issue Type: Bug > Environment: OS 10.11.1 Beta (15B30a), > Apple LLVM version 7.0.0 (clang-700.0.72) >Reporter: Till Toenshoff > > The issue manifests in the following stacktrace. Triggering the issue is not > too hard on my machine - fails in more than 10% of all attempts. > {noformat} > [--] Global test environment tear-down > [==] 148 tests from 22 test cases ran. (1323 ms total) > [ PASSED ] 148 tests. > YOU HAVE 2 DISABLED TESTS > Assertion failed: (ec == 0), function unlock, file > /BuildRoot/Library/Caches/com.apple.xbs/Sources/libcxx/libcxx-120.1/src/mutex.cpp, > line 45. > *** Aborted at 1444816067 (unix time) try "date -d @1444816067" if you are > using GNU date *** > PC: @ 0x7fff94e370ae __pthread_kill > *** SIGABRT (@0x7fff94e370ae) received by PID 11537 (TID 0x70104000) > stack trace: *** > @ 0x7fff9410d52a _sigtramp > @ 0x701034d8 (unknown) > @ 0x7fff9c71237b abort > @ 0x7fff9c6d99c4 __assert_rtn > @ 0x7fff8a27afc8 std::__1::mutex::unlock() > @0x102b109a9 > _ZZ11synchronizeINSt3__15mutexEE12SynchronizedIT_EPS3_ENKUlPS1_E0_clES6_ > @0x102b10988 > _ZZ11synchronizeINSt3__15mutexEE12SynchronizedIT_EPS3_ENUlPS1_E0_8__invokeES6_ > @0x102b10a40 Synchronized<>::~Synchronized() > @0x102ad70f5 Synchronized<>::~Synchronized() > @0x102aeb45c Gate::empty() > @0x102ace2cc process::ProcessManager::wait() > @0x102ad22b6 process::wait() > @0x1024b47ee process::wait() > @0x1029b56a6 process::http::Connection::Data::~Data() > @0x1029b55d5 process::http::Connection::Data::~Data() > @0x1029a71bc std::__1::__shared_ptr_emplace<>::__on_zero_shared() > @ 0x7fff8a27acb8 std::__1::__shared_weak_count::__release_shared() > @0x1024ba50f std::__1::shared_ptr<>::~shared_ptr() > @0x1024ba4d5 std::__1::shared_ptr<>::~shared_ptr() > @0x1024ba4b5 process::http::Connection::~Connection() > @0x1024a2cc5 process::http::Connection::~Connection() > @0x10297b81d > _ZZZN7process4http8internal7requestERKNS0_7RequestEbENK3$_2clENS0_10ConnectionEENKUlvE_clEv > @0x10297b6fd > _ZN7process20AsyncExecutorProcess7executeIZZNS_4http8internal7requestERKNS2_7RequestEbENK3$_2clENS2_10ConnectionEEUlvE_EE7NothingRKT_PN5boost9enable_ifINSE_7is_voidINSt3__19result_ofIFSB_vEE4typeEEEvE4typeE > @0x10297d8be >
[jira] [Updated] (MESOS-3728) Libprocess: Flaky behavior on test suite when finalizing.
[ https://issues.apache.org/jira/browse/MESOS-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3728: -- Summary: Libprocess: Flaky behavior on test suite when finalizing. (was: Libprocess: Flaky behavior on test suite is finalizing.) > Libprocess: Flaky behavior on test suite when finalizing. > - > > Key: MESOS-3728 > URL: https://issues.apache.org/jira/browse/MESOS-3728 > Project: Mesos > Issue Type: Bug > Environment: OS 10.11.1 Beta (15B30a), > Apple LLVM version 7.0.0 (clang-700.0.72) >Reporter: Till Toenshoff > > The issue manifests in the following stacktrace. Triggering the issue is not > too hard on my machine - fails in more than 10% of all attempts. > {noformat} > [--] Global test environment tear-down > [==] 148 tests from 22 test cases ran. (1323 ms total) > [ PASSED ] 148 tests. > YOU HAVE 2 DISABLED TESTS > Assertion failed: (ec == 0), function unlock, file > /BuildRoot/Library/Caches/com.apple.xbs/Sources/libcxx/libcxx-120.1/src/mutex.cpp, > line 45. > *** Aborted at 1444816067 (unix time) try "date -d @1444816067" if you are > using GNU date *** > PC: @ 0x7fff94e370ae __pthread_kill > *** SIGABRT (@0x7fff94e370ae) received by PID 11537 (TID 0x70104000) > stack trace: *** > @ 0x7fff9410d52a _sigtramp > @ 0x701034d8 (unknown) > @ 0x7fff9c71237b abort > @ 0x7fff9c6d99c4 __assert_rtn > @ 0x7fff8a27afc8 std::__1::mutex::unlock() > @0x102b109a9 > _ZZ11synchronizeINSt3__15mutexEE12SynchronizedIT_EPS3_ENKUlPS1_E0_clES6_ > @0x102b10988 > _ZZ11synchronizeINSt3__15mutexEE12SynchronizedIT_EPS3_ENUlPS1_E0_8__invokeES6_ > @0x102b10a40 Synchronized<>::~Synchronized() > @0x102ad70f5 Synchronized<>::~Synchronized() > @0x102aeb45c Gate::empty() > @0x102ace2cc process::ProcessManager::wait() > @0x102ad22b6 process::wait() > @0x1024b47ee process::wait() > @0x1029b56a6 process::http::Connection::Data::~Data() > @0x1029b55d5 process::http::Connection::Data::~Data() > @0x1029a71bc std::__1::__shared_ptr_emplace<>::__on_zero_shared() > @ 0x7fff8a27acb8 std::__1::__shared_weak_count::__release_shared() > @0x1024ba50f std::__1::shared_ptr<>::~shared_ptr() > @0x1024ba4d5 std::__1::shared_ptr<>::~shared_ptr() > @0x1024ba4b5 process::http::Connection::~Connection() > @0x1024a2cc5 process::http::Connection::~Connection() > @0x10297b81d > _ZZZN7process4http8internal7requestERKNS0_7RequestEbENK3$_2clENS0_10ConnectionEENKUlvE_clEv > @0x10297b6fd > _ZN7process20AsyncExecutorProcess7executeIZZNS_4http8internal7requestERKNS2_7RequestEbENK3$_2clENS2_10ConnectionEEUlvE_EE7NothingRKT_PN5boost9enable_ifINSE_7is_voidINSt3__19result_ofIFSB_vEE4typeEEEvE4typeE > @0x10297d8be > _ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKZZNS_4http8internal7requestERKNS3_7RequestEbENK3$_2clENS3_10ConnectionEEUlvE_PvSA_SD_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSI_FSF_T1_T2_ET3_T4_ENKUlPNS_11ProcessBaseEE_clEST_ > @0x10297d730 > _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN7process8dispatchI7NothingNS3_20AsyncExecutorProcessERKZZNS3_4http8internal7requestERKNS7_7RequestEbENK3$_2clENS7_10ConnectionEEUlvE_PvSE_SH_EENS3_6FutureIT_EERKNS3_3PIDIT0_EEMSM_FSJ_T1_T2_ET3_T4_EUlPNS3_11ProcessBaseEE_SX_EEEvDpOT_ > @0x10297d3fc > _ZNSt3__110__function6__funcIZN7process8dispatchI7NothingNS2_20AsyncExecutorProcessERKZZNS2_4http8internal7requestERKNS6_7RequestEbENK3$_2clENS6_10ConnectionEEUlvE_PvSD_SG_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSL_FSI_T1_T2_ET3_T4_EUlPNS2_11ProcessBaseEE_NS_9allocatorISX_EEFvSW_EEclEOSW_ > @0x102aec69f std::__1::function<>::operator()() > @0x102acef4f process::ProcessBase::visit() > @0x102b0f7de process::DispatchEvent::visit() > @0x1023417d1 process::ProcessBase::serve() > @0x102acbce1 process::ProcessManager::resume() > @0x102ad6a4c > process::ProcessManager::init_threads()::$_1::operator()() > make[5]: *** [check-local] Abort trap: 6 > make[4]: *** [check-am] Error 2 > make[3]: *** [check-recursive] Error 1 > make[2]: *** [check-recursive] Error 1 > make[1]: *** [check] Error 2 > make: *** [check-recursive] Error 1 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3728) Libprocess: Flaky behavior on test suite is finalizing.
Till Toenshoff created MESOS-3728: - Summary: Libprocess: Flaky behavior on test suite is finalizing. Key: MESOS-3728 URL: https://issues.apache.org/jira/browse/MESOS-3728 Project: Mesos Issue Type: Bug Environment: OS 10.11.1 Beta (15B30a), Apple LLVM version 7.0.0 (clang-700.0.72) Reporter: Till Toenshoff The issue manifests in the following stacktrace. Triggering the issue is not too hard on my machine - fails in more than 10% of all attempts. {noformat} [--] Global test environment tear-down [==] 148 tests from 22 test cases ran. (1323 ms total) [ PASSED ] 148 tests. YOU HAVE 2 DISABLED TESTS Assertion failed: (ec == 0), function unlock, file /BuildRoot/Library/Caches/com.apple.xbs/Sources/libcxx/libcxx-120.1/src/mutex.cpp, line 45. *** Aborted at 1444816067 (unix time) try "date -d @1444816067" if you are using GNU date *** PC: @ 0x7fff94e370ae __pthread_kill *** SIGABRT (@0x7fff94e370ae) received by PID 11537 (TID 0x70104000) stack trace: *** @ 0x7fff9410d52a _sigtramp @ 0x701034d8 (unknown) @ 0x7fff9c71237b abort @ 0x7fff9c6d99c4 __assert_rtn @ 0x7fff8a27afc8 std::__1::mutex::unlock() @0x102b109a9 _ZZ11synchronizeINSt3__15mutexEE12SynchronizedIT_EPS3_ENKUlPS1_E0_clES6_ @0x102b10988 _ZZ11synchronizeINSt3__15mutexEE12SynchronizedIT_EPS3_ENUlPS1_E0_8__invokeES6_ @0x102b10a40 Synchronized<>::~Synchronized() @0x102ad70f5 Synchronized<>::~Synchronized() @0x102aeb45c Gate::empty() @0x102ace2cc process::ProcessManager::wait() @0x102ad22b6 process::wait() @0x1024b47ee process::wait() @0x1029b56a6 process::http::Connection::Data::~Data() @0x1029b55d5 process::http::Connection::Data::~Data() @0x1029a71bc std::__1::__shared_ptr_emplace<>::__on_zero_shared() @ 0x7fff8a27acb8 std::__1::__shared_weak_count::__release_shared() @0x1024ba50f std::__1::shared_ptr<>::~shared_ptr() @0x1024ba4d5 std::__1::shared_ptr<>::~shared_ptr() @0x1024ba4b5 process::http::Connection::~Connection() @0x1024a2cc5 process::http::Connection::~Connection() @0x10297b81d _ZZZN7process4http8internal7requestERKNS0_7RequestEbENK3$_2clENS0_10ConnectionEENKUlvE_clEv @0x10297b6fd _ZN7process20AsyncExecutorProcess7executeIZZNS_4http8internal7requestERKNS2_7RequestEbENK3$_2clENS2_10ConnectionEEUlvE_EE7NothingRKT_PN5boost9enable_ifINSE_7is_voidINSt3__19result_ofIFSB_vEE4typeEEEvE4typeE @0x10297d8be _ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKZZNS_4http8internal7requestERKNS3_7RequestEbENK3$_2clENS3_10ConnectionEEUlvE_PvSA_SD_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSI_FSF_T1_T2_ET3_T4_ENKUlPNS_11ProcessBaseEE_clEST_ @0x10297d730 _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN7process8dispatchI7NothingNS3_20AsyncExecutorProcessERKZZNS3_4http8internal7requestERKNS7_7RequestEbENK3$_2clENS7_10ConnectionEEUlvE_PvSE_SH_EENS3_6FutureIT_EERKNS3_3PIDIT0_EEMSM_FSJ_T1_T2_ET3_T4_EUlPNS3_11ProcessBaseEE_SX_EEEvDpOT_ @0x10297d3fc _ZNSt3__110__function6__funcIZN7process8dispatchI7NothingNS2_20AsyncExecutorProcessERKZZNS2_4http8internal7requestERKNS6_7RequestEbENK3$_2clENS6_10ConnectionEEUlvE_PvSD_SG_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSL_FSI_T1_T2_ET3_T4_EUlPNS2_11ProcessBaseEE_NS_9allocatorISX_EEFvSW_EEclEOSW_ @0x102aec69f std::__1::function<>::operator()() @0x102acef4f process::ProcessBase::visit() @0x102b0f7de process::DispatchEvent::visit() @0x1023417d1 process::ProcessBase::serve() @0x102acbce1 process::ProcessManager::resume() @0x102ad6a4c process::ProcessManager::init_threads()::$_1::operator()() make[5]: *** [check-local] Abort trap: 6 make[4]: *** [check-am] Error 2 make[3]: *** [check-recursive] Error 1 make[2]: *** [check-recursive] Error 1 make[1]: *** [check] Error 2 make: *** [check-recursive] Error 1 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3769) Agent logs are misleading during agent shutdown
[ https://issues.apache.org/jira/browse/MESOS-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3769: -- Shepherd: Till Toenshoff > Agent logs are misleading during agent shutdown > --- > > Key: MESOS-3769 > URL: https://issues.apache.org/jira/browse/MESOS-3769 > Project: Mesos > Issue Type: Bug >Reporter: Alexander Rukletsov >Priority: Minor > > When analyzing output of the {{MasterAllocatorTest.SlaveLost}} test I spotted > following logs: > {noformat} > I1020 18:18:09.026553 237658112 status_update_manager.cpp:322] Received > status update TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for > task 0 of framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- > I1020 18:18:09.026845 234438656 slave.cpp:3090] Forwarding the update > TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for task 0 of > framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- to > master@172.18.6.110:62507 > I1020 18:18:09.026973 234438656 slave.cpp:651] ; unregistering and shutting > down > I1020 18:18:09.027007 234438656 slave.cpp:2016] Asked to shut down framework > 7aff439d-307c-486b-9c0d-c2a47ddbda5b- by @0.0.0.0:0 > I1020 18:18:09.027019 234438656 slave.cpp:2041] Shutting down framework > 7aff439d-307c-486b-9c0d-c2a47ddbda5b- > {noformat} > It looks like {{Slave::shutdown()}} uses wrong assumptions about possible > execution paths. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3769) Agent logs are misleading during agent shutdown
[ https://issues.apache.org/jira/browse/MESOS-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3769: -- Labels: newbie (was: ) Looks like all code-paths in {{void Slave::shutdown(const UPID& from, const string& message)}} should should check for {{message.empty()}} before trying to log it. > Agent logs are misleading during agent shutdown > --- > > Key: MESOS-3769 > URL: https://issues.apache.org/jira/browse/MESOS-3769 > Project: Mesos > Issue Type: Bug >Reporter: Alexander Rukletsov >Priority: Minor > Labels: newbie > > When analyzing output of the {{MasterAllocatorTest.SlaveLost}} test I spotted > following logs: > {noformat} > I1020 18:18:09.026553 237658112 status_update_manager.cpp:322] Received > status update TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for > task 0 of framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- > I1020 18:18:09.026845 234438656 slave.cpp:3090] Forwarding the update > TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for task 0 of > framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- to > master@172.18.6.110:62507 > I1020 18:18:09.026973 234438656 slave.cpp:651] ; unregistering and shutting > down > I1020 18:18:09.027007 234438656 slave.cpp:2016] Asked to shut down framework > 7aff439d-307c-486b-9c0d-c2a47ddbda5b- by @0.0.0.0:0 > I1020 18:18:09.027019 234438656 slave.cpp:2041] Shutting down framework > 7aff439d-307c-486b-9c0d-c2a47ddbda5b- > {noformat} > It looks like {{Slave::shutdown()}} uses wrong assumptions about possible > execution paths. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3113) Add resource usage section to containerizer documentation
[ https://issues.apache.org/jira/browse/MESOS-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3113: -- Shepherd: Till Toenshoff > Add resource usage section to containerizer documentation > - > > Key: MESOS-3113 > URL: https://issues.apache.org/jira/browse/MESOS-3113 > Project: Mesos > Issue Type: Documentation > Components: documentation >Reporter: Niklas Quarfot Nielsen >Assignee: Gilbert Song > Labels: docathon, documentaion, mesosphere > > Currently, the containerizer documentation doesn't touch upon the usage() API > and how to interpret the collected statistics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3030) Build failure on OS 10.11 using Xcode 7.
[ https://issues.apache.org/jira/browse/MESOS-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950168#comment-14950168 ] Till Toenshoff commented on MESOS-3030: --- The issues persists all the way to OS X 10.11 release, will propose a fix. > Build failure on OS 10.11 using Xcode 7. > > > Key: MESOS-3030 > URL: https://issues.apache.org/jira/browse/MESOS-3030 > Project: Mesos > Issue Type: Bug > Environment: OS 10.11 Beta (15A215h), Apple LLVM version 7.0.0 > (clang-700.0.57.2) >Reporter: Till Toenshoff > > When trying to build Mesos (recent master) on OS X El Capitan (public beta 1) > with apple's clang distribution via Xcode 7 (beta 3) the following warnings > trigger build failures; > h6.Boost: unused-local-typedef > {noformat} > ../3rdparty/libprocess/3rdparty/boost-1.53.0/boost/tuple/detail/tuple_basic.hpp:228:31: > error: unused typedef 'cons_element' [-Werror,-Wunused-local-typedef] > typedef typename impl::type cons_element; > {noformat} > h6.CyrusSASL2: deprecated-declarations > {noformat} > distcc[57619] ERROR: compile > /Users/till/.ccache/tmp/authentica.stdout.lobomacpro2.fritz.box.48363.0QJikQ.ii > on localhost failed > ../../src/authentication/cram_md5/authenticatee.cpp:75:7: error: > 'sasl_dispose' is deprecated: first deprecated in OS X 10.11 > [-Werror,-Wdeprecated-declarations] > sasl_dispose(); > ^ > /usr/include/sasl/sasl.h:746:13: note: 'sasl_dispose' has been explicitly > marked deprecated here > extern void sasl_dispose(sasl_conn_t **pconn) > __attribute__((availability(macosx,introduced=10.0,deprecated=10.11))); > ^ > {noformat} > > A simple workaround is disabling those warnings for now; > {noformat} > export CXXFLAGS="-Wno-unused-local-typedef -Wno-deprecated-declarations" > export CCFLAGS="-Wno-unused-local-typedef -Wno-deprecated-declarations" > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-3030) Build failure on OS 10.11 using Xcode 7.
[ https://issues.apache.org/jira/browse/MESOS-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff reassigned MESOS-3030: - Assignee: Till Toenshoff > Build failure on OS 10.11 using Xcode 7. > > > Key: MESOS-3030 > URL: https://issues.apache.org/jira/browse/MESOS-3030 > Project: Mesos > Issue Type: Bug > Environment: OS 10.11 Beta (15A215h), Apple LLVM version 7.0.0 > (clang-700.0.57.2) >Reporter: Till Toenshoff >Assignee: Till Toenshoff > > When trying to build Mesos (recent master) on OS X El Capitan (public beta 1) > with apple's clang distribution via Xcode 7 (beta 3) the following warnings > trigger build failures; > h6.Boost: unused-local-typedef > {noformat} > ../3rdparty/libprocess/3rdparty/boost-1.53.0/boost/tuple/detail/tuple_basic.hpp:228:31: > error: unused typedef 'cons_element' [-Werror,-Wunused-local-typedef] > typedef typename impl::type cons_element; > {noformat} > h6.CyrusSASL2: deprecated-declarations > {noformat} > distcc[57619] ERROR: compile > /Users/till/.ccache/tmp/authentica.stdout.lobomacpro2.fritz.box.48363.0QJikQ.ii > on localhost failed > ../../src/authentication/cram_md5/authenticatee.cpp:75:7: error: > 'sasl_dispose' is deprecated: first deprecated in OS X 10.11 > [-Werror,-Wdeprecated-declarations] > sasl_dispose(); > ^ > /usr/include/sasl/sasl.h:746:13: note: 'sasl_dispose' has been explicitly > marked deprecated here > extern void sasl_dispose(sasl_conn_t **pconn) > __attribute__((availability(macosx,introduced=10.0,deprecated=10.11))); > ^ > {noformat} > > A simple workaround is disabling those warnings for now; > {noformat} > export CXXFLAGS="-Wno-unused-local-typedef -Wno-deprecated-declarations" > export CCFLAGS="-Wno-unused-local-typedef -Wno-deprecated-declarations" > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3600) unable to build with non-default protobuf
[ https://issues.apache.org/jira/browse/MESOS-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3600: -- Shepherd: Till Toenshoff > unable to build with non-default protobuf > - > > Key: MESOS-3600 > URL: https://issues.apache.org/jira/browse/MESOS-3600 > Project: Mesos > Issue Type: Bug > Components: build >Reporter: James Peach > > If I install a custom protobuf into {{/opt/protobuf}}, I should be able to > pass {{--with-protobuf=/opt/protobuf}} to configure the build to use it. > On OS X, this fails: > {code} > ... > checking google/protobuf/message.h usability... yes > checking google/protobuf/message.h presence... yes > checking for google/protobuf/message.h... yes > checking for _init in -lprotobuf... no > configure: error: cannot find protobuf > --- > You have requested the use of a non-bundled protobuf but no suitable > protobuf could be found. > You may want specify the location of protobuf by providing a prefix > path via --with-protobuf=DIR, or check that the path you provided is > correct if youre already doing this. > --- > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3608) optionally install test binaries
[ https://issues.apache.org/jira/browse/MESOS-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3608: -- Shepherd: Till Toenshoff > optionally install test binaries > > > Key: MESOS-3608 > URL: https://issues.apache.org/jira/browse/MESOS-3608 > Project: Mesos > Issue Type: Improvement > Components: build, test >Reporter: James Peach >Priority: Minor > > Many of the tests in Mesos could be described as integration tests, since > they have external dependencies on kernel features, installed tools, > permissions, etc. I'd like to be able to generate a {{mesos-tests}} RPM along > with my {{mesos}} RPM so that I can run the same tests in different > deployment environments. > I propose a new configuration option named {{--enable-test-tools}} that will > install the tests into {{libexec/mesos/tests}}. I'll also need to make some > minor changes to tests so that helper tools can be found in this location as > well as in the build directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2947) Authorizer Module: Implementation, Integration Tests
[ https://issues.apache.org/jira/browse/MESOS-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-2947: -- Sprint: Mesosphere Sprint 14 Authorizer Module: Implementation, Integration Tests -- Key: MESOS-2947 URL: https://issues.apache.org/jira/browse/MESOS-2947 Project: Mesos Issue Type: Improvement Reporter: Till Toenshoff Assignee: Alexander Rojas Labels: mesosphere, module, security h4.Motivation Provide an example authorizer module based on the {{LocalAuthorizer}} implementation. Make sure that such authorizer module can be fully unit- and integration- tested within the mesos test suite. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-2946) Authorizer Module: Interface design
[ https://issues.apache.org/jira/browse/MESOS-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613069#comment-14613069 ] Till Toenshoff edited comment on MESOS-2946 at 7/7/15 8:27 PM: --- h4.Status Quo As the current design stands, {{Authorizer}} is indeed an interface, but its default implementation is declared in the same header. Moreover, if one decides to create an alternative implementation for authorization, Mesos needs to be recompiled and all the places where the authorizer gets instantiated need to be updated. h4.Design Under the modularize version, the MVP for the {{Authorizer}} interface will look like: {code} class Authorizer { public: static TryAuthorizer* create(const std::string name); virtual ~Authorizer() {} virtual TryNothing initialize(const OptionACLs acls) = 0; virtual process::Futurebool authorize( const ACL::RegisterFramework request) = 0; virtual process::Futurebool authorize( const ACL::RunTask request) = 0; virtual process::Futurebool authorize( const ACL::ShutdownFramework request) = 0; protected: Authorizer() {} }; {code} Where {{Authorizer::create(const std::string)}} is the factory function which will construct the default {{LocalAuthorizer}} if local is selected and will use the existing facilities within {{ModuleManager}} to load the appropriate module in any other case. In order to allow the {{LocalAuthorizer}} to play nicely with the general modules design, it needs a default constructor. This constraint leads to the existence of {{Authorizer::initialize(const OptionACLs)}} which is needed to pass initialization parameters to the {{LocalAuthorizer}}. Note that all other authorizers will use the {{ModuleManager}} mechanisms to pass initialization parameters. This follows the pattern used in the {{Authenticator}} module. The method {{Authorizer::initialize(const OptionACLs)}} can be removed when we go to a modules only implementation. All other methods remain unchanged from the original {{Authorizer}} interface. was (Author: arojas): h4.Status Quo As the current design stands, {{Authorizer}} is indeed an interface, but its default implementation is declared in the same header. Moreover, if one decides to create an alternative implementation for authorization, Mesos needs to be recompiled and all the places where the authorizer gets instantiated need to be updated. h4.Design Under the modularize version, the MVP for the {{Authorizer}} interface will look like: {code} class Authorizer { public: static TryAuthorizer* create(const std::string name); virtual ~Authorizer() {} virtual TryNothing initialize(const OptionACLs acls) = 0; virtual process::Futurebool authorize( const ACL::RegisterFramework request) = 0; virtual process::Futurebool authorize( const ACL::RunTask request) = 0; virtual process::Futurebool authorize( const ACL::ShutdownFramework request) = 0; protected: Authorizer() {} }; {code} Where {{Authorizer::create(const std::string)}} is the factory function which will construct the default {{LocalAuthorizer}} if local is selected and will use the existing facilities within {{ModuleManager}} to load the appropriate module in any other case. In order to allow the {{LocalAuthorizer}} to play nicely with the general modules design, it needs a default constructor. This constraint leads to the existence of {{Authorizer::initialize(const OptionACLs)}} which is needed to pass initialization parameters to the {{LocalAuthorizer}}. Note that all other authorizers will use the {{ModuleManager}} mechanisms to pass initialization parameters. This follows the pattern used in the {{Authorizator}} module. The method {{Authorizer::initialize(const OptionACLs)}} can be removed when we go to a modules only implementation. All other methods remain unchanged from the original {{Authorizer}} interface. Authorizer Module: Interface design --- Key: MESOS-2946 URL: https://issues.apache.org/jira/browse/MESOS-2946 Project: Mesos Issue Type: Improvement Reporter: Till Toenshoff Assignee: Till Toenshoff Labels: mesosphere, module, security h4.Motivation Design an interface covering authorizer modules while staying minimally invasive in regards to changes to the existing {{LocalAuthorizer}} implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-708) Static files missing Last-Modified HTTP headers
[ https://issues.apache.org/jira/browse/MESOS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621394#comment-14621394 ] Till Toenshoff commented on MESOS-708: -- See MESOS-3026 for some serious problems of these patches. Static files missing Last-Modified HTTP headers - Key: MESOS-708 URL: https://issues.apache.org/jira/browse/MESOS-708 Project: Mesos Issue Type: Improvement Components: libprocess, webui Affects Versions: 0.13.0 Reporter: Ross Allen Assignee: Alexander Rojas Labels: mesosphere Static assets served by the Mesos master don't return Last-Modified HTTP headers. That means clients receive a 200 status code and re-download assets on every page request even if the assets haven't changed. Because Angular JS does most of the work, the downloading happens only when you navigate to Mesos master in your browser or use the browser's refresh. Example header for mesos.css: HTTP/1.1 200 OK Date: Thu, 26 Sep 2013 17:18:52 GMT Content-Length: 1670 Content-Type: text/css Clients sometimes use the Date header for the same effect as Last-Modified, but the date is always the time of the response from the server, i.e. it changes on every request and makes the assets look new every time. The Last-Modified header should be added and should be the last modified time of the file. On subsequent requests for the same files, the master should return 304 responses with no content rather than 200 with the full files. It could save clients a lot of download time since Mesos assets are rather heavyweight. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3026) ProcessTest.Cache fails and hangs
[ https://issues.apache.org/jira/browse/MESOS-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621389#comment-14621389 ] Till Toenshoff commented on MESOS-3026: --- commit dab0977d2c9649fd9a7235c82cfa5d944ca32214 Author: Till Toenshoff toensh...@me.com Date: Fri Jul 10 00:13:06 2015 +0200 Reverted commit for HTTP caching of static assets. This reverts commit d0300e1a47d1ba5d6714957fc258ab125fd53ed1. We identified several issues in this implementation and the most important one is described by MESOS-3026. ProcessTest.Cache fails and hangs - Key: MESOS-3026 URL: https://issues.apache.org/jira/browse/MESOS-3026 Project: Mesos Issue Type: Bug Components: libprocess Environment: ubuntu 15.04/ ubuntu 14.04.2 clang-3.6 / gcc 4.8.2 Reporter: Joris Van Remoortere Assignee: Alexander Rojas Priority: Blocker Labels: libprocess, tests {code} [ RUN ] ProcessTest.Cache ../../../3rdparty/libprocess/src/tests/process_tests.cpp:1726: Failure Value of: response.get().status Actual: 200 OK Expected: 304 Not Modified [ FAILED ] ProcessTest.Cache (1 ms) {code} The tests then finish running, but the gtest framework fails to terminate and uses 100% CPU. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3024) HTTP endpoint authN is enabled merely by specifying --credentials
[ https://issues.apache.org/jira/browse/MESOS-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621487#comment-14621487 ] Till Toenshoff commented on MESOS-3024: --- +1 for undesired coupling of master provided credentials with authentication activation. Let's keep in mind that e.g. ticket based authentication mechanisms do not require credentials. +1 for getting rid of the authentication requirement activation via the authenticate and authenticate_slaves flags and instead using the ACLs. HTTP endpoint authN is enabled merely by specifying --credentials - Key: MESOS-3024 URL: https://issues.apache.org/jira/browse/MESOS-3024 Project: Mesos Issue Type: Bug Components: master, security Reporter: Adam B Labels: authentication, http, mesosphere If I set `--credentials` on the master, framework and slave authentication are allowed, but not required. On the other hand, http authentication is now required for authenticated endpoints (currently only `/shutdown`). That means that I cannot enable framework or slave authentication without also enabling http endpoint authentication. This is undesirable. Framework and slave authentication have separate flags (`--authenticate` and `--authenticate_slaves`) to require authentication for each. It would be great if there was also such a flag for framework authentication. Or maybe we get rid of these flags altogether and rely on ACLs to determine which unauthenticated principals are even allowed to authenticate for each endpoint/action. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
[ https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009819#comment-15009819 ] Till Toenshoff commented on MESOS-3937: --- I have that test also failing (100%) on a vmware fusion box; exact same OS, compiler and docker configuration as tested by Bernd -- but not the same image / machine. > Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails. > --- > > Key: MESOS-3937 > URL: https://issues.apache.org/jira/browse/MESOS-3937 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 0.26.0 > Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2 >Reporter: Bernd Mathiske > > {noformat} > ../configure > make check > sudo ./bin/mesos-tests.sh > --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose > {noformat} > {noformat} > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from DockerContainerizerTest > I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms > I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms > I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns > I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in > 4927ns > I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the > db in 1605ns > I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery > I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status > I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received > a broadcasted recover request from (4)@10.0.2.15:50088 > I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to > STARTING > I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 1.016098ms > I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to > STARTING > I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status > I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status > received a broadcasted recover request from (5)@10.0.2.15:50088 > I1117 15:08:09.282552 26400 master.cpp:367] Master > 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on > 10.0.2.15:50088 > I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/40AlT8/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" > --zk_session_timeout="10secs" > I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing > authenticated frameworks to register > I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing > authenticated slaves to register > I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for > authentication from '/tmp/40AlT8/credentials' > I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from > a replica in STARTING status > I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING > I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' > authenticator > I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 1.075466ms > I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to > VOTING > I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the Paxos > group > I1117 15:08:09.286471 26401 recover.cpp:464] Recover process terminated > I1117 15:08:09.287303 26400 authenticator.cpp:520] Initializing server SASL > I1117 15:08:09.289371 26400 master.cpp:495] Authorization enabled > I1117 15:08:09.296018 26399 master.cpp:1606] The newly elected leader is > master@10.0.2.15:50088 with id 59c600f1-92ff-4926-9c84-073d9b81f68a > I1117 15:08:09.296115 26399 master.cpp:1619] Elected
[jira] [Commented] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
[ https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011684#comment-15011684 ] Till Toenshoff commented on MESOS-3937: --- Yes, that one. Sorry for the confusion. > Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails. > --- > > Key: MESOS-3937 > URL: https://issues.apache.org/jira/browse/MESOS-3937 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 0.26.0 > Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2 > 8 CPUs, 16 GB memory > Vagrant, libvirt/Virtual Box or VMware >Reporter: Bernd Mathiske > > {noformat} > ../configure > make check > sudo ./bin/mesos-tests.sh > --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose > {noformat} > {noformat} > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from DockerContainerizerTest > I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms > I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms > I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns > I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in > 4927ns > I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the > db in 1605ns > I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery > I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status > I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received > a broadcasted recover request from (4)@10.0.2.15:50088 > I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to > STARTING > I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 1.016098ms > I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to > STARTING > I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status > I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status > received a broadcasted recover request from (5)@10.0.2.15:50088 > I1117 15:08:09.282552 26400 master.cpp:367] Master > 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on > 10.0.2.15:50088 > I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/40AlT8/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" > --zk_session_timeout="10secs" > I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing > authenticated frameworks to register > I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing > authenticated slaves to register > I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for > authentication from '/tmp/40AlT8/credentials' > I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from > a replica in STARTING status > I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING > I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' > authenticator > I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 1.075466ms > I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to > VOTING > I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the Paxos > group > I1117 15:08:09.286471 26401 recover.cpp:464] Recover process terminated > I1117 15:08:09.287303 26400 authenticator.cpp:520] Initializing server SASL > I1117 15:08:09.289371 26400 master.cpp:495] Authorization enabled > I1117 15:08:09.296018 26399 master.cpp:1606] The newly elected leader is > master@10.0.2.15:50088 with id 59c600f1-92ff-4926-9c84-073d9b81f68a > I1117 15:08:09.296115 26399 master.cpp:1619] Elected as the leading master! > I1117 15:08:09.296187 26399
[jira] [Comment Edited] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
[ https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011644#comment-15011644 ] Till Toenshoff edited comment on MESOS-3937 at 11/18/15 6:58 PM: - Tim had a great hint and that fixed the problem for me; the docker executor image was outdated. A manual: {noformat}docker pull tnachen/test-executor{noformat} fixed the issue for me. Seems the reason for my problems was outdated proto code within the image. was (Author: tillt): Tim had a great hint and that fixed the problem for me; the docker executor image was outdated. A manual: {noformat}docker pull tnachen/docker-executor{noformat} fixed the issue for me. Seems the reason for my problems was outdated proto code within the image. > Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails. > --- > > Key: MESOS-3937 > URL: https://issues.apache.org/jira/browse/MESOS-3937 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 0.26.0 > Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2 > 8 CPUs, 16 GB memory > Vagrant, libvirt/Virtual Box or VMware >Reporter: Bernd Mathiske > > {noformat} > ../configure > make check > sudo ./bin/mesos-tests.sh > --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose > {noformat} > {noformat} > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from DockerContainerizerTest > I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms > I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms > I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns > I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in > 4927ns > I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the > db in 1605ns > I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery > I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status > I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received > a broadcasted recover request from (4)@10.0.2.15:50088 > I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to > STARTING > I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 1.016098ms > I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to > STARTING > I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status > I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status > received a broadcasted recover request from (5)@10.0.2.15:50088 > I1117 15:08:09.282552 26400 master.cpp:367] Master > 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on > 10.0.2.15:50088 > I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/40AlT8/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" > --zk_session_timeout="10secs" > I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing > authenticated frameworks to register > I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing > authenticated slaves to register > I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for > authentication from '/tmp/40AlT8/credentials' > I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from > a replica in STARTING status > I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING > I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' > authenticator > I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 1.075466ms > I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to > VOTING > I1117
[jira] [Commented] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
[ https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011644#comment-15011644 ] Till Toenshoff commented on MESOS-3937: --- Tim had a great hint and that fixed the problem for me; the docker executor image was outdated. A manual: {noformat}docker pull tnachen/docker-executor{noformat} fixed the issue for me. Seems the reason for my problems was outdated proto code within the image. > Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails. > --- > > Key: MESOS-3937 > URL: https://issues.apache.org/jira/browse/MESOS-3937 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 0.26.0 > Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2 > 8 CPUs, 16 GB memory > Vagrant, libvirt/Virtual Box or VMware >Reporter: Bernd Mathiske > > {noformat} > ../configure > make check > sudo ./bin/mesos-tests.sh > --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose > {noformat} > {noformat} > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from DockerContainerizerTest > I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms > I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms > I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns > I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in > 4927ns > I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the > db in 1605ns > I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery > I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status > I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received > a broadcasted recover request from (4)@10.0.2.15:50088 > I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to > STARTING > I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 1.016098ms > I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to > STARTING > I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status > I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status > received a broadcasted recover request from (5)@10.0.2.15:50088 > I1117 15:08:09.282552 26400 master.cpp:367] Master > 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on > 10.0.2.15:50088 > I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/40AlT8/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" > --zk_session_timeout="10secs" > I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing > authenticated frameworks to register > I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing > authenticated slaves to register > I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for > authentication from '/tmp/40AlT8/credentials' > I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from > a replica in STARTING status > I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING > I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' > authenticator > I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 1.075466ms > I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to > VOTING > I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the Paxos > group > I1117 15:08:09.286471 26401 recover.cpp:464] Recover process terminated > I1117 15:08:09.287303 26400 authenticator.cpp:520] Initializing server SASL > I1117 15:08:09.289371 26400 master.cpp:495] Authorization enabled > I1117 15:08:09.296018 26399
[jira] [Commented] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
[ https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009945#comment-15009945 ] Till Toenshoff commented on MESOS-3937: --- Are you saying that the lack of memory limitation on the container side would fail this test? > Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails. > --- > > Key: MESOS-3937 > URL: https://issues.apache.org/jira/browse/MESOS-3937 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 0.26.0 > Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2 >Reporter: Bernd Mathiske > > {noformat} > ../configure > make check > sudo ./bin/mesos-tests.sh > --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose > {noformat} > {noformat} > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from DockerContainerizerTest > I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms > I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms > I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns > I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in > 4927ns > I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the > db in 1605ns > I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery > I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status > I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received > a broadcasted recover request from (4)@10.0.2.15:50088 > I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to > STARTING > I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 1.016098ms > I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to > STARTING > I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status > I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status > received a broadcasted recover request from (5)@10.0.2.15:50088 > I1117 15:08:09.282552 26400 master.cpp:367] Master > 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on > 10.0.2.15:50088 > I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/40AlT8/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" > --zk_session_timeout="10secs" > I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing > authenticated frameworks to register > I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing > authenticated slaves to register > I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for > authentication from '/tmp/40AlT8/credentials' > I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from > a replica in STARTING status > I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING > I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' > authenticator > I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 1.075466ms > I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to > VOTING > I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the Paxos > group > I1117 15:08:09.286471 26401 recover.cpp:464] Recover process terminated > I1117 15:08:09.287303 26400 authenticator.cpp:520] Initializing server SASL > I1117 15:08:09.289371 26400 master.cpp:495] Authorization enabled > I1117 15:08:09.296018 26399 master.cpp:1606] The newly elected leader is > master@10.0.2.15:50088 with id 59c600f1-92ff-4926-9c84-073d9b81f68a > I1117 15:08:09.296115 26399 master.cpp:1619] Elected as the leading master! > I1117 15:08:09.296187 26399 master.cpp:1379]
[jira] [Updated] (MESOS-3316) provisioner_backend_tests.cpp breaks the build on OSX
[ https://issues.apache.org/jira/browse/MESOS-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3316: -- Assignee: Yan Xu provisioner_backend_tests.cpp breaks the build on OSX - Key: MESOS-3316 URL: https://issues.apache.org/jira/browse/MESOS-3316 Project: Mesos Issue Type: Bug Reporter: Alexander Rojas Assignee: Yan Xu Priority: Blocker Labels: build-failure The test file makes an include of {{linux/fs.hpp}} which in turn includes {{mntent.h}} which is only available in linux. Building in OSX leads to: {noformat} g++ -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\ -DPACKAGE_VERSION=\0.25.0\ -DPACKAGE_STRING=\mesos\ 0.25.0\ -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\ -DVERSION=\0.25.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -I. -I../../src -Wall -Werror -DLIBDIR=\/usr/local/lib\ -DPKGLIBEXECDIR=\/usr/local/libexec/mesos\ -DPKGDATADIR=\/usr/local/share/mesos\ -I../../include -I../../3rdparty/libprocess/include -I../../3rdparty/libprocess/3rdparty/stout/include -I../include -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 -I../3rdparty/libprocess/3rdparty/picojson-4f93734 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include -I../3rdparty/zookeeper-3.4.5/src/c/generated -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src -DSOURCE_DIR=\/Users/alexander/Documents/workspace/pmesos/build/..\ -DBUILD_DIR=\/Users/alexander/Documents/workspace/pmesos/build\ -I../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include -I../3rdparty/libprocess/3rdparty/gmock-1.7.0/include -I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include -I/usr/local/opt/subversion/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0 -D_THREAD_SAFE -pthread -g -O0 -std=c++11 -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT tests/containerizer/mesos_tests-provisioner_backend_tests.o -MD -MP -MF tests/containerizer/.deps/mesos_tests-provisioner_backend_tests.Tpo -c -o tests/containerizer/mesos_tests-provisioner_backend_tests.o `test -f 'tests/containerizer/provisioner_backend_tests.cpp' || echo '../../src/'`tests/containerizer/provisioner_backend_tests.cpp make[3]: Nothing to be done for `../../src/tests/balloon_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/event_call_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/java_exception_test.sh'. make[3]: Nothing to be done for `../../src/tests/java_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/java_log_test.sh'. make[3]: Nothing to be done for `../../src/tests/no_executor_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/persistent_volume_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/python_framework_test.sh'. make[3]: Nothing to be done for `../../src/tests/test_framework_test.sh'. In file included from ../../src/tests/containerizer/provisioner_backend_tests.cpp:28: ../../src/linux/fs.hpp:23:10: fatal error: 'mntent.h' file not found #include mntent.h ^ 1 error generated. make[3]: *** [tests/containerizer/mesos_tests-provisioner_backend_tests.o] Error 1 make[2]: *** [check-am] Error 2 make[1]: *** [check] Error 2 make: *** [check-recursive] Error 1 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1194) protobuf-JSON rendering doesnt validate
[ https://issues.apache.org/jira/browse/MESOS-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734530#comment-14734530 ] Till Toenshoff commented on MESOS-1194: --- [~Akanksha08] ping :) > protobuf-JSON rendering doesnt validate > --- > > Key: MESOS-1194 > URL: https://issues.apache.org/jira/browse/MESOS-1194 > Project: Mesos > Issue Type: Bug > Components: stout >Affects Versions: 0.19.0 >Reporter: Till Toenshoff >Assignee: Akanksha Agrawal >Priority: Minor > Labels: json, newbie, protobuf, stout > > When using JSON::Protobuf(Message&), the supplied protobuf is not checked for > being properly initialized, hence e.g. required fields could be missing. > It would be desirable to have a feedback mechanism in place for this > constructor - maybe this would do: > {noformat} > if (!message.IsInitialized()) { > std::cerr << "Protobuf not initialized: " << > message.InitializationErrorString() << std::endl; > abort(); > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3370) Deprecate the external containerizer
[ https://issues.apache.org/jira/browse/MESOS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740945#comment-14740945 ] Till Toenshoff commented on MESOS-3370: --- Yes, a vote would be the right approach here, but I am also +1 (with a small tear in my eyes ;) ). > Deprecate the external containerizer > > > Key: MESOS-3370 > URL: https://issues.apache.org/jira/browse/MESOS-3370 > Project: Mesos > Issue Type: Task >Reporter: Niklas Quarfot Nielsen > > To our knowledge, no one is using the external containerizer and we could > clean up code paths in the slave and containerizer interface (the dual > launch() signatures) > In a deprecation cycle, we can move this code into a module (dependent on > containerizer modules landing) and from there, move it into it's own repo -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4065) slave FD for ZK tcp connection leaked to executor process
[ https://issues.apache.org/jira/browse/MESOS-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046570#comment-15046570 ] Till Toenshoff commented on MESOS-4065: --- >From your results, this conclusion seems sensible to me. We should actually add a bug report in the zookeeper JIRA so it can be properly handled upstream (https://issues.apache.org/jira/browse/ZOOKEEPER/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel). Could you please take care of that [~jdef]? > slave FD for ZK tcp connection leaked to executor process > - > > Key: MESOS-4065 > URL: https://issues.apache.org/jira/browse/MESOS-4065 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.24.1, 0.25.0 >Reporter: James DeFelice > Labels: mesosphere, security > > {code} > core@ip-10-0-0-45 ~ $ ps auxwww|grep -e etcd > root 1432 99.3 0.0 202420 12928 ?Rsl 21:32 13:51 > ./etcd-mesos-executor -log_dir=./ > root 1450 0.4 0.1 38332 28752 ?Sl 21:32 0:03 ./etcd > --data-dir=etcd_data --name=etcd-1449178273 > --listen-peer-urls=http://10.0.0.45:1025 > --initial-advertise-peer-urls=http://10.0.0.45:1025 > --listen-client-urls=http://10.0.0.45:1026 > --advertise-client-urls=http://10.0.0.45:1026 > --initial-cluster=etcd-1449178273=http://10.0.0.45:1025,etcd-1449178271=http://10.0.2.95:1025,etcd-1449178272=http://10.0.2.216:1025 > --initial-cluster-state=existing > core 1651 0.0 0.0 6740 928 pts/0S+ 21:46 0:00 grep > --colour=auto -e etcd > core@ip-10-0-0-45 ~ $ sudo lsof -p 1432|grep -e 2181 > etcd-meso 1432 root 10u IPv4 21973 0t0TCP > ip-10-0-0-45.us-west-2.compute.internal:54016->ip-10-0-5-206.us-west-2.compute.internal:2181 > (ESTABLISHED) > core@ip-10-0-0-45 ~ $ ps auxwww|grep -e slave > root 1124 0.2 0.1 900496 25736 ?Ssl 21:11 0:04 > /opt/mesosphere/packages/mesos--52cbecde74638029c3ba0ac5e5ab81df8debf0fa/sbin/mesos-slave > core 1658 0.0 0.0 6740 832 pts/0S+ 21:46 0:00 grep > --colour=auto -e slave > core@ip-10-0-0-45 ~ $ sudo lsof -p 1124|grep -e 2181 > mesos-sla 1124 root 10u IPv4 21973 0t0TCP > ip-10-0-0-45.us-west-2.compute.internal:54016->ip-10-0-5-206.us-west-2.compute.internal:2181 > (ESTABLISHED) > {code} > I only tested against mesos 0.24.1 and 0.25.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4065) slave FD for ZK tcp connection leaked to executor process
[ https://issues.apache.org/jira/browse/MESOS-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046588#comment-15046588 ] Till Toenshoff commented on MESOS-4065: --- Some tool that has been rather useful for debugging such issues within Mesos; https://github.com/tillt/mesos/commit/d6982ece26121c599426e6b5c573e8d8afeff837 > slave FD for ZK tcp connection leaked to executor process > - > > Key: MESOS-4065 > URL: https://issues.apache.org/jira/browse/MESOS-4065 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.24.1, 0.25.0 >Reporter: James DeFelice > Labels: mesosphere, security > > {code} > core@ip-10-0-0-45 ~ $ ps auxwww|grep -e etcd > root 1432 99.3 0.0 202420 12928 ?Rsl 21:32 13:51 > ./etcd-mesos-executor -log_dir=./ > root 1450 0.4 0.1 38332 28752 ?Sl 21:32 0:03 ./etcd > --data-dir=etcd_data --name=etcd-1449178273 > --listen-peer-urls=http://10.0.0.45:1025 > --initial-advertise-peer-urls=http://10.0.0.45:1025 > --listen-client-urls=http://10.0.0.45:1026 > --advertise-client-urls=http://10.0.0.45:1026 > --initial-cluster=etcd-1449178273=http://10.0.0.45:1025,etcd-1449178271=http://10.0.2.95:1025,etcd-1449178272=http://10.0.2.216:1025 > --initial-cluster-state=existing > core 1651 0.0 0.0 6740 928 pts/0S+ 21:46 0:00 grep > --colour=auto -e etcd > core@ip-10-0-0-45 ~ $ sudo lsof -p 1432|grep -e 2181 > etcd-meso 1432 root 10u IPv4 21973 0t0TCP > ip-10-0-0-45.us-west-2.compute.internal:54016->ip-10-0-5-206.us-west-2.compute.internal:2181 > (ESTABLISHED) > core@ip-10-0-0-45 ~ $ ps auxwww|grep -e slave > root 1124 0.2 0.1 900496 25736 ?Ssl 21:11 0:04 > /opt/mesosphere/packages/mesos--52cbecde74638029c3ba0ac5e5ab81df8debf0fa/sbin/mesos-slave > core 1658 0.0 0.0 6740 832 pts/0S+ 21:46 0:00 grep > --colour=auto -e slave > core@ip-10-0-0-45 ~ $ sudo lsof -p 1124|grep -e 2181 > mesos-sla 1124 root 10u IPv4 21973 0t0TCP > ip-10-0-0-45.us-west-2.compute.internal:54016->ip-10-0-5-206.us-west-2.compute.internal:2181 > (ESTABLISHED) > {code} > I only tested against mesos 0.24.1 and 0.25.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-4025: -- Component/s: test > SlaveRecoveryTest/0.GCExecutor is flaky. > > > Key: MESOS-4025 > URL: https://issues.apache.org/jira/browse/MESOS-4025 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 0.26.0 >Reporter: Till Toenshoff > Labels: flaky, flaky-test, test > > Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based > on 0.26.0-rc1. > Testsuite was run as root. > {noformat} > sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1 > {noformat} > {noformat} > [ RUN ] SlaveRecoveryTest/0.GCExecutor > I1130 16:49:16.336833 1032 exec.cpp:136] Version: 0.26.0 > I1130 16:49:16.345212 1049 exec.cpp:210] Executor registered on slave > dde9fd4e-b016-4a99-9081-b047e9df9afa-S0 > Registered executor on ubuntu14 > Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114 > sh -c 'sleep 1000' > Forked command at 1057 > ../../src/tests/mesos.cpp:779: Failure > (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup > '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave': > Device or resource busy > *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are > using GNU date *** > PC: @ 0x1443e9a testing::UnitTest::AddTestPartResult() > *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; > stack trace: *** > @ 0x7f1be92b80b7 os::Linux::chained_handler() > @ 0x7f1be92bc219 JVM_handle_linux_signal > @ 0x7f1bf7bbc340 (unknown) > @ 0x1443e9a testing::UnitTest::AddTestPartResult() > @ 0x1438b99 testing::internal::AssertHelper::operator=() > @ 0xf0b3bb > mesos::internal::tests::ContainerizerTest<>::TearDown() > @ 0x1461882 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x145c6f8 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x143de4a testing::Test::Run() > @ 0x143e584 testing::TestInfo::Run() > @ 0x143ebca testing::TestCase::Run() > @ 0x1445312 testing::internal::UnitTestImpl::RunAllTests() > @ 0x14624a7 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x145d26e > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x14440ae testing::UnitTest::Run() > @ 0xd15cd4 RUN_ALL_TESTS() > @ 0xd158c1 main > @ 0x7f1bf7808ec5 (unknown) > @ 0x913009 (unknown) > {noformat} > My Vagrantfile generator; > {noformat} > #!/usr/bin/env bash > cat << EOF > Vagrantfile > # -*- mode: ruby -*-" > > # vi: set ft=ruby : > Vagrant.configure(2) do |config| > # Disable shared folder to prevent certain kernel module dependencies. > config.vm.synced_folder ".", "/vagrant", disabled: true > config.vm.box = "bento/ubuntu-14.04" > config.vm.hostname = "${PLATFORM_NAME}" > config.vm.provider "virtualbox" do |vb| > vb.memory = ${VAGRANT_MEM} > vb.cpus = ${VAGRANT_CPUS} > vb.customize ["modifyvm", :id, "--nictype1", "virtio"] > vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"] > vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"] > end > config.vm.provider "vmware_fusion" do |vb| > vb.memory = ${VAGRANT_MEM} > vb.cpus = ${VAGRANT_CPUS} > end > config.vm.provision "file", source: "../test.sh", destination: "~/test.sh" > config.vm.provision "shell", inline: <<-SHELL > sudo apt-get update > sudo apt-get -y install openjdk-7-jdk autoconf libtool > sudo apt-get -y install build-essential python-dev python-boto \ > libcurl4-nss-dev libsasl2-dev maven \ > libapr1-dev libsvn-dev libssl-dev libevent-dev > sudo apt-get -y install git > sudo wget -qO- https://get.docker.com/ | sh > SHELL > end > EOF > {noformat} > The problem is kicking in frequently in my tests - I'ld say > 10% but less > than 50%. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-4025: -- Labels: flaky flaky-test test (was: test) > SlaveRecoveryTest/0.GCExecutor is flaky. > > > Key: MESOS-4025 > URL: https://issues.apache.org/jira/browse/MESOS-4025 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 0.26.0 >Reporter: Till Toenshoff > Labels: flaky, flaky-test, test > > Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based > on 0.26.0-rc1. > Testsuite was run as root. > {noformat} > sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1 > {noformat} > {noformat} > [ RUN ] SlaveRecoveryTest/0.GCExecutor > I1130 16:49:16.336833 1032 exec.cpp:136] Version: 0.26.0 > I1130 16:49:16.345212 1049 exec.cpp:210] Executor registered on slave > dde9fd4e-b016-4a99-9081-b047e9df9afa-S0 > Registered executor on ubuntu14 > Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114 > sh -c 'sleep 1000' > Forked command at 1057 > ../../src/tests/mesos.cpp:779: Failure > (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup > '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave': > Device or resource busy > *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are > using GNU date *** > PC: @ 0x1443e9a testing::UnitTest::AddTestPartResult() > *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; > stack trace: *** > @ 0x7f1be92b80b7 os::Linux::chained_handler() > @ 0x7f1be92bc219 JVM_handle_linux_signal > @ 0x7f1bf7bbc340 (unknown) > @ 0x1443e9a testing::UnitTest::AddTestPartResult() > @ 0x1438b99 testing::internal::AssertHelper::operator=() > @ 0xf0b3bb > mesos::internal::tests::ContainerizerTest<>::TearDown() > @ 0x1461882 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x145c6f8 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x143de4a testing::Test::Run() > @ 0x143e584 testing::TestInfo::Run() > @ 0x143ebca testing::TestCase::Run() > @ 0x1445312 testing::internal::UnitTestImpl::RunAllTests() > @ 0x14624a7 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x145d26e > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x14440ae testing::UnitTest::Run() > @ 0xd15cd4 RUN_ALL_TESTS() > @ 0xd158c1 main > @ 0x7f1bf7808ec5 (unknown) > @ 0x913009 (unknown) > {noformat} > My Vagrantfile generator; > {noformat} > #!/usr/bin/env bash > cat << EOF > Vagrantfile > # -*- mode: ruby -*-" > > # vi: set ft=ruby : > Vagrant.configure(2) do |config| > # Disable shared folder to prevent certain kernel module dependencies. > config.vm.synced_folder ".", "/vagrant", disabled: true > config.vm.box = "bento/ubuntu-14.04" > config.vm.hostname = "${PLATFORM_NAME}" > config.vm.provider "virtualbox" do |vb| > vb.memory = ${VAGRANT_MEM} > vb.cpus = ${VAGRANT_CPUS} > vb.customize ["modifyvm", :id, "--nictype1", "virtio"] > vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"] > vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"] > end > config.vm.provider "vmware_fusion" do |vb| > vb.memory = ${VAGRANT_MEM} > vb.cpus = ${VAGRANT_CPUS} > end > config.vm.provision "file", source: "../test.sh", destination: "~/test.sh" > config.vm.provision "shell", inline: <<-SHELL > sudo apt-get update > sudo apt-get -y install openjdk-7-jdk autoconf libtool > sudo apt-get -y install build-essential python-dev python-boto \ > libcurl4-nss-dev libsasl2-dev maven \ > libapr1-dev libsvn-dev libssl-dev libevent-dev > sudo apt-get -y install git > sudo wget -qO- https://get.docker.com/ | sh > SHELL > end > EOF > {noformat} > The problem is kicking in frequently in my tests - I'ld say > 10% but less > than 50%. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3586) MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and CGROUPS_ROOT_SlaveRecovery are flaky
[ https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3586: -- Labels: flaky flaky-test (was: ) > MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and > CGROUPS_ROOT_SlaveRecovery are flaky > > > Key: MESOS-3586 > URL: https://issues.apache.org/jira/browse/MESOS-3586 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 0.24.0, 0.26.0 > Environment: Ubuntu 14.04, 3.13.0-32 generic > Debian 8, gcc 4.9.2 >Reporter: Miguel Bernadin > Labels: flaky, flaky-test > > I am install Mesos 0.24.0 on 4 servers which have very similar hardware and > software configurations. > After performing ../configure, make, and make check some servers have > completed successfully and other failed on test [ RUN ] > MemoryPressureMesosTest.CGROUPS_ROOT_Statistics. > Is there something I should check in this test? > PERFORMED MAKE CHECK NODE-001 > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0 > I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave > 20151005-143735-2393768202-35106-27900-S0 > Registered executor on svdidac038.techlabs.accenture.com > Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0 > Forked command at 38510 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > PERFORMED MAKE CHECK NODE-002 > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0 > I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave > 20151005-143857-2360213770-50427-26325-S0 > Registered executor on svdidac039.techlabs.accenture.com > Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > Forked command at 37028 > ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure > Expected: (usage.get().mem_medium_pressure_counter()) >= > (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6 > 2015-10-05 > 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3586) MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and CGROUPS_ROOT_SlaveRecovery are flaky
[ https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3586: -- Component/s: test > MemoryPressureMesosTest.CGROUPS_ROOT_Statistics and > CGROUPS_ROOT_SlaveRecovery are flaky > > > Key: MESOS-3586 > URL: https://issues.apache.org/jira/browse/MESOS-3586 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 0.24.0, 0.26.0 > Environment: Ubuntu 14.04, 3.13.0-32 generic > Debian 8, gcc 4.9.2 >Reporter: Miguel Bernadin > Labels: flaky, flaky-test > > I am install Mesos 0.24.0 on 4 servers which have very similar hardware and > software configurations. > After performing ../configure, make, and make check some servers have > completed successfully and other failed on test [ RUN ] > MemoryPressureMesosTest.CGROUPS_ROOT_Statistics. > Is there something I should check in this test? > PERFORMED MAKE CHECK NODE-001 > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0 > I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave > 20151005-143735-2393768202-35106-27900-S0 > Registered executor on svdidac038.techlabs.accenture.com > Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0 > Forked command at 38510 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > PERFORMED MAKE CHECK NODE-002 > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0 > I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave > 20151005-143857-2360213770-50427-26325-S0 > Registered executor on svdidac039.techlabs.accenture.com > Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > Forked command at 37028 > ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure > Expected: (usage.get().mem_medium_pressure_counter()) >= > (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6 > 2015-10-05 > 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3608) optionally install test binaries
[ https://issues.apache.org/jira/browse/MESOS-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039700#comment-15039700 ] Till Toenshoff commented on MESOS-3608: --- Due to the 0.26.0 release process, things have been much delayed already - sorry for that. I will not be able to review this until monday. > optionally install test binaries > > > Key: MESOS-3608 > URL: https://issues.apache.org/jira/browse/MESOS-3608 > Project: Mesos > Issue Type: Improvement > Components: build, test >Reporter: James Peach >Assignee: James Peach >Priority: Minor > > Many of the tests in Mesos could be described as integration tests, since > they have external dependencies on kernel features, installed tools, > permissions, etc. I'd like to be able to generate a {{mesos-tests}} RPM along > with my {{mesos}} RPM so that I can run the same tests in different > deployment environments. > I propose a new configuration option named {{--enable-test-tools}} that will > install the tests into {{libexec/mesos/tests}}. I'll also need to make some > minor changes to tests so that helper tools can be found in this location as > well as in the build directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4065) slave FD for ZK tcp connection leaked to executor process
[ https://issues.apache.org/jira/browse/MESOS-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15041767#comment-15041767 ] Till Toenshoff commented on MESOS-4065: --- What we see there is the fact that two processes (slave + executor) both use the same fd (10u) which likely is a bug. > slave FD for ZK tcp connection leaked to executor process > - > > Key: MESOS-4065 > URL: https://issues.apache.org/jira/browse/MESOS-4065 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.24.1, 0.25.0 >Reporter: James DeFelice > Labels: mesosphere, security > > {code} > core@ip-10-0-0-45 ~ $ ps auxwww|grep -e etcd > root 1432 99.3 0.0 202420 12928 ?Rsl 21:32 13:51 > ./etcd-mesos-executor -log_dir=./ > root 1450 0.4 0.1 38332 28752 ?Sl 21:32 0:03 ./etcd > --data-dir=etcd_data --name=etcd-1449178273 > --listen-peer-urls=http://10.0.0.45:1025 > --initial-advertise-peer-urls=http://10.0.0.45:1025 > --listen-client-urls=http://10.0.0.45:1026 > --advertise-client-urls=http://10.0.0.45:1026 > --initial-cluster=etcd-1449178273=http://10.0.0.45:1025,etcd-1449178271=http://10.0.2.95:1025,etcd-1449178272=http://10.0.2.216:1025 > --initial-cluster-state=existing > core 1651 0.0 0.0 6740 928 pts/0S+ 21:46 0:00 grep > --colour=auto -e etcd > core@ip-10-0-0-45 ~ $ sudo lsof -p 1432|grep -e 2181 > etcd-meso 1432 root 10u IPv4 21973 0t0TCP > ip-10-0-0-45.us-west-2.compute.internal:54016->ip-10-0-5-206.us-west-2.compute.internal:2181 > (ESTABLISHED) > core@ip-10-0-0-45 ~ $ ps auxwww|grep -e slave > root 1124 0.2 0.1 900496 25736 ?Ssl 21:11 0:04 > /opt/mesosphere/packages/mesos--52cbecde74638029c3ba0ac5e5ab81df8debf0fa/sbin/mesos-slave > core 1658 0.0 0.0 6740 832 pts/0S+ 21:46 0:00 grep > --colour=auto -e slave > core@ip-10-0-0-45 ~ $ sudo lsof -p 1124|grep -e 2181 > mesos-sla 1124 root 10u IPv4 21973 0t0TCP > ip-10-0-0-45.us-west-2.compute.internal:54016->ip-10-0-5-206.us-west-2.compute.internal:2181 > (ESTABLISHED) > {code} > I only tested against mesos 0.24.1 and 0.25.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4045) NumifyTest.HexNumberTest fails
[ https://issues.apache.org/jira/browse/MESOS-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037918#comment-15037918 ] Till Toenshoff commented on MESOS-4045: --- Appears to be a gcc vs. clang issue -- or maybe even a libstdc++ vs. libc++ problem. Building on OSX using gcc works fine, but bugs out using clang. > NumifyTest.HexNumberTest fails > -- > > Key: MESOS-4045 > URL: https://issues.apache.org/jira/browse/MESOS-4045 > Project: Mesos > Issue Type: Bug > Components: stout > Environment: Mac OS X 10.11.1 >Reporter: Michael Park > > {noformat} > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from NumifyTest > [ RUN ] NumifyTest.HexNumberTest > ../../../../3rdparty/libprocess/3rdparty/stout/tests/numify_tests.cpp:44: > Failure > Value of: numify("0x10.9").isError() > Actual: false > Expected: true > [ FAILED ] NumifyTest.HexNumberTest (0 ms) > [--] 1 test from NumifyTest (0 ms total) > [--] Global test environment tear-down > [==] 1 test from 1 test case ran. (0 ms total) > [ PASSED ] 0 tests. > [ FAILED ] 1 test, listed below: > [ FAILED ] NumifyTest.HexNumberTest > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4061) Flaky tests: docker containerizer tests on debian 8 VM
[ https://issues.apache.org/jira/browse/MESOS-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038556#comment-15038556 ] Till Toenshoff commented on MESOS-4061: --- Just for the fun of it - same test on VMware Fusion: {noformat} real0m5.887s user0m0.028s sys 0m0.020s {noformat} > Flaky tests: docker containerizer tests on debian 8 VM > -- > > Key: MESOS-4061 > URL: https://issues.apache.org/jira/browse/MESOS-4061 > Project: Mesos > Issue Type: Bug > Environment: debian 8, vagrant, virtual box >Reporter: Jojy Varghese > > Following tests were failing for 0.26 rc3: > * DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping > * DockerContainerizerTest.ROOT_DOCKER_Recover > * DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer > * DockerContainerizerTest.ROOT_DOCKER_Launch_Executor > * DockerContainerizerTest.ROOT_DOCKER_Launch > * DockerContainerizerTest.ROOT_DOCKER_Usage > * DockerContainerizerTest.ROOT_DOCKER_Update > * DockerContainerizerTest.ROOT_DOCKER_SkipRecoverNonDocker > Note that this is not a comprehensive list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4061) Flaky tests: docker containerizer tests on debian 8 VM
[ https://issues.apache.org/jira/browse/MESOS-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038584#comment-15038584 ] Till Toenshoff commented on MESOS-4061: --- The more intersting test here actually would be re-running the same command (with a new name) as that would cut out the download of that busybox image. > Flaky tests: docker containerizer tests on debian 8 VM > -- > > Key: MESOS-4061 > URL: https://issues.apache.org/jira/browse/MESOS-4061 > Project: Mesos > Issue Type: Bug > Environment: debian 8, vagrant, virtual box >Reporter: Jojy Varghese > > Following tests were failing for 0.26 rc3: > * DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping > * DockerContainerizerTest.ROOT_DOCKER_Recover > * DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer > * DockerContainerizerTest.ROOT_DOCKER_Launch_Executor > * DockerContainerizerTest.ROOT_DOCKER_Launch > * DockerContainerizerTest.ROOT_DOCKER_Usage > * DockerContainerizerTest.ROOT_DOCKER_Update > * DockerContainerizerTest.ROOT_DOCKER_SkipRecoverNonDocker > Note that this is not a comprehensive list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3608) optionally install test binaries
[ https://issues.apache.org/jira/browse/MESOS-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3608: -- Labels: mesosphere (was: ) > optionally install test binaries > > > Key: MESOS-3608 > URL: https://issues.apache.org/jira/browse/MESOS-3608 > Project: Mesos > Issue Type: Improvement > Components: build, test >Reporter: James Peach >Assignee: James Peach >Priority: Minor > Labels: mesosphere > > Many of the tests in Mesos could be described as integration tests, since > they have external dependencies on kernel features, installed tools, > permissions, etc. I'd like to be able to generate a {{mesos-tests}} RPM along > with my {{mesos}} RPM so that I can run the same tests in different > deployment environments. > I propose a new configuration option named {{--enable-test-tools}} that will > install the tests into {{libexec/mesos/tests}}. I'll also need to make some > minor changes to tests so that helper tools can be found in this location as > well as in the build directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4012) Update documentation to reflect the addition of installable tests.
[ https://issues.apache.org/jira/browse/MESOS-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-4012: -- Labels: mesosphere (was: ) > Update documentation to reflect the addition of installable tests. > > > Key: MESOS-4012 > URL: https://issues.apache.org/jira/browse/MESOS-4012 > Project: Mesos > Issue Type: Documentation >Reporter: Till Toenshoff > Labels: mesosphere > > We may want to add the needed steps for administrators to create and run the > test-suite on anything other than the build machine. > One possible location could be {{docs/gettings-started.md}} for validating > the pre-requisites as described in that document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4091) Mesos protobuf message definition ContainerInfo skipped an index.
Till Toenshoff created MESOS-4091: - Summary: Mesos protobuf message definition ContainerInfo skipped an index. Key: MESOS-4091 URL: https://issues.apache.org/jira/browse/MESOS-4091 Project: Mesos Issue Type: Bug Reporter: Till Toenshoff Looking at {{include/mesos/mesos.proto}}: {noformat} /** * Describes a container configuration and allows extensible * configurations for different container implementations. */ message ContainerInfo { // All container implementation types. enum Type { DOCKER = 1; MESOS = 2; } message DockerInfo { // The docker image that is going to be passed to the registry. required string image = 1; // Network options. enum Network { HOST = 1; BRIDGE = 2; NONE = 3; } optional Network network = 2 [default = HOST]; message PortMapping { required uint32 host_port = 1; required uint32 container_port = 2; // Protocol to expose as (ie: tcp, udp). optional string protocol = 3; } repeated PortMapping port_mappings = 3; optional bool privileged = 4 [default = false]; // Allowing arbitrary parameters to be passed to docker CLI. // Note that anything passed to this field is not guaranteed // to be supported moving forward, as we might move away from // the docker CLI. repeated Parameter parameters = 5; // With this flag set to true, the docker containerizer will // pull the docker image from the registry even if the image // is already downloaded on the slave. optional bool force_pull_image = 6; } message MesosInfo { optional Image image = 1; } required Type type = 1; repeated Volume volumes = 2; optional string hostname = 4; // Only one of the following *Info messages should be set to match // the type. optional DockerInfo docker = 3; optional MesosInfo mesos = 5; // A list of network requests. A framework can request multiple IP addresses // for the container. repeated NetworkInfo network_infos = 7; } {noformat} Seems we are missing index 6 here. A quick history check revealed no intension to remove a former field 6 - hence this appears to be a bug. Checked via; {noformat} $ git log -L 1500,1515:include/mesos/mesos.proto {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4015) Expose task / executor health in master & slave state.json
[ https://issues.apache.org/jira/browse/MESOS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-4015: -- Fix Version/s: (was: 0.27.0) 0.26.0 > Expose task / executor health in master & slave state.json > -- > > Key: MESOS-4015 > URL: https://issues.apache.org/jira/browse/MESOS-4015 > Project: Mesos > Issue Type: Improvement >Affects Versions: 0.25.0 >Reporter: Sargun Dhillon >Assignee: Artem Harutyunyan >Priority: Trivial > Labels: mesosphere > Fix For: 0.26.0 > > > Right now, if I specify a healthcheck for a task, the only way to get to it > is via the Task Status updates that come to the framework. Unfortunately, > this information isn't exposed in the state.json either in the slave or > master. It'd be ideal to have that information to enable tools like Mesos-DNS > to be health-aware. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4106) The health checker may fail to inform the executor to kill an unhealthy task after max_consecutive_failures.
[ https://issues.apache.org/jira/browse/MESOS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-4106: -- Fix Version/s: (was: 0.27.0) 0.26.0 > The health checker may fail to inform the executor to kill an unhealthy task > after max_consecutive_failures. > > > Key: MESOS-4106 > URL: https://issues.apache.org/jira/browse/MESOS-4106 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.20.0, 0.20.1, 0.21.1, 0.21.2, 0.22.1, 0.22.2, 0.23.0, > 0.23.1, 0.24.0, 0.24.1, 0.25.0 >Reporter: Benjamin Mahler >Assignee: Benjamin Mahler >Priority: Blocker > Fix For: 0.26.0 > > > This was reported by [~tan] experimenting with health checks. Many tasks were > launched with the following health check, taken from the container > stdout/stderr: > {code} > Launching health check process: /usr/local/libexec/mesos/mesos-health-check > --executor=(1)@127.0.0.1:39629 > --health_check_json={"command":{"shell":true,"value":"false"},"consecutive_failures":1,"delay_seconds":0.0,"grace_period_seconds":1.0,"interval_seconds":1.0,"timeout_seconds":1.0} > --task_id=sleepy-2 > {code} > This should have led to all tasks getting killed due to > {{\-\-consecutive_failures}} being set, however, only some tasks get killed, > while other remain running. > It turns out that the health check binary does a {{send}} and promptly exits. > Unfortunately, this may lead to a message drop since libprocess may not have > sent this message over the socket by the time the process exits. > We work around this in the command executor with a manual sleep, which has > been around since the svn days. See > [here|https://github.com/apache/mesos/blob/0.14.0/src/launcher/executor.cpp#L288-L290]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3686) General cleanup of documentation
[ https://issues.apache.org/jira/browse/MESOS-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060691#comment-15060691 ] Till Toenshoff commented on MESOS-3686: --- This commit https://github.com/apache/mesos/commit/b29ec4f110483555a5e1a65ef25a7ecc13e31b7f is the source, it seems. This is the fix: https://reviews.apache.org/r/41463/ > General cleanup of documentation > > > Key: MESOS-3686 > URL: https://issues.apache.org/jira/browse/MESOS-3686 > Project: Mesos > Issue Type: Documentation > Components: documentation, general >Reporter: Chris Elsmore > Labels: documentation > Original Estimate: 24h > Remaining Estimate: 24h > > Part of the MesosCon Europe 2015 Hackathon! > Current documentation is inconsistent, and could do with a clean up:- > * File names use a mix of hyphens and underscores, some start with 'mesos' > some not. > * A general clean up of broken links, and markdown tables etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3686) General cleanup of documentation
[ https://issues.apache.org/jira/browse/MESOS-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060636#comment-15060636 ] Till Toenshoff commented on MESOS-3686: --- Seems we got duplicate documentation now. https://github.com/apache/mesos/blob/master/docs/mesos-documentation-guide.md https://github.com/apache/mesos/blob/master/docs/documentation-guide.md > General cleanup of documentation > > > Key: MESOS-3686 > URL: https://issues.apache.org/jira/browse/MESOS-3686 > Project: Mesos > Issue Type: Documentation > Components: documentation, general >Reporter: Chris Elsmore > Labels: documentation > Original Estimate: 24h > Remaining Estimate: 24h > > Part of the MesosCon Europe 2015 Hackathon! > Current documentation is inconsistent, and could do with a clean up:- > * File names use a mix of hyphens and underscores, some start with 'mesos' > some not. > * A general clean up of broken links, and markdown tables etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-3742) Site needs to get updated as it still lists MesosCon Europe as an upcoming event
[ https://issues.apache.org/jira/browse/MESOS-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff reassigned MESOS-3742: - Assignee: Till Toenshoff > Site needs to get updated as it still lists MesosCon Europe as an upcoming > event > > > Key: MESOS-3742 > URL: https://issues.apache.org/jira/browse/MESOS-3742 > Project: Mesos > Issue Type: Bug > Components: project website >Reporter: Till Toenshoff >Assignee: Till Toenshoff > > The Apache website does need to get updated as it still lists MesosCon Europe > as an upcoming event > Even the registration page > (http://events.linuxfoundation.org/events/mesoscon-europe/attend/register) > still seems to accept registrations - something we might want to get fixed > upstream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3844) getting started documentation has flaws, corrections suggested (http://mesos.apache.org/gettingstarted/)
[ https://issues.apache.org/jira/browse/MESOS-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3844: -- Shepherd: Till Toenshoff (was: Benjamin Hindman) > getting started documentation has flaws, corrections suggested > (http://mesos.apache.org/gettingstarted/) > > > Key: MESOS-3844 > URL: https://issues.apache.org/jira/browse/MESOS-3844 > Project: Mesos > Issue Type: Documentation > Components: documentation, project website, test >Affects Versions: 0.25.0 > Environment: CentOS 7 AWS Linux image: AWS EC2 MarketPlace CentOS 7 > (x86_64) with Updates HVM (a t2.medium instance) >Reporter: Manne Laukkanen >Assignee: Kevin Klues >Priority: Trivial > Labels: build, documentation, mesosphere > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > Getting started documentation, while having great virtues, has room for > improvement: > 1) Documentation is illogical and wrong for this part: > " $ wget http://www.apache.org/dist/mesos/0.25.0/mesos-0.25.0.tar.gz > $ tar -zxf mesos-0.25.0.tar.gz" ...then, later: > "# Install a few utility tools > $ sudo yum install -y tar wget > ..obviously using tar and wget is not possible before installing them. > 2) Although vi is fine for many, utility tools having: > sudo yum install -y tar wget nano > might make editing e.g. the WANDISCO -repo file way easier for newbies. > 3) Advice to launch Mesos with localhost option ( " ./bin/mesos-master.sh > --ip=127.0.0.1 --work_dir=/var/lib/mesos " ) will lead into a state where > Mesos UI can not be reached in port :5050 in a production environment e.g. in > AWS EC2. Mentioning this would help, not hinder deployment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4118) Update Getting Started for Mac OS X El Capitan
[ https://issues.apache.org/jira/browse/MESOS-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-4118: -- Shepherd: Till Toenshoff > Update Getting Started for Mac OS X El Capitan > -- > > Key: MESOS-4118 > URL: https://issues.apache.org/jira/browse/MESOS-4118 > Project: Mesos > Issue Type: Documentation > Components: documentation > Environment: Mac OS X >Reporter: Kevin Klues >Assignee: Kevin Klues >Priority: Minor > Labels: documentation, mesosphere > Original Estimate: 1h > Remaining Estimate: 1h > > This ticket pertains to the Getting Started guide on the apache mesos website > The current instructions for installing on Mac OS X only include instructions > for Yosemite. The instructions to build for El Capitan are identical except > in the case of upgrading from Yosemite to El Capitan. To build after an > upgrade requires a trivial (but important) step which is non-obvious -- you > have to rerun 'xcode-select --install' after you complete the upgrade. > Let's change the heading for installing on Mac OS X to say: > Mac OS X Yosemite & El Capitan > and then add a comment at the bottom of the section to point out that a rerun > of 'xcode-select --install' is necessary after an upgrade from Yosemite to El > Capitan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4134) Add note about tunneling in site-docker README
[ https://issues.apache.org/jira/browse/MESOS-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-4134: -- Shepherd: Till Toenshoff > Add note about tunneling in site-docker README > -- > > Key: MESOS-4134 > URL: https://issues.apache.org/jira/browse/MESOS-4134 > Project: Mesos > Issue Type: Documentation > Components: documentation >Reporter: Kevin Klues >Assignee: Kevin Klues >Priority: Minor > Labels: documentation > > If we are running the site-docker container on a remote machine, we should > set up a tunnel to localhost to view the site locally. The README should > explain how to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)