[jira] [Updated] (MESOS-2067) Add HTTP API to the master for maintenance operations.
[ https://issues.apache.org/jira/browse/MESOS-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-2067: - Sprint: Mesosphere Sprint 16 Add HTTP API to the master for maintenance operations. -- Key: MESOS-2067 URL: https://issues.apache.org/jira/browse/MESOS-2067 Project: Mesos Issue Type: Task Components: master Reporter: Benjamin Mahler Assignee: Joseph Wu Labels: mesosphere, twitter Based on MESOS-1474, we'd like to provide an HTTP API on the master for the maintenance primitives in mesos. For the MVP, we'll want something like this for manipulating the schedule: {code} /maintenance/schedule GET - returns the schedule, which will include the various maintenance windows. POST - create or update the schedule with a JSON blob (see below). /maintenance/status GET - returns a list of machines and their maintenance mode. /maintenance/start POST - Transition a set of machines from Draining into Deactivated mode. /maintenance/stop POST - Transition a set of machines from Deactivated into Normal mode. {code} (Note: The slashes in URLs might not be supported yet.) A schedule might look like: {code} { windows : [ { machines : [ { ip : 192.168.0.1 }, { hostname : localhost }, ... ], unavailability : { start : 12345, // Epoch seconds. duration : 1000 // Seconds. } }, ... ] } {code} There should be firewall settings such that only those with access to master can use these endpoints. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3072) Unify initialization of modularized components
[ https://issues.apache.org/jira/browse/MESOS-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rojas updated MESOS-3072: --- Description: h1.Introduction As it stands right now, default implementations of modularized components are required to have a non parametrized {{create()}} static method. This allows to write tests which can cover default implementations and modules based on these default implementations on a uniform way. For example, with the interface {{Foo}}: {code} class Foo { public: virtual ~Foo() {} virtual Futureint hello() = 0; protected: Foo() {} }; {code} With a default implementation: {code} class LocalFoo { public: TryFoo* create() { return new Foo; } virtual Futureint hello() { return 1; } }; {code} This allows to create typed tests which look as following: {code} typedef ::testing::TypesLocalFoo, tests::ModuleFoo, TestLocalFoo FooTestTypes; TYPED_TEST_CASE(FooTest, FooTestTypes); TYPED_TEST(FooTest, ATest) { TryFoo* foo = TypeParam::create(); ASSERT_SOME(foo); AWAIT_CHECK_EQUAL(foo.get()-hello(), 1); } {code} The test will be applied to each of types in the template parameters of {{FooTestTypes}}. This allows to test different implementation of an interface. In our code, it tests default implementations and a module which uses the same default implementation. The class {{tests::Moduletypename T, ModuleID N}} needs a little explanation, it is a wrapper around {{ModuleManager}} which allows the tests to encode information about the requested module in the type itself instead of passing a string to the factory method. The wrapper around create, the real important method looks as follows: {code} templatetypename T, ModuleID N static TryT* test::ModuleT, N::create() { Trystd::string moduleName = getModuleName(N); if (moduleName.isError()) { return Error(moduleName.error()); } return mesos::modules::ModuleManager::createT(moduleName.get()); } {code} h1.The Problem Consider the following implementation of {{Foo}}: {code} class ParameterFoo { public: TryFoo* create(int i) { return new ParameterFoo(i); } ParameterFoo(int i) : i_(i) {} virtual Futureint hello() { return i; } private: int i_; }; {code} As it can be seen, this implementation cannot be used as a default implementation since its create API does not match the one of {{test::Module}}: {{create()}} has a different signature for both types. It is still a common situation to require initialization parameters for objects, however this constraint (keeping both interfaces alike) forces default implementations of modularized components to have default constructors, therefore the tests are forcing the design of the interfaces. Implementations which are supposed to be used as modules only, i.e. non default implementations are allowed to have constructor parameters, since the actual signature of their factory method is, this factory method's function is to decode the parameters and call the appropriate constructor: {code} templatetypename T T* ModuleT::create(const Parameters params); {code} where parameters is just an array of key-value string pairs whose interpretation is left to the specific module. Sadly, this call is wrapped by {{ModuleManager}} which only allows module parameters to be passed from the command line and does not offer a programmatic way to feed construction parameters to modules. h1.The Ugly Workaround With the requirement of a default constructor and parameters devoid {{create()}} factory function, a common pattern (see [Authenticator|https://github.com/apache/mesos/blob/9d4ac11ed757aa5869da440dfe5343a61b07199a/include/mesos/authentication/authenticator.hpp]) has been introduced to feed construction parameters into default implementation, this leads to adding an {{initialize()}} call to the public interface, which will have {{Foo}} become: {code} class Foo { public: virtual ~Foo() {} virtual TryNothing initialize(Optionint i) = 0; virtual Futureint hello() = 0; protected: Foo() {} }; {code} {{ParameterFoo}} will thus look as follows: {code} class ParameterFoo { public: TryFoo* create() { return new ParameterFoo; } ParameterFoo() : i_(None()) {} virtual TryNothing initialize(Optionint i) { if (i.isNone()) { return Error(Need value to initialize); } i_ = i; return Nothing; } virtual Futureint hello() { if (i_.isNone()) { return Futureint::failure(Not initialized); } return i_.get(); } private: Optionint i_; }; {code} Look that this {{initialize()}} method now has to be implemented by all descendants of {{Foo}}, even if there's a {{DatabaseFoo}} which takes is return value for {{hello()}} from a DB, it will need to support {{int}} as an initialization parameter. The problem is more severe the more specific the parameter to
[jira] [Commented] (MESOS-3072) Unify initialization of modularized components
[ https://issues.apache.org/jira/browse/MESOS-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693604#comment-14693604 ] Alexander Rojas commented on MESOS-3072: My solution for this problem will look as follows: Under this proposal, the first step is to overload the factory method {{create()}} of the {{ModuleManager}} by adding the following: {code} template typename T static TryT* ModuleManagerT::create( const std::string moduleName, const Parameters params) { synchronized (mutex) { if (!moduleBases.contains(moduleName)) { return Error( Module ' + moduleName + ' unknown); } ModuleT* module = (ModuleT*) moduleBases[moduleName]; if (module-create == NULL) { return Error( Error creating module instance for ' + moduleName + ': create() method not found); } std::string expectedKind = kindT(); if (expectedKind != module-kind) { return Error( Error creating module instance for ' + moduleName + ': module is of kind ' + module-kind + ', but the requested kind is ' + expectedKind + '); } T* instance = module-create(params); if (instance == NULL) { return Error(Error creating Module instance for ' + moduleName + '); } return instance; } } {code} Then the original {{create()}} method can be simplified to: {code} template typename T static TryT* ModuleManagerT::create(const std::string moduleName) { return create(moduleName, moduleParameters[moduleName]); } {code} {{Foo}} remains unchanged: {code} class Foo { public: virtual ~Foo() {} virtual Futureint hello() = 0; protected: Foo() {} }; {code} Wile {{ParameterFoo}} is kept simple with only one factory function extra: {code} class ParameterFoo { public: TryFoo* create(int i) { return new ParameterFoo(i); } TryFoo* create(const Parameters params) { Optionint param = None; std::size_t error; for (const auto param : params.parameter()) { if (param.key() == i) { param = std::stoi(param.value(), error); if (error == 0) { return Error(Could not parse parameters); } } } if (param.isNone()) { return Error(Wrong type given in the parameters); } return create(param.get()); } ParameterFoo(int i) : i_(i) {} virtual Futureint hello() { return i; } private: int i_; }; {code} Some changes in {{tests::Module}} will be needed, adding an overload for create: {code} templatetypename T static TryT* tests::ModuleT::create(const Parameters params) { Trystd::string moduleName = getModuleName(N); if (moduleName.isError()) { return Error(moduleName.error()); } return mesos::modules::ModuleManager::createT(moduleName.get(), params); } {code} The test can thus be written as: {code} typedef ::testing::TypesParameterFoo, tests::ModuleFoo, TestParameterFoo FooTestTypes; TYPED_TEST_CASE(FooTest, FooTestTypes); TYPED_TEST(FooTest, ATest) { int fooValue = 1; // This part can go in the fixture set up Parameters params; Parameter* param = params.add_parameter(); param-set_key(i); param-set_value(std::to_string(fooValue)); TryFoo* foo = TypeParam::create(params); ASSERT_SOME(foo); AWAIT_CHECK_EQUAL(foo.get()-hello(), fooValue); } {code} Unify initialization of modularized components -- Key: MESOS-3072 URL: https://issues.apache.org/jira/browse/MESOS-3072 Project: Mesos Issue Type: Improvement Components: modules Affects Versions: 0.22.0, 0.22.1, 0.23.0 Reporter: Alexander Rojas Labels: mesosphere h1.Introduction As it stands right now, default implementations of modularized components are required to have a non parametrized {{create()}} static method. This allows to write tests which can cover default implementations and modules based on these default implementations on a uniform way. For example, with the interface {{Foo}}: {code} class Foo { public: virtual ~Foo() {} virtual Futureint hello() = 0; protected: Foo() {} }; {code} With a default implementation: {code} class LocalFoo { public: TryFoo* create() { return new Foo; } virtual Futureint hello() { return 1; } }; {code} This allows to create typed tests which look as following: {code} typedef ::testing::TypesLocalFoo, tests::ModuleFoo, TestLocalFoo FooTestTypes; TYPED_TEST_CASE(FooTest, FooTestTypes); TYPED_TEST(FooTest, ATest) { TryFoo* foo = TypeParam::create(); ASSERT_SOME(foo); AWAIT_CHECK_EQUAL(foo.get()-hello(), 1); } {code} The test will be applied to each of types in the template parameters of {{FooTestTypes}}. This allows to test different implementation of an
[jira] [Commented] (MESOS-1010) Python extension build is broken if gflags-dev is installed
[ https://issues.apache.org/jira/browse/MESOS-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693621#comment-14693621 ] Till Toenshoff commented on MESOS-1010: --- I have no reopened that review request - will also discuss it with the some other committers today - stay tuned for more :) Python extension build is broken if gflags-dev is installed --- Key: MESOS-1010 URL: https://issues.apache.org/jira/browse/MESOS-1010 Project: Mesos Issue Type: Bug Components: build, python api Environment: Fedora 20, amd64, GCC: 4.8.2; OSX Yosemite, Apple LLVM 6.1.0 (~LLVM 3.6.0) Reporter: Nikita Vetoshkin Assignee: Greg Mann Labels: flaky-test, mesosphere In my environment mesos build from master results in broken python api module {{_mesos.so}}: {noformat} nekto0n@ya-darkstar ~/workspace/mesos/src/python $ PYTHONPATH=build/lib.linux-x86_64-2.7/ python -c import _mesos Traceback (most recent call last): File string, line 1, in module ImportError: /home/nekto0n/workspace/mesos/src/python/build/lib.linux-x86_64-2.7/_mesos.so: undefined symbol: _ZN6google14FlagRegistererC1EPKcS2_S2_S2_PvS3_ {noformat} Unmangled version of symbol looks like this: {noformat} google::FlagRegisterer::FlagRegisterer(char const*, char const*, char const*, char const*, void*, void*) {noformat} During {{./configure}} step {{glog}} finds {{gflags}} development files and starts using them, thus *implicitly* adding dependency on {{libgflags.so}}. This breaks Python extensions module and perhaps can break other mesos subsystems when moved to hosts without {{gflags}} installed. This task is done when the ExamplesTest.PythonFramework test will pass on a system with gflags installed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-830) ExamplesTest.JavaFramework is flaky
[ https://issues.apache.org/jira/browse/MESOS-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-830: External issue URL: https://reviews.apache.org/r/37415/ ExamplesTest.JavaFramework is flaky --- Key: MESOS-830 URL: https://issues.apache.org/jira/browse/MESOS-830 Project: Mesos Issue Type: Bug Components: test Reporter: Vinod Kone Assignee: Greg Mann Labels: flaky, mesosphere Identify the cause of the following test failure: [ RUN ] ExamplesTest.JavaFramework Using temporary directory '/tmp/ExamplesTest_JavaFramework_wSc7u8' Enabling authentication for the framework I1120 15:13:39.820032 1681264640 master.cpp:285] Master started on 172.25.133.171:52576 I1120 15:13:39.820180 1681264640 master.cpp:299] Master ID: 201311201513-2877626796-52576-3234 I1120 15:13:39.820194 1681264640 master.cpp:302] Master only allowing authenticated frameworks to register! I1120 15:13:39.821197 1679654912 slave.cpp:112] Slave started on 1)@172.25.133.171:52576 I1120 15:13:39.821795 1679654912 slave.cpp:212] Slave resources: cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.822855 1682337792 slave.cpp:112] Slave started on 2)@172.25.133.171:52576 I1120 15:13:39.823652 1682337792 slave.cpp:212] Slave resources: cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.825330 1679118336 master.cpp:744] The newly elected leader is master@172.25.133.171:52576 I1120 15:13:39.825445 1679118336 master.cpp:748] Elected as the leading master! I1120 15:13:39.825907 1681264640 state.cpp:33] Recovering state from '/tmp/ExamplesTest_JavaFramework_wSc7u8/0/meta' I1120 15:13:39.826127 1681264640 status_update_manager.cpp:180] Recovering status update manager I1120 15:13:39.826331 1681801216 process_isolator.cpp:317] Recovering isolator I1120 15:13:39.826738 1682874368 slave.cpp:2743] Finished recovery I1120 15:13:39.827747 1682337792 state.cpp:33] Recovering state from '/tmp/ExamplesTest_JavaFramework_wSc7u8/1/meta' I1120 15:13:39.827945 1680191488 slave.cpp:112] Slave started on 3)@172.25.133.171:52576 I1120 15:13:39.828415 1682337792 status_update_manager.cpp:180] Recovering status update manager I1120 15:13:39.828608 1680728064 sched.cpp:260] Authenticating with master master@172.25.133.171:52576 I1120 15:13:39.828606 1680191488 slave.cpp:212] Slave resources: cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.828680 1682874368 slave.cpp:497] New master detected at master@172.25.133.171:52576 I1120 15:13:39.828765 1682337792 process_isolator.cpp:317] Recovering isolator I1120 15:13:39.829828 1680728064 sched.cpp:229] Detecting new master I1120 15:13:39.830288 1679654912 authenticatee.hpp:100] Initializing client SASL I1120 15:13:39.831635 1680191488 state.cpp:33] Recovering state from '/tmp/ExamplesTest_JavaFramework_wSc7u8/2/meta' I1120 15:13:39.831991 1679118336 status_update_manager.cpp:158] New master detected at master@172.25.133.171:52576 I1120 15:13:39.832042 1682874368 slave.cpp:524] Detecting new master I1120 15:13:39.832314 1682337792 slave.cpp:2743] Finished recovery I1120 15:13:39.832309 1681264640 master.cpp:1266] Attempting to register slave on vkone.local at slave(1)@172.25.133.171:52576 I1120 15:13:39.832929 1680728064 status_update_manager.cpp:180] Recovering status update manager I1120 15:13:39.833371 1681801216 slave.cpp:497] New master detected at master@172.25.133.171:52576 I1120 15:13:39.833273 1681264640 master.cpp:2513] Adding slave 201311201513-2877626796-52576-3234-0 at vkone.local with cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.833595 1680728064 process_isolator.cpp:317] Recovering isolator I1120 15:13:39.833859 1681801216 slave.cpp:524] Detecting new master I1120 15:13:39.833861 1682874368 status_update_manager.cpp:158] New master detected at master@172.25.133.171:52576 I1120 15:13:39.834092 1680191488 slave.cpp:542] Registered with master master@172.25.133.171:52576; given slave ID 201311201513-2877626796-52576-3234-0 I1120 15:13:39.834486 1681264640 master.cpp:1266] Attempting to register slave on vkone.local at slave(2)@172.25.133.171:52576 I1120 15:13:39.834549 1681264640 master.cpp:2513] Adding slave 201311201513-2877626796-52576-3234-1 at vkone.local with cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.834750 1680191488 slave.cpp:555] Checkpointing SlaveInfo to '/tmp/ExamplesTest_JavaFramework_wSc7u8/0/meta/slaves/201311201513-2877626796-52576-3234-0/slave.info' I1120 15:13:39.834875 1682874368 hierarchical_allocator_process.hpp:445] Added slave 201311201513-2877626796-52576-3234-0 (vkone.local) with cpus(*):4; mem(*):7168;
[jira] [Commented] (MESOS-830) ExamplesTest.JavaFramework is flaky
[ https://issues.apache.org/jira/browse/MESOS-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694183#comment-14694183 ] Greg Mann commented on MESOS-830: - There does not seem to be a good way to ensure completion of SchedulerDriver teardown before exiting the program, so I've settled for the not-so-good alternative of inserting a brief sleep before {{System.exit()}} is called. Review is here: https://reviews.apache.org/r/37415/ With the sleep, I can run the test several hundred times with no failures. This bug is also a bit dependent upon JDK installation: in OSX, it would fail nearly every time with the JVM 1.8 installed via {{brew cask install java}}, while it would fail only infrequently with the Oracle v7 or v8 JDK installed. ExamplesTest.JavaFramework is flaky --- Key: MESOS-830 URL: https://issues.apache.org/jira/browse/MESOS-830 Project: Mesos Issue Type: Bug Components: test Reporter: Vinod Kone Assignee: Greg Mann Labels: flaky, mesosphere Identify the cause of the following test failure: [ RUN ] ExamplesTest.JavaFramework Using temporary directory '/tmp/ExamplesTest_JavaFramework_wSc7u8' Enabling authentication for the framework I1120 15:13:39.820032 1681264640 master.cpp:285] Master started on 172.25.133.171:52576 I1120 15:13:39.820180 1681264640 master.cpp:299] Master ID: 201311201513-2877626796-52576-3234 I1120 15:13:39.820194 1681264640 master.cpp:302] Master only allowing authenticated frameworks to register! I1120 15:13:39.821197 1679654912 slave.cpp:112] Slave started on 1)@172.25.133.171:52576 I1120 15:13:39.821795 1679654912 slave.cpp:212] Slave resources: cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.822855 1682337792 slave.cpp:112] Slave started on 2)@172.25.133.171:52576 I1120 15:13:39.823652 1682337792 slave.cpp:212] Slave resources: cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.825330 1679118336 master.cpp:744] The newly elected leader is master@172.25.133.171:52576 I1120 15:13:39.825445 1679118336 master.cpp:748] Elected as the leading master! I1120 15:13:39.825907 1681264640 state.cpp:33] Recovering state from '/tmp/ExamplesTest_JavaFramework_wSc7u8/0/meta' I1120 15:13:39.826127 1681264640 status_update_manager.cpp:180] Recovering status update manager I1120 15:13:39.826331 1681801216 process_isolator.cpp:317] Recovering isolator I1120 15:13:39.826738 1682874368 slave.cpp:2743] Finished recovery I1120 15:13:39.827747 1682337792 state.cpp:33] Recovering state from '/tmp/ExamplesTest_JavaFramework_wSc7u8/1/meta' I1120 15:13:39.827945 1680191488 slave.cpp:112] Slave started on 3)@172.25.133.171:52576 I1120 15:13:39.828415 1682337792 status_update_manager.cpp:180] Recovering status update manager I1120 15:13:39.828608 1680728064 sched.cpp:260] Authenticating with master master@172.25.133.171:52576 I1120 15:13:39.828606 1680191488 slave.cpp:212] Slave resources: cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.828680 1682874368 slave.cpp:497] New master detected at master@172.25.133.171:52576 I1120 15:13:39.828765 1682337792 process_isolator.cpp:317] Recovering isolator I1120 15:13:39.829828 1680728064 sched.cpp:229] Detecting new master I1120 15:13:39.830288 1679654912 authenticatee.hpp:100] Initializing client SASL I1120 15:13:39.831635 1680191488 state.cpp:33] Recovering state from '/tmp/ExamplesTest_JavaFramework_wSc7u8/2/meta' I1120 15:13:39.831991 1679118336 status_update_manager.cpp:158] New master detected at master@172.25.133.171:52576 I1120 15:13:39.832042 1682874368 slave.cpp:524] Detecting new master I1120 15:13:39.832314 1682337792 slave.cpp:2743] Finished recovery I1120 15:13:39.832309 1681264640 master.cpp:1266] Attempting to register slave on vkone.local at slave(1)@172.25.133.171:52576 I1120 15:13:39.832929 1680728064 status_update_manager.cpp:180] Recovering status update manager I1120 15:13:39.833371 1681801216 slave.cpp:497] New master detected at master@172.25.133.171:52576 I1120 15:13:39.833273 1681264640 master.cpp:2513] Adding slave 201311201513-2877626796-52576-3234-0 at vkone.local with cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.833595 1680728064 process_isolator.cpp:317] Recovering isolator I1120 15:13:39.833859 1681801216 slave.cpp:524] Detecting new master I1120 15:13:39.833861 1682874368 status_update_manager.cpp:158] New master detected at master@172.25.133.171:52576 I1120 15:13:39.834092 1680191488 slave.cpp:542] Registered with master master@172.25.133.171:52576; given slave ID 201311201513-2877626796-52576-3234-0 I1120 15:13:39.834486 1681264640 master.cpp:1266] Attempting to register slave on
[jira] [Comment Edited] (MESOS-3201) Libev handle_async can deadlock with run_in_event_loop
[ https://issues.apache.org/jira/browse/MESOS-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694128#comment-14694128 ] Artem Harutyunyan edited comment on MESOS-3201 at 8/12/15 8:32 PM: --- Hi [~vinodkone], [~benjaminhindman] wanted to get another Ship It on this one before he commits it. Could you please review? was (Author: hartem): Hi [~vinodkone]], [~benjaminhindman] wanted to get another Ship It on this one before he commits it. Could you please review? Libev handle_async can deadlock with run_in_event_loop -- Key: MESOS-3201 URL: https://issues.apache.org/jira/browse/MESOS-3201 Project: Mesos Issue Type: Bug Components: libprocess Affects Versions: 0.23.0 Reporter: Joris Van Remoortere Assignee: Joris Van Remoortere Priority: Blocker Labels: libprocess, mesosphere Due to the arbitrary nature of the functions that are executed in handle_async, invoking them under the (A) {{watchers_mutex}} can lead to deadlocks if (B) is acquired before calling {{run_in_event_loop}} and (B) is also acquired within the arbitrary function. {code} ==82679== Thread #10: lock order 0x60774F8 before 0x60768C0 violated ==82679== ==82679== Observed (incorrect) order is: acquisition of lock at 0x60768C0 ==82679==at 0x4C32145: pthread_mutex_lock (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==82679==by 0x692C9B: __gthread_mutex_lock(pthread_mutex_t*) (gthr-default.h:748) ==82679==by 0x6950BF: std::mutex::lock() (mutex:134) ==82679==by 0x696219: Synchronizedstd::mutex synchronizestd::mutex(std::mutex*)::{lambda(std::mutex*)#1}::operator()(std::mutex*) const (synchronized.hpp:58) ==82679==by 0x696238: Synchronizedstd::mutex synchronizestd::mutex(std::mutex*)::{lambda(std::mutex*)#1}::_FUN(std::mutex*) (synchronized.hpp:58) ==82679==by 0x6984CF: Synchronizedstd::mutex::Synchronized(std::mutex*, void (*)(std::mutex*), void (*)(std::mutex*)) (synchronized.hpp:35) ==82679==by 0x6962DE: Synchronizedstd::mutex synchronizestd::mutex(std::mutex*) (synchronized.hpp:60) ==82679==by 0x728FE1: process::handle_async(ev_loop*, ev_async*, int) (libev.cpp:48) ==82679==by 0x761384: ev_invoke_pending (ev.c:2994) ==82679==by 0x7643C4: ev_run (ev.c:3394) ==82679==by 0x728E37: ev_loop (ev.h:826) ==82679==by 0x729469: process::EventLoop::run() (libev.cpp:135) ==82679== ==82679== followed by a later acquisition of lock at 0x60774F8 ==82679==at 0x4C32145: pthread_mutex_lock (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==82679==by 0x4C6F9D: __gthread_mutex_lock(pthread_mutex_t*) (gthr-default.h:748) ==82679==by 0x4C6FED: __gthread_recursive_mutex_lock(pthread_mutex_t*) (gthr-default.h:810) ==82679==by 0x4F5D3D: std::recursive_mutex::lock() (mutex:175) ==82679==by 0x516513: Synchronizedstd::recursive_mutex synchronizestd::recursive_mutex(std::recursive_mutex*)::{lambda(std::recursive_mutex*)#1}::operator()(std::recursive_mutex*) const (synchronized.hpp:58) ==82679==by 0x516532: Synchronizedstd::recursive_mutex synchronizestd::recursive_mutex(std::recursive_mutex*)::{lambda(std::recursive_mutex*)#1}::_FUN(std::recursive_mutex*) (synchronized.hpp:58) ==82679==by 0x52E619: Synchronizedstd::recursive_mutex::Synchronized(std::recursive_mutex*, void (*)(std::recursive_mutex*), void (*)(std::recursive_mutex*)) (synchronized.hpp:35) ==82679==by 0x5165D4: Synchronizedstd::recursive_mutex synchronizestd::recursive_mutex(std::recursive_mutex*) (synchronized.hpp:60) ==82679==by 0x6BF4E1: process::ProcessManager::use(process::UPID const) (process.cpp:2127) ==82679==by 0x6C2B8C: process::ProcessManager::terminate(process::UPID const, bool, process::ProcessBase*) (process.cpp:2604) ==82679==by 0x6C6C3C: process::terminate(process::UPID const, bool) (process.cpp:3107) ==82679==by 0x692B65: process::Latch::trigger() (latch.cpp:53) {code} This was introduced in https://github.com/apache/mesos/commit/849fc4d361e40062073324153ba97e98e294fdf2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2794) Implement filesystem isolators
[ https://issues.apache.org/jira/browse/MESOS-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694465#comment-14694465 ] Jie Yu commented on MESOS-2794: --- commit fd9b283310ad93c02ca1cedea85065dc99b581d1 Author: Jie Yu yujie@gmail.com Date: Wed Aug 12 16:32:46 2015 -0700 Added a persistent volume test for linux filesystem isolator to test case where the container does not specify a root filesystem. Review: https://reviews.apache.org/r/37422 commit 1ff6e269b70164409fec359644a1674cf02947cb Author: Jie Yu yujie@gmail.com Date: Mon Aug 10 18:57:05 2015 -0700 Added a persistent volume test for linux filesystem isolator. Review: https://reviews.apache.org/r/37334 commit 0b3cdecb41747563827128717e22351265786bc7 Author: Jie Yu yujie@gmail.com Date: Mon Aug 10 17:23:45 2015 -0700 Added persistent volume support for linux filesystem isolator. Review: https://reviews.apache.org/r/37330 commit 03b28b11e53fc098f838af5a6fec81c14a025dfb Author: Jie Yu yujie@gmail.com Date: Mon Aug 10 15:12:26 2015 -0700 Added filesystem isolator tests to test volumes from the host. Review: https://reviews.apache.org/r/37322 commit 512f2bca2eb01d64c5c2c1d2f350862eaaae17f5 Author: Jie Yu yujie@gmail.com Date: Fri Aug 7 16:29:10 2015 -0700 Added filesystem isolator tests to test volumes from sandbox. Review: https://reviews.apache.org/r/37237 commit f55e36a5a7e0e00186bdcb3c1dc7601f9ba90bb0 Author: Jie Yu yujie@gmail.com Date: Fri Aug 7 15:32:48 2015 -0700 Added the linux filesystem isolator. Review: https://reviews.apache.org/r/37236 Implement filesystem isolators -- Key: MESOS-2794 URL: https://issues.apache.org/jira/browse/MESOS-2794 Project: Mesos Issue Type: Improvement Components: isolation Affects Versions: 0.22.1 Reporter: Ian Downes Assignee: Jie Yu Labels: twitter Fix For: 0.24.0 Move persistent volume support from Mesos containerizer to separate filesystem isolators, including support for container rootfs, where possible. Use symlinks for posix systems without container rootfs. Use bind mounts for Linux with/without container rootfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3201) Libev handle_async can deadlock with run_in_event_loop
[ https://issues.apache.org/jira/browse/MESOS-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694128#comment-14694128 ] Artem Harutyunyan commented on MESOS-3201: -- Hi [~vinodkone]], [~benjaminhindman] wanted to get another Ship It on this one before he commits it. Could you please review? Libev handle_async can deadlock with run_in_event_loop -- Key: MESOS-3201 URL: https://issues.apache.org/jira/browse/MESOS-3201 Project: Mesos Issue Type: Bug Components: libprocess Affects Versions: 0.23.0 Reporter: Joris Van Remoortere Assignee: Joris Van Remoortere Priority: Blocker Labels: libprocess, mesosphere Due to the arbitrary nature of the functions that are executed in handle_async, invoking them under the (A) {{watchers_mutex}} can lead to deadlocks if (B) is acquired before calling {{run_in_event_loop}} and (B) is also acquired within the arbitrary function. {code} ==82679== Thread #10: lock order 0x60774F8 before 0x60768C0 violated ==82679== ==82679== Observed (incorrect) order is: acquisition of lock at 0x60768C0 ==82679==at 0x4C32145: pthread_mutex_lock (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==82679==by 0x692C9B: __gthread_mutex_lock(pthread_mutex_t*) (gthr-default.h:748) ==82679==by 0x6950BF: std::mutex::lock() (mutex:134) ==82679==by 0x696219: Synchronizedstd::mutex synchronizestd::mutex(std::mutex*)::{lambda(std::mutex*)#1}::operator()(std::mutex*) const (synchronized.hpp:58) ==82679==by 0x696238: Synchronizedstd::mutex synchronizestd::mutex(std::mutex*)::{lambda(std::mutex*)#1}::_FUN(std::mutex*) (synchronized.hpp:58) ==82679==by 0x6984CF: Synchronizedstd::mutex::Synchronized(std::mutex*, void (*)(std::mutex*), void (*)(std::mutex*)) (synchronized.hpp:35) ==82679==by 0x6962DE: Synchronizedstd::mutex synchronizestd::mutex(std::mutex*) (synchronized.hpp:60) ==82679==by 0x728FE1: process::handle_async(ev_loop*, ev_async*, int) (libev.cpp:48) ==82679==by 0x761384: ev_invoke_pending (ev.c:2994) ==82679==by 0x7643C4: ev_run (ev.c:3394) ==82679==by 0x728E37: ev_loop (ev.h:826) ==82679==by 0x729469: process::EventLoop::run() (libev.cpp:135) ==82679== ==82679== followed by a later acquisition of lock at 0x60774F8 ==82679==at 0x4C32145: pthread_mutex_lock (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==82679==by 0x4C6F9D: __gthread_mutex_lock(pthread_mutex_t*) (gthr-default.h:748) ==82679==by 0x4C6FED: __gthread_recursive_mutex_lock(pthread_mutex_t*) (gthr-default.h:810) ==82679==by 0x4F5D3D: std::recursive_mutex::lock() (mutex:175) ==82679==by 0x516513: Synchronizedstd::recursive_mutex synchronizestd::recursive_mutex(std::recursive_mutex*)::{lambda(std::recursive_mutex*)#1}::operator()(std::recursive_mutex*) const (synchronized.hpp:58) ==82679==by 0x516532: Synchronizedstd::recursive_mutex synchronizestd::recursive_mutex(std::recursive_mutex*)::{lambda(std::recursive_mutex*)#1}::_FUN(std::recursive_mutex*) (synchronized.hpp:58) ==82679==by 0x52E619: Synchronizedstd::recursive_mutex::Synchronized(std::recursive_mutex*, void (*)(std::recursive_mutex*), void (*)(std::recursive_mutex*)) (synchronized.hpp:35) ==82679==by 0x5165D4: Synchronizedstd::recursive_mutex synchronizestd::recursive_mutex(std::recursive_mutex*) (synchronized.hpp:60) ==82679==by 0x6BF4E1: process::ProcessManager::use(process::UPID const) (process.cpp:2127) ==82679==by 0x6C2B8C: process::ProcessManager::terminate(process::UPID const, bool, process::ProcessBase*) (process.cpp:2604) ==82679==by 0x6C6C3C: process::terminate(process::UPID const, bool) (process.cpp:3107) ==82679==by 0x692B65: process::Latch::trigger() (latch.cpp:53) {code} This was introduced in https://github.com/apache/mesos/commit/849fc4d361e40062073324153ba97e98e294fdf2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3254) Cgroup CHECK fails test harness
[ https://issues.apache.org/jira/browse/MESOS-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-3254: -- Description: CHECK in clean up of ContainerizerTest causes test harness to abort rather than fail or skip only perf related tests. [ RUN ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch [ OK ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch (628 ms) [--] 24 tests from SlaveRecoveryTest/0 (38986 ms total) [--] 4 tests from MesosContainerizerSlaveRecoveryTest [ RUN ] MesosContainerizerSlaveRecoveryTest.ResourceStatistics ../../src/tests/mesos.cpp:720: Failure cgroups::mount(hierarchy, subsystem): 'perf_event' is already attached to another hierarchy - We cannot run any cgroups tests that require a hierarchy with subsystem 'perf_event' because we failed to find an existing hierarchy or create a new one (tried '/tmp/mesos_test_cgroup/perf_event'). You can either remove all existing hierarchies, or disable this test case (i.e., --gtest_filter=-MesosContainerizerSlaveRecoveryTest.*). - F0811 17:23:43.874696 12955 mesos.cpp:774] CHECK_SOME(cgroups): '/tmp/mesos_test_cgroup/perf_event' is not a valid hierarchy *** Check failure stack trace: *** @ 0x7fb2fb4835fd google::LogMessage::Fail() @ 0x7fb2fb48543d google::LogMessage::SendToLog() @ 0x7fb2fb4831ec google::LogMessage::Flush() @ 0x7fb2fb485d39 google::LogMessageFatal::~LogMessageFatal() @ 0x4e3f98 _CheckFatal::~_CheckFatal() @ 0x82f25a mesos::internal::tests::ContainerizerTest::TearDown() @ 0xc030e3 testing::internal::HandleExceptionsInMethodIfSupported() @ 0xbf9050 testing::Test::Run() @ 0xbf912e testing::TestInfo::Run() @ 0xbf9235 testing::TestCase::Run() @ 0xbf94e8 testing::internal::UnitTestImpl::RunAllTests() @ 0xbf97a4 testing::UnitTest::Run() @ 0x4a9df3 main @ 0x7fb2f9371ec5 (unknown) @ 0x4b63ee (unknown) Build step 'Execute shell' marked build as failure was: CHECK in clean up of ContainerizerTest causes test harness to abort rather than fail or skip only perf related tests. [ RUN ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch [ OK ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch (628 ms) [--] 24 tests from SlaveRecoveryTest/0 (38986 ms total) [--] 4 tests from MesosContainerizerSlaveRecoveryTest [ RUN ] MesosContainerizerSlaveRecoveryTest.ResourceStatistics ../../src/tests/mesos.cpp:720: Failure cgroups::mount(hierarchy, subsystem): 'perf_event' is already attached to another hierarchy - We cannot run any cgroups tests that require a hierarchy with subsystem 'perf_event' because we failed to find an existing hierarchy or create a new one (tried '/tmp/mesos_test_cgroup/perf_event'). You can either remove all existing hierarchies, or disable this test case (i.e., --gtest_filter=-MesosContainerizerSlaveRecoveryTest.*). - F0811 17:23:43.874696 12955 mesos.cpp:774] CHECK_SOME(cgroups): '/tmp/mesos_test_cgroup/perf_event' is not a valid hierarchy *** Check failure stack trace: *** @ 0x7fb2fb4835fd google::LogMessage::Fail() @ 0x7fb2fb48543d google::LogMessage::SendToLog() @ 0x7fb2fb4831ec google::LogMessage::Flush() @ 0x7fb2fb485d39 google::LogMessageFatal::~LogMessageFatal() @ 0x4e3f98 _CheckFatal::~_CheckFatal() @ 0x82f25a mesos::internal::tests::ContainerizerTest::TearDown() @ 0xc030e3 testing::internal::HandleExceptionsInMethodIfSupported() @ 0xbf9050 testing::Test::Run() @ 0xbf912e testing::TestInfo::Run() @ 0xbf9235 testing::TestCase::Run() @ 0xbf94e8 testing::internal::UnitTestImpl::RunAllTests() @ 0xbf97a4 testing::UnitTest::Run() @ 0x4a9df3 main @ 0x7fb2f9371ec5 (unknown) @ 0x4b63ee (unknown) Build step 'Execute shell' marked build as failure Cgroup CHECK fails test harness --- Key: MESOS-3254 URL: https://issues.apache.org/jira/browse/MESOS-3254 Project: Mesos Issue Type: Bug Components: test Reporter: Paul Brett CHECK in clean up of ContainerizerTest causes test harness to abort rather than fail or skip only perf related tests. [ RUN ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch [ OK ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch (628 ms) [--] 24 tests from SlaveRecoveryTest/0
[jira] [Commented] (MESOS-2249) Mesos entities should be able to use IPv6 and IPv4 in the same time
[ https://issues.apache.org/jira/browse/MESOS-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694572#comment-14694572 ] Parv Oberoi commented on MESOS-2249: Hi, I just wanted to understand what is the priority for this issue. Also if i could get some pointers as to where to start would be great. Thanks, parv Mesos entities should be able to use IPv6 and IPv4 in the same time --- Key: MESOS-2249 URL: https://issues.apache.org/jira/browse/MESOS-2249 Project: Mesos Issue Type: Task Reporter: Evelina Dumitrescu Assignee: Evelina Dumitrescu Each Mesos entity should be able to bind on both Ipv4 and Ipv6 and let the enitity that wants to connect to decide which protocol to use. For example, we can have a slave that wants to use IPv4 and another one that wants to use IPv6, so the master should bind on both. In consequence, I want to propose in process.cpp to have two Node fields, one for each type of endpoint. It might be better that the field for Ipv6 to be an Option, because the stack might not support IPv6(eg: the kernel si not compiled with Ipv6 support). Also, UPID will contain two fields of Node, for each type of protocol. For the HTTP endpoints, whenever a request is done, the entities should try firstly to connect on IPv4 and if the connection fails, to try to use IPv6, or vice versa. We could let the user set up which policy to use. I think in this context it does not matter which protocol is used. I saw this approach in various projects: http://www.perforce.com/perforce/r13.1/manuals/cmdref/env.P4PORT.html (tcp4to6(Attempt to listen/connect to an IPv4 address. If this fails, try IPv6.) and tcp6to4(Attempt to listen/connect to an IPv6 address. If this fails, try IPv4.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2562) 0.24.0 release
[ https://issues.apache.org/jira/browse/MESOS-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694564#comment-14694564 ] Klaus Ma commented on MESOS-2562: - [~vi...@twitter.com], there is a ticket on Dynamic Reservation Example (MESOS-3063) in my hand (Reviewable); will it also be included in 0.24? 0.24.0 release -- Key: MESOS-2562 URL: https://issues.apache.org/jira/browse/MESOS-2562 Project: Mesos Issue Type: Task Reporter: Kapil Arya Assignee: Vinod Kone The main feature of this release is going to be v1 (beta) release of the HTTP scheduler API (part of MESOS-2288 epic). Unresolved issues tracker: https://issues.apache.org/jira/issues/?jql=project%20%3D%20MESOS%20AND%20status%20!%3D%20Resolved%20AND%20%22Target%20Version%2Fs%22%20%3D%200.24.0%20ORDER%20BY%20status%20DESC -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2562) 0.24.0 release
[ https://issues.apache.org/jira/browse/MESOS-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694589#comment-14694589 ] Klaus Ma commented on MESOS-2562: - Sure, I'll ping him to get it committed. Thanks. 0.24.0 release -- Key: MESOS-2562 URL: https://issues.apache.org/jira/browse/MESOS-2562 Project: Mesos Issue Type: Task Reporter: Kapil Arya Assignee: Vinod Kone The main feature of this release is going to be v1 (beta) release of the HTTP scheduler API (part of MESOS-2288 epic). Unresolved issues tracker: https://issues.apache.org/jira/issues/?jql=project%20%3D%20MESOS%20AND%20status%20!%3D%20Resolved%20AND%20%22Target%20Version%2Fs%22%20%3D%200.24.0%20ORDER%20BY%20status%20DESC -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2562) 0.24.0 release
[ https://issues.apache.org/jira/browse/MESOS-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694579#comment-14694579 ] Vinod Kone commented on MESOS-2562: --- You should work with your shepherd to get it committed. Unfortunately I don't have cycles for reviewing non-HTTP API stuff. 0.24.0 release -- Key: MESOS-2562 URL: https://issues.apache.org/jira/browse/MESOS-2562 Project: Mesos Issue Type: Task Reporter: Kapil Arya Assignee: Vinod Kone The main feature of this release is going to be v1 (beta) release of the HTTP scheduler API (part of MESOS-2288 epic). Unresolved issues tracker: https://issues.apache.org/jira/issues/?jql=project%20%3D%20MESOS%20AND%20status%20!%3D%20Resolved%20AND%20%22Target%20Version%2Fs%22%20%3D%200.24.0%20ORDER%20BY%20status%20DESC -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3251) http::get API evaluates host wrongly
[ https://issues.apache.org/jira/browse/MESOS-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jojy Varghese updated MESOS-3251: - Labels: mesosphere (was: ) http::get API evaluates host wrongly -- Key: MESOS-3251 URL: https://issues.apache.org/jira/browse/MESOS-3251 Project: Mesos Issue Type: Bug Components: libprocess Reporter: Jojy Varghese Assignee: Jojy Varghese Labels: mesosphere Currently libprocess http API sets the Host header field from the peer socket address (IP:port). The problem is that socket address might not be right HTTP server and might be just a proxy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3185) Refactor Subprocess logic in linux/perf.cpp to use common subroutine
[ https://issues.apache.org/jira/browse/MESOS-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694498#comment-14694498 ] Paul Brett commented on MESOS-3185: --- Updated, reviews are: https://reviews.apache.org/r/37423/ https://reviews.apache.org/r/37424/ https://reviews.apache.org/r/37417/ https://reviews.apache.org/r/37416/ Refactor Subprocess logic in linux/perf.cpp to use common subroutine Key: MESOS-3185 URL: https://issues.apache.org/jira/browse/MESOS-3185 Project: Mesos Issue Type: Bug Components: slave Reporter: Paul Brett Assignee: Paul Brett MESOS-2834 will enhance the perf isolator to support the different output formats provided by difference kernel versions. In order to achieve this, it requires to execute the perf --version command. We should decompose the existing Subcommand processing in perf so that we can share the implementation between the multiple uses of perf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3201) Libev handle_async can deadlock with run_in_event_loop
[ https://issues.apache.org/jira/browse/MESOS-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693771#comment-14693771 ] Vinod Kone commented on MESOS-3201: --- I'm assuming this is important for 24.0 release? If yes, can you please commit it? Libev handle_async can deadlock with run_in_event_loop -- Key: MESOS-3201 URL: https://issues.apache.org/jira/browse/MESOS-3201 Project: Mesos Issue Type: Bug Components: libprocess Affects Versions: 0.23.0 Reporter: Joris Van Remoortere Assignee: Joris Van Remoortere Priority: Blocker Labels: libprocess, mesosphere Due to the arbitrary nature of the functions that are executed in handle_async, invoking them under the (A) {{watchers_mutex}} can lead to deadlocks if (B) is acquired before calling {{run_in_event_loop}} and (B) is also acquired within the arbitrary function. {code} ==82679== Thread #10: lock order 0x60774F8 before 0x60768C0 violated ==82679== ==82679== Observed (incorrect) order is: acquisition of lock at 0x60768C0 ==82679==at 0x4C32145: pthread_mutex_lock (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==82679==by 0x692C9B: __gthread_mutex_lock(pthread_mutex_t*) (gthr-default.h:748) ==82679==by 0x6950BF: std::mutex::lock() (mutex:134) ==82679==by 0x696219: Synchronizedstd::mutex synchronizestd::mutex(std::mutex*)::{lambda(std::mutex*)#1}::operator()(std::mutex*) const (synchronized.hpp:58) ==82679==by 0x696238: Synchronizedstd::mutex synchronizestd::mutex(std::mutex*)::{lambda(std::mutex*)#1}::_FUN(std::mutex*) (synchronized.hpp:58) ==82679==by 0x6984CF: Synchronizedstd::mutex::Synchronized(std::mutex*, void (*)(std::mutex*), void (*)(std::mutex*)) (synchronized.hpp:35) ==82679==by 0x6962DE: Synchronizedstd::mutex synchronizestd::mutex(std::mutex*) (synchronized.hpp:60) ==82679==by 0x728FE1: process::handle_async(ev_loop*, ev_async*, int) (libev.cpp:48) ==82679==by 0x761384: ev_invoke_pending (ev.c:2994) ==82679==by 0x7643C4: ev_run (ev.c:3394) ==82679==by 0x728E37: ev_loop (ev.h:826) ==82679==by 0x729469: process::EventLoop::run() (libev.cpp:135) ==82679== ==82679== followed by a later acquisition of lock at 0x60774F8 ==82679==at 0x4C32145: pthread_mutex_lock (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==82679==by 0x4C6F9D: __gthread_mutex_lock(pthread_mutex_t*) (gthr-default.h:748) ==82679==by 0x4C6FED: __gthread_recursive_mutex_lock(pthread_mutex_t*) (gthr-default.h:810) ==82679==by 0x4F5D3D: std::recursive_mutex::lock() (mutex:175) ==82679==by 0x516513: Synchronizedstd::recursive_mutex synchronizestd::recursive_mutex(std::recursive_mutex*)::{lambda(std::recursive_mutex*)#1}::operator()(std::recursive_mutex*) const (synchronized.hpp:58) ==82679==by 0x516532: Synchronizedstd::recursive_mutex synchronizestd::recursive_mutex(std::recursive_mutex*)::{lambda(std::recursive_mutex*)#1}::_FUN(std::recursive_mutex*) (synchronized.hpp:58) ==82679==by 0x52E619: Synchronizedstd::recursive_mutex::Synchronized(std::recursive_mutex*, void (*)(std::recursive_mutex*), void (*)(std::recursive_mutex*)) (synchronized.hpp:35) ==82679==by 0x5165D4: Synchronizedstd::recursive_mutex synchronizestd::recursive_mutex(std::recursive_mutex*) (synchronized.hpp:60) ==82679==by 0x6BF4E1: process::ProcessManager::use(process::UPID const) (process.cpp:2127) ==82679==by 0x6C2B8C: process::ProcessManager::terminate(process::UPID const, bool, process::ProcessBase*) (process.cpp:2604) ==82679==by 0x6C6C3C: process::terminate(process::UPID const, bool) (process.cpp:3107) ==82679==by 0x692B65: process::Latch::trigger() (latch.cpp:53) {code} This was introduced in https://github.com/apache/mesos/commit/849fc4d361e40062073324153ba97e98e294fdf2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2562) 0.24.0 release
[ https://issues.apache.org/jira/browse/MESOS-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2562: -- Story Points: 5 0.24.0 release -- Key: MESOS-2562 URL: https://issues.apache.org/jira/browse/MESOS-2562 Project: Mesos Issue Type: Task Reporter: Kapil Arya Assignee: Vinod Kone The main feature of this release is going to be v1 (beta) release of the HTTP scheduler API (part of MESOS-2288 epic). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1010) Python extension build is broken if gflags-dev is installed
[ https://issues.apache.org/jira/browse/MESOS-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-1010: - Environment: Fedora 20, amd64, GCC: 4.8.2; OSX 10.10.4, Apple LLVM 6.1.0 (~LLVM 3.6.0) (was: Fedora 20, amd64, GCC: 4.8.2; OSX Yosemite, Apple LLVM 6.1.0 (~LLVM 3.6.0)) Python extension build is broken if gflags-dev is installed --- Key: MESOS-1010 URL: https://issues.apache.org/jira/browse/MESOS-1010 Project: Mesos Issue Type: Bug Components: build, python api Environment: Fedora 20, amd64, GCC: 4.8.2; OSX 10.10.4, Apple LLVM 6.1.0 (~LLVM 3.6.0) Reporter: Nikita Vetoshkin Assignee: Greg Mann Labels: flaky-test, mesosphere In my environment mesos build from master results in broken python api module {{_mesos.so}}: {noformat} nekto0n@ya-darkstar ~/workspace/mesos/src/python $ PYTHONPATH=build/lib.linux-x86_64-2.7/ python -c import _mesos Traceback (most recent call last): File string, line 1, in module ImportError: /home/nekto0n/workspace/mesos/src/python/build/lib.linux-x86_64-2.7/_mesos.so: undefined symbol: _ZN6google14FlagRegistererC1EPKcS2_S2_S2_PvS3_ {noformat} Unmangled version of symbol looks like this: {noformat} google::FlagRegisterer::FlagRegisterer(char const*, char const*, char const*, char const*, void*, void*) {noformat} During {{./configure}} step {{glog}} finds {{gflags}} development files and starts using them, thus *implicitly* adding dependency on {{libgflags.so}}. This breaks Python extensions module and perhaps can break other mesos subsystems when moved to hosts without {{gflags}} installed. This task is done when the ExamplesTest.PythonFramework test will pass on a system with gflags installed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3149) Use setuptools to install python cli package
[ https://issues.apache.org/jira/browse/MESOS-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3149: -- Shepherd: Till Toenshoff Use setuptools to install python cli package Key: MESOS-3149 URL: https://issues.apache.org/jira/browse/MESOS-3149 Project: Mesos Issue Type: Task Reporter: haosdent Assignee: haosdent mesos-ps/mesos-cat which depends on src/cli/python/mesos could not work in OSX because src/cli/python is not installed to sys.path. It's time to finish this TODO. {code} # Add 'src/cli/python' to PYTHONPATH. # TODO(benh): Remove this if/when we install the 'mesos' module via # PIP and setuptools. PYTHONPATH=@abs_top_srcdir@/src/cli/python:${PYTHONPATH} {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-830) ExamplesTest.JavaFramework is flaky
[ https://issues.apache.org/jira/browse/MESOS-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-830: Shepherd: Till Toenshoff ExamplesTest.JavaFramework is flaky --- Key: MESOS-830 URL: https://issues.apache.org/jira/browse/MESOS-830 Project: Mesos Issue Type: Bug Components: test Reporter: Vinod Kone Assignee: Greg Mann Labels: flaky, mesosphere Identify the cause of the following test failure: [ RUN ] ExamplesTest.JavaFramework Using temporary directory '/tmp/ExamplesTest_JavaFramework_wSc7u8' Enabling authentication for the framework I1120 15:13:39.820032 1681264640 master.cpp:285] Master started on 172.25.133.171:52576 I1120 15:13:39.820180 1681264640 master.cpp:299] Master ID: 201311201513-2877626796-52576-3234 I1120 15:13:39.820194 1681264640 master.cpp:302] Master only allowing authenticated frameworks to register! I1120 15:13:39.821197 1679654912 slave.cpp:112] Slave started on 1)@172.25.133.171:52576 I1120 15:13:39.821795 1679654912 slave.cpp:212] Slave resources: cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.822855 1682337792 slave.cpp:112] Slave started on 2)@172.25.133.171:52576 I1120 15:13:39.823652 1682337792 slave.cpp:212] Slave resources: cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.825330 1679118336 master.cpp:744] The newly elected leader is master@172.25.133.171:52576 I1120 15:13:39.825445 1679118336 master.cpp:748] Elected as the leading master! I1120 15:13:39.825907 1681264640 state.cpp:33] Recovering state from '/tmp/ExamplesTest_JavaFramework_wSc7u8/0/meta' I1120 15:13:39.826127 1681264640 status_update_manager.cpp:180] Recovering status update manager I1120 15:13:39.826331 1681801216 process_isolator.cpp:317] Recovering isolator I1120 15:13:39.826738 1682874368 slave.cpp:2743] Finished recovery I1120 15:13:39.827747 1682337792 state.cpp:33] Recovering state from '/tmp/ExamplesTest_JavaFramework_wSc7u8/1/meta' I1120 15:13:39.827945 1680191488 slave.cpp:112] Slave started on 3)@172.25.133.171:52576 I1120 15:13:39.828415 1682337792 status_update_manager.cpp:180] Recovering status update manager I1120 15:13:39.828608 1680728064 sched.cpp:260] Authenticating with master master@172.25.133.171:52576 I1120 15:13:39.828606 1680191488 slave.cpp:212] Slave resources: cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.828680 1682874368 slave.cpp:497] New master detected at master@172.25.133.171:52576 I1120 15:13:39.828765 1682337792 process_isolator.cpp:317] Recovering isolator I1120 15:13:39.829828 1680728064 sched.cpp:229] Detecting new master I1120 15:13:39.830288 1679654912 authenticatee.hpp:100] Initializing client SASL I1120 15:13:39.831635 1680191488 state.cpp:33] Recovering state from '/tmp/ExamplesTest_JavaFramework_wSc7u8/2/meta' I1120 15:13:39.831991 1679118336 status_update_manager.cpp:158] New master detected at master@172.25.133.171:52576 I1120 15:13:39.832042 1682874368 slave.cpp:524] Detecting new master I1120 15:13:39.832314 1682337792 slave.cpp:2743] Finished recovery I1120 15:13:39.832309 1681264640 master.cpp:1266] Attempting to register slave on vkone.local at slave(1)@172.25.133.171:52576 I1120 15:13:39.832929 1680728064 status_update_manager.cpp:180] Recovering status update manager I1120 15:13:39.833371 1681801216 slave.cpp:497] New master detected at master@172.25.133.171:52576 I1120 15:13:39.833273 1681264640 master.cpp:2513] Adding slave 201311201513-2877626796-52576-3234-0 at vkone.local with cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.833595 1680728064 process_isolator.cpp:317] Recovering isolator I1120 15:13:39.833859 1681801216 slave.cpp:524] Detecting new master I1120 15:13:39.833861 1682874368 status_update_manager.cpp:158] New master detected at master@172.25.133.171:52576 I1120 15:13:39.834092 1680191488 slave.cpp:542] Registered with master master@172.25.133.171:52576; given slave ID 201311201513-2877626796-52576-3234-0 I1120 15:13:39.834486 1681264640 master.cpp:1266] Attempting to register slave on vkone.local at slave(2)@172.25.133.171:52576 I1120 15:13:39.834549 1681264640 master.cpp:2513] Adding slave 201311201513-2877626796-52576-3234-1 at vkone.local with cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.834750 1680191488 slave.cpp:555] Checkpointing SlaveInfo to '/tmp/ExamplesTest_JavaFramework_wSc7u8/0/meta/slaves/201311201513-2877626796-52576-3234-0/slave.info' I1120 15:13:39.834875 1682874368 hierarchical_allocator_process.hpp:445] Added slave 201311201513-2877626796-52576-3234-0 (vkone.local) with cpus(*):4; mem(*):7168; disk(*):481998;
[jira] [Updated] (MESOS-1010) Python extension build is broken if gflags-dev is installed
[ https://issues.apache.org/jira/browse/MESOS-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-1010: - Shepherd: Till Toenshoff Python extension build is broken if gflags-dev is installed --- Key: MESOS-1010 URL: https://issues.apache.org/jira/browse/MESOS-1010 Project: Mesos Issue Type: Bug Components: build, python api Environment: Fedora 20, amd64, GCC: 4.8.2; OSX Yosemite, Apple LLVM 6.1.0 (~LLVM 3.6.0) Reporter: Nikita Vetoshkin Assignee: Greg Mann Labels: flaky-test, mesosphere In my environment mesos build from master results in broken python api module {{_mesos.so}}: {noformat} nekto0n@ya-darkstar ~/workspace/mesos/src/python $ PYTHONPATH=build/lib.linux-x86_64-2.7/ python -c import _mesos Traceback (most recent call last): File string, line 1, in module ImportError: /home/nekto0n/workspace/mesos/src/python/build/lib.linux-x86_64-2.7/_mesos.so: undefined symbol: _ZN6google14FlagRegistererC1EPKcS2_S2_S2_PvS3_ {noformat} Unmangled version of symbol looks like this: {noformat} google::FlagRegisterer::FlagRegisterer(char const*, char const*, char const*, char const*, void*, void*) {noformat} During {{./configure}} step {{glog}} finds {{gflags}} development files and starts using them, thus *implicitly* adding dependency on {{libgflags.so}}. This breaks Python extensions module and perhaps can break other mesos subsystems when moved to hosts without {{gflags}} installed. This task is done when the ExamplesTest.PythonFramework test will pass on a system with gflags installed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-3001) Create a demo HTTP API client
[ https://issues.apache.org/jira/browse/MESOS-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Jimenez reassigned MESOS-3001: - Assignee: Isabel Jimenez (was: Marco Massenzio) Create a demo HTTP API client --- Key: MESOS-3001 URL: https://issues.apache.org/jira/browse/MESOS-3001 Project: Mesos Issue Type: Bug Components: framework Reporter: Marco Massenzio Assignee: Isabel Jimenez Labels: mesosphere We want to create a simple demo HTTP API Client (in Java or Python) that can serve as an example framework for people who will want to use the new API for their Frameworks. The scope should be fairly limited (eg, launching a simple Container task?) but sufficient to exercise most of the new API endpoint messages/capabilities. Scope: TBD Non-Goals: - create a best-of-breed Framework to deliver any specific functionality; - create an Integration Test for the HTTP API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-2860) Create the basic infrastructure to handle /call endpoint
[ https://issues.apache.org/jira/browse/MESOS-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14607993#comment-14607993 ] Isabel Jimenez edited comment on MESOS-2860 at 8/12/15 5:30 PM: submitted: https://reviews.apache.org/r/36040/ https://reviews.apache.org/r/36360/ https://reviews.apache.org/r/36328/ https://reviews.apache.org/r/35934/ https://reviews.apache.org/r/35939/ https://reviews.apache.org/r/36073/ https://reviews.apache.org/r/36072/ https://reviews.apache.org/r/36402/ https://reviews.apache.org/r/37097 https://reviews.apache.org/r/36624/ discarded or split: https://reviews.apache.org/r/36217/ https://reviews.apache.org/r/36037/ was (Author: ijimenez): submitted: https://reviews.apache.org/r/36040/ https://reviews.apache.org/r/36360/ https://reviews.apache.org/r/36328/ https://reviews.apache.org/r/35934/ https://reviews.apache.org/r/35939/ https://reviews.apache.org/r/36073/ https://reviews.apache.org/r/36072/ reviewable: https://reviews.apache.org/r/36402/ https://reviews.apache.org/r/37097 https://reviews.apache.org/r/36624/ https://reviews.apache.org/r/36040/ discarded or split: https://reviews.apache.org/r/36217/ https://reviews.apache.org/r/36037/ Create the basic infrastructure to handle /call endpoint Key: MESOS-2860 URL: https://issues.apache.org/jira/browse/MESOS-2860 Project: Mesos Issue Type: Story Components: master Reporter: Marco Massenzio Assignee: Isabel Jimenez Labels: mesosphere This is the first basic step in ensuring the basic {{/call}} functionality: processing a {noformat} POST /call {noformat} and returning: - {{202}} if all goes well; - {{401}} if not authorized; and - {{403}} if the request is malformed. We'll get more sophisticated as the work progressed (eg, supporting {{415}} if the content-type is not of the right kind). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2497) Create synchronous validations for Calls
[ https://issues.apache.org/jira/browse/MESOS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693963#comment-14693963 ] Isabel Jimenez commented on MESOS-2497: --- https://reviews.apache.org/r/37403/ https://reviews.apache.org/r/37405/ Create synchronous validations for Calls Key: MESOS-2497 URL: https://issues.apache.org/jira/browse/MESOS-2497 Project: Mesos Issue Type: Bug Reporter: Isabel Jimenez Assignee: Isabel Jimenez Labels: HTTP, mesosphere /call endpoint will return a 202 accepted code but has to do some basic validations before. In case of invalidation it will return a 4xx code. We have to create a mechanism that will validate the 'request' and send back the appropriate code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2562) 0.24.0 release
[ https://issues.apache.org/jira/browse/MESOS-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2562: -- Description: The main feature of this release is going to be v1 (beta) release of the HTTP scheduler API (part of MESOS-2288 epic). Unresolved issues tracker: https://issues.apache.org/jira/issues/?jql=project%20%3D%20MESOS%20AND%20status%20!%3D%20Resolved%20AND%20%22Target%20Version%2Fs%22%20%3D%200.24.0%20ORDER%20BY%20status%20DESC was:The main feature of this release is going to be v1 (beta) release of the HTTP scheduler API (part of MESOS-2288 epic). 0.24.0 release -- Key: MESOS-2562 URL: https://issues.apache.org/jira/browse/MESOS-2562 Project: Mesos Issue Type: Task Reporter: Kapil Arya Assignee: Vinod Kone The main feature of this release is going to be v1 (beta) release of the HTTP scheduler API (part of MESOS-2288 epic). Unresolved issues tracker: https://issues.apache.org/jira/issues/?jql=project%20%3D%20MESOS%20AND%20status%20!%3D%20Resolved%20AND%20%22Target%20Version%2Fs%22%20%3D%200.24.0%20ORDER%20BY%20status%20DESC -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3252) Ignore no statistics condition for containers with no qdisc
[ https://issues.apache.org/jira/browse/MESOS-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-3252: -- Sprint: Twitter Mesos Q3 Sprint 3 Story Points: 2 Ignore no statistics condition for containers with no qdisc --- Key: MESOS-3252 URL: https://issues.apache.org/jira/browse/MESOS-3252 Project: Mesos Issue Type: Bug Components: isolation Reporter: Paul Brett In PortMappingStatistics::execute, we log the following errors to stderr if the egress rate limiting qdiscs are not configured inside the container. {code} Failed to get the network statistics for the htb qdisc on eth0 Failed to get the network statistics for the fq_codel qdisc on eth0 {code} This can occur because of an error reading the qdisc (statistics function return an error) or because the qdisc does not exist (function returns none). We should not log an error when the qdisc does not exist since this is normal behaviour if the container is created without rate limiting. We do not want to gate this function on the slave rate limiting flag since we would have to compare the behaviour against the flag value at the time the container was created. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3252) Ignore no statistics condition for containers with no qdisc
[ https://issues.apache.org/jira/browse/MESOS-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-3252: -- Assignee: Paul Brett Ignore no statistics condition for containers with no qdisc --- Key: MESOS-3252 URL: https://issues.apache.org/jira/browse/MESOS-3252 Project: Mesos Issue Type: Bug Components: isolation Reporter: Paul Brett Assignee: Paul Brett In PortMappingStatistics::execute, we log the following errors to stderr if the egress rate limiting qdiscs are not configured inside the container. {code} Failed to get the network statistics for the htb qdisc on eth0 Failed to get the network statistics for the fq_codel qdisc on eth0 {code} This can occur because of an error reading the qdisc (statistics function return an error) or because the qdisc does not exist (function returns none). We should not log an error when the qdisc does not exist since this is normal behaviour if the container is created without rate limiting. We do not want to gate this function on the slave rate limiting flag since we would have to compare the behaviour against the flag value at the time the container was created. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3001) Create a demo HTTP API client
[ https://issues.apache.org/jira/browse/MESOS-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Jimenez updated MESOS-3001: -- Description: We want to create a simple demo HTTP API Client (in Java, Python or Go) that can serve as an example framework for people who will want to use the new API for their Frameworks. The scope should be fairly limited (eg, launching a simple Container task?) but sufficient to exercise most of the new API endpoint messages/capabilities. Scope: TBD Non-Goals: - create a best-of-breed Framework to deliver any specific functionality; - create an Integration Test for the HTTP API. was: We want to create a simple demo HTTP API Client (in Java or Python) that can serve as an example framework for people who will want to use the new API for their Frameworks. The scope should be fairly limited (eg, launching a simple Container task?) but sufficient to exercise most of the new API endpoint messages/capabilities. Scope: TBD Non-Goals: - create a best-of-breed Framework to deliver any specific functionality; - create an Integration Test for the HTTP API. Create a demo HTTP API client --- Key: MESOS-3001 URL: https://issues.apache.org/jira/browse/MESOS-3001 Project: Mesos Issue Type: Bug Components: framework Reporter: Marco Massenzio Assignee: Isabel Jimenez Labels: mesosphere We want to create a simple demo HTTP API Client (in Java, Python or Go) that can serve as an example framework for people who will want to use the new API for their Frameworks. The scope should be fairly limited (eg, launching a simple Container task?) but sufficient to exercise most of the new API endpoint messages/capabilities. Scope: TBD Non-Goals: - create a best-of-breed Framework to deliver any specific functionality; - create an Integration Test for the HTTP API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)