----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/56895/#review166284 -----------------------------------------------------------
A few superficial suggestions attached -- will take a closer look shortly. I see this intermittently: ``` libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: recursive_mutex lock failed: Invalid argument *** Aborted at 1487726754 (unix time) try "date -d @1487726754" if you are using GNU date *** PC: @ 0x7fff9271cdd6 __pthread_kill *** SIGABRT (@0x7fff9271cdd6) received by PID 90593 (TID 0x70000362d000) stack trace: *** @ 0x7fff927fbbba _sigtramp @ 0x41221cd4c (unknown) @ 0x7fff92682420 abort @ 0x7fff911dd85a abort_message @ 0x7fff91202c37 default_terminate_handler() @ 0x7fff91d0cf33 _objc_terminate() @ 0x7fff911ffd69 std::__terminate() @ 0x7fff911ff7de __cxa_throw @ 0x7fff911cd441 std::__1::__throw_system_error() @ 0x112631a49 _ZZ11synchronizeINSt3__115recursive_mutexEE12SynchronizedIT_EPS3_ENKUlPS1_E_clES6_ @ 0x112631a28 _ZZ11synchronizeINSt3__115recursive_mutexEE12SynchronizedIT_EPS3_ENUlPS1_E_8__invokeES6_ @ 0x112631af9 Synchronized<>::Synchronized() @ 0x1126319fd Synchronized<>::Synchronized() @ 0x1125fe49a synchronize<>() @ 0x1155690f2 process::ProcessBase::enqueue() @ 0x115583aad process::ProcessManager::deliver() @ 0x115583696 process::ProcessManager::deliver() @ 0x115591aa2 process::internal::dispatch() @ 0x11386f147 process::dispatch<>() @ 0x11386f04f _ZZN7process5delayIN5mesos8internal5slave5SlaveEEENS_5TimerERK8DurationRKNS_3PIDIT_EEMSA_FvvEENKUlvE_clEv @ 0x11386f00d _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN7process5delayIN5mesos8internal5slave5SlaveEEENS3_5TimerERK8DurationRKNS3_3PIDIT_EEMSE_FvvEEUlvE_EEEvDpOT_ @ 0x11386edc9 _ZNSt3__110__function6__funcIZN7process5delayIN5mesos8internal5slave5SlaveEEENS2_5TimerERK8DurationRKNS2_3PIDIT_EEMSD_FvvEEUlvE_NS_9allocatorISJ_EEFvvEEclEv @ 0x11223765e std::__1::function<>::operator()() @ 0x11554e489 process::Timer::operator()() @ 0x11554e1c9 process::timedout() @ 0x11560df89 _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRNS_6__bindIPFvRKNS_4listIN7process5TimerENS_9allocatorIS6_EEEEEJRNS_12placeholders4__phILi1EEEEEESB_EEEvDpOT_ @ 0x11560dc79 _ZNSt3__110__function6__funcINS_6__bindIPFvRKNS_4listIN7process5TimerENS_9allocatorIS5_EEEEEJRNS_12placeholders4__phILi1EEEEEENS6_ISH_EESB_EclESA_ @ 0x112233591 std::__1::function<>::operator()() @ 0x115210067 process::clock::tick() @ 0x11521774f _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRNS_6__bindIRFvRKN7process4TimeEEJS7_EEEEEEvDpOT_ @ 0x1152175c9 _ZNSt3__110__function6__funcINS_6__bindIRFvRKN7process4TimeEEJS6_EEENS_9allocatorIS9_EEFvvEEclEv @ 0x11223765e std::__1::function<>::operator()() ``` Seems like the review bot ran into a similar problem. src/tests/slave_recovery_tests.cpp (line 2310) <https://reviews.apache.org/r/56895/#comment238213> Can we rename `_ack` to something that identifies we're waiting for the _agent_ to see the status update acknowledgment? src/tests/slave_recovery_tests.cpp (line 2401) <https://reviews.apache.org/r/56895/#comment238206> Words in variable names should not be separated with underscores. src/tests/slave_recovery_tests.cpp (line 2402) <https://reviews.apache.org/r/56895/#comment238211> Seems like we should lookup `state.frameworks[frameworkId].executors[executorId]` once and then reuse it. src/tests/slave_recovery_tests.cpp (line 2405) <https://reviews.apache.org/r/56895/#comment238205> Should probably be `EXPECT`, here and below. src/tests/slave_recovery_tests.cpp (line 2420) <https://reviews.apache.org/r/56895/#comment238204> Indentation. - Neil Conway On Feb. 21, 2017, 9:44 p.m., Megha Sharma wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/56895/ > ----------------------------------------------------------- > > (Updated Feb. 21, 2017, 9:44 p.m.) > > > Review request for mesos, Neil Conway and Jiang Yan Xu. > > > Bugs: MESOS-6223 > https://issues.apache.org/jira/browse/MESOS-6223 > > > Repository: mesos > > > Description > ------- > > With partition awareness, the agents are now allowed to re-register > after they have been marked Unreachable. The executors are anyway > terminated on the agent when it reboots so there is no harm in > letting the agent keep its SlaveID, re-register with the master > and reconcile the lost executors. This is a pre-requisite for > supporting persistent/restartable tasks in mesos. > > > Diffs > ----- > > src/slave/slave.hpp 3b0aea4e3e9a17501077beccbccaab4abbe11af2 > src/slave/slave.cpp 7564e8d39530794131dbbc928fcbc59fb65ef471 > src/slave/state.hpp a497ce1f58fb8dc7718ee5bb10bc62dd7479efa5 > src/slave/state.cpp f8e7cdd4df0a3c5d62d89edd11844527084f2baa > src/tests/slave_recovery_tests.cpp 0e295915fea0a7314e173857249bd8726eeccd76 > > Diff: https://reviews.apache.org/r/56895/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Megha Sharma > >