> On April 19, 2016, 2:35 p.m., Neil Conway wrote: > > This patch does not solve the flakiness for me: failed once after 2 > > iterations, then again after 77 iterations. Verbose test log here: > > https://gist.github.com/neilconway/e6134b4717ee022e7fc32a1f95619fa9 > > haosdent huang wrote: > Thank you very much for your test! I saw you use `vagrant@archlinux`, may > you share your vagrantfile to me? So that I could try to reproduce in my > local.
``` I0420 00:33:13.497138 15400 http.cpp:313] HTTP GET for /master/state from 10.0.2.15:44478 Received task health update, healthy: true I0420 00:33:13.502598 15400 slave.cpp:3201] Handling status update TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in health state healthy of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 from executor(1)@10.0.2.15:37107 I0420 00:33:13.504456 15400 status_update_manager.cpp:320] Received status update TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in health state healthy of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 I0420 00:33:13.505009 15400 slave.cpp:3599] Forwarding the update TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in health state healthy of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 to [email protected]:41408 I0420 00:33:13.505167 15400 slave.cpp:3509] Sending acknowledgement for status update TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in health state healthy of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 to executor(1)@10.0.2.15:37107 I0420 00:33:13.505524 15400 master.cpp:5069] Status update TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in health state healthy of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 from agent 7cf5923c-3d03-4ed6-826a-efa97f54e765-S0 at slave(76)@10.0.2.15:41408 (archlinux.vagrant.vm) I0420 00:33:13.505602 15400 master.cpp:5117] Forwarding status update TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in health state healthy of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 I0420 00:33:13.505738 15400 master.cpp:6725] Updating the state of task 1 of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 (latest state: TASK_RUNNING, status update state: TASK_RUNNING) I0420 00:33:13.505985 15400 master.cpp:4224] Processing ACKNOWLEDGE call e19c76cc-096a-4398-b616-afb628b8e5b8 for task 1 of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 (default) at [email protected]:41408 on agent 7cf5923c-3d03-4ed6-826a-efa97f54e765-S0 I0420 00:33:13.506142 15400 status_update_manager.cpp:392] Received status update acknowledgement (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 rm: cannot remove '/tmp/1NKfr1': No such file or directory I0420 00:33:13.508203 15400 http.cpp:178] HTTP GET for /slave(76)/state from 10.0.2.15:44482 ../../mesos/src/tests/health_check_tests.cpp:647: Failure Value of: (find).get() Actual: 16-byte object <05-00 00-00 00-00 00-00 90-C4 2D-03 00-00 00-00> Expected: false Which is: false *** Aborted at 1461076393 (unix time) try "date -d @1461076393" if you are using GNU date *** PC: @ 0x1899ba0 testing::UnitTest::AddTestPartResult() *** SIGSEGV (@0x0) received by PID 15381 (TID 0x7f0aa958a7c0) from PID 0; stack trace: *** ``` It looks like get `true` here. Let me try how to fix this. - haosdent ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/46307/#review129534 ----------------------------------------------------------- On April 17, 2016, 5:15 p.m., haosdent huang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/46307/ > ----------------------------------------------------------- > > (Updated April 17, 2016, 5:15 p.m.) > > > Review request for mesos, Alexander Rukletsov, Ben Mahler, Greg Mann, Neil > Conway, and Timothy Chen. > > > Bugs: MESOS-1802 > https://issues.apache.org/jira/browse/MESOS-1802 > > > Repository: mesos > > > Description > ------- > > We need to ignore subsequent status updates in HealthStatusChange > tests. In our test cases, we set `consecutive_failures` to 3 in > HealthCheck message definition. But the counter for > `consecutiveFailures` in `mesos-health-check` would be reset to 0 > after a success check. It is possible to continue to receive status > updates before we stop the driver. > > > Diffs > ----- > > src/tests/health_check_tests.cpp 1c4a554ab07731963a4a38e3ae40b0323bf317bb > > Diff: https://reviews.apache.org/r/46307/diff/ > > > Testing > ------- > > # I still could not reproduce the problem in old code after repeatedly tests. > So seems no way to verify whether my assumption is correct or not. > > > Thanks, > > haosdent huang > >
