Re: Review Request 46307: Ignored subsequent status update in HealthStatusChange tests.

haosdent huang Tue, 19 Apr 2016 07:51:32 -0700


> On April 19, 2016, 2:35 p.m., Neil Conway wrote:
> > This patch does not solve the flakiness for me: failed once after 2 
> > iterations, then again after 77 iterations. Verbose test log here: 
> > https://gist.github.com/neilconway/e6134b4717ee022e7fc32a1f95619fa9
> 
> haosdent huang wrote:
>     Thank you very much for your test! I saw you use `vagrant@archlinux`, may 
> you share your vagrantfile to me? So that I could try to reproduce in my 
> local.


```
I0420 00:33:13.497138 15400 http.cpp:313] HTTP GET for /master/state from 
10.0.2.15:44478
Received task health update, healthy: true
I0420 00:33:13.502598 15400 slave.cpp:3201] Handling status update TASK_RUNNING 
(UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in health state healthy 
of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 from 
executor(1)@10.0.2.15:37107
I0420 00:33:13.504456 15400 status_update_manager.cpp:320] Received status 
update TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in 
health state healthy of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000
I0420 00:33:13.505009 15400 slave.cpp:3599] Forwarding the update TASK_RUNNING 
(UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in health state healthy 
of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 to [email protected]:41408
I0420 00:33:13.505167 15400 slave.cpp:3509] Sending acknowledgement for status 
update TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in 
health state healthy of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 to 
executor(1)@10.0.2.15:37107
I0420 00:33:13.505524 15400 master.cpp:5069] Status update TASK_RUNNING (UUID: 
e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in health state healthy of 
framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 from agent 
7cf5923c-3d03-4ed6-826a-efa97f54e765-S0 at slave(76)@10.0.2.15:41408 
(archlinux.vagrant.vm)
I0420 00:33:13.505602 15400 master.cpp:5117] Forwarding status update 
TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in health 
state healthy of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000
I0420 00:33:13.505738 15400 master.cpp:6725] Updating the state of task 1 of 
framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 (latest state: 
TASK_RUNNING, status update state: TASK_RUNNING)
I0420 00:33:13.505985 15400 master.cpp:4224] Processing ACKNOWLEDGE call 
e19c76cc-096a-4398-b616-afb628b8e5b8 for task 1 of framework 
7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 (default) at 
[email protected]:41408 on agent 
7cf5923c-3d03-4ed6-826a-efa97f54e765-S0
I0420 00:33:13.506142 15400 status_update_manager.cpp:392] Received status 
update acknowledgement (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 
of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000
rm: cannot remove '/tmp/1NKfr1': No such file or directory
I0420 00:33:13.508203 15400 http.cpp:178] HTTP GET for /slave(76)/state from 
10.0.2.15:44482
../../mesos/src/tests/health_check_tests.cpp:647: Failure
Value of: (find).get()
  Actual: 16-byte object <05-00 00-00 00-00 00-00 90-C4 2D-03 00-00 00-00>
Expected: false
Which is: false
*** Aborted at 1461076393 (unix time) try "date -d @1461076393" if you are 
using GNU date ***
PC: @          0x1899ba0 testing::UnitTest::AddTestPartResult()
*** SIGSEGV (@0x0) received by PID 15381 (TID 0x7f0aa958a7c0) from PID 0; stack 
trace: ***

```
It looks like get `true` here. Let me try how to fix this.


- haosdent


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46307/#review129534
-----------------------------------------------------------


On April 17, 2016, 5:15 p.m., haosdent huang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46307/
> -----------------------------------------------------------
> 
> (Updated April 17, 2016, 5:15 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Ben Mahler, Greg Mann, Neil 
> Conway, and Timothy Chen.
> 
> 
> Bugs: MESOS-1802
>     https://issues.apache.org/jira/browse/MESOS-1802
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> We need to ignore subsequent status updates in HealthStatusChange
> tests. In our test cases, we set `consecutive_failures` to 3 in
> HealthCheck message definition. But the counter for
> `consecutiveFailures` in `mesos-health-check` would be reset to 0
> after a success check. It is possible to continue to receive status
> updates before we stop the driver.
> 
> 
> Diffs
> -----
> 
>   src/tests/health_check_tests.cpp 1c4a554ab07731963a4a38e3ae40b0323bf317bb 
> 
> Diff: https://reviews.apache.org/r/46307/diff/
> 
> 
> Testing
> -------
> 
> # I still could not reproduce the problem in old code after repeatedly tests. 
> So seems no way to verify whether my assumption is correct or not.
> 
> 
> Thanks,
> 
> haosdent huang
> 
>

Re: Review Request 46307: Ignored subsequent status update in HealthStatusChange tests.

Reply via email to