Re: Review Request 58754: Altered the task command used in an agent test.

2017-04-28 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58754/#review173306
---



Patch looks great!

Reviews applied: [58754]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On April 28, 2017, 2:13 a.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58754/
> ---
> 
> (Updated April 28, 2017, 2:13 a.m.)
> 
> 
> Review request for mesos, Joseph Wu and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Previously, the test `SlaveTest.RestartSlaveRequireExecutorAuthentication`
> used the command 'cat' in an attempt to run a long-lived task. However,
> this command seems to yield a task that will terminate prematurely in some
> testing environments. This patch updates the task to use `sleep 120` instead.
> 
> 
> Diffs
> -
> 
>   src/tests/slave_tests.cpp 8c97dc6d088708d301dc3ccf90d413fd785b782f 
> 
> 
> Diff: https://reviews.apache.org/r/58754/diff/1/
> 
> 
> Testing
> ---
> 
> Run in CI to verify that the test is no longer flaky.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 58754: Altered the task command used in an agent test.

2017-04-27 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58754/#review173284
---




src/tests/slave_tests.cpp
Lines 6255-6256 (original), 6255 (patched)


While this certainly removes the flakiness, I wonder if this is masking an 
underlying race condition in the containerizer.

From the logs I've seen, the `cat` command seems to be exiting due to a 
pipe closure.

In the past, commands like this would be launched sharing the stdin of the 
agent process (which in tests, is equal to the test process).  But after the 
introduction of the IO switchboard, there are more layers to consider:

1) If the container is launched with a `tty_info` (not the case in this 
test), the stdin will come from a TTY.
2) In local mode, the stdin is shared with the parent process.
3) In normal mode (this test), the stdin will be a pipe to the IO 
switchboard server process.

Perhaps, when the agent gets restarted in the test, it ends up killing the 
IO switchboard server somehow?  The agent restart is a semi-graceful shutdown, 
meaning it may call destructors.  In an actual agent restart, there may not be 
time to call destructors.

So TL;DR: Investigate if the IO Switchboard server is dying in some test 
runs.


- Joseph Wu


On April 27, 2017, 4:13 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58754/
> ---
> 
> (Updated April 27, 2017, 4:13 p.m.)
> 
> 
> Review request for mesos, Joseph Wu and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Previously, the test `SlaveTest.RestartSlaveRequireExecutorAuthentication`
> used the command 'cat' in an attempt to run a long-lived task. However,
> this command seems to yield a task that will terminate prematurely in some
> testing environments. This patch updates the task to use `sleep 120` instead.
> 
> 
> Diffs
> -
> 
>   src/tests/slave_tests.cpp 8c97dc6d088708d301dc3ccf90d413fd785b782f 
> 
> 
> Diff: https://reviews.apache.org/r/58754/diff/1/
> 
> 
> Testing
> ---
> 
> Run in CI to verify that the test is no longer flaky.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 58754: Altered the task command used in an agent test.

2017-04-27 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58754/
---

(Updated April 27, 2017, 11:13 p.m.)


Review request for mesos, Joseph Wu and Vinod Kone.


Repository: mesos


Description (updated)
---

Previously, the test `SlaveTest.RestartSlaveRequireExecutorAuthentication`
used the command 'cat' in an attempt to run a long-lived task. However,
this command seems to yield a task that will terminate prematurely in some
testing environments. This patch updates the task to use `sleep 120` instead.


Diffs
-

  src/tests/slave_tests.cpp 8c97dc6d088708d301dc3ccf90d413fd785b782f 


Diff: https://reviews.apache.org/r/58754/diff/1/


Testing (updated)
---

Run in CI to verify that the test is no longer flaky.


Thanks,

Greg Mann