> On Oct. 5, 2017, 4:40 a.m., Gaston Kleiman wrote:
> > src/tests/default_executor_tests.cpp
> > Lines 1273-1277 (patched)
> > <https://reviews.apache.org/r/62775/diff/1/?file=1846112#file1846112line1273>
> >
> > You could use the `KillPolicyTestHelper` with `--sleep_duration=0`
> > instead of this.
> >
> > If you do that, you don't need to mark the test as `ROOT_INTERNET_CURL`
> > and it will be run more often.
>
> Qian Zhang wrote:
> Thanks, I have tried it, and found an issue about this test: When I use
> the `nginx:alpine` image and ran this test, I found:
> ```
> [ RUN ]
> MesosContainerizer/DefaultExecutorTest.ROOT_INTERNET_CURL_NoTransitionFromKillingToFinished/0
> ...
> I1006 06:32:32.615941 29160 default_executor.cpp:1004] Killing task
> 6eacb18a-69f6-48ca-8ffd-4134b75c8525 running in child container
> 10f2af6b-232d-485e-86a0-9bbdd419a280.f6c078ba-9c08-4cb0-80fc-18bc154477be
> with SIGTERM signal
> I1006 06:32:32.616163 29160 default_executor.cpp:1026] Scheduling
> escalation to SIGKILL in 3secs from now
> I1006 06:32:32.664510 29177 default_executor.cpp:185] Received
> ACKNOWLEDGED event
> I1006 06:32:32.700647 29180 default_executor.cpp:842] Child container
> 10f2af6b-232d-485e-86a0-9bbdd419a280.f6c078ba-9c08-4cb0-80fc-18bc154477be of
> task '6eacb18a-69f6-48ca-8ffd-4134b75c8525' in state TASK_KILLED terminated
> with signal Terminated
> ...
> ```
>
> And when I use `KillPolicyTestHelper` with `--sleep_duration=0` and run
> this test, I found:
> ```
> [ RUN ]
> MesosContainerizer/DefaultExecutorTest.ROOT_NoTransitionFromKillingToFinished/0
> I1006 06:38:32.138226 30407 default_executor.cpp:1004] Killing task
> 636f6b03-b083-4522-813e-aa2122e60915 running in child container
> c9af5166-6aee-4cb3-890e-2846fcb1179e.0994b090-5e88-4a36-a63a-3eacbbbabf89
> with SIGTERM signal
> I1006 06:38:32.138247 30407 default_executor.cpp:1026] Scheduling
> escalation to SIGKILL in 3secs from now
> I1006 06:38:32.187152 30421 default_executor.cpp:185] Received
> ACKNOWLEDGED event
> I1006 06:38:32.225972 30423 default_executor.cpp:842] Child container
> c9af5166-6aee-4cb3-890e-2846fcb1179e.0994b090-5e88-4a36-a63a-3eacbbbabf89 of
> task '636f6b03-b083-4522-813e-aa2122e60915' in state TASK_KILLED terminated
> with signal Killed
> ```
>
> So as you see, the child container was terminated with either `signal
> Terminated` (SIGTERM) or `signal Killed` (SIGKILL), this seems not correct,
> because when I run the same task group (`nginx:alpine` or
> `KillPolicyTestHelper`) with `mesos-execute`, I see:
> ```
> I1006 01:35:42.375380 12666 default_executor.cpp:1004] Killing task test1
> running in child container
> 814d48e9-4776-420a-8194-e61c820315be.921436e3-238e-4b86-a644-189c40d44da8
> with SIGTERM signal
> I1006 01:35:42.375432 12666 default_executor.cpp:1026] Scheduling
> escalation to SIGKILL in 3secs from now
> I1006 01:35:42.530655 12635 default_executor.cpp:842] Child container
> 814d48e9-4776-420a-8194-e61c820315be.921436e3-238e-4b86-a644-189c40d44da8 of
> task 'test1' in state TASK_KILLED exited with status 0
> ```
> This time the child container exited with status 0 rather than terminated
> with any signal, I think this is correct behavior. So it seems as long as I
> run the test, the task (child container) never exits with 0, this is not what
> we expect. I am not sure what's wrong with test, any comments?
>
> Gaston Kleiman wrote:
> Have you tried setting `CommandInfo.shell` to `false` and adding
> `--sleep_duration=0` to `CommandInfo.arguments`? If `CommandInfo.shell` is
> set to `true`, `sh` will receive the SIGTERM signal and exit instead of
> propagating it to the helper.
Yeah, I have already set it, here is the diff on top of this patch:
```
- v1::CommandInfo command;
- command.set_shell(false);
+ v1::CommandInfo commandInfo;
+ commandInfo.set_shell(false);
+ commandInfo.set_value(getTestHelperPath("test-helper"));
+ commandInfo.add_arguments("test-helper");
+ commandInfo.add_arguments(KillPolicyTestHelper::NAME);
+ commandInfo.add_arguments("--sleep_duration=1");
v1::TaskInfo taskInfo = v1::createTask(
agentId,
v1::Resources::parse("cpus:0.1;mem:32;disk:32").get(),
- command);
-
- // The "nginx:alpine" container returns an "EXIT_STATUS" of 0 on
- // receiving a SIGTERM.
- mesos::v1::Image image;
- image.set_type(mesos::v1::Image::DOCKER);
- image.mutable_docker()->set_name("nginx:alpine");
-
- mesos::v1::ContainerInfo* container = taskInfo.mutable_container();
- container->set_type(mesos::v1::ContainerInfo::MESOS);
- container->mutable_mesos()->mutable_image()->CopyFrom(image);
+ commandInfo);
```
After debugging, I found the nested container status file does not exist:
```
ls: cannot access
/tmp/MesosContainerizer_DefaultExecutorTest_ROOT_NoTransitionFromKillingToFinished_0_kztkSB/containers/c649ec7d-972c-4d4c-8c3a-14d6fdc44bce/containers/3cafd0f9-f8c7-49b9-8865-6c74f9483dc0/status:
No such file or directory
```
But this file exists when the task is running, I am not sure why it gets
deleted during the test. If this file does not exist, the nested container will
be considered as termiated by SIGKILL (see
https://github.com/apache/mesos/blob/1.4.0/src/slave/containerizer/mesos/containerizer.cpp#L2600).
- Qian
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62775/#review187126
-----------------------------------------------------------
On Oct. 4, 2017, 11:48 p.m., Qian Zhang wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62775/
> -----------------------------------------------------------
>
> (Updated Oct. 4, 2017, 11:48 p.m.)
>
>
> Review request for mesos, Anand Mazumdar and Vinod Kone.
>
>
> Bugs: MESOS-7975
> https://issues.apache.org/jira/browse/MESOS-7975
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Added a test `ROOT_INTERNET_CURL_NoTransitionFromKillingToFinished`.
>
>
> Diffs
> -----
>
> src/tests/default_executor_tests.cpp
> 2b4c643b8fb2fc8f2a5e98984ae1c267f66885d1
>
>
> Diff: https://reviews.apache.org/r/62775/diff/1/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Qian Zhang
>
>