> On May 7, 2015, 4:53 p.m., Jie Yu wrote:
> > Overall, LGTM! The only reason I don't give a ship it here is because the 
> > tests. Could you please add an integration test in 
> > docker_containerizer_tests.cpp?

I'll add the tests tomorrow and update the diff.


> On May 7, 2015, 4:53 p.m., Jie Yu wrote:
> > src/examples/docker_no_executor_framework.cpp, line 49
> > <https://reviews.apache.org/r/33249/diff/5/?file=952542#file952542line49>
> >
> >     Why this change?

I was using this to test/develop, so it was noisy and unnecessary to have 5 
tasks started.  I'll revert.


> On May 7, 2015, 4:53 p.m., Jie Yu wrote:
> > src/examples/docker_no_executor_framework.cpp, lines 145-148
> > <https://reviews.apache.org/r/33249/diff/5/?file=952542#file952542line145>
> >
> >     This looks wiered to me. Are you testing the failure case here? If 
> > that's the case, could you please instead write a gtest based test (not in 
> > the example framework)?

Sure, I'll write test.


> On May 7, 2015, 4:53 p.m., Jie Yu wrote:
> > src/slave/containerizer/mesos/containerizer.cpp, line 586
> > <https://reviews.apache.org/r/33249/diff/5/?file=952547#file952547line586>
> >
> >     This changes the semantics a little bit. Previously, `killed` in the 
> > Termination will be set to false (framework will thus receive TASK_LOST). 
> > Now `killed` will be set to true (framework will receive TASK_FAILED).
> >     
> >     However, I think that's fine! The semantics around `killed` in 
> > Termination is a definitely weired to me. With a REASON field in the 
> > Termination, we probably can get rid of the `killed`. Could you please add 
> > a TODO in Termination protobuf?

Grrrrr... killed is required.  But since this is an internal protobuf does that 
matter?  Maybe we can just break backwards compatibility?

Adding Reason (or rather setting it) was much less straight forward than I 
expected it to be.  I talked to Jörg about this and he said he was willing to 
take it on next week.  I'll note it here and reference MESOS-2035.


- Jay


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33249/#review82940
-----------------------------------------------------------


On May 7, 2015, 11:57 a.m., Jay Buffington wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33249/
> -----------------------------------------------------------
> 
> (Updated May 7, 2015, 11:57 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Jie Yu, and Timothy Chen.
> 
> 
> Bugs: MESOS-2020 and MESOS-2656
>     https://issues.apache.org/jira/browse/MESOS-2020
>     https://issues.apache.org/jira/browse/MESOS-2656
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Send statusUpdate to scheduler on containerizer launch failure
> 
> This review is actually for three commits.  The three commits are separated 
> out
> on github here:
> https://github.com/apache/mesos/compare/master...jaybuff:report-launch-failure?expand=1
> 
> Commit #1:
> docker-no-executor-framework example should send ContainerInfo
> 
> docker-no-executor-framework stopped working when the protobuf message
> changed during development.  Apparently noone noticed because we don't
> run it as part of our test suite.  I used it to develop a fix for
> proper error reporting, so I fixed it.
> 
> Commit #2:
> serialize wait and destroy in containerizer
> 
> Containerizers keep track of all the containers they have created.  When
> a container is destroy()'ed the containerizer removes the container from
> its internal hashmap.  This means that wait() must be called before the
> container is destroyed.
> 
> When the docker containerizer fails because of a "docker pull" or
> "docker run" failure, we end up with a race condition where wait() does
> not properly return the Termination, so we ended up with a default
> failure message of "Abnormal Executor Termination" which isn't very
> useful.
> 
> This commit solves this issue by not destroying the container until
> after wait is called in the failure case.
> 
> Fixes MESOS-2656
> 
> Commit #3:
> send scheduler containerizer launch failure errors
> 
> If a method in the launch sequence returns a failure the error message
> wasn't making it back to the scheduler.
> 
> Fixes MESOS-2020
> 
> 
> Diffs
> -----
> 
>   src/examples/docker_no_executor_framework.cpp 
> d5385d9295d30a6039bab9938f3270006582d3da 
>   src/slave/containerizer/containerizer.hpp 
> 56c088abb036aa26c323c3142fd27ea29ed51d4f 
>   src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f 
>   src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 
>   src/slave/containerizer/mesos/containerizer.hpp 
> 5e5f13ed8a71ff9510b40b6032d00fd16d312622 
>   src/slave/containerizer/mesos/containerizer.cpp 
> f2587280dc0e1d566d2b856a80358c7b3896c603 
>   src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 
> 
> Diff: https://reviews.apache.org/r/33249/diff/
> 
> 
> Testing
> -------
> 
> I used examples/docker_no_executor_framework.cpp and make check
> 
> 
> Thanks,
> 
> Jay Buffington
> 
>

Reply via email to