Re: Review Request 41285: Initialized `pid` to None() instead of `UPID()` in Slave.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/41285/ --- (Updated Jan. 19, 2016, 10:28 p.m.) Review request for mesos, Ben Mahler, Jie Yu, and Vinod Kone. Changes --- Updated Depends on and moved this out of the existing review chain. This should go in 0.27. Bugs: MESOS-3550 https://issues.apache.org/jira/browse/MESOS-3550 Repository: mesos Description --- This was earlier pointed out by Jie on https://reviews.apache.org/r/38874. Currently, most of our logic for finding the executor type is based on the fields `pid`/`http` in the `Executor` struct. We were erroneously setting the initial `pid` value to be `UPID()` instead of being `None()`. The value of `pid` being `UPID()` is only possible in this scenario: - We were unable to find the type of executor upon agent recovery i.e. no pid/http marker file was found. We then mark this special case as `pid=UPID()`. This special case helps us while recovery in the following ways: - If the agent crashed before checkpointing the pid file, both `executor->pid` and `executor->http` would be `None()`. This is similar to the case for HTTP based executors (pid/http being `None`). In order, to distinguish between these two cases, we set the `pid=UPID()`. This helps us in not shutting the HTTP executor accidently by destroying the container when the agent is recovered using `recovery=cleanup` - This special value also helps us to do better logging by correctly distinguishing between HTTP based executors and agent dying before checkpointing the pid/http marker file. ( Both have `pid/http` as `None`) Diffs - src/slave/slave.cpp 90d0fecd2d83fd174134870a577ac59d79c0006f Diff: https://reviews.apache.org/r/41285/diff/ Testing --- make check Thanks, Anand Mazumdar
Re: Review Request 41285: Initialized `pid` to None() instead of `UPID()` in Slave.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/41285/ --- (Updated Jan. 15, 2016, 3:15 a.m.) Review request for mesos, Ben Mahler, Jie Yu, and Vinod Kone. Changes --- Updated description. Bugs: MESOS-3550 https://issues.apache.org/jira/browse/MESOS-3550 Repository: mesos Description (updated) --- This was earlier pointed out by Jie on https://reviews.apache.org/r/38874. Currently, most of our logic for finding the executor type is based on the fields `pid`/`http` in the `Executor` struct. We were erroneously setting the initial `pid` value to be `UPID()` instead of being `None()`. The value of `pid` being `UPID()` is only possible in this scenario: - We were unable to find the type of executor upon agent recovery i.e. no pid/http marker file was found. We then mark this special case as `pid=UPID()`. This special case helps us while recovery in the following ways: - If the agent crashed before checkpointing the pid file, both `executor->pid` and `executor->http` would be `None()`. This is similar to the case for HTTP based executors (pid/http being `None`). In order, to distinguish between these two cases, we set the `pid=UPID()`. This helps us in not shutting the HTTP executor accidently by destroying the container when the agent is recovered using `recovery=cleanup` - This special value also helps us to do better logging by correctly distinguishing between HTTP based executors and agent dying before checkpointing the pid/http marker file. ( Both have `pid/http` as `None`) Diffs - src/slave/slave.cpp 90d0fecd2d83fd174134870a577ac59d79c0006f Diff: https://reviews.apache.org/r/41285/diff/ Testing --- make check Thanks, Anand Mazumdar
Re: Review Request 41285: Initialized `pid` to None() instead of `UPID()` in Slave.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/41285/ --- (Updated Jan. 12, 2016, 5:37 p.m.) Review request for mesos, Ben Mahler, Jie Yu, and Vinod Kone. Changes --- Modified summary. Summary (updated) - Initialized `pid` to None() instead of `UPID()` in Slave. Bugs: MESOS-3550 https://issues.apache.org/jira/browse/MESOS-3550 Repository: mesos Description --- This was earlier pointed out by Jie on https://reviews.apache.org/r/38874 . We never observed it as we had no tests for recovery of HTTP executors/receiving status updates for them. Diffs - src/slave/slave.cpp 90d0fecd2d83fd174134870a577ac59d79c0006f Diff: https://reviews.apache.org/r/41285/diff/ Testing --- make check Thanks, Anand Mazumdar
Re: Review Request 41285: Initialized `pid` to None() instead of `UPID()` in Slave
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/41285/ --- (Updated Jan. 12, 2016, 9:21 a.m.) Review request for mesos, Ben Mahler, Jie Yu, and Vinod Kone. Changes --- Rebased Bugs: MESOS-3550 https://issues.apache.org/jira/browse/MESOS-3550 Repository: mesos Description --- This was earlier pointed out by Jie on https://reviews.apache.org/r/38874 . We never observed it as we had no tests for recovery of HTTP executors/receiving status updates for them. Diffs (updated) - src/slave/slave.cpp 90d0fecd2d83fd174134870a577ac59d79c0006f Diff: https://reviews.apache.org/r/41285/diff/ Testing --- make check Thanks, Anand Mazumdar