Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-09-12 Thread Anand Mazumdar


> On Sept. 8, 2016, 9:14 a.m., Vinod Kone wrote:
> > src/slave/slave.cpp, line 2112
> > 
> >
> > #2085 explicitly calls stringify(). lets be consistent.
> > 
> > also this could've been better written as
> > 
> > ```
> > vector taskIds;
> > foreach (const TaskGroup& taskGroup, taskGroups) {
> >   vector taskIds_;
> >   foreach (const TaskInfo& task, taskGroup.tasks()) {
> > taskIds_.push_back(task.task_id());
> >   }
> >   taskIds.push_back(taskIds_);
> > }
> > 
> > out << stringify(taskIds);
> > ```
> 
> Vinod Kone wrote:
> also realized that master prints task group as a set whereas this code is 
> printing it as a vector! we need to be consistent.

hmm, we should fix the master to not print the task group as a set (sigh!)

The logical equivalent of a `RepeatedPtrField` is either a `vector`/`list`. 
Using a `set` is counter-intuitive.


- Anand


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51477/#review148161
---


On Sept. 8, 2016, 5:19 a.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51477/
> ---
> 
> (Updated Sept. 8, 2016, 5:19 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6076
> https://issues.apache.org/jira/browse/MESOS-6076
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This changes implements the `runTaskGroup()` handler on the
> agent ensuring that task group is sent atomically to the executor
> via the `LAUNCH_GROUP` event. It also refactors `runTask()`/`_runTask()`
> to go through a common handler function. Also, it ensures that all
> tasks in `framework->pending`/`queuedTasks` that are killed before
> running the task group result in all the tasks being killed.
> 
> Review: https://reviews.apache.org/r/51477/
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 4add4c0180ea56039e0d5009bad4d9346128bde6 
>   src/slave/slave.cpp 2da5a2986c427250664b2bf3456039f86e5c6079 
> 
> Diff: https://reviews.apache.org/r/51477/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-09-12 Thread Anand Mazumdar


> On Sept. 8, 2016, 2:28 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, line 2277
> > 
> >
> > just realized that you need to set proper command info for `DEFAULT` 
> > executor here? master is not going to fill it!
> > 
> > consequently, you need to ensure that in `doReliableregistration()` 
> > command info is unset when re-registering `DEFAULT` executor.

Good Catch! I would leave a TODO while committing and would fix it in a follow 
up JIRA issue/review.


- Anand


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51477/#review148184
---


On Sept. 8, 2016, 5:19 a.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51477/
> ---
> 
> (Updated Sept. 8, 2016, 5:19 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6076
> https://issues.apache.org/jira/browse/MESOS-6076
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This changes implements the `runTaskGroup()` handler on the
> agent ensuring that task group is sent atomically to the executor
> via the `LAUNCH_GROUP` event. It also refactors `runTask()`/`_runTask()`
> to go through a common handler function. Also, it ensures that all
> tasks in `framework->pending`/`queuedTasks` that are killed before
> running the task group result in all the tasks being killed.
> 
> Review: https://reviews.apache.org/r/51477/
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 4add4c0180ea56039e0d5009bad4d9346128bde6 
>   src/slave/slave.cpp 2da5a2986c427250664b2bf3456039f86e5c6079 
> 
> Diff: https://reviews.apache.org/r/51477/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-09-08 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51477/#review148184
---




src/slave/slave.cpp (line 2260)


just realized that you need to set proper command info for `DEFAULT` 
executor here? master is not going to fill it!

consequently, you need to ensure that in `doReliableregistration()` command 
info is unset when re-registering `DEFAULT` executor.


- Vinod Kone


On Sept. 8, 2016, 5:19 a.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51477/
> ---
> 
> (Updated Sept. 8, 2016, 5:19 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6076
> https://issues.apache.org/jira/browse/MESOS-6076
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This changes implements the `runTaskGroup()` handler on the
> agent ensuring that task group is sent atomically to the executor
> via the `LAUNCH_GROUP` event. It also refactors `runTask()`/`_runTask()`
> to go through a common handler function. Also, it ensures that all
> tasks in `framework->pending`/`queuedTasks` that are killed before
> running the task group result in all the tasks being killed.
> 
> Review: https://reviews.apache.org/r/51477/
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 4add4c0180ea56039e0d5009bad4d9346128bde6 
>   src/slave/slave.cpp 2da5a2986c427250664b2bf3456039f86e5c6079 
> 
> Diff: https://reviews.apache.org/r/51477/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-09-08 Thread Vinod Kone


> On Sept. 8, 2016, 9:14 a.m., Vinod Kone wrote:
> > src/slave/slave.cpp, line 2112
> > 
> >
> > #2085 explicitly calls stringify(). lets be consistent.
> > 
> > also this could've been better written as
> > 
> > ```
> > vector taskIds;
> > foreach (const TaskGroup& taskGroup, taskGroups) {
> >   vector taskIds_;
> >   foreach (const TaskInfo& task, taskGroup.tasks()) {
> > taskIds_.push_back(task.task_id());
> >   }
> >   taskIds.push_back(taskIds_);
> > }
> > 
> > out << stringify(taskIds);
> > ```

also realized that master prints task group as a set whereas this code is 
printing it as a vector! we need to be consistent.


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51477/#review148161
---


On Sept. 8, 2016, 5:19 a.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51477/
> ---
> 
> (Updated Sept. 8, 2016, 5:19 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6076
> https://issues.apache.org/jira/browse/MESOS-6076
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This changes implements the `runTaskGroup()` handler on the
> agent ensuring that task group is sent atomically to the executor
> via the `LAUNCH_GROUP` event. It also refactors `runTask()`/`_runTask()`
> to go through a common handler function. Also, it ensures that all
> tasks in `framework->pending`/`queuedTasks` that are killed before
> running the task group result in all the tasks being killed.
> 
> Review: https://reviews.apache.org/r/51477/
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 4add4c0180ea56039e0d5009bad4d9346128bde6 
>   src/slave/slave.cpp 2da5a2986c427250664b2bf3456039f86e5c6079 
> 
> Diff: https://reviews.apache.org/r/51477/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-09-08 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51477/#review148161
---


Fix it, then Ship it!





src/slave/slave.cpp (line 2102)


#2085 explicitly calls stringify(). lets be consistent.

also this could've been better written as

```
vector taskIds;
foreach (const TaskGroup& taskGroup, taskGroups) {
  vector taskIds_;
  foreach (const TaskInfo& task, taskGroup.tasks()) {
taskIds_.push_back(task.task_id());
  }
  taskIds.push_back(taskIds_);
}

out << stringify(taskIds);
```



src/slave/slave.cpp (line 2168)


no need for this if statement?



src/slave/slave.cpp (line 2199)


no need for this if statement?


- Vinod Kone


On Sept. 8, 2016, 5:19 a.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51477/
> ---
> 
> (Updated Sept. 8, 2016, 5:19 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6076
> https://issues.apache.org/jira/browse/MESOS-6076
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This changes implements the `runTaskGroup()` handler on the
> agent ensuring that task group is sent atomically to the executor
> via the `LAUNCH_GROUP` event. It also refactors `runTask()`/`_runTask()`
> to go through a common handler function. Also, it ensures that all
> tasks in `framework->pending`/`queuedTasks` that are killed before
> running the task group result in all the tasks being killed.
> 
> Review: https://reviews.apache.org/r/51477/
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 4add4c0180ea56039e0d5009bad4d9346128bde6 
>   src/slave/slave.cpp 2da5a2986c427250664b2bf3456039f86e5c6079 
> 
> Diff: https://reviews.apache.org/r/51477/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-09-08 Thread Guangya Liu


> On 九月 7, 2016, 10:17 a.m., Guangya Liu wrote:
> > src/slave/slave.hpp, line 1027
> > 
> >
> > How about const?
> 
> Anand Mazumdar wrote:
> Can you elaborate? Non member functions can't be marked `const`. Also, 
> did you mean `const` on the return type?
> 
> Guangya Liu wrote:
> Sorry for confuse, yes, what about the return type as const?
> 
> Anand Mazumdar wrote:
> hmm, I don't quite understand what would the `const` specifier on the 
> return type buy us. It's anyways a temporary rvalue that is being returned?

Thanks Anand, got it. We can keep this as unchanged.


> On 九月 7, 2016, 10:17 a.m., Guangya Liu wrote:
> > src/slave/slave.cpp, lines 1988-1989
> > 
> >
> > What about adding `executor state` to the log message?
> 
> Anand Mazumdar wrote:
> Why? Also, we weren't logging the state before either.
> 
> Guangya Liu wrote:
> We actually have the executor state in log message but not here, please 
> refer to https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L1824
> 
> Another point is that this is a new message that you introduced, and the 
> log message for `Executor::REGISTERING` and `Executor::RUNNING` are same, it 
> is better distinguish those log messages by adding the executor state here 
> for debugability.
> 
> Anand Mazumdar wrote:
> hmm, I got confused because it looked orthogonal to this change i.e. we 
> did not do it previously either.
> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L1847
> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L1860
> 
> 
> I would clean this up in a separate patch like I did for a few other 
> changes.

Sounds good, yes, I also saw those two lines, but I think that we need a clean 
up for this by adding the `executor state` here to improve debugability. Thanks.


- Guangya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51477/#review147994
---


On 九月 8, 2016, 5:19 a.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51477/
> ---
> 
> (Updated 九月 8, 2016, 5:19 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6076
> https://issues.apache.org/jira/browse/MESOS-6076
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This changes implements the `runTaskGroup()` handler on the
> agent ensuring that task group is sent atomically to the executor
> via the `LAUNCH_GROUP` event. It also refactors `runTask()`/`_runTask()`
> to go through a common handler function. Also, it ensures that all
> tasks in `framework->pending`/`queuedTasks` that are killed before
> running the task group result in all the tasks being killed.
> 
> Review: https://reviews.apache.org/r/51477/
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 4add4c0180ea56039e0d5009bad4d9346128bde6 
>   src/slave/slave.cpp 2da5a2986c427250664b2bf3456039f86e5c6079 
> 
> Diff: https://reviews.apache.org/r/51477/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-09-08 Thread Anand Mazumdar


> On Sept. 8, 2016, 2:54 a.m., Guangya Liu wrote:
> > src/slave/slave.cpp, line 2102
> > 
> >
> > How about also adding `[]` here?
> > 
> > ```
> > out << "tasks [" << stringify(taskIds) << "]";
> > ```

`stringify` for `std::vector` already does that.


- Anand


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51477/#review148132
---


On Sept. 8, 2016, 5:19 a.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51477/
> ---
> 
> (Updated Sept. 8, 2016, 5:19 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6076
> https://issues.apache.org/jira/browse/MESOS-6076
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This changes implements the `runTaskGroup()` handler on the
> agent ensuring that task group is sent atomically to the executor
> via the `LAUNCH_GROUP` event. It also refactors `runTask()`/`_runTask()`
> to go through a common handler function. Also, it ensures that all
> tasks in `framework->pending`/`queuedTasks` that are killed before
> running the task group result in all the tasks being killed.
> 
> Review: https://reviews.apache.org/r/51477/
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 4add4c0180ea56039e0d5009bad4d9346128bde6 
>   src/slave/slave.cpp 2da5a2986c427250664b2bf3456039f86e5c6079 
> 
> Diff: https://reviews.apache.org/r/51477/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-09-08 Thread Anand Mazumdar


> On Sept. 7, 2016, 10:17 a.m., Guangya Liu wrote:
> > src/slave/slave.hpp, line 1027
> > 
> >
> > How about const?
> 
> Anand Mazumdar wrote:
> Can you elaborate? Non member functions can't be marked `const`. Also, 
> did you mean `const` on the return type?
> 
> Guangya Liu wrote:
> Sorry for confuse, yes, what about the return type as const?

hmm, I don't quite understand what would the `const` specifier on the return 
type buy us. It's anyways a temporary rvalue that is being returned?


> On Sept. 7, 2016, 10:17 a.m., Guangya Liu wrote:
> > src/slave/slave.cpp, lines 1988-1989
> > 
> >
> > What about adding `executor state` to the log message?
> 
> Anand Mazumdar wrote:
> Why? Also, we weren't logging the state before either.
> 
> Guangya Liu wrote:
> We actually have the executor state in log message but not here, please 
> refer to https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L1824
> 
> Another point is that this is a new message that you introduced, and the 
> log message for `Executor::REGISTERING` and `Executor::RUNNING` are same, it 
> is better distinguish those log messages by adding the executor state here 
> for debugability.

hmm, I got confused because it looked orthogonal to this change i.e. we did not 
do it previously either.
https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L1847
https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L1860


I would clean this up in a separate patch like I did for a few other changes.


- Anand


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51477/#review147994
---


On Sept. 8, 2016, 5:19 a.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51477/
> ---
> 
> (Updated Sept. 8, 2016, 5:19 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6076
> https://issues.apache.org/jira/browse/MESOS-6076
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This changes implements the `runTaskGroup()` handler on the
> agent ensuring that task group is sent atomically to the executor
> via the `LAUNCH_GROUP` event. It also refactors `runTask()`/`_runTask()`
> to go through a common handler function. Also, it ensures that all
> tasks in `framework->pending`/`queuedTasks` that are killed before
> running the task group result in all the tasks being killed.
> 
> Review: https://reviews.apache.org/r/51477/
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 4add4c0180ea56039e0d5009bad4d9346128bde6 
>   src/slave/slave.cpp 2da5a2986c427250664b2bf3456039f86e5c6079 
> 
> Diff: https://reviews.apache.org/r/51477/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-09-07 Thread Anand Mazumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51477/
---

(Updated Sept. 8, 2016, 5:19 a.m.)


Review request for mesos and Vinod Kone.


Changes
---

Rebased due to `b2101157fd61bbe42c9536935ee9fda44a929ee9`


Bugs: MESOS-6076
https://issues.apache.org/jira/browse/MESOS-6076


Repository: mesos


Description
---

This changes implements the `runTaskGroup()` handler on the
agent ensuring that task group is sent atomically to the executor
via the `LAUNCH_GROUP` event. It also refactors `runTask()`/`_runTask()`
to go through a common handler function. Also, it ensures that all
tasks in `framework->pending`/`queuedTasks` that are killed before
running the task group result in all the tasks being killed.

Review: https://reviews.apache.org/r/51477/


Diffs (updated)
-

  src/slave/slave.hpp 4add4c0180ea56039e0d5009bad4d9346128bde6 
  src/slave/slave.cpp 2da5a2986c427250664b2bf3456039f86e5c6079 

Diff: https://reviews.apache.org/r/51477/diff/


Testing
---

make check


Thanks,

Anand Mazumdar



Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-09-07 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51477/#review148132
---




src/slave/slave.cpp (lines 1839 - 1840)


I think that we always prefer adding the blank space at the beginning as 
#1787, you can fix this in other patches.

```
"The checkpointed resources being used by the task or task group are"
" unknown to the agent",
```



src/slave/slave.cpp (lines 2079 - 2080)


new line here



src/slave/slave.cpp (line 2085)


How about also adding `[]` here?

```
out << "tasks [" << stringify(taskIds) << "]";
```


- Guangya Liu


On 九月 7, 2016, 10:13 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51477/
> ---
> 
> (Updated 九月 7, 2016, 10:13 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6076
> https://issues.apache.org/jira/browse/MESOS-6076
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This changes implements the `runTaskGroup()` handler on the
> agent ensuring that task group is sent atomically to the executor
> via the `LAUNCH_GROUP` event. It also refactors `runTask()`/`_runTask()`
> to go through a common handler function. Also, it ensures that all
> tasks in `framework->pending`/`queuedTasks` that are killed before
> running the task group result in all the tasks being killed.
> 
> Review: https://reviews.apache.org/r/51477/
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 4add4c0180ea56039e0d5009bad4d9346128bde6 
>   src/slave/slave.cpp 11664779ed78c0a5913598bb7dd1bb0e793d6b93 
> 
> Diff: https://reviews.apache.org/r/51477/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-09-07 Thread Guangya Liu


> On 九月 7, 2016, 10:17 a.m., Guangya Liu wrote:
> > src/slave/slave.hpp, line 1027
> > 
> >
> > How about const?
> 
> Anand Mazumdar wrote:
> Can you elaborate? Non member functions can't be marked `const`. Also, 
> did you mean `const` on the return type?

Sorry for confuse, yes, what about the return type as const?


> On 九月 7, 2016, 10:17 a.m., Guangya Liu wrote:
> > src/slave/slave.cpp, line 1711
> > 
> >
> > s/for/to
> 
> Anand Mazumdar wrote:
> Why? This looks fine to me.

I do not have strong intention for this, just depend on you, but seems we use 
`send  to ...` more frequently ;-)


> On 九月 7, 2016, 10:17 a.m., Guangya Liu wrote:
> > src/slave/slave.cpp, line 1748
> > 
> >
> > What about enhance the reason as `Task killed before it was launched 
> > due to one task killed in the task group`?
> 
> Anand Mazumdar wrote:
> We haven't been doing such fine grained error messaging on the Master 
> too. We might consider doing it in the future. I would file a JIRA for 
> posterity.

Sounds good, thanks Anand.


> On 九月 7, 2016, 10:17 a.m., Guangya Liu wrote:
> > src/slave/slave.cpp, lines 1988-1989
> > 
> >
> > What about adding `executor state` to the log message?
> 
> Anand Mazumdar wrote:
> Why? Also, we weren't logging the state before either.

We actually have the executor state in log message but not here, please refer 
to https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L1824

Another point is that this is a new message that you introduced, and the log 
message for `Executor::REGISTERING` and `Executor::RUNNING` are same, it is 
better distinguish those log messages by adding the executor state here for 
debugability.


- Guangya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51477/#review147994
---


On 九月 7, 2016, 10:13 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51477/
> ---
> 
> (Updated 九月 7, 2016, 10:13 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6076
> https://issues.apache.org/jira/browse/MESOS-6076
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This changes implements the `runTaskGroup()` handler on the
> agent ensuring that task group is sent atomically to the executor
> via the `LAUNCH_GROUP` event. It also refactors `runTask()`/`_runTask()`
> to go through a common handler function. Also, it ensures that all
> tasks in `framework->pending`/`queuedTasks` that are killed before
> running the task group result in all the tasks being killed.
> 
> Review: https://reviews.apache.org/r/51477/
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 4add4c0180ea56039e0d5009bad4d9346128bde6 
>   src/slave/slave.cpp 11664779ed78c0a5913598bb7dd1bb0e793d6b93 
> 
> Diff: https://reviews.apache.org/r/51477/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-09-07 Thread Anand Mazumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51477/#review148068
---



- Anand Mazumdar


On Sept. 6, 2016, 9:25 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51477/
> ---
> 
> (Updated Sept. 6, 2016, 9:25 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6076
> https://issues.apache.org/jira/browse/MESOS-6076
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This changes implements the `runTaskGroup()` handler on the
> agent ensuring that task group is sent atomically to the executor
> via the `LAUNCH_GROUP` event. It also refactors `runTask()`/`_runTask()`
> to go through a common handler function. Also, it ensures that all
> tasks in `framework->pending`/`queuedTasks` that are killed before
> running the task group result in all the tasks being killed.
> 
> Review: https://reviews.apache.org/r/51477/
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 4add4c0180ea56039e0d5009bad4d9346128bde6 
>   src/slave/slave.cpp 11664779ed78c0a5913598bb7dd1bb0e793d6b93 
> 
> Diff: https://reviews.apache.org/r/51477/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-09-07 Thread Anand Mazumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51477/
---

(Updated Sept. 7, 2016, 10:13 p.m.)


Review request for mesos and Vinod Kone.


Changes
---

Review comments from Vinod and Guangya.


Bugs: MESOS-6076
https://issues.apache.org/jira/browse/MESOS-6076


Repository: mesos


Description
---

This changes implements the `runTaskGroup()` handler on the
agent ensuring that task group is sent atomically to the executor
via the `LAUNCH_GROUP` event. It also refactors `runTask()`/`_runTask()`
to go through a common handler function. Also, it ensures that all
tasks in `framework->pending`/`queuedTasks` that are killed before
running the task group result in all the tasks being killed.

Review: https://reviews.apache.org/r/51477/


Diffs (updated)
-

  src/slave/slave.hpp 4add4c0180ea56039e0d5009bad4d9346128bde6 
  src/slave/slave.cpp 11664779ed78c0a5913598bb7dd1bb0e793d6b93 

Diff: https://reviews.apache.org/r/51477/diff/


Testing
---

make check


Thanks,

Anand Mazumdar



Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-09-07 Thread Anand Mazumdar


> On Sept. 7, 2016, 10:17 a.m., Guangya Liu wrote:
> > src/slave/slave.cpp, lines 3120-3135
> > 
> >
> > Can you please show more detail and update the comments here for which 
> > case will cause the `executor->queuedTasks` and 
> > `executor->queuedTaskGroups` have same taskId?

We currently store the tasks in queued task groups also in queued tasks.


> On Sept. 7, 2016, 10:17 a.m., Guangya Liu wrote:
> > src/slave/slave.cpp, line 2391
> > 
> >
> > Just a question here, we are killing task in task group, and we can 
> > even say here is killing a task group, but here the status is still 
> > `TASK_KILLED`, do we need to introduce a new `TASKGROUP_KILLED` for this?
> > 
> > Ditto for other places.

We might consider doing it later but for now this behavior is consistent with 
the behavior on the Master.


> On Sept. 7, 2016, 10:17 a.m., Guangya Liu wrote:
> > src/slave/slave.cpp, lines 1988-1989
> > 
> >
> > What about adding `executor state` to the log message?

Why? Also, we weren't logging the state before either.


> On Sept. 7, 2016, 10:17 a.m., Guangya Liu wrote:
> > src/slave/slave.cpp, line 1711
> > 
> >
> > s/for/to

Why? This looks fine to me.


> On Sept. 7, 2016, 10:17 a.m., Guangya Liu wrote:
> > src/slave/slave.hpp, line 1027
> > 
> >
> > How about const?

Can you elaborate? Non member functions can't be marked `const`. Also, did you 
mean `const` on the return type?


> On Sept. 7, 2016, 10:17 a.m., Guangya Liu wrote:
> > src/slave/slave.cpp, line 1748
> > 
> >
> > What about enhance the reason as `Task killed before it was launched 
> > due to one task killed in the task group`?

We haven't been doing such fine grained error messaging on the Master too. We 
might consider doing it in the future. I would file a JIRA for posterity.


> On Sept. 7, 2016, 10:17 a.m., Guangya Liu wrote:
> > src/slave/slave.cpp, lines 1825-1847
> > 
> >
> > I think the reason is not correct for other tasks in the task group.
> > 
> > What about sending out TASK_LOST reason separately as following logic:
> > 
> > if checkpoint failure:
> >   task lost with reason as checkpoint failure
> >   
> > if kill:
> >   foreach task but not the checkpoint failure task:
> >  task lost with reason as one task fail cause whole task groupp fail

See above comment.


> On Sept. 7, 2016, 10:17 a.m., Guangya Liu wrote:
> > src/slave/slave.cpp, lines 1866-1887
> > 
> >
> > ditto as above for updating `reasons` separately

See above comment.


- Anand


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51477/#review147994
---


On Sept. 6, 2016, 9:25 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51477/
> ---
> 
> (Updated Sept. 6, 2016, 9:25 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6076
> https://issues.apache.org/jira/browse/MESOS-6076
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This changes implements the `runTaskGroup()` handler on the
> agent ensuring that task group is sent atomically to the executor
> via the `LAUNCH_GROUP` event. It also refactors `runTask()`/`_runTask()`
> to go through a common handler function. Also, it ensures that all
> tasks in `framework->pending`/`queuedTasks` that are killed before
> running the task group result in all the tasks being killed.
> 
> Review: https://reviews.apache.org/r/51477/
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 4add4c0180ea56039e0d5009bad4d9346128bde6 
>   src/slave/slave.cpp 11664779ed78c0a5913598bb7dd1bb0e793d6b93 
> 
> Diff: https://reviews.apache.org/r/51477/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-09-07 Thread Anand Mazumdar


> On Sept. 7, 2016, 10:26 a.m., Vinod Kone wrote:
> > src/slave/slave.cpp, line 1616
> > 
> >
> > couldn't this have been?
> > 
> > ```
> > for (auto it = tasks.begin(); it != tasks.end(); ++it)
> > ```

Was trying to be consistent with the loop a couple of lines before. Would file 
another CR to clean it up too! 
https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L1562


> On Sept. 7, 2016, 10:26 a.m., Vinod Kone wrote:
> > src/slave/slave.cpp, line 1711
> > 
> >
> > you say "other" tasks but you send TASK_KILLED for "all" tasks in the 
> > group. can you remind me why we can't send TASK_KILLED for the killed task 
> > in `killTask()` right away and send TASK_KILLED for the rest of the tasks 
> > in the group here, similar to what we do in `Master::_accept()`?

I don't know _yet_ the reasoning for us doing this in `Master::_accept()`. It 
is pretty non-intuitive and leads to two code paths where we send the status 
update rather then just doing them all in one place as the current code in this 
CR does. 

Also, there is a relevant `TODO` in the code that wants to do the opposite in 
the Master!

```// TODO(bmahler): Do this killing when processing
// the `Kill` call, rather than doing it here.
```

I don't think we even have enough information in `Master::killTask()` to do 
this unless we start storing task group information on the Master. I would 
change the `TODO` in the master to reflect this i.e. send all the status 
updates in `_accept()` similar to what we are doing on the agent.


> On Sept. 7, 2016, 10:26 a.m., Vinod Kone wrote:
> > src/slave/slave.cpp, line 1752
> > 
> >
> > back ticks instead of quotes.

Other places in this function use quotes for this comment. We might want to do 
a sweep later.


> On Sept. 7, 2016, 10:26 a.m., Vinod Kone wrote:
> > src/slave/slave.cpp, line 1903
> > 
> >
> > back ticks.

It seems weird that we have a quote and a back tick in the same comment! We 
might want to do a clean sweep later.


> On Sept. 7, 2016, 10:26 a.m., Vinod Kone wrote:
> > src/slave/slave.hpp, lines 327-329
> > 
> >
> > kill this comment.

Would do in a separate CR.


> On Sept. 7, 2016, 10:26 a.m., Vinod Kone wrote:
> > src/slave/slave.cpp, line 1712
> > 
> >
> > if you follow `Master::_accept()` this should be a hashset of TaskIDs.

See earlier comment.


> On Sept. 7, 2016, 10:26 a.m., Vinod Kone wrote:
> > src/slave/slave.cpp, line 2182
> > 
> >
> > but `killTask` is not sending a TASK_KILLED update anymore?

It does so still for queued tasks.


> On Sept. 7, 2016, 10:26 a.m., Vinod Kone wrote:
> > src/slave/slave.cpp, lines 2332-2334
> > 
> >
> > s/bigger//
> > 
> > can you remind me why we don't send TASK_KILLED here? AFAICT `_run()` 
> > has information about the task group based on the argument passed to it, 
> > even if `framework->pending` doesn't itself have that info. if there is no 
> > good reason, lets make this consistent with how we do it in the master.

See earlier comment.


> On Sept. 7, 2016, 10:26 a.m., Vinod Kone wrote:
> > src/slave/slave.cpp, lines 3137-3159
> > 
> >
> > it's unfortunate that you have to make 2 duplicate 
> > containerizer->update calls here. is there any reason why `__run` work with 
> > both tasks and taskGroups arguments set?

Good Suggestion. The only reason I made `__run()` to not work with both tasks 
and task groups was to maintain consistency with `run()/_run()`. Looks like not 
doing that would allow us to get rid of this redundant code.


> On Sept. 7, 2016, 10:26 a.m., Vinod Kone wrote:
> > src/slave/slave.cpp, line 2115
> > 
> >
> > i think you need to also insert a space after each task group? it's 
> > probably best to store taskgroups as a vector of vectors and call stringify.

I modified it a bit to have a `,` after every task group.


> On Sept. 7, 2016, 10:26 a.m., Vinod Kone wrote:
> > src/slave/slave.cpp, lines 3123-3126
> > 
> >
> > you couldn't use the default assignment operator for LinkedHashMap?

hmm, seems like I ran into some issues with using the default assignment 
operator. It seems that we store the 

Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-09-07 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51477/#review148008
---



Another point I want to mention is without https://reviews.apache.org/r/51478/ 
, this patch will cause build failed. That's to say this patch is not atomic 
and I think that mesos always want an atomic patch. But it also ok if those two 
patches can be merged together.

- Guangya Liu


On 九月 6, 2016, 9:25 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51477/
> ---
> 
> (Updated 九月 6, 2016, 9:25 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6076
> https://issues.apache.org/jira/browse/MESOS-6076
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This changes implements the `runTaskGroup()` handler on the
> agent ensuring that task group is sent atomically to the executor
> via the `LAUNCH_GROUP` event. It also refactors `runTask()`/`_runTask()`
> to go through a common handler function. Also, it ensures that all
> tasks in `framework->pending`/`queuedTasks` that are killed before
> running the task group result in all the tasks being killed.
> 
> Review: https://reviews.apache.org/r/51477/
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 4add4c0180ea56039e0d5009bad4d9346128bde6 
>   src/slave/slave.cpp 11664779ed78c0a5913598bb7dd1bb0e793d6b93 
> 
> Diff: https://reviews.apache.org/r/51477/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-09-07 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51477/#review147990
---




src/slave/slave.hpp (lines 327 - 329)


kill this comment.



src/slave/slave.hpp (line 845)


we typically use 'get' instead of 'find' for these cases.



src/slave/slave.cpp (line 1615)


couldn't this have been?

```
for (auto it = tasks.begin(); it != tasks.end(); ++it)
```



src/slave/slave.cpp (line 1709)


you say "other" tasks but you send TASK_KILLED for "all" tasks in the 
group. can you remind me why we can't send TASK_KILLED for the killed task in 
`killTask()` right away and send TASK_KILLED for the rest of the tasks in the 
group here, similar to what we do in `Master::_accept()`?



src/slave/slave.cpp (line 1710)


if you follow `Master::_accept()` this should be a hashset of TaskIDs.



src/slave/slave.cpp (line 1750)


back ticks instead of quotes.



src/slave/slave.cpp (line 1788)


space before "directories"



src/slave/slave.cpp (line 1840)


s/task/task or task group/



src/slave/slave.cpp (line 1898)


back ticks.



src/slave/slave.cpp (line 2055)


extra space after CHECK



src/slave/slave.cpp (line 2058)


extra space after CHECK



src/slave/slave.cpp (line 2104)


i think you need to also insert a space after each task group? it's 
probably best to store taskgroups as a vector of vectors and call stringify.



src/slave/slave.cpp (line 2171)


but `killTask` is not sending a TASK_KILLED update anymore?



src/slave/slave.cpp (line 2198)


CHECK_SOME(taskGroups); ?



src/slave/slave.cpp (lines 2321 - 2323)


s/bigger//

can you remind me why we don't send TASK_KILLED here? AFAICT `_run()` has 
information about the task group based on the argument passed to it, even if 
`framework->pending` doesn't itself have that info. if there is no good reason, 
lets make this consistent with how we do it in the master.



src/slave/slave.cpp 


can you add a comment here for why we don't remove the framework here for 
posterity?



src/slave/slave.cpp (line 2360)


s/other tasks/tasks in the group/



src/slave/slave.cpp (line 2451)


s/task/task or task group/



src/slave/slave.cpp (line 2455)


why the return?



src/slave/slave.cpp (line 3063)


tasks or task groups.



src/slave/slave.cpp (lines 3102 - 3105)


you couldn't use the default assignment operator for LinkedHashMap?



src/slave/slave.cpp (lines 3116 - 3138)


it's unfortunate that you have to make 2 duplicate containerizer->update 
calls here. is there any reason why `__run` work with both tasks and taskGroups 
arguments set?



src/slave/slave.cpp (lines 3327 - 3367)


ditto. see comments above.



src/slave/slave.cpp (lines 6677 - 6687)


looks fancy, why not just do:

```
foreach (const TaskGroup& taskGroup, queuedTaskGroups) {
  foreach (const TaskInfo& taskInfo, taskGroup.tasks()) {
if (taskInfo.task_id() == taskId) {
  return taskGroup;  
}
  }
}

return None();

```


- Vinod Kone


On Sept. 6, 2016, 9:25 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51477/
> ---
> 
> (Updated Sept. 6, 2016, 9:25 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6076
> https://issues.apache.org/jira/browse/MESOS-6076
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This changes 

Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-09-07 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51477/#review147994
---




src/slave/slave.hpp (line 325)


s/tasks/tasks or task groups



src/slave/slave.hpp (line 326)


s/RunTaskMessages/RunTaskMessages or LaunchGroup



src/slave/slave.hpp (lines 327 - 329)


kill this?



src/slave/slave.hpp (line 845)


How about 
s/findQueuedTaskGroup/getQueuedTaskGroup



src/slave/slave.hpp (line 1027)


How about const?



src/slave/slave.cpp (line 1709)


s/for/to



src/slave/slave.cpp (line 1746)


What about enhance the reason as `Task killed before it was launched due to 
one task killed in the task group`?



src/slave/slave.cpp (lines 1823 - 1845)


I think the reason is not correct for other tasks in the task group.

What about sending out TASK_LOST reason separately as following logic:

if checkpoint failure:
  task lost with reason as checkpoint failure
  
if kill:
  foreach task but not the checkpoint failure task:
 task lost with reason as one task fail cause whole task groupp fail



src/slave/slave.cpp (lines 1863 - 1884)


ditto as above for updating `reasons` separately



src/slave/slave.cpp (lines 1980 - 1981)


What about adding `executor state` to the log message?



src/slave/slave.cpp (lines 2002 - 2003)


ditto



src/slave/slave.cpp (line 2370)


Just a question here, we are killing task in task group, and we can even 
say here is killing a task group, but here the status is still `TASK_KILLED`, 
do we need to introduce a new `TASKGROUP_KILLED` for this?

Ditto for other places.



src/slave/slave.cpp (lines 3099 - 3114)


Can you please show more detail and update the comments here for which case 
will cause the `executor->queuedTasks` and `executor->queuedTaskGroups` have 
same taskId?


- Guangya Liu


On 九月 6, 2016, 9:25 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51477/
> ---
> 
> (Updated 九月 6, 2016, 9:25 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6076
> https://issues.apache.org/jira/browse/MESOS-6076
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This changes implements the `runTaskGroup()` handler on the
> agent ensuring that task group is sent atomically to the executor
> via the `LAUNCH_GROUP` event. It also refactors `runTask()`/`_runTask()`
> to go through a common handler function. Also, it ensures that all
> tasks in `framework->pending`/`queuedTasks` that are killed before
> running the task group result in all the tasks being killed.
> 
> Review: https://reviews.apache.org/r/51477/
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 4add4c0180ea56039e0d5009bad4d9346128bde6 
>   src/slave/slave.cpp 11664779ed78c0a5913598bb7dd1bb0e793d6b93 
> 
> Diff: https://reviews.apache.org/r/51477/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-09-06 Thread Qian Zhang


> On Sept. 1, 2016, 1:48 p.m., Qian Zhang wrote:
> > src/slave/slave.cpp, line 2542
> > 
> >
> > Here we send a `KillTaskMessage` to executor, but the task to be killed 
> > may be part of a task group, so that means it is executor's responsibility 
> > to kill the tasks in the whole group, right? What if executor does not do 
> > it? Will we enforce it in agent?
> 
> Anand Mazumdar wrote:
> We have never enforced this on the agent (historically for the single 
> task case too)

OK, so that means here the expectation is the executor should cooperate: once 
the executor receive a `Kill` event for a task which is part of a task group, 
it is the executor's responsibility to kill all the tasks in that group, right?


- Qian


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51477/#review147401
---


On Sept. 7, 2016, 5:25 a.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51477/
> ---
> 
> (Updated Sept. 7, 2016, 5:25 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6076
> https://issues.apache.org/jira/browse/MESOS-6076
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This changes implements the `runTaskGroup()` handler on the
> agent ensuring that task group is sent atomically to the executor
> via the `LAUNCH_GROUP` event. It also refactors `runTask()`/`_runTask()`
> to go through a common handler function. Also, it ensures that all
> tasks in `framework->pending`/`queuedTasks` that are killed before
> running the task group result in all the tasks being killed.
> 
> Review: https://reviews.apache.org/r/51477/
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 4add4c0180ea56039e0d5009bad4d9346128bde6 
>   src/slave/slave.cpp 11664779ed78c0a5913598bb7dd1bb0e793d6b93 
> 
> Diff: https://reviews.apache.org/r/51477/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-09-06 Thread Anand Mazumdar


> On Sept. 2, 2016, 7:20 a.m., Qian Zhang wrote:
> > src/slave/slave.hpp, line 908
> > 
> >
> > Here the key of this map is all the task IDs of the `TaskGroupInfo`, 
> > but actually `TaskGroupInfo` already have such info in it (all its tasks), 
> > so it seems a bit duplicated, maybe `hashset 
> > queuedTaskGroups` should be enough?

hmm, `hashset` won't work since we need ordering guarrantees for sending queued 
tasks to the executor in the order they were received. Since, we do not have 
`LinkedHashSet`, I moved this to being `std::list` after discussing with Vinod.


- Anand


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51477/#review147632
---


On Sept. 6, 2016, 9:25 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51477/
> ---
> 
> (Updated Sept. 6, 2016, 9:25 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6076
> https://issues.apache.org/jira/browse/MESOS-6076
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This changes implements the `runTaskGroup()` handler on the
> agent ensuring that task group is sent atomically to the executor
> via the `LAUNCH_GROUP` event. It also refactors `runTask()`/`_runTask()`
> to go through a common handler function. Also, it ensures that all
> tasks in `framework->pending`/`queuedTasks` that are killed before
> running the task group result in all the tasks being killed.
> 
> Review: https://reviews.apache.org/r/51477/
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 4add4c0180ea56039e0d5009bad4d9346128bde6 
>   src/slave/slave.cpp 11664779ed78c0a5913598bb7dd1bb0e793d6b93 
> 
> Diff: https://reviews.apache.org/r/51477/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-09-06 Thread Anand Mazumdar


> On Sept. 1, 2016, 5:48 a.m., Qian Zhang wrote:
> > src/slave/slave.cpp, line 1882
> > 
> >
> > I do not think we need this, because if we get here, `kill` must be 
> > still `false`.

Good catch! I added an explicit invariant check here.


> On Sept. 1, 2016, 5:48 a.m., Qian Zhang wrote:
> > src/slave/slave.cpp, line 1951
> > 
> >
> > Not yours, but I think we do not need this blank line.

We might want to clean this up in a separate patch.


> On Sept. 1, 2016, 5:48 a.m., Qian Zhang wrote:
> > src/slave/slave.cpp, line 1754
> > 
> >
> > I think we can `break` from here.

hmm, we can't `break` here. Let's say we had a task group containing tasks: T1, 
T2 & T3. We put them in the pending queue in `_runTask()`.

A scheduler sends a `killTask()` for T2 before `__runTask()` is invoked. We 
remove T2 from pending in `killTask()`.

When we are in `__runTask`, pending is T1 & T3. If we break in the loop just 
after `T2`, we would leave behind `T3` still in `framework->pending`.


> On Sept. 1, 2016, 5:48 a.m., Qian Zhang wrote:
> > src/slave/slave.cpp, line 2542
> > 
> >
> > Here we send a `KillTaskMessage` to executor, but the task to be killed 
> > may be part of a task group, so that means it is executor's responsibility 
> > to kill the tasks in the whole group, right? What if executor does not do 
> > it? Will we enforce it in agent?

We have never enforced this on the agent (historically for the single task case 
too)


- Anand


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51477/#review147401
---


On Sept. 6, 2016, 9:25 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51477/
> ---
> 
> (Updated Sept. 6, 2016, 9:25 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6076
> https://issues.apache.org/jira/browse/MESOS-6076
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This changes implements the `runTaskGroup()` handler on the
> agent ensuring that task group is sent atomically to the executor
> via the `LAUNCH_GROUP` event. It also refactors `runTask()`/`_runTask()`
> to go through a common handler function. Also, it ensures that all
> tasks in `framework->pending`/`queuedTasks` that are killed before
> running the task group result in all the tasks being killed.
> 
> Review: https://reviews.apache.org/r/51477/
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 4add4c0180ea56039e0d5009bad4d9346128bde6 
>   src/slave/slave.cpp 11664779ed78c0a5913598bb7dd1bb0e793d6b93 
> 
> Diff: https://reviews.apache.org/r/51477/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-09-06 Thread Anand Mazumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51477/
---

(Updated Sept. 6, 2016, 9:25 p.m.)


Review request for mesos and Vinod Kone.


Changes
---

Review comments


Bugs: MESOS-6076
https://issues.apache.org/jira/browse/MESOS-6076


Repository: mesos


Description (updated)
---

This changes implements the `runTaskGroup()` handler on the
agent ensuring that task group is sent atomically to the executor
via the `LAUNCH_GROUP` event. It also refactors `runTask()`/`_runTask()`
to go through a common handler function. Also, it ensures that all
tasks in `framework->pending`/`queuedTasks` that are killed before
running the task group result in all the tasks being killed.

Review: https://reviews.apache.org/r/51477/


Diffs (updated)
-

  src/slave/slave.hpp 4add4c0180ea56039e0d5009bad4d9346128bde6 
  src/slave/slave.cpp 11664779ed78c0a5913598bb7dd1bb0e793d6b93 

Diff: https://reviews.apache.org/r/51477/diff/


Testing
---

make check


Thanks,

Anand Mazumdar



Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-09-02 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51477/#review147632
---




src/slave/slave.hpp (line 908)


Here the key of this map is all the task IDs of the `TaskGroupInfo`, but 
actually `TaskGroupInfo` already have such info in it (all its tasks), so it 
seems a bit duplicated, maybe `hashset queuedTaskGroups` should 
be enough?


- Qian Zhang


On Aug. 28, 2016, 1:44 a.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51477/
> ---
> 
> (Updated Aug. 28, 2016, 1:44 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6076
> https://issues.apache.org/jira/browse/MESOS-6076
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This changes implements the `runTaskGroup()` handler on the
> agent ensuring that task group is sent atomically to the executor
> via the `LAUNCH_GROUP` event. It also refactors `runTask()`/`_runTask()`
> to go through a common handler function. Also, it ensures that all
> tasks in `framework->pending`/`queuedTasks` that are killed before
> running the task group result in all the tasks being killed.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 4add4c0180ea56039e0d5009bad4d9346128bde6 
>   src/slave/slave.cpp 5d162d00a8a9a0e318c39bda93dd584f23c87987 
> 
> Diff: https://reviews.apache.org/r/51477/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-08-31 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51477/#review147401
---




src/slave/slave.cpp (line 1549)


I would suggest to change it to `ignoring running` in order to be 
consistent with the other warning message.



src/slave/slave.cpp (lines 1686 - 1690)


Should it be `CHECK_NE(task.isSome(), taskGroup.isSome())`?



src/slave/slave.cpp (line 1721)


s/run/running/



src/slave/slave.cpp (line 1752)


I think we can `break` from here.



src/slave/slave.cpp (lines 1777 - 1778)


Kill one blank line.



src/slave/slave.cpp (lines 1851 - 1852)


I think we need to `break` between these two `}` if kill == true.



src/slave/slave.cpp (line 1879)


I do not think we need this, because if we get here, `kill` must be still 
`false`.



src/slave/slave.cpp (line 1946)


Not yours, but I think we do not need this blank line.



src/slave/slave.cpp (lines 2091 - 2095)


Should it be `CHECK_NE(tasks.isSome(), taskGroups.isSome())`?



src/slave/slave.cpp (line 2521)


Here we send a `KillTaskMessage` to executor, but the task to be killed may 
be part of a task group, so that means it is executor's responsibility to kill 
the tasks in the whole group, right? What if executor does not do it? Will we 
enforce it in agent?


- Qian Zhang


On Aug. 28, 2016, 1:44 a.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51477/
> ---
> 
> (Updated Aug. 28, 2016, 1:44 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6076
> https://issues.apache.org/jira/browse/MESOS-6076
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This changes implements the `runTaskGroup()` handler on the
> agent ensuring that task group is sent atomically to the executor
> via the `LAUNCH_GROUP` event. It also refactors `runTask()`/`_runTask()`
> to go through a common handler function. Also, it ensures that all
> tasks in `framework->pending`/`queuedTasks` that are killed before
> running the task group result in all the tasks being killed.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 4add4c0180ea56039e0d5009bad4d9346128bde6 
>   src/slave/slave.cpp 5d162d00a8a9a0e318c39bda93dd584f23c87987 
> 
> Diff: https://reviews.apache.org/r/51477/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-08-30 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51477/#review147294
---




src/slave/slave.cpp (lines 1513 - 1517)


Should it be `CHECK_NE(task.isSome(), taskGroup.isSome())`?


- Qian Zhang


On Aug. 28, 2016, 1:44 a.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51477/
> ---
> 
> (Updated Aug. 28, 2016, 1:44 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6076
> https://issues.apache.org/jira/browse/MESOS-6076
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This changes implements the `runTaskGroup()` handler on the
> agent ensuring that task group is sent atomically to the executor
> via the `LAUNCH_GROUP` event. It also refactors `runTask()`/`_runTask()`
> to go through a common handler function. Also, it ensures that all
> tasks in `framework->pending`/`queuedTasks` that are killed before
> running the task group result in all the tasks being killed.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 4add4c0180ea56039e0d5009bad4d9346128bde6 
>   src/slave/slave.cpp 5d162d00a8a9a0e318c39bda93dd584f23c87987 
> 
> Diff: https://reviews.apache.org/r/51477/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 51477: Implemented `RunTaskGroupMessage` handler on the agent.

2016-08-27 Thread Anand Mazumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51477/
---

(Updated Aug. 27, 2016, 5:44 p.m.)


Review request for mesos and Vinod Kone.


Changes
---

Rebased.


Bugs: MESOS-6076
https://issues.apache.org/jira/browse/MESOS-6076


Repository: mesos


Description
---

This changes implements the `runTaskGroup()` handler on the
agent ensuring that task group is sent atomically to the executor
via the `LAUNCH_GROUP` event. It also refactors `runTask()`/`_runTask()`
to go through a common handler function. Also, it ensures that all
tasks in `framework->pending`/`queuedTasks` that are killed before
running the task group result in all the tasks being killed.


Diffs (updated)
-

  src/slave/slave.hpp 4add4c0180ea56039e0d5009bad4d9346128bde6 
  src/slave/slave.cpp 5d162d00a8a9a0e318c39bda93dd584f23c87987 

Diff: https://reviews.apache.org/r/51477/diff/


Testing
---

make check


Thanks,

Anand Mazumdar