Re: Review Request 64095: Added a generic actor to be used by status update managers.

2017-12-07 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64095/#review193204
---




src/status_update_manager/status_update_manager_process.hpp
Lines 117 (patched)


Nit: let's add a `that` for consistency. I can fix while committing.



src/status_update_manager/status_update_manager_process.hpp
Lines 121 (patched)


Nit: backticks. I'll fix while committing.



src/status_update_manager/status_update_manager_process.hpp
Lines 182 (patched)


Let's get rid of "framework=" here and below. Will fix while committing.



src/status_update_manager/status_update_manager_process.hpp
Lines 361-371 (patched)


Nit: Too many newlines between methods. I'll fix while committing.



src/status_update_manager/status_update_manager_process.hpp
Lines 377 (patched)


Nit: Let's iterate with a reference here. Will fix while committing.



src/status_update_manager/status_update_manager_process.hpp
Lines 509 (patched)


Nit: we can omit this comment. Will fix while committing.



src/status_update_manager/status_update_manager_process.hpp
Lines 805 (patched)


Using underscores like this doesn't really follow our style guide. Let's 
follow up with a patch later to fix this style throughout the class.


- Greg Mann


On Dec. 7, 2017, 1:22 a.m., Gaston Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64095/
> ---
> 
> (Updated Dec. 7, 2017, 1:22 a.m.)
> 
> 
> Review request for mesos and Greg Mann.
> 
> 
> Bugs: MESOS-8197
> https://issues.apache.org/jira/browse/MESOS-8197
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This actor handles the checkpointing, recovery, and retry of status
> updates.
> 
> It will initially be used by the offer operations status update
> manager, but it was designed and implemented so that it can replace
> the current implementation of the task status update manager.
> 
> 
> Diffs
> -
> 
>   src/slave/constants.hpp 2d07bce299e63b8492a5ecb08b29508a671bc14c 
>   src/status_update_manager/status_update_manager_process.hpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/64095/diff/10/
> 
> 
> Testing
> ---
> 
> Added new tests at the end of the chain. They passed 5000 times on GNU/Linux.
> 
> 
> Thanks,
> 
> Gaston Kleiman
> 
>



Re: Review Request 64095: Added a generic actor to be used by status update managers.

2017-12-06 Thread Gaston Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64095/
---

(Updated Dec. 6, 2017, 5:22 p.m.)


Review request for mesos and Greg Mann.


Changes
---

Addressed feedback.


Bugs: MESOS-8197
https://issues.apache.org/jira/browse/MESOS-8197


Repository: mesos


Description
---

This actor handles the checkpointing, recovery, and retry of status
updates.

It will initially be used by the offer operations status update
manager, but it was designed and implemented so that it can replace
the current implementation of the task status update manager.


Diffs (updated)
-

  src/slave/constants.hpp 2d07bce299e63b8492a5ecb08b29508a671bc14c 
  src/status_update_manager/status_update_manager_process.hpp PRE-CREATION 


Diff: https://reviews.apache.org/r/64095/diff/8/

Changes: https://reviews.apache.org/r/64095/diff/7-8/


Testing
---

Added new tests at the end of the chain. They passed 5000 times on GNU/Linux.


Thanks,

Gaston Kleiman



Re: Review Request 64095: Added a generic actor to be used by status update managers.

2017-12-06 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64095/#review193010
---




src/status_update_manager/status_update_manager_process.hpp
Lines 808-810 (patched)


We should really only do something like this in test code, if ever (a 
`.get()` without a CHECK or `if` guard). Adding the CHECK will get us a stack 
trace if things go wrong. Instead, here let's do:

```
Try statusUuid = UUID::fromBytes(update.status().status_uuid());
CHECK_SOME(statusUuid);
if (acknowledged.contains(statusUuid.get())) {
```

Here and elsewhere.


- Greg Mann


On Dec. 2, 2017, 2:44 a.m., Gaston Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64095/
> ---
> 
> (Updated Dec. 2, 2017, 2:44 a.m.)
> 
> 
> Review request for mesos and Greg Mann.
> 
> 
> Bugs: MESOS-8197
> https://issues.apache.org/jira/browse/MESOS-8197
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This actor handles the checkpointing, recovery, and retry of status
> updates.
> 
> It will initially be used by the offer operations status update
> manager, but it was designed and implemented so that it can replace
> the current implementation of the task status update manager.
> 
> 
> Diffs
> -
> 
>   src/status_update_manager/status_update_manager_process.hpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/64095/diff/7/
> 
> 
> Testing
> ---
> 
> Added new tests at the end of the chain. They passed 5000 times on GNU/Linux.
> 
> 
> Thanks,
> 
> Gaston Kleiman
> 
>



Re: Review Request 64095: Added a generic actor to be used by status update managers.

2017-12-05 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64095/#review192904
---




src/status_update_manager/status_update_manager_process.hpp
Lines 56-58 (patched)


Should we add "Recovering status update state after failover" to this list?

Also, should we explicitly say that the SUM is not responsible for garbage 
collection of completed streams?



src/status_update_manager/status_update_manager_process.hpp
Lines 67 (patched)


After looking at the recovery code, looks like we should replace: 
s/immediately after recovery/during recovery/



src/status_update_manager/status_update_manager_process.hpp
Lines 73-74 (patched)


I did a quick search and it looks like we usually indent 4 spaces for 
template parameters? Could you confirm and update if appropriate?



src/status_update_manager/status_update_manager_process.hpp
Lines 95 (patched)


Add a comment explaining this member?



src/status_update_manager/status_update_manager_process.hpp
Lines 110-111 (patched)


Place on same line as initialization list.



src/status_update_manager/status_update_manager_process.hpp
Lines 115-120 (patched)


Is this code necessary?



src/status_update_manager/status_update_manager_process.hpp
Lines 128 (patched)


s/a call-back//



src/status_update_manager/status_update_manager_process.hpp
Lines 152 (patched)


Is this the appropriate place for this comment? Perhaps it can be removed?



src/status_update_manager/status_update_manager_process.hpp
Lines 182 (patched)


s/frameworkId/`frameworkId`/

or

s/frameworkId/framework ID/



src/status_update_manager/status_update_manager_process.hpp
Lines 206 (patched)


I don't understand this comment?



src/status_update_manager/status_update_manager_process.hpp
Lines 227 (patched)


Backticks instead of quotes for consistency.



src/status_update_manager/status_update_manager_process.hpp
Lines 237-238 (patched)


Maybe we should leave a TODO either here or next to the constant 
declaration saying that we should move this constant into this file once 
MESOS-8296 is resolved?



src/status_update_manager/status_update_manager_process.hpp
Lines 268 (patched)


s/the status update stream for stream/status update stream/



src/status_update_manager/status_update_manager_process.hpp
Lines 268-269 (patched)


Indent 4 spaces.



src/status_update_manager/status_update_manager_process.hpp
Lines 285 (patched)


Can probably remove this comment.



src/status_update_manager/status_update_manager_process.hpp
Lines 311 (patched)


Period.



src/status_update_manager/status_update_manager_process.hpp
Lines 331-332 (patched)


The following is more consistent with our style guide:

```
const std::string message =
  "Failed to recover status update stream " +
  stringify(streamId) + ": " + result.error();
```



src/status_update_manager/status_update_manager_process.hpp
Lines 405 (patched)


Are we sure that this is being resent? Isn't it possible that the update 
was queued while the manager was paused?



src/status_update_manager/status_update_manager_process.hpp
Lines 429-432 (patched)


4 Space indent



src/status_update_manager/status_update_manager_process.hpp
Lines 431 (patched)


Let's not use the C-style cast here.

How about:
```
checkpoint ? Option(getPath(streamId)) : None());
```

I think you'll need to use `Option::none()` for the NONE if 
the compiler complains.



src/status_update_manager/status_update_manager_process.hpp
Lines 454-457 (patched)


This indentation is difficult to parse.

How about
```
Result> 

Re: Review Request 64095: Added a generic actor to be used by status update managers.

2017-12-01 Thread Gaston Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64095/
---

(Updated Dec. 1, 2017, 6:44 p.m.)


Review request for mesos and Greg Mann.


Changes
---

Changed some indentation to reduce "jaggedness".


Bugs: MESOS-8197
https://issues.apache.org/jira/browse/MESOS-8197


Repository: mesos


Description
---

This actor handles the checkpointing, recovery, and retry of status
updates.

It will initially be used by the offer operations status update
manager, but it was designed and implemented so that it can replace
the current implementation of the task status update manager.


Diffs (updated)
-

  src/status_update_manager/status_update_manager_process.hpp PRE-CREATION 


Diff: https://reviews.apache.org/r/64095/diff/5/

Changes: https://reviews.apache.org/r/64095/diff/4-5/


Testing
---

Added new tests at the end of the chain. They passed 5000 times on GNU/Linux.


Thanks,

Gaston Kleiman



Re: Review Request 64095: Added a generic actor to be used by status update managers.

2017-12-01 Thread Gaston Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64095/
---

(Updated Dec. 1, 2017, 6:12 p.m.)


Review request for mesos and Greg Mann.


Changes
---

Addressed feedback.


Bugs: MESOS-8197
https://issues.apache.org/jira/browse/MESOS-8197


Repository: mesos


Description
---

This actor handles the checkpointing, recovery, and retry of status
updates.

It will initially be used by the offer operations status update
manager, but it was designed and implemented so that it can replace
the current implementation of the task status update manager.


Diffs (updated)
-

  src/status_update_manager/status_update_manager_process.hpp PRE-CREATION 


Diff: https://reviews.apache.org/r/64095/diff/4/

Changes: https://reviews.apache.org/r/64095/diff/3-4/


Testing
---

Added new tests at the end of the chain. They passed 5000 times on GNU/Linux.


Thanks,

Gaston Kleiman



Re: Review Request 64095: Added a generic actor to be used by status update managers.

2017-12-01 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64095/#review192544
---




src/status_update_manager/status_update_manager_process.hpp
Lines 654 (patched)


Actually, after looking at this again I was thinking: perhaps this comment 
should be changed to tell readers that we are doing two things here: both 
building our in-memory streams and the state object which we return to the 
caller. WDYT?



src/status_update_manager/status_update_manager_process.hpp
Lines 666 (patched)


Could we use a `switch` here instead of `if`?



src/status_update_manager/status_update_manager_process.hpp
Lines 678 (patched)


s/.get()./->/



src/status_update_manager/status_update_manager_process.hpp
Lines 692 (patched)


As written, this is a bit opaque. I had to look up why we're seeking from 
the current position with an offset of zero. I now see that we're doing this in 
an attempt to return the current file position, ensuring that the current 
position is valid.

Could you clarify this? I could imagine doing it by renaming this variable 
from `lseek` to `currentPosition`, or putting a note in the comment above.



src/status_update_manager/status_update_manager_process.hpp
Lines 709-710 (patched)


Fits on one line.



src/status_update_manager/status_update_manager_process.hpp
Lines 717 (patched)


Could you get rid of this `else`?



src/status_update_manager/status_update_manager_process.hpp
Lines 725 (patched)


We could probably use a `std::pair` here instead. And maybe we can use 
`std::make_pair`, as long as it doesn't have issues with type deduction?



src/status_update_manager/status_update_manager_process.hpp
Lines 734 (patched)


s/duplicate/duplicate or has already been acknowledged/



src/status_update_manager/status_update_manager_process.hpp
Lines 747 (patched)


s/uuid/status_uuid/



src/status_update_manager/status_update_manager_process.hpp
Lines 754 (patched)


Could you remove the exclamation point?



src/status_update_manager/status_update_manager_process.hpp
Lines 780-783 (patched)


I know that in this context, using just `uuid` is not as ambiguous, but 
given the presence of multiple UUIDs in general I think it's better to be 
explicit. Could we use `status_uuid` for the function argument?



src/status_update_manager/status_update_manager_process.hpp
Lines 790-791 (patched)


Since we print the status UUID in the stream operator for the update type, 
we should be able to eliminate the explicit UUID in the log line here and not 
lose any information.



src/status_update_manager/status_update_manager_process.hpp
Lines 847-849 (patched)


I think it's more common for us to keep the empty function body on the same 
line:

```
: terminated(false), streamId(_streamId), path(_path), fd(_fd) {}
```

Here and elsewhere.



src/status_update_manager/status_update_manager_process.hpp
Lines 855 (patched)


s/its/it's/



src/status_update_manager/status_update_manager_process.hpp
Lines 873 (patched)


Ditto - use a `switch` here.



src/status_update_manager/status_update_manager_process.hpp
Lines 885-886 (patched)


Hmm this error message seems strange in the ACK case?



src/status_update_manager/status_update_manager_process.hpp
Lines 905 (patched)


Ditto - use a `switch` here.



src/status_update_manager/status_update_manager_process.hpp
Lines 907-908 (patched)


Let's pass framework ID into the creation method to eliminate the NONE case 
here.



src/status_update_manager/status_update_manager_process.hpp
Lines 912 (patched)


I think we can remove this comment.



src/status_update_manager/status_update_manager_process.hpp
Lines 923 (patched)


Can remove this comment.


- Greg Mann


On Dec. 1, 2017, 1:42 a.m., Gaston Kleiman 

Re: Review Request 64095: Added a generic actor to be used by status update managers.

2017-12-01 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64095/#review192528
---




src/status_update_manager/status_update_manager_process.hpp
Lines 588 (patched)


Could we switch to returning an `Owned` here instead of a raw pointer?



src/status_update_manager/status_update_manager_process.hpp
Lines 593-596 (patched)


What are the implications of this for the empty file case? It seems the 
caller must be aware that they should delete an empty file, but I'm not sure 
that they receive specific enough feedback for this.

Perhaps `recover` should remove the file if it's empty?



src/status_update_manager/status_update_manager_process.hpp
Lines 609-610 (patched)


Is there a reason you're not using `O_APPEND` here?



src/status_update_manager/status_update_manager_process.hpp
Lines 650 (patched)


You can probably remove this comment.



src/status_update_manager/status_update_manager_process.hpp
Lines 651 (patched)


s/update stream/updates/



src/status_update_manager/status_update_manager_process.hpp
Lines 654 (patched)


Given the log line above, I think this comment can be removed as well.


- Greg Mann


On Dec. 1, 2017, 1:42 a.m., Gaston Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64095/
> ---
> 
> (Updated Dec. 1, 2017, 1:42 a.m.)
> 
> 
> Review request for mesos and Greg Mann.
> 
> 
> Bugs: MESOS-8197
> https://issues.apache.org/jira/browse/MESOS-8197
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This actor handles the checkpointing, recovery, and retry of status
> updates.
> 
> It will initially be used by the offer operations status update
> manager, but it was designed and implemented so that it can replace
> the current implementation of the task status update manager.
> 
> 
> Diffs
> -
> 
>   src/status_update_manager/status_update_manager_process.hpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/64095/diff/3/
> 
> 
> Testing
> ---
> 
> Added new tests at the end of the chain. They passed 5000 times on GNU/Linux.
> 
> 
> Thanks,
> 
> Gaston Kleiman
> 
>



Re: Review Request 64095: Added a generic actor to be used by status update managers.

2017-11-30 Thread Gaston Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64095/#review192409
---



Result of the review session with Graag.


src/status_update_manager/status_update_manager_process.hpp
Lines 55-56 (patched)


Let's try making this a nested class. Maybe we can name it 
`StatusUpdateStream` and get rid of the typedef.



src/status_update_manager/status_update_manager_process.hpp
Lines 69 (patched)


Reference here a JIRA for the migration of the `TaskStatusUpdateManager`.

Say whether the manager will be paused or not on creation.



src/status_update_manager/status_update_manager_process.hpp
Lines 73 (patched)


Check what we do in our codebase regarding indentation of template 
parameters.



src/status_update_manager/status_update_manager_process.hpp
Lines 76 (patched)


Add a description.



src/status_update_manager/status_update_manager_process.hpp
Lines 87 (patched)


s/state/streams/



src/status_update_manager/status_update_manager_process.hpp
Lines 119 (patched)


s/getPath/_getPath/



src/status_update_manager/status_update_manager_process.hpp
Lines 122 (patched)


s/forward/_forwardCallback/



src/status_update_manager/status_update_manager_process.hpp
Lines 130-132 (patched)


s/If no stream ID/If checkpoint is `false`/



src/status_update_manager/status_update_manager_process.hpp
Lines 172-173 (patched)


Keep just the beginning.



src/status_update_manager/status_update_manager_process.hpp
Lines 178-179 (patched)


Forward the status update if this is at the front of the queue.
Subsequent status updates will be sent in 'acknowledgement()'.



src/status_update_manager/status_update_manager_process.hpp
Lines 223-235 (patched)


Move this to Stream::acknowledgement



src/status_update_manager/status_update_manager_process.hpp
Lines 246 (patched)


s/Duplicate status acknowledgement/Duplicate status update acknowledgement/



src/status_update_manager/status_update_manager_process.hpp
Lines 276 (patched)


Maybe "Recovers the status update manager's state using the supplied stream 
IDs."



src/status_update_manager/status_update_manager_process.hpp
Lines 280-281 (patched)


- The recovered state if successful.
- The recovered state, including the number of errors encountered, if 
`strict == false` and any of the streams couldn't be recovered.
- A `Failure` if `strict == true` and any of the streams couldn't be 
recovered.



src/status_update_manager/status_update_manager_process.hpp
Lines 288 (patched)


Let's call this just `state`.



src/status_update_manager/status_update_manager_process.hpp
Lines 307 (patched)


We want to increment `state.errors` if `strict=false`.



src/status_update_manager/status_update_manager_process.hpp
Lines 309 (patched)


// This can happen if the initial checkpoint of the stream didn't complete.



src/status_update_manager/status_update_manager_process.hpp
Lines 332 (patched)


s/the//



src/status_update_manager/status_update_manager_process.hpp
Lines 334 (patched)


I think that we don't need this one ;-)/



src/status_update_manager/status_update_manager_process.hpp
Lines 341-342 (patched)


```
LOG(INFO) << "Closing status update streams for framework"
  << " '" << frameworkId << "'";
```



src/status_update_manager/status_update_manager_process.hpp
Lines 346 (patched)


Split onto two different lines.



src/status_update_manager/status_update_manager_process.hpp
Lines 366-373 (patched)


We can use `stream->next()` here.



src/status_update_manager/status_update_manager_process.hpp
Lines 569 (patched)


We 

Re: Review Request 64095: Added a generic actor to be used by status update managers.

2017-11-30 Thread Gaston Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64095/
---

(Updated Nov. 30, 2017, 5:42 p.m.)


Review request for mesos and Greg Mann.


Changes
---

Result of the review session with Graag.


Bugs: MESOS-8197
https://issues.apache.org/jira/browse/MESOS-8197


Repository: mesos


Description (updated)
---

This actor handles the checkpointing, recovery, and retry of status
updates.

It will initially be used by the offer operations status update
manager, but it was designed and implemented so that it can replace
the current implementation of the task status update manager.


Diffs
-

  src/status_update_manager/status_update_manager_process.hpp PRE-CREATION 


Diff: https://reviews.apache.org/r/64095/diff/3/


Testing
---

Added new tests at the end of the chain. They passed 5000 times on GNU/Linux.


Thanks,

Gaston Kleiman



Re: Review Request 64095: Added a generic actor to be used by status update managers.

2017-11-30 Thread Gaston Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64095/
---

(Updated Nov. 30, 2017, 4:10 p.m.)


Review request for mesos and Greg Mann.


Changes
---

Changed the signature of the `recover` method.


Bugs: MESOS-8197
https://issues.apache.org/jira/browse/MESOS-8197


Repository: mesos


Description
---

This actor handles the checkpointing, recovery, and retry of status
updates, and will initially be used by the offer operations status
update manager.


Diffs (updated)
-

  src/status_update_manager/status_update_manager_process.hpp PRE-CREATION 


Diff: https://reviews.apache.org/r/64095/diff/3/

Changes: https://reviews.apache.org/r/64095/diff/2-3/


Testing
---

Added new tests at the end of the chain. They passed 5000 times on GNU/Linux.


Thanks,

Gaston Kleiman



Re: Review Request 64095: Added a generic actor to be used by status update managers.

2017-11-30 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64095/#review192379
---




src/status_update_manager/status_update_manager_process.hpp
Lines 57-61 (patched)


Could you provide a summary of the template parameters here to improve 
readability?


- Greg Mann


On Nov. 27, 2017, 7:52 p.m., Gaston Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64095/
> ---
> 
> (Updated Nov. 27, 2017, 7:52 p.m.)
> 
> 
> Review request for mesos and Greg Mann.
> 
> 
> Bugs: MESOS-8197
> https://issues.apache.org/jira/browse/MESOS-8197
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This actor handles the checkpointing, recovery, and retry of status
> updates, and will initially be used by the offer operations status
> update manager.
> 
> 
> Diffs
> -
> 
>   src/status_update_manager/status_update_manager_process.hpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/64095/diff/2/
> 
> 
> Testing
> ---
> 
> Added new tests at the end of the chain. They passed 5000 times on GNU/Linux.
> 
> 
> Thanks,
> 
> Gaston Kleiman
> 
>



Re: Review Request 64095: Added a generic actor to be used by status update managers.

2017-11-29 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64095/#review192171
---




src/status_update_manager/status_update_manager_process.hpp
Lines 689 (patched)


executor?


- Greg Mann


On Nov. 27, 2017, 7:52 p.m., Gaston Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64095/
> ---
> 
> (Updated Nov. 27, 2017, 7:52 p.m.)
> 
> 
> Review request for mesos and Greg Mann.
> 
> 
> Bugs: MESOS-8197
> https://issues.apache.org/jira/browse/MESOS-8197
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This actor handles the checkpointing, recovery, and retry of status
> updates, and will initially be used by the offer operations status
> update manager.
> 
> 
> Diffs
> -
> 
>   src/status_update_manager/status_update_manager_process.hpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/64095/diff/2/
> 
> 
> Testing
> ---
> 
> Added new tests at the end of the chain. They passed 5000 times on GNU/Linux.
> 
> 
> Thanks,
> 
> Gaston Kleiman
> 
>