Re: Review Request 64098: Send status updates when agent re-registers.

2017-12-12 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review193557
---


Ship it!




Ship It!

- Jiang Yan Xu


On Dec. 12, 2017, 4:54 a.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64098/
> ---
> 
> (Updated Dec. 12, 2017, 4:54 a.m.)
> 
> 
> Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6406
> https://issues.apache.org/jira/browse/MESOS-6406
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master will send task status updates to frameworks when an agent
> which has been previously removed by the master for being unreachable
> or is unknown to the master due to the garbage collection of
> the unreachable and gone agents in the registry re-registers.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp b10d0341276090bfa70aaa4fd6317a560e3334ea 
>   src/tests/master_allocator_tests.cpp 
> 10de6f0104a6a93ff5037d7e2ab5cf21d57fbec8 
>   src/tests/master_tests.cpp 7b087675200fe0ca69319b715997700a91d9f6b2 
>   src/tests/partition_tests.cpp 54ccf783682e7e6db0847b9a6313489b7f8181f8 
>   src/tests/persistent_volume_tests.cpp 
> f255382af957cfa66f5efaffcaf1082b83b35e58 
>   src/tests/slave_recovery_tests.cpp 253b0fc2ff7ec1f00937d42636151553c46d5175 
>   src/tests/upgrade_tests.cpp 0efaa586153564b20f7884023946a11635425ee4 
> 
> 
> Diff: https://reviews.apache.org/r/64098/diff/15/
> 
> 
> Testing
> ---
> 
> with make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 64098: Send status updates when agent re-registers.

2017-12-12 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review193544
---



FAIL: Some Mesos tests failed.

Reviews applied: `['64250', '64098']`

Failed command: `D:\DCOS\mesos\src\mesos-tests.exe --verbose`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64098

Relevant logs:

- 
[mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64098/logs/mesos-tests-stdout.log):

```

[--] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN  ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[   OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (2284 ms)
[--] 1 test from IsolationFlag/CpuIsolatorTest (2306 ms total)

[--] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN  ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[   OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (2265 ms)
[--] 1 test from IsolationFlag/MemoryIsolatorTest (2287 ms total)

[--] Global test environment tear-down
[==] 829 tests from 84 test cases ran. (305398 ms total)
[  PASSED  ] 819 tests.
[  FAILED  ] 10 tests, listed below:
[  FAILED  ] OfferOperationStatusUpdateManagerTest.UpdateAndAckNonTerminalUpdate
[  FAILED  ] OfferOperationStatusUpdateManagerTest.RecoverCheckpointedStream
[  FAILED  ] OfferOperationStatusUpdateManagerTest.RecoverEmptyFile
[  FAILED  ] OfferOperationStatusUpdateManagerTest.RecoverTerminatedStream
[  FAILED  ] OfferOperationStatusUpdateManagerTest.IgnoreDuplicateUpdate
[  FAILED  ] 
OfferOperationStatusUpdateManagerTest.IgnoreDuplicateUpdateAfterRecover
[  FAILED  ] OfferOperationStatusUpdateManagerTest.RejectDuplicateAck
[  FAILED  ] 
OfferOperationStatusUpdateManagerTest.RejectDuplicateAckAfterRecover
[  FAILED  ] 
OfferOperationStatusUpdateManagerTest.NonStrictRecoveryCorruptedFile
[  FAILED  ] SlaveTest.ResourceProviderPublishAll

10 FAILED TESTS
  YOU HAVE 204 DISABLED TESTS

```

- 
[mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64098/logs/mesos-tests-stderr.log):

```
I1212 15:36:49.166775  6504 master.cpp:10149] Updating the state of task 
12db25fd-0b6f-4f66-9b84-e1a1c13e46ff of framework 
d72ff96e-44f0-4c60-bf3f-16dff914edd0- (latest state: TASK_KILLED, status 
update state: TASK_KILLED)
I1212 15:36:49.1I1212 15:36:48.500752  7276 exec.cpp:162] Version: 1.5.0
I1212 15:36:48.523741   508 exec.cpp:237] Executor registered on agent 
d72ff96e-44f0-4c60-bf3f-16dff914edd0-S0
I1212 15:36:48.525737  4216 executor.cpp:171] Received SUBSCRIBED event
I1212 15:36:48.529757  4216 executor.cpp:175] Subscribed executor on 
build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I1212 15:36:48.530753  4216 executor.cpp:171] Received LAUNCH event
I1212 15:36:48.533756  4216 executor.cpp:637] Starting task 
12db25fd-0b6f-4f66-9b84-e1a1c13e46ff
I1212 15:36:48.606755  4216 executor.cpp:477] Running 
'D:\DCOS\mesos\src\mesos-containerizer.exe launch '
I1212 15:36:49.136749  4216 executor.cpp:650] Forked command at 6088
I1212 15:36:49.168748  7684 exec.cpp:435] Executor asked to shutdown
I1212 15:36:49.169749  8940 executor.cpp:171] Received SHUTDOWN event
I1212 15:36:49.169749  8940 executor.cpp:747] Shutting down
I1212 15:36:49.169749  8940 executor.cpp:854] Sending SIGTERM to process tree 
at pid 666775  5788 slave.cpp:3400] Shutting down framework 
d72ff96e-44f0-4c60-bf3f-16dff914edd0-
I1212 15:36:49.167755  5788 slave.cpp:6091] Shutting down executor 
'12db25fd-0b6f-4f66-9b84-e1a1c13e46ff' of framework 
d72ff96e-44f0-4c60-bf3f-16dff914edd0- at executor(1)@10.3.1.5:60853
I1212 15:36:49.168748  6504 master.cpp:10255] Removing task 
12db25fd-0b6f-4f66-9b84-e1a1c13e46ff with resources cpus(allocated: *):4; 
mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: 
*):[31000-32000] of framework d72ff96e-44f0-4c60-bf3f-16dff914edd0- on 
agent d72ff96e-44f0-4c60-bf3f-16dff914edd0-S0 at slave(326)@10.3.1.5:60832 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I1212 15:36:49.169749  5532 slave.cpp:909] Agent terminating
W1212 15:36:49.169749  5532 slave.cpp:3396] Ignoring shutdown framework 
d72ff96e-44f0-4c60-bf3f-16dff914edd0- because it is terminating
I1212 15:36:49.170748  6504 master.cpp:1305] Agent 
d72ff96e-44f0-4c60-bf3f-16dff914edd0-S0 at slave(326)@10.3.1.5:60832 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I1212 15:36:49.171747  6504 master.cpp:3364] Disconnecting agent 
d72ff96e-44f0-4c60-bf3f-16dff914edd0-S0 at slave(326)@10.3.1.5:60832 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I1212 15:36:49.171747  6504 master.cpp:3383] Deactivating agent 
d72ff96e-44f0-4c60-bf3f-16dff914edd0-S0 at slave(326)@10.3.1.5:60832 

Re: Review Request 64098: Send status updates when agent re-registers.

2017-12-12 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review193536
---



FAIL: Failed to apply the current review.

Failed command: `python.exe .\support\apply-reviews.py -n -r 64098`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64098

Relevant logs:

- 
[apply-review-64098-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64098/logs/apply-review-64098-stdout.log):

```
error: patch failed: src/master/master.cpp:6804
error: src/master/master.cpp: patch does not apply
```

- Mesos Reviewbot Windows


On Dec. 12, 2017, 6:24 p.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64098/
> ---
> 
> (Updated Dec. 12, 2017, 6:24 p.m.)
> 
> 
> Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6406
> https://issues.apache.org/jira/browse/MESOS-6406
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master will send task status updates to frameworks when an agent
> which has been previously removed by the master for being unreachable
> or is unknown to the master due to the garbage collection of
> the unreachable and gone agents in the registry re-registers.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp dfe60ef670edcaefa0c1241df2e2870f650fcf9e 
>   src/tests/master_allocator_tests.cpp 
> 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
>   src/tests/master_tests.cpp 57eae320a7a398527cd3623c89bf67f319a8e955 
>   src/tests/partition_tests.cpp 31ebfe1655438eceae74d72a223df03a9dbd282d 
>   src/tests/persistent_volume_tests.cpp 
> 4aa3c2e8b0f461cd78053707cff8bcb2e6f2b0d7 
>   src/tests/slave_recovery_tests.cpp f14c6ef69eb20a03454c8197df79b572a3c6d050 
>   src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 
> 
> 
> Diff: https://reviews.apache.org/r/64098/diff/14/
> 
> 
> Testing
> ---
> 
> with make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 64098: Send status updates when agent re-registers.

2017-12-12 Thread Megha Sharma

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/
---

(Updated Dec. 12, 2017, 12:54 p.m.)


Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.


Changes
---

tested with make check


Bugs: MESOS-6406
https://issues.apache.org/jira/browse/MESOS-6406


Repository: mesos


Description
---

Master will send task status updates to frameworks when an agent
which has been previously removed by the master for being unreachable
or is unknown to the master due to the garbage collection of
the unreachable and gone agents in the registry re-registers.


Diffs (updated)
-

  src/master/master.cpp dfe60ef670edcaefa0c1241df2e2870f650fcf9e 
  src/tests/master_allocator_tests.cpp 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
  src/tests/master_tests.cpp 57eae320a7a398527cd3623c89bf67f319a8e955 
  src/tests/partition_tests.cpp 31ebfe1655438eceae74d72a223df03a9dbd282d 
  src/tests/persistent_volume_tests.cpp 
4aa3c2e8b0f461cd78053707cff8bcb2e6f2b0d7 
  src/tests/slave_recovery_tests.cpp f14c6ef69eb20a03454c8197df79b572a3c6d050 
  src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 


Diff: https://reviews.apache.org/r/64098/diff/13/

Changes: https://reviews.apache.org/r/64098/diff/12-13/


Testing
---

with make check


Thanks,

Megha Sharma



Re: Review Request 64098: Send status updates when agent re-registers.

2017-12-11 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review193466
---



FAIL: Failed to apply the current review.

Failed command: `python.exe .\support\apply-reviews.py -n -r 64098`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64098

Relevant logs:

- 
[apply-review-64098-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64098/logs/apply-review-64098-stdout.log):

```
error: patch failed: src/master/master.cpp:6805
error: src/master/master.cpp: patch does not apply
```

- Mesos Reviewbot Windows


On Dec. 8, 2017, 6:42 p.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64098/
> ---
> 
> (Updated Dec. 8, 2017, 6:42 p.m.)
> 
> 
> Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6406
> https://issues.apache.org/jira/browse/MESOS-6406
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master will send task status updates to frameworks when an agent
> which has been previously removed by the master for being unreachable
> or is unknown to the master due to the garbage collection of
> the unreachable and gone agents in the registry re-registers.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp dfe60ef670edcaefa0c1241df2e2870f650fcf9e 
>   src/tests/master_allocator_tests.cpp 
> 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
>   src/tests/master_tests.cpp 57eae320a7a398527cd3623c89bf67f319a8e955 
>   src/tests/partition_tests.cpp 31ebfe1655438eceae74d72a223df03a9dbd282d 
>   src/tests/persistent_volume_tests.cpp 
> 4aa3c2e8b0f461cd78053707cff8bcb2e6f2b0d7 
>   src/tests/slave_recovery_tests.cpp f14c6ef69eb20a03454c8197df79b572a3c6d050 
>   src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 
> 
> 
> Diff: https://reviews.apache.org/r/64098/diff/12/
> 
> 
> Testing
> ---
> 
> with make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 64098: Send status updates when agent re-registers.

2017-12-11 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review193460
---



LGTM sans Ilya's comment about only use `replicated_log` flag where necessary.


src/master/master.cpp
Line 6807 (original), 6803-6804 (patched)


Move `Framework* framework = getFramework(frameworkId);` down into the if 
condition since it's used only in there?



src/tests/partition_tests.cpp
Lines 402-415 (original), 403-407 (patched)


Expectations above `detector.appoint(master.get()->pid);` everywhere as 
suggested in the last review.



src/tests/partition_tests.cpp
Lines 716-730 (original), 710-714 (patched)


Ditto.



src/tests/partition_tests.cpp
Lines 1030-1032 (original), 1015-1025 (patched)


Ditto.



src/tests/partition_tests.cpp
Lines 2297 (patched)


`statusFromAgent` sounds like it's actually sent by the agent. Maybe 
`finsihedStatusFromMaster` for this update and rename the other `finshedStatus` 
to `finsihedStatusFromAgent` makes it a bit easier to distinguish? (I do 
understand that you meant that the status is from the agent's reregister 
message but I think  `finsihedStatusFromMaster` vs `finsihedStatusFromAgent` is 
clearer).

Also, can we add a comment too? e.g., the first status is sent as part of 
the reregistation and the second is by the agent's status update manager after 
it is reregsitered.


- Jiang Yan Xu


On Dec. 8, 2017, 10:42 a.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64098/
> ---
> 
> (Updated Dec. 8, 2017, 10:42 a.m.)
> 
> 
> Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6406
> https://issues.apache.org/jira/browse/MESOS-6406
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master will send task status updates to frameworks when an agent
> which has been previously removed by the master for being unreachable
> or is unknown to the master due to the garbage collection of
> the unreachable and gone agents in the registry re-registers.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp dfe60ef670edcaefa0c1241df2e2870f650fcf9e 
>   src/tests/master_allocator_tests.cpp 
> 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
>   src/tests/master_tests.cpp 57eae320a7a398527cd3623c89bf67f319a8e955 
>   src/tests/partition_tests.cpp 31ebfe1655438eceae74d72a223df03a9dbd282d 
>   src/tests/persistent_volume_tests.cpp 
> 4aa3c2e8b0f461cd78053707cff8bcb2e6f2b0d7 
>   src/tests/slave_recovery_tests.cpp f14c6ef69eb20a03454c8197df79b572a3c6d050 
>   src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 
> 
> 
> Diff: https://reviews.apache.org/r/64098/diff/12/
> 
> 
> Testing
> ---
> 
> with make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 64098: Send status updates when agent re-registers.

2017-12-11 Thread Ilya Pronin


> On Dec. 7, 2017, 1:36 p.m., Ilya Pronin wrote:
> > src/tests/master_tests.cpp
> > Lines 7477 (patched)
> > 
> >
> > Ditto re replicated log based registry.
> 
> Megha Sharma wrote:
> I added this specifically to tests that have partitioned agents 
> re-registering to ensure that the additional status updates still happen with 
> the registry in use. That we don't have to look through the tests to see if 
> the master restarted or not and have the test always do the right thing. Also 
> I don't see any harm being done here by using registry, for one it keeps the 
> master configuration consistent across tests.
> 
> Megha Sharma wrote:
> Ilya, I am dropping these issues based on my above argument. Please feel 
> free to re-open them if needed.

I agree that using the replicated log based registry in these tests does no 
harm. Here instead of changing a specific test when it's required, you did a 
blanket change in a bunch of (now) unaffected tests. I just prefer to keep 
tests minimal to avoid accidentally hiding something. But that's just my 
preference, feel free to ignore it.


> On Dec. 7, 2017, 1:36 p.m., Ilya Pronin wrote:
> > src/tests/partition_tests.cpp
> > Lines 820 (patched)
> > 
> >
> > Ditto re replicated log based registry.
> 
> Megha Sharma wrote:
> Test description says that master did failover here 
> `PartitionedSlaveReregistrationMasterFailover`.

Yep, found it. Sorry.


> On Dec. 7, 2017, 1:36 p.m., Ilya Pronin wrote:
> > src/tests/upgrade_tests.cpp
> > Lines 106 (patched)
> > 
> >
> > Ditto re replicated log based registry.
> 
> Megha Sharma wrote:
> Yes it does, look for `StartMaster(masterFlags)` in this test, it will 
> appear twice.

Found it. Seems that my search skills failed me, sorry.


- Ilya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review193141
---


On Dec. 8, 2017, 10:42 a.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64098/
> ---
> 
> (Updated Dec. 8, 2017, 10:42 a.m.)
> 
> 
> Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6406
> https://issues.apache.org/jira/browse/MESOS-6406
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master will send task status updates to frameworks when an agent
> which has been previously removed by the master for being unreachable
> or is unknown to the master due to the garbage collection of
> the unreachable and gone agents in the registry re-registers.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp dfe60ef670edcaefa0c1241df2e2870f650fcf9e 
>   src/tests/master_allocator_tests.cpp 
> 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
>   src/tests/master_tests.cpp 57eae320a7a398527cd3623c89bf67f319a8e955 
>   src/tests/partition_tests.cpp 31ebfe1655438eceae74d72a223df03a9dbd282d 
>   src/tests/persistent_volume_tests.cpp 
> 4aa3c2e8b0f461cd78053707cff8bcb2e6f2b0d7 
>   src/tests/slave_recovery_tests.cpp f14c6ef69eb20a03454c8197df79b572a3c6d050 
>   src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 
> 
> 
> Diff: https://reviews.apache.org/r/64098/diff/12/
> 
> 
> Testing
> ---
> 
> with make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 64098: Send status updates when agent re-registers.

2017-12-08 Thread Megha Sharma

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/
---

(Updated Dec. 8, 2017, 6:42 p.m.)


Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.


Bugs: MESOS-6406
https://issues.apache.org/jira/browse/MESOS-6406


Repository: mesos


Description
---

Master will send task status updates to frameworks when an agent
which has been previously removed by the master for being unreachable
or is unknown to the master due to the garbage collection of
the unreachable and gone agents in the registry re-registers.


Diffs (updated)
-

  src/master/master.cpp dfe60ef670edcaefa0c1241df2e2870f650fcf9e 
  src/tests/master_allocator_tests.cpp 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
  src/tests/master_tests.cpp 57eae320a7a398527cd3623c89bf67f319a8e955 
  src/tests/partition_tests.cpp 31ebfe1655438eceae74d72a223df03a9dbd282d 
  src/tests/persistent_volume_tests.cpp 
4aa3c2e8b0f461cd78053707cff8bcb2e6f2b0d7 
  src/tests/slave_recovery_tests.cpp f14c6ef69eb20a03454c8197df79b572a3c6d050 
  src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 


Diff: https://reviews.apache.org/r/64098/diff/12/

Changes: https://reviews.apache.org/r/64098/diff/11-12/


Testing
---

with make check


Thanks,

Megha Sharma



Re: Review Request 64098: Send status updates when agent re-registers.

2017-12-07 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review193189
---



FAIL: Failed to apply the current review.

Failed command: `python.exe .\support\apply-reviews.py -n -r 64098`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64098

Relevant logs:

- 
[apply-review-64098-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64098/logs/apply-review-64098-stdout.log):

```
[master 3895bf542] Send status updates when agent re-registers.
 Author: Megha Sharma 
 7 files changed, 115 insertions(+), 92 deletions(-)
```

- Mesos Reviewbot Windows


On Dec. 2, 2017, 12:29 a.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64098/
> ---
> 
> (Updated Dec. 2, 2017, 12:29 a.m.)
> 
> 
> Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6406
> https://issues.apache.org/jira/browse/MESOS-6406
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master will send task status updates to frameworks when an agent
> which has been previously removed by the master for being unreachable
> or is unknown to the master due to the garbage collection of
> the unreachable and gone agents in the registry re-registers.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp dfe60ef670edcaefa0c1241df2e2870f650fcf9e 
>   src/tests/master_allocator_tests.cpp 
> 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
>   src/tests/master_tests.cpp 57eae320a7a398527cd3623c89bf67f319a8e955 
>   src/tests/partition_tests.cpp 31ebfe1655438eceae74d72a223df03a9dbd282d 
>   src/tests/persistent_volume_tests.cpp 
> 4aa3c2e8b0f461cd78053707cff8bcb2e6f2b0d7 
>   src/tests/slave_recovery_tests.cpp f14c6ef69eb20a03454c8197df79b572a3c6d050 
>   src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 
> 
> 
> Diff: https://reviews.apache.org/r/64098/diff/11/
> 
> 
> Testing
> ---
> 
> with make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 64098: Send status updates when agent re-registers.

2017-12-07 Thread Ilya Pronin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review193141
---




src/master/master.cpp
Lines 6806-6810 (patched)


Also, this message will be written to the log for every task on the agent. 
+1 for removing it.



src/master/master.cpp
Lines 6814 (patched)


`s/registration/re-registered/`



src/tests/master_tests.cpp
Lines 7021 (patched)


Why do you need the replicated log based registry here? The master is not 
failing over in this test. Did I miss something?



src/tests/master_tests.cpp
Lines 7477 (patched)


Ditto re replicated log based registry.



src/tests/partition_tests.cpp
Lines 173 (patched)


Ditto re replicated log based registry.



src/tests/partition_tests.cpp
Lines 538 (patched)


Ditto re replicated log based registry.



src/tests/partition_tests.cpp
Lines 820 (patched)


Ditto re replicated log based registry.



src/tests/partition_tests.cpp
Lines 2151 (patched)


Ditto re replicated log based registry.



src/tests/upgrade_tests.cpp
Lines 106 (patched)


Ditto re replicated log based registry.


- Ilya Pronin


On Dec. 1, 2017, 4:29 p.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64098/
> ---
> 
> (Updated Dec. 1, 2017, 4:29 p.m.)
> 
> 
> Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6406
> https://issues.apache.org/jira/browse/MESOS-6406
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master will send task status updates to frameworks when an agent
> which has been previously removed by the master for being unreachable
> or is unknown to the master due to the garbage collection of
> the unreachable and gone agents in the registry re-registers.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp dfe60ef670edcaefa0c1241df2e2870f650fcf9e 
>   src/tests/master_allocator_tests.cpp 
> 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
>   src/tests/master_tests.cpp 57eae320a7a398527cd3623c89bf67f319a8e955 
>   src/tests/partition_tests.cpp 31ebfe1655438eceae74d72a223df03a9dbd282d 
>   src/tests/persistent_volume_tests.cpp 
> 4aa3c2e8b0f461cd78053707cff8bcb2e6f2b0d7 
>   src/tests/slave_recovery_tests.cpp f14c6ef69eb20a03454c8197df79b572a3c6d050 
>   src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 
> 
> 
> Diff: https://reviews.apache.org/r/64098/diff/11/
> 
> 
> Testing
> ---
> 
> with make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 64098: Send status updates when agent re-registers.

2017-12-05 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review192895
---



Awesome! Only minor comments below.


src/master/master.cpp
Lines 6806 (patched)






src/master/master.cpp
Lines 6806-6810 (patched)


The fact that an unknown agent is allowed to register is already logged: 
https://github.com/apache/mesos/blob/c9462f4927cfffb1f3a90827467ded730c0f40b9/src/master/registry_operations.cpp#L133

Here I was orignally thinking we needed to add some information about the 
fact that the update is for an unknown agent. However since the logging in 
`forward` already logs `update.status().message()`, perhaps it's fine and so we 
don't need a separate log line after all. :)



src/master/master.cpp
Lines 6812-6814 (patched)


Our convention if the two conditions are on separate lines:

```
const string message = slaves.unreachable.contains(slaveInfo.id())
? "Unreachable agent re-reregistered"
: "Unknown agent reregistered";
```

Note that I s/re-registration/reregistered/ because it's more consistent 
with other status messages.



src/master/master.cpp
Line 6809 (original), 6816 (patched)


Should this line be the first thing done under the if condition since the 
rest of the logic is about sending status updates?



src/master/master.cpp
Lines 6833 (patched)


So we don't log anything when `framework == nullptr` but log a warning when 
`!framework->connected()`, should we treat the two cases the same?



src/master/master.cpp
Lines 6834-6835 (patched)


Let's log the message of the update as well (the same way `forward` does 
it).

FWIW I think it's probably the best to log it in `ostream& 
operator<<(ostream& stream, const StatusUpdate& update)` but it probably 
requires a separate commit to refactor.



src/master/master.cpp
Lines 6842-6843 (patched)


This comment comes from 
https://github.com/apache/mesos/commit/678b864cb78c74cc98b960f921d07869ce3300c5 
which I don't think is relevant anymore and can be removed.



src/tests/master_allocator_tests.cpp
Lines 1362 (patched)


Not yours but the use of `this->` is annoying here (we generally don't do 
it unless for disambiguation) so perhaps don't add more occurrences for now and 
keep the existing ocurrences for a future sweep.



src/tests/master_tests.cpp
Lines 2881-2883 (original), 2885-2887 (patched)


Can we also set the expectation for the new reason?



src/tests/master_tests.cpp
Lines 7188-7190 (patched)


You have put the expectations above `detector.appoint(master.get()->pid);` 
in the test above. 

I think it's a better and clearer alternative to prevent race conditions. 

Even though there's a 
`Clock::advance(agentFlags.registration_backoff_factor);` below to prevent it 
when the clock is paused, it's clearer and more foolproof if we don't insert 
the expectation between `detector.appoint(master.get()->pid);` and the clock 
advancement.

Plus 

```
detector.appoint(master.get()->pid);
Clock::advance(agentFlags.registration_backoff_factor);
```

can be thought of as the same group of statements so better to be kept 
together.

Here and below.



src/tests/master_tests.cpp
Lines 7198-7199 (patched)


Can we also set the expectation on the reason?



src/tests/master_tests.cpp
Lines 7661-7663 (patched)


Ditto on pulling it above `detector.appoint(master.get()->pid);`.



src/tests/master_tests.cpp
Lines 7669-7671 (patched)


Can we also set the expectation on the reason?

Can we do this for other places (when reasonable)?



src/tests/partition_tests.cpp
Line 2181 (original), 2146 (patched)


How come the commit hook didn't catch this over 80 char line?



src/tests/partition_tests.cpp
Lines 2299 (patched)


Here it seems that we'll receive two status updates, one from the agent and 
one sent by the master.

Setting `WillRepeatedly` here is probably racy due to 

```
EXPECT_CALL(sched, statusUpdate(, _))

Re: Review Request 64098: Send status updates when agent re-registers.

2017-12-04 Thread Megha Sharma


> On Dec. 2, 2017, 12:41 a.m., Megha Sharma wrote:
> > src/tests/upgrade_tests.cpp
> > Line 138 (original), 138 (patched)
> > 
> >
> > Sorry my bad looks like the fix for this test is still needed. I am not 
> > clear though what do you mean by ignoring the update? As I understand 
> > setting the expectation is unavoidable coz if we don't then the test fails 
> > claiming that previously set expectation here is being called more than 
> > once.
> 
> Megha Sharma wrote:
> ^^Ilya
> 
> Ilya Pronin wrote:
> Yeah, we'll most likely need to set an expectation, but we can make it 
> just `WillRepeatedly(Return())` and omit the future, because we don't really 
> care about that status update in those tests.

This expectation is not needed anymore in the test after using the registry.


- Megha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review192607
---


On Dec. 2, 2017, 12:29 a.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64098/
> ---
> 
> (Updated Dec. 2, 2017, 12:29 a.m.)
> 
> 
> Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6406
> https://issues.apache.org/jira/browse/MESOS-6406
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master will send task status updates to frameworks when an agent
> which has been previously removed by the master for being unreachable
> or is unknown to the master due to the garbage collection of
> the unreachable and gone agents in the registry re-registers.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp dfe60ef670edcaefa0c1241df2e2870f650fcf9e 
>   src/tests/master_allocator_tests.cpp 
> 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
>   src/tests/master_tests.cpp 57eae320a7a398527cd3623c89bf67f319a8e955 
>   src/tests/partition_tests.cpp 31ebfe1655438eceae74d72a223df03a9dbd282d 
>   src/tests/persistent_volume_tests.cpp 
> 4aa3c2e8b0f461cd78053707cff8bcb2e6f2b0d7 
>   src/tests/slave_recovery_tests.cpp f14c6ef69eb20a03454c8197df79b572a3c6d050 
>   src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 
> 
> 
> Diff: https://reviews.apache.org/r/64098/diff/11/
> 
> 
> Testing
> ---
> 
> with make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 64098: Send status updates when agent re-registers.

2017-12-04 Thread Megha Sharma


> On Dec. 4, 2017, 6:43 p.m., Jiang Yan Xu wrote:
> > src/tests/master_allocator_tests.cpp
> > Lines 1452-1454 (original), 1452-1459 (patched)
> > 
> >
> > So these tests had to be changed becauase the master is not using the 
> > `replicated_log` registry flag so it cannot remember the agent...
> > 
> > I think the right change is to use `replicated_log` for the tests that 
> > don't make sense to change for this patch.
> > 
> > Note that the use of replicated log registrar breaks on Windows (see 
> > MESOS-5932) so we need to make sure such tests are disabled on Windows.
> > 
> > The way to do it is to check whether the file is in this list: 
> > https://github.com/apache/mesos/blob/42952093c081e494d36721bc6c56ab73069a82e4/src/tests/CMakeLists.txt#L152
> > 
> > If it is, then it is already not built on Windows.
> > 
> > If not, we can use `TEST_F_TEMP_DISABLED_ON_WINDOWS` as part of the 
> > test name to disable it.
> > 
> > Please review all the tests in this RR for this.
> 
> Megha Sharma wrote:
> Thank you Yan, Good find. Working on fixing the tests.

Updated the RR, verified that all tests but one (`partition_tests.cpp`) are 
present in the section `if (NOT WIN32)` of the mentioned file. I have added 
`TEST_F_TEMP_DISABLED_ON_WINDOWS` to individual tests in `partition_tests.cpp` 
which have been updated to use replicated_logs in this RR.


- Megha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review192733
---


On Dec. 2, 2017, 12:29 a.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64098/
> ---
> 
> (Updated Dec. 2, 2017, 12:29 a.m.)
> 
> 
> Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6406
> https://issues.apache.org/jira/browse/MESOS-6406
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master will send task status updates to frameworks when an agent
> which has been previously removed by the master for being unreachable
> or is unknown to the master due to the garbage collection of
> the unreachable and gone agents in the registry re-registers.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp dfe60ef670edcaefa0c1241df2e2870f650fcf9e 
>   src/tests/master_allocator_tests.cpp 
> 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
>   src/tests/master_tests.cpp 57eae320a7a398527cd3623c89bf67f319a8e955 
>   src/tests/partition_tests.cpp 31ebfe1655438eceae74d72a223df03a9dbd282d 
>   src/tests/persistent_volume_tests.cpp 
> 4aa3c2e8b0f461cd78053707cff8bcb2e6f2b0d7 
>   src/tests/slave_recovery_tests.cpp f14c6ef69eb20a03454c8197df79b572a3c6d050 
>   src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 
> 
> 
> Diff: https://reviews.apache.org/r/64098/diff/11/
> 
> 
> Testing
> ---
> 
> with make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 64098: Send status updates when agent re-registers.

2017-12-04 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review192794
---



FAIL: Some Mesos libprocess-tests failed.

Reviews applied: `['64250', '64098']`

Failed command: 
`C:\DCOS\mesos\3rdparty\libprocess\src\tests\Debug\libprocess-tests.exe`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64098

Relevant logs:

- 
[libprocess-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64098/logs/libprocess-tests-stdout.log):

```
[   OK ] LimiterTest.THREADSAFE_Acquire (3 ms)
[ RUN  ] LimiterTest.THREADSAFE_DiscardMiddle
[   OK ] LimiterTest.THREADSAFE_DiscardMiddle (2 ms)
[ RUN  ] LimiterTest.THREADSAFE_DiscardLast
[   OK ] LimiterTest.THREADSAFE_DiscardLast (1 ms)
[--] 3 tests from LimiterTest (72 ms total)

[--] 4 tests from LoopTest
[ RUN  ] LoopTest.Sync
[   OK ] LoopTest.Sync (0 ms)
[ RUN  ] LoopTest.Async
[   OK ] LoopTest.Async (1 ms)
[ RUN  ] LoopTest.DiscardIterate
[   OK ] LoopTest.DiscardIterate (1 ms)
[ RUN  ] LoopTest.DiscardBody
[   OK ] LoopTest.DiscardBody (2 ms)
[--] 4 tests from LoopTest (106 ms total)

[--] 9 tests from MetricsTest
[ RUN  ] MetricsTest.Counter
[   OK ] MetricsTest.Counter (2 ms)
[ RUN  ] MetricsTest.THREADSAFE_Gauge
[   OK ] MetricsTest.THREADSAFE_Gauge (3 ms)
[ RUN  ] MetricsTest.Statistics
[   OK ] MetricsTest.Statistics (2 ms)
[ RUN  ] MetricsTest.THREADSAFE_Snapshot
[   OK ] MetricsTest.THREADSAFE_Snapshot (88 ms)
[ RUN  ] MetricsTest.THREADSAFE_SnapshotTimeout
C:\DCOS\mesos\mesos\3rdparty\libprocess\src\tests\metrics_tests.cpp(335): 
error: Failed to wait 15secs for response
```

- Mesos Reviewbot Windows


On Dec. 2, 2017, 12:29 a.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64098/
> ---
> 
> (Updated Dec. 2, 2017, 12:29 a.m.)
> 
> 
> Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6406
> https://issues.apache.org/jira/browse/MESOS-6406
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master will send task status updates to frameworks when an agent
> which has been previously removed by the master for being unreachable
> or is unknown to the master due to the garbage collection of
> the unreachable and gone agents in the registry re-registers.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp dfe60ef670edcaefa0c1241df2e2870f650fcf9e 
>   src/tests/master_allocator_tests.cpp 
> 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
>   src/tests/master_tests.cpp 57eae320a7a398527cd3623c89bf67f319a8e955 
>   src/tests/partition_tests.cpp 31ebfe1655438eceae74d72a223df03a9dbd282d 
>   src/tests/persistent_volume_tests.cpp 
> 4aa3c2e8b0f461cd78053707cff8bcb2e6f2b0d7 
>   src/tests/slave_recovery_tests.cpp f14c6ef69eb20a03454c8197df79b572a3c6d050 
>   src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 
> 
> 
> Diff: https://reviews.apache.org/r/64098/diff/9/
> 
> 
> Testing
> ---
> 
> with make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 64098: Send status updates when agent re-registers.

2017-12-04 Thread Ilya Pronin


> On Dec. 1, 2017, 4:41 p.m., Megha Sharma wrote:
> > src/tests/upgrade_tests.cpp
> > Line 138 (original), 138 (patched)
> > 
> >
> > Sorry my bad looks like the fix for this test is still needed. I am not 
> > clear though what do you mean by ignoring the update? As I understand 
> > setting the expectation is unavoidable coz if we don't then the test fails 
> > claiming that previously set expectation here is being called more than 
> > once.
> 
> Megha Sharma wrote:
> ^^Ilya

Yeah, we'll most likely need to set an expectation, but we can make it just 
`WillRepeatedly(Return())` and omit the future, because we don't really care 
about that status update in those tests.


- Ilya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review192607
---


On Dec. 1, 2017, 4:29 p.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64098/
> ---
> 
> (Updated Dec. 1, 2017, 4:29 p.m.)
> 
> 
> Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6406
> https://issues.apache.org/jira/browse/MESOS-6406
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master will send task status updates to frameworks when an agent
> which has been previously removed by the master for being unreachable
> or is unknown to the master due to the garbage collection of
> the unreachable and gone agents in the registry re-registers.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp dfe60ef670edcaefa0c1241df2e2870f650fcf9e 
>   src/tests/master_allocator_tests.cpp 
> 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
>   src/tests/master_tests.cpp 57eae320a7a398527cd3623c89bf67f319a8e955 
>   src/tests/partition_tests.cpp 31ebfe1655438eceae74d72a223df03a9dbd282d 
>   src/tests/persistent_volume_tests.cpp 
> 4aa3c2e8b0f461cd78053707cff8bcb2e6f2b0d7 
>   src/tests/slave_recovery_tests.cpp f14c6ef69eb20a03454c8197df79b572a3c6d050 
>   src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 
> 
> 
> Diff: https://reviews.apache.org/r/64098/diff/8/
> 
> 
> Testing
> ---
> 
> with make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 64098: Send status updates when agent re-registers.

2017-12-04 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review192733
---



Mainly looked at the reason for the test changes. Will take another look at the 
whole patch once it's fixed.


src/tests/master_allocator_tests.cpp
Lines 1452-1454 (original), 1452-1459 (patched)


So these tests had to be changed becauase the master is not using the 
`replicated_log` registry flag so it cannot remember the agent...

I think the right change is to use `replicated_log` for the tests that 
don't make sense to change for this patch.

Note that the use of replicated log registrar breaks on Windows (see 
MESOS-5932) so we need to make sure such tests are disabled on Windows.

The way to do it is to check whether the file is in this list: 
https://github.com/apache/mesos/blob/42952093c081e494d36721bc6c56ab73069a82e4/src/tests/CMakeLists.txt#L152

If it is, then it is already not built on Windows.

If not, we can use `TEST_F_TEMP_DISABLED_ON_WINDOWS` as part of the test 
name to disable it.

Please review all the tests in this RR for this.


- Jiang Yan Xu


On Dec. 1, 2017, 4:29 p.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64098/
> ---
> 
> (Updated Dec. 1, 2017, 4:29 p.m.)
> 
> 
> Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6406
> https://issues.apache.org/jira/browse/MESOS-6406
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master will send task status updates to frameworks when an agent
> which has been previously removed by the master for being unreachable
> or is unknown to the master due to the garbage collection of
> the unreachable and gone agents in the registry re-registers.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp dfe60ef670edcaefa0c1241df2e2870f650fcf9e 
>   src/tests/master_allocator_tests.cpp 
> 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
>   src/tests/master_tests.cpp 57eae320a7a398527cd3623c89bf67f319a8e955 
>   src/tests/partition_tests.cpp 31ebfe1655438eceae74d72a223df03a9dbd282d 
>   src/tests/persistent_volume_tests.cpp 
> 4aa3c2e8b0f461cd78053707cff8bcb2e6f2b0d7 
>   src/tests/slave_recovery_tests.cpp f14c6ef69eb20a03454c8197df79b572a3c6d050 
>   src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 
> 
> 
> Diff: https://reviews.apache.org/r/64098/diff/8/
> 
> 
> Testing
> ---
> 
> with make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 64098: Send status updates when agent re-registers.

2017-12-01 Thread Megha Sharma

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review192607
---




src/tests/upgrade_tests.cpp
Line 138 (original), 138 (patched)


Sorry my bad looks like the fix for this test is still needed. I am not 
clear though what do you mean by ignoring the update? As I understand setting 
the expectation is unavoidable coz if we don't then the test fails claiming 
that previously set expectation here is being called more than once.


- Megha Sharma


On Dec. 2, 2017, 12:29 a.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64098/
> ---
> 
> (Updated Dec. 2, 2017, 12:29 a.m.)
> 
> 
> Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6406
> https://issues.apache.org/jira/browse/MESOS-6406
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master will send task status updates to frameworks when an agent
> which has been previously removed by the master for being unreachable
> or is unknown to the master due to the garbage collection of
> the unreachable and gone agents in the registry re-registers.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp dfe60ef670edcaefa0c1241df2e2870f650fcf9e 
>   src/tests/master_allocator_tests.cpp 
> 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
>   src/tests/master_tests.cpp 57eae320a7a398527cd3623c89bf67f319a8e955 
>   src/tests/partition_tests.cpp 31ebfe1655438eceae74d72a223df03a9dbd282d 
>   src/tests/persistent_volume_tests.cpp 
> 4aa3c2e8b0f461cd78053707cff8bcb2e6f2b0d7 
>   src/tests/slave_recovery_tests.cpp f14c6ef69eb20a03454c8197df79b572a3c6d050 
>   src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 
> 
> 
> Diff: https://reviews.apache.org/r/64098/diff/8/
> 
> 
> Testing
> ---
> 
> with make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 64098: Send status updates when agent re-registers.

2017-12-01 Thread Megha Sharma

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/
---

(Updated Dec. 2, 2017, 12:29 a.m.)


Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.


Bugs: MESOS-6406
https://issues.apache.org/jira/browse/MESOS-6406


Repository: mesos


Description
---

Master will send task status updates to frameworks when an agent
which has been previously removed by the master for being unreachable
or is unknown to the master due to the garbage collection of
the unreachable and gone agents in the registry re-registers.


Diffs (updated)
-

  src/master/master.cpp dfe60ef670edcaefa0c1241df2e2870f650fcf9e 
  src/tests/master_allocator_tests.cpp 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
  src/tests/master_tests.cpp 57eae320a7a398527cd3623c89bf67f319a8e955 
  src/tests/partition_tests.cpp 31ebfe1655438eceae74d72a223df03a9dbd282d 
  src/tests/persistent_volume_tests.cpp 
4aa3c2e8b0f461cd78053707cff8bcb2e6f2b0d7 
  src/tests/slave_recovery_tests.cpp f14c6ef69eb20a03454c8197df79b572a3c6d050 
  src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 


Diff: https://reviews.apache.org/r/64098/diff/8/

Changes: https://reviews.apache.org/r/64098/diff/7-8/


Testing
---

with make check


Thanks,

Megha Sharma



Re: Review Request 64098: Send status updates when agent re-registers.

2017-12-01 Thread Megha Sharma

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/
---

(Updated Dec. 2, 2017, 12:12 a.m.)


Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.


Bugs: MESOS-6406
https://issues.apache.org/jira/browse/MESOS-6406


Repository: mesos


Description (updated)
---

Master will send task status updates to frameworks when an agent
which has been previously removed by the master for being unreachable
or is unknown to the master due to the garbage collection of
the unreachable and gone agents in the registry re-registers.


Diffs (updated)
-

  src/master/master.cpp dfe60ef670edcaefa0c1241df2e2870f650fcf9e 
  src/tests/master_allocator_tests.cpp 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
  src/tests/master_tests.cpp 57eae320a7a398527cd3623c89bf67f319a8e955 
  src/tests/partition_tests.cpp 31ebfe1655438eceae74d72a223df03a9dbd282d 
  src/tests/persistent_volume_tests.cpp 
4aa3c2e8b0f461cd78053707cff8bcb2e6f2b0d7 
  src/tests/slave_recovery_tests.cpp f14c6ef69eb20a03454c8197df79b572a3c6d050 
  src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 


Diff: https://reviews.apache.org/r/64098/diff/7/

Changes: https://reviews.apache.org/r/64098/diff/6-7/


Testing
---

with make check


Thanks,

Megha Sharma



Re: Review Request 64098: Send status updates when agent re-registers.

2017-12-01 Thread Megha Sharma


> On Dec. 1, 2017, 9:29 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp
> > Line 6808 (original), 6808 (patched)
> > 
> >
> > When considering the comment by Ilya in MESOS-6406 (i.e., what if 
> > agents GCed from the unreachable or gone list reregster?), seems like we 
> > can do this:
> > 
> > 1. Move down the line `slaves.recovered.erase(slaveInfo.id());` to 
> > after we process `recoveredTasks`.
> > 2. Instead of checking `slaves.unreachable.contains(slaveInfo.id()` we 
> > could check `!slaves.recovered.contains(slaveInfo.id()`
> > 3. Now we are sending status updates for two cases: reregistering 
> > unreachable agents or unknown agents (which could have been marked either 
> > unreachable or gone but we can't distiguish)
> > - We can distinguish unreachable and unknown in the task status 
> > message.
> > - We can probably log a warning about tasks from unknown agents.

Sounds good, thanks for fomalizing.


> On Dec. 1, 2017, 9:29 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp
> > Lines 6818 (patched)
> > 
> >
> > s/REASON_AGENT_REREGISTERED/REASON_SLAVE_REREGISTERED/

Thanks!


- Megha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review192557
---


On Nov. 28, 2017, 12:55 a.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64098/
> ---
> 
> (Updated Nov. 28, 2017, 12:55 a.m.)
> 
> 
> Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6406
> https://issues.apache.org/jira/browse/MESOS-6406
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master will send task status updates to frameworks when an agent
> which has been previously removed by the master for being unreachable
> re-registers.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp dfe60ef670edcaefa0c1241df2e2870f650fcf9e 
>   src/tests/master_allocator_tests.cpp 
> 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
>   src/tests/master_tests.cpp 57eae320a7a398527cd3623c89bf67f319a8e955 
>   src/tests/partition_tests.cpp 31ebfe1655438eceae74d72a223df03a9dbd282d 
>   src/tests/persistent_volume_tests.cpp 
> 4aa3c2e8b0f461cd78053707cff8bcb2e6f2b0d7 
>   src/tests/slave_recovery_tests.cpp f14c6ef69eb20a03454c8197df79b572a3c6d050 
>   src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 
> 
> 
> Diff: https://reviews.apache.org/r/64098/diff/6/
> 
> 
> Testing
> ---
> 
> with make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 64098: Send status updates when agent re-registers.

2017-12-01 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review192557
---




src/master/master.cpp
Line 6808 (original), 6808 (patched)


When considering the comment by Ilya in MESOS-6406 (i.e., what if agents 
GCed from the unreachable or gone list reregster?), seems like we can do this:

1. Move down the line `slaves.recovered.erase(slaveInfo.id());` to after we 
process `recoveredTasks`.
2. Instead of checking `slaves.unreachable.contains(slaveInfo.id()` we 
could check `!slaves.recovered.contains(slaveInfo.id()`
3. Now we are sending status updates for two cases: reregistering 
unreachable agents or unknown agents (which could have been marked either 
unreachable or gone but we can't distiguish)
- We can distinguish unreachable and unknown in the task status message.
- We can probably log a warning about tasks from unknown agents.



src/master/master.cpp
Lines 6818 (patched)


s/REASON_AGENT_REREGISTERED/REASON_SLAVE_REREGISTERED/


- Jiang Yan Xu


On Nov. 27, 2017, 4:55 p.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64098/
> ---
> 
> (Updated Nov. 27, 2017, 4:55 p.m.)
> 
> 
> Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6406
> https://issues.apache.org/jira/browse/MESOS-6406
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master will send task status updates to frameworks when an agent
> which has been previously removed by the master for being unreachable
> re-registers.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp dfe60ef670edcaefa0c1241df2e2870f650fcf9e 
>   src/tests/master_allocator_tests.cpp 
> 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
>   src/tests/master_tests.cpp 57eae320a7a398527cd3623c89bf67f319a8e955 
>   src/tests/partition_tests.cpp 31ebfe1655438eceae74d72a223df03a9dbd282d 
>   src/tests/persistent_volume_tests.cpp 
> 4aa3c2e8b0f461cd78053707cff8bcb2e6f2b0d7 
>   src/tests/slave_recovery_tests.cpp f14c6ef69eb20a03454c8197df79b572a3c6d050 
>   src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 
> 
> 
> Diff: https://reviews.apache.org/r/64098/diff/5/
> 
> 
> Testing
> ---
> 
> with make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 64098: Send status updates when agent re-registers.

2017-12-01 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review192519
---



FAIL: Mesos libprocess-tests failed to build

Reviews applied: `['64250', '64098']`

Failed command: `cmake.exe --build . --target libprocess-tests --config Debug`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64098

Relevant logs:

- 
[libprocess-tests-build-cmake-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64098/logs/libprocess-tests-build-cmake-stdout.log):

```
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(62): warning 
C4996: 'GetVersionExW': was declared deprecated (compiling source file 
C:\DCOS\mesos\mesos\3rdparty\libprocess\src\tests\state_machine_tests.cpp) 
[C:\DCOS\mesos\3rdparty\libprocess\src\tests\libprocess-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(451): warning 
C4996: 'GetVersionExW': was declared deprecated (compiling source file 
C:\DCOS\mesos\mesos\3rdparty\libprocess\src\tests\state_machine_tests.cpp) 
[C:\DCOS\mesos\3rdparty\libprocess\src\tests\libprocess-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(805): warning 
C4244: 'argument': conversion from 'ULONG_PTR' to 'pid_t', possible loss of 
data (compiling source file 
C:\DCOS\mesos\mesos\3rdparty\libprocess\src\tests\state_machine_tests.cpp) 
[C:\DCOS\mesos\3rdparty\libprocess\src\tests\libprocess-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(62): warning 
C4996: 'GetVersionExW': was declared deprecated (compiling source file 
C:\DCOS\mesos\mesos\3rdparty\libprocess\src\tests\subprocess_tests.cpp) 
[C:\DCOS\mesos\3rdparty\libprocess\src\tests\libprocess-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(451): warning 
C4996: 'GetVersionExW': was declared deprecated (compiling source file 
C:\DCOS\mesos\mesos\3rdparty\libprocess\src\tests\subprocess_tests.cpp) 
[C:\DCOS\mesos\3rdparty\libprocess\src\tests\libprocess-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(62): warning 
C4996: 'GetVersionExW': was declared deprecated (compiling source file 
C:\DCOS\mesos\mesos\3rdparty\libprocess\src\tests\system_tests.cpp) 
[C:\DCOS\mesos\3rdparty\libprocess\src\tests\libprocess-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(451): warning 
C4996: 'GetVersionExW': was declared deprecated (compiling source file 
C:\DCOS\mesos\mesos\3rdparty\libprocess\src\tests\system_tests.cpp) 
[C:\DCOS\mesos\3rdparty\libprocess\src\tests\libprocess-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(805): warning 
C4244: 'argument': conversion from 'ULONG_PTR' to 'pid_t', possible loss of 
data (compiling source file 
C:\DCOS\mesos\mesos\3rdparty\libprocess\src\tests\system_tests.cpp) 
[C:\DCOS\mesos\3rdparty\libprocess\src\tests\libprocess-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(805): warning 
C4244: 'argument': conversion from 'ULONG_PTR' to 'pid_t', possible loss of 
data (compiling source file 
C:\DCOS\mesos\mesos\3rdparty\libprocess\src\tests\subprocess_tests.cpp) 
[C:\DCOS\mesos\3rdparty\libprocess\src\tests\libprocess-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(62): warning 
C4996: 'GetVersionExW': was declared deprecated (compiling source file 
C:\DCOS\mesos\mesos\3rdparty\libprocess\src\tests\time_tests.cpp) 
[C:\DCOS\mesos\3rdparty\libprocess\src\tests\libprocess-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(451): warning 
C4996: 'GetVersionExW': was declared deprecated (compiling source file 
C:\DCOS\mesos\mesos\3rdparty\libprocess\src\tests\time_tests.cpp) 
[C:\DCOS\mesos\3rdparty\libprocess\src\tests\libprocess-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(805): warning 
C4244: 'argument': conversion from 'ULONG_PTR' to 'pid_t', possible loss of 
data (compiling source file 
C:\DCOS\mesos\mesos\3rdparty\libprocess\src\tests\time_tests.cpp) 
[C:\DCOS\mesos\3rdparty\libprocess\src\tests\libprocess-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(62): warning 
C4996: 'GetVersionExW': was declared deprecated (compiling source file 
C:\DCOS\mesos\mesos\3rdparty\libprocess\src\tests\ssl_tests.cpp) 
[C:\DCOS\mesos\3rdparty\libprocess\src\tests\libprocess-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(451): warning 
C4996: 'GetVersionExW': was declared deprecated (compiling source file 
C:\DCOS\mesos\mesos\3rdparty\libprocess\src\tests\ssl_tests.cpp) 
[C:\DCOS\mesos\3rdparty\libprocess\src\tests\libprocess-tests.vcxproj]



Re: Review Request 64098: Send status updates when agent re-registers.

2017-11-30 Thread Jiang Yan Xu


> On Nov. 28, 2017, 11:22 a.m., Jiang Yan Xu wrote:
> > src/master/master.cpp
> > Lines 6788 (patched)
> > 
> >
> > I think this type of status updates could benefit from a distinct 
> > reason for to make it more strongly typed.
> > 
> > Could you add
> > 
> > ```
> > v1/mesos.proto: REASON_AGENT_REREGISTERED
> > mesos.proto: REASON_SLAVE_REREGISTERED
> > ```
> > 
> > and use it?
> > 
> > Note that the tag numbers in the proto enum need to match.
> > 
> > Also note that there are also documentation related changes for these 
> > fields: 
> > https://github.com/apache/mesos/blob/master/docs/task-state-reasons.md
> > 
> > It's fine to make it a separate patch preceding this one.
> 
> Ilya Pronin wrote:
> Won't the new reason break any frameworks?
> 
> Jiang Yan Xu wrote:
> How will it break? We add new reasons all the time. This is indeed a new 
> sitaution that Mesos master sends status updates, how do you think the 
> framework will assert the reason otherwise?
> 
> Ilya Pronin wrote:
> Sure, the new reason reflects what's going on much better. I was thinking 
> that a framework that uses the old libmesos will get a "default" status 
> update reason, which is `REASON_COMMAND_EXECUTOR_FAILED`, and may act 
> incorrectly. But it seems that Aurora will be fine with such change.
> 
> Megha Sharma wrote:
> Yeah it seems that the frameworks should be fine with the addition of 
> another reason, I guess that's how we have been adding new reasons all along. 
> The proto default behavior to interpret the unknown enum value as first one 
> defined, is of course a problem but I believe the frameworks must already 
> have some guards in place to avoid that.
> 
> Megha Sharma wrote:
> +1 to add the new reasons:
> 
> v1/mesos.proto: REASON_AGENT_REREGISTERED
> mesos.proto: REASON_SLAVE_REREGISTERED

I see. Yeah the fact that there's not a sensible default value for the reason 
field is terrible. 

At least `REASON_COMMAND_EXECUTOR_FAILED` is deprecated: 
https://github.com/apache/mesos/blob/master/docs/task-state-reasons.md

```
The reason `REASON_COMMAND_EXECUTOR_FAILED` is deprecated and will be removed
in the future. It should not be referenced by newly written code.
```

Hopefully we get rid of it soon and replace it with a `REASON_UNKONWN_REASON`.


- Jiang Yan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review192041
---


On Nov. 27, 2017, 4:55 p.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64098/
> ---
> 
> (Updated Nov. 27, 2017, 4:55 p.m.)
> 
> 
> Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6406
> https://issues.apache.org/jira/browse/MESOS-6406
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master will send task status updates to frameworks when an agent
> re-registers.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 2ddd67ada3731803b00883b6a1f32b20c1bb238f 
>   src/tests/master_allocator_tests.cpp 
> 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
>   src/tests/master_tests.cpp 9c450b9f592d9e09a468f537d9b500e97acc636b 
>   src/tests/partition_tests.cpp e49c474167076b4136a161ed29b11db9a13455a7 
>   src/tests/persistent_volume_tests.cpp 
> acfeac16884b00581a3523607ff26f44f6dca53a 
>   src/tests/slave_recovery_tests.cpp c864aa92d9ff128a89dbc25653385de25653f56a 
>   src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 
> 
> 
> Diff: https://reviews.apache.org/r/64098/diff/4/
> 
> 
> Testing
> ---
> 
> with make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 64098: Send status updates when agent re-registers.

2017-11-30 Thread Megha Sharma


> On Nov. 28, 2017, 7:22 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp
> > Lines 6788 (patched)
> > 
> >
> > I think this type of status updates could benefit from a distinct 
> > reason for to make it more strongly typed.
> > 
> > Could you add
> > 
> > ```
> > v1/mesos.proto: REASON_AGENT_REREGISTERED
> > mesos.proto: REASON_SLAVE_REREGISTERED
> > ```
> > 
> > and use it?
> > 
> > Note that the tag numbers in the proto enum need to match.
> > 
> > Also note that there are also documentation related changes for these 
> > fields: 
> > https://github.com/apache/mesos/blob/master/docs/task-state-reasons.md
> > 
> > It's fine to make it a separate patch preceding this one.
> 
> Ilya Pronin wrote:
> Won't the new reason break any frameworks?
> 
> Jiang Yan Xu wrote:
> How will it break? We add new reasons all the time. This is indeed a new 
> sitaution that Mesos master sends status updates, how do you think the 
> framework will assert the reason otherwise?
> 
> Ilya Pronin wrote:
> Sure, the new reason reflects what's going on much better. I was thinking 
> that a framework that uses the old libmesos will get a "default" status 
> update reason, which is `REASON_COMMAND_EXECUTOR_FAILED`, and may act 
> incorrectly. But it seems that Aurora will be fine with such change.

Yeah it seems that the frameworks should be fine with the addition of another 
reason, I guess that's how we have been adding new reasons all along. The proto 
default behavior to interpret the unknown enum value as first one defined, is 
of course a problem but I believe the frameworks must already have some guards 
in place to avoid that.


- Megha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review192041
---


On Nov. 28, 2017, 12:55 a.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64098/
> ---
> 
> (Updated Nov. 28, 2017, 12:55 a.m.)
> 
> 
> Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6406
> https://issues.apache.org/jira/browse/MESOS-6406
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master will send task status updates to frameworks when an agent
> re-registers.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 2ddd67ada3731803b00883b6a1f32b20c1bb238f 
>   src/tests/master_allocator_tests.cpp 
> 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
>   src/tests/master_tests.cpp 9c450b9f592d9e09a468f537d9b500e97acc636b 
>   src/tests/partition_tests.cpp e49c474167076b4136a161ed29b11db9a13455a7 
>   src/tests/persistent_volume_tests.cpp 
> acfeac16884b00581a3523607ff26f44f6dca53a 
>   src/tests/slave_recovery_tests.cpp c864aa92d9ff128a89dbc25653385de25653f56a 
>   src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 
> 
> 
> Diff: https://reviews.apache.org/r/64098/diff/4/
> 
> 
> Testing
> ---
> 
> with make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 64098: Send status updates when agent re-registers.

2017-11-30 Thread Megha Sharma


> On Nov. 28, 2017, 7:01 p.m., Ilya Pronin wrote:
> > src/master/master.cpp
> > Lines 6789 (patched)
> > 
> >
> > I think this is not specific to unreachable agents. Can be an agent 
> > that was recovered after failover.

Ilya, I agree the reason needs to be changed based on whether or not the agent 
was unreachable. Also, Yan and I dicussed more about the agent re-registeration 
scenarios in which the master should do a status update. If the master 
undergoes a failover then the current approach will make the master do status 
updates for all tasks on re-registering agents which will make the make the 
critical path of agent re-registeration slower. One good alternative was to do 
status updates for only unreachable agents. Since master already sent a 
TASK_LOST/TASK_UNREACHABLE for these so there should definitely be a followup. 
Most of the frameworks today already do frequent reconciliations upon 
re-registering with master so doing explicit status updates for re-registering 
agents due to failover seemed a bit unnecessary. How do you feel about the 
changed approach?


- Megha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review191979
---


On Nov. 28, 2017, 12:55 a.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64098/
> ---
> 
> (Updated Nov. 28, 2017, 12:55 a.m.)
> 
> 
> Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6406
> https://issues.apache.org/jira/browse/MESOS-6406
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master will send task status updates to frameworks when an agent
> re-registers.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 2ddd67ada3731803b00883b6a1f32b20c1bb238f 
>   src/tests/master_allocator_tests.cpp 
> 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
>   src/tests/master_tests.cpp 9c450b9f592d9e09a468f537d9b500e97acc636b 
>   src/tests/partition_tests.cpp e49c474167076b4136a161ed29b11db9a13455a7 
>   src/tests/persistent_volume_tests.cpp 
> acfeac16884b00581a3523607ff26f44f6dca53a 
>   src/tests/slave_recovery_tests.cpp c864aa92d9ff128a89dbc25653385de25653f56a 
>   src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 
> 
> 
> Diff: https://reviews.apache.org/r/64098/diff/4/
> 
> 
> Testing
> ---
> 
> with make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 64098: Send status updates when agent re-registers.

2017-11-28 Thread Ilya Pronin


> On Nov. 28, 2017, 11:22 a.m., Jiang Yan Xu wrote:
> > src/master/master.cpp
> > Lines 6788 (patched)
> > 
> >
> > I think this type of status updates could benefit from a distinct 
> > reason for to make it more strongly typed.
> > 
> > Could you add
> > 
> > ```
> > v1/mesos.proto: REASON_AGENT_REREGISTERED
> > mesos.proto: REASON_SLAVE_REREGISTERED
> > ```
> > 
> > and use it?
> > 
> > Note that the tag numbers in the proto enum need to match.
> > 
> > Also note that there are also documentation related changes for these 
> > fields: 
> > https://github.com/apache/mesos/blob/master/docs/task-state-reasons.md
> > 
> > It's fine to make it a separate patch preceding this one.
> 
> Ilya Pronin wrote:
> Won't the new reason break any frameworks?
> 
> Jiang Yan Xu wrote:
> How will it break? We add new reasons all the time. This is indeed a new 
> sitaution that Mesos master sends status updates, how do you think the 
> framework will assert the reason otherwise?

Sure, the new reason reflects what's going on much better. I was thinking that 
a framework that uses the old libmesos will get a "default" status update 
reason, which is `REASON_COMMAND_EXECUTOR_FAILED`, and may act incorrectly. But 
it seems that Aurora will be fine with such change.


- Ilya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review192041
---


On Nov. 27, 2017, 4:55 p.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64098/
> ---
> 
> (Updated Nov. 27, 2017, 4:55 p.m.)
> 
> 
> Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6406
> https://issues.apache.org/jira/browse/MESOS-6406
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master will send task status updates to frameworks when an agent
> re-registers.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 2ddd67ada3731803b00883b6a1f32b20c1bb238f 
>   src/tests/master_allocator_tests.cpp 
> 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
>   src/tests/master_tests.cpp 9c450b9f592d9e09a468f537d9b500e97acc636b 
>   src/tests/partition_tests.cpp e49c474167076b4136a161ed29b11db9a13455a7 
>   src/tests/persistent_volume_tests.cpp 
> acfeac16884b00581a3523607ff26f44f6dca53a 
>   src/tests/slave_recovery_tests.cpp c864aa92d9ff128a89dbc25653385de25653f56a 
>   src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 
> 
> 
> Diff: https://reviews.apache.org/r/64098/diff/4/
> 
> 
> Testing
> ---
> 
> with make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 64098: Send status updates when agent re-registers.

2017-11-28 Thread Jiang Yan Xu


> On Nov. 28, 2017, 11:22 a.m., Jiang Yan Xu wrote:
> > src/master/master.cpp
> > Lines 6788 (patched)
> > 
> >
> > I think this type of status updates could benefit from a distinct 
> > reason for to make it more strongly typed.
> > 
> > Could you add
> > 
> > ```
> > v1/mesos.proto: REASON_AGENT_REREGISTERED
> > mesos.proto: REASON_SLAVE_REREGISTERED
> > ```
> > 
> > and use it?
> > 
> > Note that the tag numbers in the proto enum need to match.
> > 
> > Also note that there are also documentation related changes for these 
> > fields: 
> > https://github.com/apache/mesos/blob/master/docs/task-state-reasons.md
> > 
> > It's fine to make it a separate patch preceding this one.
> 
> Ilya Pronin wrote:
> Won't the new reason break any frameworks?

How will it break? We add new reasons all the time. This is indeed a new 
sitaution that Mesos master sends status updates, how do you think the 
framework will assert the reason otherwise?


- Jiang Yan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review192041
---


On Nov. 27, 2017, 4:55 p.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64098/
> ---
> 
> (Updated Nov. 27, 2017, 4:55 p.m.)
> 
> 
> Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6406
> https://issues.apache.org/jira/browse/MESOS-6406
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master will send task status updates to frameworks when an agent
> re-registers.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 2ddd67ada3731803b00883b6a1f32b20c1bb238f 
>   src/tests/master_allocator_tests.cpp 
> 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
>   src/tests/master_tests.cpp 9c450b9f592d9e09a468f537d9b500e97acc636b 
>   src/tests/partition_tests.cpp e49c474167076b4136a161ed29b11db9a13455a7 
>   src/tests/persistent_volume_tests.cpp 
> acfeac16884b00581a3523607ff26f44f6dca53a 
>   src/tests/slave_recovery_tests.cpp c864aa92d9ff128a89dbc25653385de25653f56a 
>   src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 
> 
> 
> Diff: https://reviews.apache.org/r/64098/diff/4/
> 
> 
> Testing
> ---
> 
> with make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 64098: Send status updates when agent re-registers.

2017-11-28 Thread Ilya Pronin


> On Nov. 28, 2017, 11:22 a.m., Jiang Yan Xu wrote:
> > src/master/master.cpp
> > Lines 6788 (patched)
> > 
> >
> > I think this type of status updates could benefit from a distinct 
> > reason for to make it more strongly typed.
> > 
> > Could you add
> > 
> > ```
> > v1/mesos.proto: REASON_AGENT_REREGISTERED
> > mesos.proto: REASON_SLAVE_REREGISTERED
> > ```
> > 
> > and use it?
> > 
> > Note that the tag numbers in the proto enum need to match.
> > 
> > Also note that there are also documentation related changes for these 
> > fields: 
> > https://github.com/apache/mesos/blob/master/docs/task-state-reasons.md
> > 
> > It's fine to make it a separate patch preceding this one.

Won't the new reason break any frameworks?


- Ilya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review192041
---


On Nov. 27, 2017, 4:55 p.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64098/
> ---
> 
> (Updated Nov. 27, 2017, 4:55 p.m.)
> 
> 
> Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6406
> https://issues.apache.org/jira/browse/MESOS-6406
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master will send task status updates to frameworks when an agent
> re-registers.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 2ddd67ada3731803b00883b6a1f32b20c1bb238f 
>   src/tests/master_allocator_tests.cpp 
> 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
>   src/tests/master_tests.cpp 9c450b9f592d9e09a468f537d9b500e97acc636b 
>   src/tests/partition_tests.cpp e49c474167076b4136a161ed29b11db9a13455a7 
>   src/tests/persistent_volume_tests.cpp 
> acfeac16884b00581a3523607ff26f44f6dca53a 
>   src/tests/slave_recovery_tests.cpp c864aa92d9ff128a89dbc25653385de25653f56a 
>   src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 
> 
> 
> Diff: https://reviews.apache.org/r/64098/diff/4/
> 
> 
> Testing
> ---
> 
> with make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 64098: Send status updates when agent re-registers.

2017-11-28 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review192041
---



Reviewed the non-testing portion. Will continue to review the tests.


src/master/master.cpp
Lines 6783 (patched)


4 space indentation for function arguments.



src/master/master.cpp
Lines 6788 (patched)


I think this type of status updates could benefit from a distinct reason 
for to make it more strongly typed.

Could you add

```
v1/mesos.proto: REASON_AGENT_REREGISTERED
mesos.proto: REASON_SLAVE_REREGISTERED
```

and use it?

Note that the tag numbers in the proto enum need to match.

Also note that there are also documentation related changes for these 
fields: https://github.com/apache/mesos/blob/master/docs/task-state-reasons.md

It's fine to make it a separate patch preceding this one.



src/master/master.cpp
Lines 6789 (patched)


s/Agent/agent/



src/master/master.cpp
Lines 6791-6792 (patched)


I know that the styling of the ternary operator is terribly inconsistent 
but could you do this which I think it more widely used?

```
(task.has_executor_id()
? Option(task.executor_id()) : None()),
```

There will be a day when we just use clang tidy to fix everything. :)



src/master/master.cpp
Lines 6793-6797 (patched)


Should we actually assign these values?

```
  protobuf::getTaskHealth(*task),
  protobuf::getTaskCheckStatus(*task),
  None(),
  protobuf::getTaskContainerStatus(*task)
```

Note that the `None`s are default values so you don't have to specify 
trailing Nones.


- Jiang Yan Xu


On Nov. 27, 2017, 4:55 p.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64098/
> ---
> 
> (Updated Nov. 27, 2017, 4:55 p.m.)
> 
> 
> Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6406
> https://issues.apache.org/jira/browse/MESOS-6406
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master will send task status updates to frameworks when an agent
> re-registers.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 2ddd67ada3731803b00883b6a1f32b20c1bb238f 
>   src/tests/master_allocator_tests.cpp 
> 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
>   src/tests/master_tests.cpp 9c450b9f592d9e09a468f537d9b500e97acc636b 
>   src/tests/partition_tests.cpp e49c474167076b4136a161ed29b11db9a13455a7 
>   src/tests/persistent_volume_tests.cpp 
> acfeac16884b00581a3523607ff26f44f6dca53a 
>   src/tests/slave_recovery_tests.cpp c864aa92d9ff128a89dbc25653385de25653f56a 
>   src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 
> 
> 
> Diff: https://reviews.apache.org/r/64098/diff/3/
> 
> 
> Testing
> ---
> 
> with make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 64098: Send status updates when agent re-registers.

2017-11-28 Thread Ilya Pronin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review191979
---



Looks good to me overall. Great to see this patch coming.

Do we also need to send status updates when we re-register an already 
registered but disconnected agent (`reconcileKnownSlave()` method)? I was 
thinking about the scenario described in 
[MESOS-8185](https://issues.apache.org/jira/browse/MESOS-8185). It seems that 
we're safe as long as the master persists "kill task" requests in memory until 
it actually gets a terminal task status, and the agent retries status updates 
until they are acked. Am I correct?


src/master/master.cpp
Lines 6789 (patched)


I think this is not specific to unreachable agents. Can be an agent that 
was recovered after failover.



src/master/master.cpp
Lines 6790 (patched)


`REASON_RECONCILIATION`?



src/master/master.cpp
Lines 6800-6802 (patched)


Nit: There's double space between "disconnected" and "framework". Also, I 
think it will fit into 80 character line.



src/tests/persistent_volume_tests.cpp
Lines 1512-1515 (patched)


Should we just ignore this status update here? It doesn't seem related to 
the thing under test.



src/tests/upgrade_tests.cpp
Lines 219-221 (patched)


Ditto, should we just ignore this status update since it's not related to 
the thing test?


- Ilya Pronin


On Nov. 27, 2017, 4:55 p.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64098/
> ---
> 
> (Updated Nov. 27, 2017, 4:55 p.m.)
> 
> 
> Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6406
> https://issues.apache.org/jira/browse/MESOS-6406
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master will send task status updates to frameworks when an agent
> re-registers.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 2ddd67ada3731803b00883b6a1f32b20c1bb238f 
>   src/tests/master_allocator_tests.cpp 
> 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
>   src/tests/master_tests.cpp 9c450b9f592d9e09a468f537d9b500e97acc636b 
>   src/tests/partition_tests.cpp e49c474167076b4136a161ed29b11db9a13455a7 
>   src/tests/persistent_volume_tests.cpp 
> acfeac16884b00581a3523607ff26f44f6dca53a 
>   src/tests/slave_recovery_tests.cpp c864aa92d9ff128a89dbc25653385de25653f56a 
>   src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 
> 
> 
> Diff: https://reviews.apache.org/r/64098/diff/3/
> 
> 
> Testing
> ---
> 
> with make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 64098: Send status updates when agent re-registers.

2017-11-27 Thread Megha Sharma

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/
---

(Updated Nov. 28, 2017, 12:29 a.m.)


Review request for mesos, James Peach and Jiang Yan Xu.


Repository: mesos


Description
---

Master will send task status updates to frameworks when an agent
re-registers.


Diffs
-

  src/master/master.cpp 2ddd67ada3731803b00883b6a1f32b20c1bb238f 
  src/tests/master_allocator_tests.cpp 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
  src/tests/master_tests.cpp 9c450b9f592d9e09a468f537d9b500e97acc636b 
  src/tests/partition_tests.cpp e49c474167076b4136a161ed29b11db9a13455a7 
  src/tests/persistent_volume_tests.cpp 
acfeac16884b00581a3523607ff26f44f6dca53a 
  src/tests/slave_recovery_tests.cpp c864aa92d9ff128a89dbc25653385de25653f56a 
  src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 


Diff: https://reviews.apache.org/r/64098/diff/2/


Testing (updated)
---

with make check


Thanks,

Megha Sharma