Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-10-06 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/
---

(Updated Oct. 6, 2014, 11:53 a.m.)


Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.


Changes
---

Fix import order and checkstyle issues.


Bugs: AURORA-728
https://issues.apache.org/jira/browse/AURORA-728


Repository: aurora


Description
---

Prevent initial ZK timeouts from killing the executor. In addition, prevent 
uncaught exceptions from killing the executor.


Diffs (updated)
-

  src/main/python/apache/aurora/executor/aurora_executor.py 
79a24855b2a68271b7478395dfdadab8755c3af2 
  src/main/python/apache/aurora/executor/common/announcer.py 
c466da8d48bbc2aa227c2db157cab84665ad6602 
  src/test/python/apache/aurora/executor/common/test_announcer.py 
4f6e200ecb1a4ea7cb45acd466a57f19d5815326 

Diff: https://reviews.apache.org/r/25974/diff/


Testing
---

./pants src/test/python/apache/aurora/executor:executor-small
./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh


Thanks,

Zameer Manji



Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-10-06 Thread Kevin Sweeney

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review55547
---

Ship it!


Ship It!

- Kevin Sweeney


On Oct. 2, 2014, 6:44 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> ---
> 
> (Updated Oct. 2, 2014, 6:44 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
> https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent 
> uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py 
> c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 
> 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> ---
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-10-03 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review55402
---


Pinging Kevin for final ship it and commit.

- Zameer Manji


On Oct. 2, 2014, 6:44 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> ---
> 
> (Updated Oct. 2, 2014, 6:44 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
> https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent 
> uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py 
> c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 
> 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> ---
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-10-02 Thread Brian Wickman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review55313
---

Ship it!


Ship It!

- Brian Wickman


On Oct. 3, 2014, 1:44 a.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> ---
> 
> (Updated Oct. 3, 2014, 1:44 a.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
> https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent 
> uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py 
> c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 
> 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> ---
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-10-02 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/
---

(Updated Oct. 2, 2014, 6:44 p.m.)


Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.


Changes
---

Brian's feedback.


Bugs: AURORA-728
https://issues.apache.org/jira/browse/AURORA-728


Repository: aurora


Description
---

Prevent initial ZK timeouts from killing the executor. In addition, prevent 
uncaught exceptions from killing the executor.


Diffs (updated)
-

  src/main/python/apache/aurora/executor/aurora_executor.py 
79a24855b2a68271b7478395dfdadab8755c3af2 
  src/main/python/apache/aurora/executor/common/announcer.py 
c466da8d48bbc2aa227c2db157cab84665ad6602 
  src/test/python/apache/aurora/executor/common/test_announcer.py 
4f6e200ecb1a4ea7cb45acd466a57f19d5815326 

Diff: https://reviews.apache.org/r/25974/diff/


Testing
---

./pants src/test/python/apache/aurora/executor:executor-small
./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh


Thanks,

Zameer Manji



Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-10-02 Thread Brian Wickman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review55309
---



src/test/python/apache/aurora/executor/common/test_announcer.py


I'm confused -- nobody calls client.connected anymore, right?  In theory, 
this test will pass, except it will take 30-60 seconds to run since 
client_connect_event.wait(timeout=...) will be called with the health check 
timeout.


- Brian Wickman


On Oct. 2, 2014, 11:20 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> ---
> 
> (Updated Oct. 2, 2014, 11:20 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
> https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent 
> uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py 
> c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 
> 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> ---
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-10-02 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/
---

(Updated Oct. 2, 2014, 4:20 p.m.)


Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.


Changes
---

Brian's feedback.


Bugs: AURORA-728
https://issues.apache.org/jira/browse/AURORA-728


Repository: aurora


Description
---

Prevent initial ZK timeouts from killing the executor. In addition, prevent 
uncaught exceptions from killing the executor.


Diffs (updated)
-

  src/main/python/apache/aurora/executor/aurora_executor.py 
79a24855b2a68271b7478395dfdadab8755c3af2 
  src/main/python/apache/aurora/executor/common/announcer.py 
c466da8d48bbc2aa227c2db157cab84665ad6602 
  src/test/python/apache/aurora/executor/common/test_announcer.py 
4f6e200ecb1a4ea7cb45acd466a57f19d5815326 

Diff: https://reviews.apache.org/r/25974/diff/


Testing
---

./pants src/test/python/apache/aurora/executor:executor-small
./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh


Thanks,

Zameer Manji



Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-10-02 Thread Kevin Sweeney


> On Oct. 1, 2014, 4:34 p.m., Kevin Sweeney wrote:
> > src/main/python/apache/aurora/executor/common/announcer.py, line 228
> > 
> >
> > push this into `__start`, out of the constructor?
> > 
> > At least on the Java side we try to avoid doing any I/O in constructors 
> > (as they're more for wiring) but instead delegate to explicit lifecycle 
> > methods like `start`.
> 
> Zameer Manji wrote:
> I'd rather not. Keeping it in the constructor means it stays in the 
> from_assigned_task method code path. Moving it to the start method moves it 
> out of that code path and really changes when we do this sort of I/O.

is it common practice to do I/O in constructors elsewhere in this codebase? 
I'll defer to wickman here


- Kevin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review55161
---


On Sept. 30, 2014, 5:17 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> ---
> 
> (Updated Sept. 30, 2014, 5:17 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
> https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent 
> uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py 
> c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 
> 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> ---
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-10-02 Thread Zameer Manji


> On Oct. 1, 2014, 4:14 p.m., Brian Wickman wrote:
> > src/test/python/apache/aurora/executor/common/test_announcer.py, line 239
> > 
> >
> > out of scope for this review, but there's a new-ish pypi project called 
> > 'zake' that allows the kazoo client to be stubbed out with a mock zk tree 
> > (while preserving certain operations like create/delete.)  i'd love to 
> > explore using it here in the future.

It looks really nice for this sort of testing. It is definately out of the 
scope of this review. If we ever need to do more work on the Announcer/Service 
Discovery we should first migrate to zake.


- Zameer


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review55158
---


On Sept. 30, 2014, 5:17 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> ---
> 
> (Updated Sept. 30, 2014, 5:17 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
> https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent 
> uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py 
> c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 
> 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> ---
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-10-02 Thread Zameer Manji


> On Oct. 1, 2014, 4:34 p.m., Kevin Sweeney wrote:
> > src/main/python/apache/aurora/executor/common/announcer.py, line 228
> > 
> >
> > push this into `__start`, out of the constructor?
> > 
> > At least on the Java side we try to avoid doing any I/O in constructors 
> > (as they're more for wiring) but instead delegate to explicit lifecycle 
> > methods like `start`.

I'd rather not. Keeping it in the constructor means it stays in the 
from_assigned_task method code path. Moving it to the start method moves it out 
of that code path and really changes when we do this sort of I/O.


- Zameer


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review55161
---


On Sept. 30, 2014, 5:17 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> ---
> 
> (Updated Sept. 30, 2014, 5:17 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
> https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent 
> uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py 
> c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 
> 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> ---
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-10-01 Thread Kevin Sweeney

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review55161
---



src/main/python/apache/aurora/executor/aurora_executor.py


superfluous return



src/main/python/apache/aurora/executor/common/announcer.py


push this into `__start`, out of the constructor?

At least on the Java side we try to avoid doing any I/O in constructors (as 
they're more for wiring) but instead delegate to explicit lifecycle methods 
like `start`.


- Kevin Sweeney


On Sept. 30, 2014, 5:17 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> ---
> 
> (Updated Sept. 30, 2014, 5:17 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
> https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent 
> uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py 
> c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 
> 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> ---
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-10-01 Thread Brian Wickman


> On Oct. 1, 2014, 11:14 p.m., Brian Wickman wrote:
> >

btw overall this review looks great and is much cleaner than what i was 
envisioning.


- Brian


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review55158
---


On Oct. 1, 2014, 12:17 a.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> ---
> 
> (Updated Oct. 1, 2014, 12:17 a.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
> https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent 
> uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py 
> c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 
> 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> ---
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-10-01 Thread Brian Wickman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review55158
---



src/main/python/apache/aurora/executor/common/announcer.py


i think "if self.__connect_event.is_set()" is more accurate -- it's 
possible for it to be connected and then race with disconnection.  we just want 
to validate that it managed to connect initially.

this complicates your test a little bit since you do connected.set() right 
off the bat.  but you might be able to get around this by making a 
HealthCheckConfig that produces a tiny timeout.



src/test/python/apache/aurora/executor/common/test_announcer.py


out of scope for this review, but there's a new-ish pypi project called 
'zake' that allows the kazoo client to be stubbed out with a mock zk tree 
(while preserving certain operations like create/delete.)  i'd love to explore 
using it here in the future.


- Brian Wickman


On Oct. 1, 2014, 12:17 a.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> ---
> 
> (Updated Oct. 1, 2014, 12:17 a.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
> https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent 
> uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py 
> c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 
> 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> ---
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-09-30 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/
---

(Updated Sept. 30, 2014, 5:17 p.m.)


Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.


Changes
---

Brian's Feedback.


Bugs: AURORA-728
https://issues.apache.org/jira/browse/AURORA-728


Repository: aurora


Description
---

Prevent initial ZK timeouts from killing the executor. In addition, prevent 
uncaught exceptions from killing the executor.


Diffs (updated)
-

  src/main/python/apache/aurora/executor/aurora_executor.py 
79a24855b2a68271b7478395dfdadab8755c3af2 
  src/main/python/apache/aurora/executor/common/announcer.py 
c466da8d48bbc2aa227c2db157cab84665ad6602 
  src/test/python/apache/aurora/executor/common/test_announcer.py 
4f6e200ecb1a4ea7cb45acd466a57f19d5815326 

Diff: https://reviews.apache.org/r/25974/diff/


Testing (updated)
---

./pants src/test/python/apache/aurora/executor:executor-small
./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh


Thanks,

Zameer Manji



Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-09-29 Thread Zameer Manji


> On Sept. 29, 2014, 4:43 p.m., Brian Wickman wrote:
> > src/main/python/apache/aurora/executor/common/announcer.py, lines 112-116
> > 
> >
> > this data is available within the mesos_task struct created in 
> > from_assigned_task.  you should be getting the information from there as it 
> > may contain default values not required to be populated in the 
> > executorConfig json (in which case you'd be getting KeyErrors from 
> > make_serverset.)

done.


- Zameer


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review54916
---


On Sept. 29, 2014, 5:54 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> ---
> 
> (Updated Sept. 29, 2014, 5:54 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
> https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent 
> uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py 
> c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 
> 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> ---
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-09-29 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/
---

(Updated Sept. 29, 2014, 5:54 p.m.)


Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.


Changes
---

Brian's Feedback except for calling start_async()


Bugs: AURORA-728
https://issues.apache.org/jira/browse/AURORA-728


Repository: aurora


Description
---

Prevent initial ZK timeouts from killing the executor. In addition, prevent 
uncaught exceptions from killing the executor.


Diffs (updated)
-

  src/main/python/apache/aurora/executor/aurora_executor.py 
79a24855b2a68271b7478395dfdadab8755c3af2 
  src/main/python/apache/aurora/executor/common/announcer.py 
c466da8d48bbc2aa227c2db157cab84665ad6602 
  src/test/python/apache/aurora/executor/common/test_announcer.py 
4f6e200ecb1a4ea7cb45acd466a57f19d5815326 

Diff: https://reviews.apache.org/r/25974/diff/


Testing
---

./pants src/test/python/apache/aurora/executor:executor-small


Thanks,

Zameer Manji



Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-09-29 Thread Brian Wickman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review54916
---



src/main/python/apache/aurora/executor/common/announcer.py


this data is available within the mesos_task struct created in 
from_assigned_task.  you should be getting the information from there as it may 
contain default values not required to be populated in the executorConfig json 
(in which case you'd be getting KeyErrors from make_serverset.)



src/main/python/apache/aurora/executor/common/announcer.py


i feel like we should be calling start_async instead, though this 
complicates implementation slightly.

we want the creation of all the status managers to be non-blocking since 
the executor will be in a transient (TASK_STARTING) state, which can possibly 
timeout and go to TASK_LOST, which is less preferable than TASK_FAILED.

client.create_async returns a threading.Event.  we'd want (in a separate 
thread) to event.wait(timeout=timeout) and if not event.is_set(), set the 
.healthy property on the Announcer to StatusResult(..., mesos_pb2.TASK_FAILED)

alternately, you could ignore the threading.Event from create_async and 
bake that logic into the contract between Announcer<->ServerSetJoinThread.  i'm 
not sure which is simpler or cleaner.


- Brian Wickman


On Sept. 29, 2014, 9:46 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> ---
> 
> (Updated Sept. 29, 2014, 9:46 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
> https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent 
> uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py 
> c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 
> 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> ---
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-09-29 Thread Kevin Sweeney

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review54895
---



src/main/python/apache/aurora/executor/common/announcer.py


Do you need input sanitization here? Could this logic be hidden behind an 
abstraction? It's unclear to me why the serverset connection code needs live 
with the executor data parsing code.


- Kevin Sweeney


On Sept. 29, 2014, 2:46 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> ---
> 
> (Updated Sept. 29, 2014, 2:46 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
> https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent 
> uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py 
> c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 
> 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> ---
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-09-29 Thread Kevin Sweeney

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review54892
---



src/main/python/apache/aurora/executor/common/announcer.py


Some context for the error would be helpful here - e.g. log.error("Initial 
connection to ZooKeeper timed out: %s" % e).


- Kevin Sweeney


On Sept. 29, 2014, 2:46 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> ---
> 
> (Updated Sept. 29, 2014, 2:46 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
> https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent 
> uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py 
> c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 
> 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> ---
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-09-29 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/
---

(Updated Sept. 29, 2014, 2:46 p.m.)


Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.


Changes
---

Bill's Feedback


Bugs: AURORA-728
https://issues.apache.org/jira/browse/AURORA-728


Repository: aurora


Description
---

Prevent initial ZK timeouts from killing the executor. In addition, prevent 
uncaught exceptions from killing the executor.


Diffs (updated)
-

  src/main/python/apache/aurora/executor/aurora_executor.py 
79a24855b2a68271b7478395dfdadab8755c3af2 
  src/main/python/apache/aurora/executor/common/announcer.py 
c466da8d48bbc2aa227c2db157cab84665ad6602 
  src/test/python/apache/aurora/executor/common/test_announcer.py 
4f6e200ecb1a4ea7cb45acd466a57f19d5815326 

Diff: https://reviews.apache.org/r/25974/diff/


Testing
---

./pants src/test/python/apache/aurora/executor:executor-small


Thanks,

Zameer Manji



Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-09-29 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review54890
---


Pinging Brian and Kevin to this review.

- Zameer Manji


On Sept. 29, 2014, 2:46 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> ---
> 
> (Updated Sept. 29, 2014, 2:46 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
> https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent 
> uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py 
> c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 
> 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> ---
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-09-29 Thread Bill Farner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review54885
---

Ship it!



src/main/python/apache/aurora/executor/aurora_executor.py


TASK_FAILED in the message is redundant.


- Bill Farner


On Sept. 26, 2014, 9:24 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> ---
> 
> (Updated Sept. 26, 2014, 9:24 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
> https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent 
> uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py 
> c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 
> 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> ---
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-09-26 Thread Zameer Manji


> On Sept. 25, 2014, 4:29 p.m., Kevin Sweeney wrote:
> > src/test/python/apache/aurora/executor/common/test_announcer.py, line 264
> > 
> >
> > where is this magic number coming from? extract a constant and import 
> > that here?

This is the default initial health check + (max failures * health check). I'm 
unsure on the best way to fix this without duplicating the code it is trying to 
test.


- Zameer


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review54623
---


On Sept. 25, 2014, 2:39 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> ---
> 
> (Updated Sept. 25, 2014, 2:39 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
> https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent 
> uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py 
> c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 
> 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> ---
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-09-26 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/
---

(Updated Sept. 26, 2014, 2:24 p.m.)


Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.


Changes
---

Kevin's Feedback.


Bugs: AURORA-728
https://issues.apache.org/jira/browse/AURORA-728


Repository: aurora


Description
---

Prevent initial ZK timeouts from killing the executor. In addition, prevent 
uncaught exceptions from killing the executor.


Diffs (updated)
-

  src/main/python/apache/aurora/executor/aurora_executor.py 
79a24855b2a68271b7478395dfdadab8755c3af2 
  src/main/python/apache/aurora/executor/common/announcer.py 
c466da8d48bbc2aa227c2db157cab84665ad6602 
  src/test/python/apache/aurora/executor/common/test_announcer.py 
4f6e200ecb1a4ea7cb45acd466a57f19d5815326 

Diff: https://reviews.apache.org/r/25974/diff/


Testing
---

./pants src/test/python/apache/aurora/executor:executor-small


Thanks,

Zameer Manji



Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-09-25 Thread Kevin Sweeney

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review54623
---



src/main/python/apache/aurora/executor/common/announcer.py


add a trailing comma here for easy import-sorting



src/main/python/apache/aurora/executor/common/announcer.py


log this?



src/test/python/apache/aurora/executor/common/test_announcer.py


where is this magic number coming from? extract a constant and import that 
here?


- Kevin Sweeney


On Sept. 25, 2014, 2:39 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> ---
> 
> (Updated Sept. 25, 2014, 2:39 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
> https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent 
> uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py 
> c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 
> 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> ---
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

2014-09-25 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/
---

(Updated Sept. 25, 2014, 2:39 p.m.)


Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.


Changes
---

Brian's Feedback


Summary (updated)
-

Prevent initial ZK timeouts from killing the executor.


Bugs: AURORA-728
https://issues.apache.org/jira/browse/AURORA-728


Repository: aurora


Description (updated)
---

Prevent initial ZK timeouts from killing the executor. In addition, prevent 
uncaught exceptions from killing the executor.


Diffs (updated)
-

  src/main/python/apache/aurora/executor/aurora_executor.py 
79a24855b2a68271b7478395dfdadab8755c3af2 
  src/main/python/apache/aurora/executor/common/announcer.py 
c466da8d48bbc2aa227c2db157cab84665ad6602 
  src/test/python/apache/aurora/executor/common/test_announcer.py 
4f6e200ecb1a4ea7cb45acd466a57f19d5815326 

Diff: https://reviews.apache.org/r/25974/diff/


Testing
---

./pants src/test/python/apache/aurora/executor:executor-small


Thanks,

Zameer Manji