Re: Health Checks for Updates design review

2015-07-03 Thread Brian Brazil
On 2 July 2015 at 19:16, Maxim Khutornenko ma...@apache.org wrote:

 Hi Brian,

 This feature is on a back burner for now. It’s unlikely that we'll
 make any progress within the next few months.

 The design is mostly hashed out at this point, so if you feel you
 could contribute or even better take it over completely it would be
 absolutely awesome!


Thanks for the update, unfortunately that whole feature is a bit big for me
to chew off all of anytime soon.

Yours,
Brian



 Thanks,
 Maxim

 On Thu, Jul 2, 2015 at 9:27 AM, Brian Brazil brian.bra...@boxever.com
 wrote:
  On 5 May 2015 at 21:24, Maxim Khutornenko ma...@apache.org wrote:
 
  Hi,
 
  I have put together a design proposal for improving health-enabled job
  update performance. Please, review and leave your comments:
 
 
 
 https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit
 
 
  Hi,
 
  I'm looking to move some of our production jobs to Aurora, and this
 design
  is of interest as I want to allow slack in startup and for a few failures
  without greatly increasing update times.
  Is there a rough ETA for this, or maybe some smaller tasks I could help
  with? 1224 sounds easy for example, and I've already worked with that
 code.
 
  Thanks,
  Brian
 
 
 
 
  Thanks,
  Maxim
 



Re: Health Checks for Updates design review

2015-07-02 Thread Maxim Khutornenko
Hi Brian,

This feature is on a back burner for now. It’s unlikely that we'll
make any progress within the next few months.

The design is mostly hashed out at this point, so if you feel you
could contribute or even better take it over completely it would be
absolutely awesome!

Thanks,
Maxim

On Thu, Jul 2, 2015 at 9:27 AM, Brian Brazil brian.bra...@boxever.com wrote:
 On 5 May 2015 at 21:24, Maxim Khutornenko ma...@apache.org wrote:

 Hi,

 I have put together a design proposal for improving health-enabled job
 update performance. Please, review and leave your comments:


 https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit


 Hi,

 I'm looking to move some of our production jobs to Aurora, and this design
 is of interest as I want to allow slack in startup and for a few failures
 without greatly increasing update times.
 Is there a rough ETA for this, or maybe some smaller tasks I could help
 with? 1224 sounds easy for example, and I've already worked with that code.

 Thanks,
 Brian




 Thanks,
 Maxim



Re: Health Checks for Updates design review

2015-07-02 Thread Brian Brazil
On 5 May 2015 at 21:24, Maxim Khutornenko ma...@apache.org wrote:

 Hi,

 I have put together a design proposal for improving health-enabled job
 update performance. Please, review and leave your comments:


 https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit


Hi,

I'm looking to move some of our production jobs to Aurora, and this design
is of interest as I want to allow slack in startup and for a few failures
without greatly increasing update times.
Is there a rough ETA for this, or maybe some smaller tasks I could help
with? 1224 sounds easy for example, and I've already worked with that code.

Thanks,
Brian




 Thanks,
 Maxim



Re: Health Checks for Updates design review

2015-05-06 Thread Erb, Stephan
Hi Maxim,

I am not keen on the potential risk of tasks getting stuck in STARTING. We 
perform auto-scaling of jobs, so there might be nobody around to notice and 
correct the problem in time.

How about keeping the initial_interval_secs and just change its meaning to be 
grace period, so that health checks are triggered but errors ignored during 
this interval.

The initial_interval_secs is then a user-configurable upper bound of when a job 
is meant to be working. It can even be set rather high, because it won't affect 
the update performance.

What do you think?

Best Regards,
Stephan

From: Maxim Khutornenko ma...@apache.org
Sent: Tuesday, May 5, 2015 10:24 PM
To: dev@aurora.apache.org
Subject: Health Checks for Updates design review

Hi,

I have put together a design proposal for improving health-enabled job
update performance. Please, review and leave your comments:

https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit

Thanks,
Maxim

Re: Health Checks for Updates design review

2015-05-06 Thread Maxim Khutornenko
Thanks for your comment, Stephan. I have moved it into the doc to keep
discussion history in one place.

On Wed, May 6, 2015 at 1:33 AM, Erb, Stephan
stephan@blue-yonder.com wrote:
 Hi Maxim,

 I am not keen on the potential risk of tasks getting stuck in STARTING. We 
 perform auto-scaling of jobs, so there might be nobody around to notice and 
 correct the problem in time.

 How about keeping the initial_interval_secs and just change its meaning to be 
 grace period, so that health checks are triggered but errors ignored during 
 this interval.

 The initial_interval_secs is then a user-configurable upper bound of when a 
 job is meant to be working. It can even be set rather high, because it won't 
 affect the update performance.

 What do you think?

 Best Regards,
 Stephan
 
 From: Maxim Khutornenko ma...@apache.org
 Sent: Tuesday, May 5, 2015 10:24 PM
 To: dev@aurora.apache.org
 Subject: Health Checks for Updates design review

 Hi,

 I have put together a design proposal for improving health-enabled job
 update performance. Please, review and leave your comments:

 https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit

 Thanks,
 Maxim


Health Checks for Updates design review

2015-05-05 Thread Maxim Khutornenko
Hi,

I have put together a design proposal for improving health-enabled job
update performance. Please, review and leave your comments:

https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit

Thanks,
Maxim