[jira] [Updated] (AURORA-1791) Commit ca683 is not backwards compatible.

2017-01-31 Thread Stephan Erb (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Erb updated AURORA-1791:

Fix Version/s: 0.17.0

> Commit ca683 is not backwards compatible.
> -
>
> Key: AURORA-1791
> URL: https://issues.apache.org/jira/browse/AURORA-1791
> Project: Aurora
>  Issue Type: Bug
>Reporter: Zameer Manji
>Assignee: Kai Huang
>Priority: Blocker
> Fix For: 0.17.0
>
>
> The commit [ca683cb9e27bae76424a687bc6c3af5a73c501b9 | 
> https://github.com/apache/aurora/commit/ca683cb9e27bae76424a687bc6c3af5a73c501b9]
>  is not backwards compatible. The last section of the commit 
> {quote}
> 4. Modified the Health Checker and redefined the meaning 
> initial_interval_secs.
> {quote}
> has serious, unintended consequences.
> Consider the following health check config:
> {noformat}
>   initial_interval_secs: 10
>   interval_secs: 5
>   max_consecutive_failures: 1
> {noformat}
> On the 0.16.0 executor, no health checking will occur for the first 10 
> seconds. Here the earliest a task can cause failure is at the 10th second.
> On master, health checking starts right away which means the task can fail at 
> the first second since {{max_consecutive_failures}} is set to 1.
> This is not backwards compatible and needs to be fixed.
> I think a good solution would be to revert the meaning change to 
> initial_interval_secs and have the task transition into RUNNING when 
> {{max_consecutive_successes}} is met.
> An investigation shows {{initial_interval_secs}} was set to 5 but the task 
> failed health checks right away:
> {noformat}
> D1011 19:52:13.295877 6 health_checker.py:107] Health checks enabled. 
> Performing health check.
> D1011 19:52:13.306816 6 health_checker.py:126] Reset consecutive failures 
> counter.
> D1011 19:52:13.307032 6 health_checker.py:132] Initial interval expired.
> W1011 19:52:13.307130 6 health_checker.py:135] Failed to reach minimum 
> consecutive successes.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AURORA-1791) Commit ca683 is not backwards compatible.

2016-10-11 Thread Zameer Manji (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zameer Manji updated AURORA-1791:
-
Description: 
The commit [ca683cb9e27bae76424a687bc6c3af5a73c501b9 | 
https://github.com/apache/aurora/commit/ca683cb9e27bae76424a687bc6c3af5a73c501b9]
 is not backwards compatible. The last section of the commit 

{quote}
4. Modified the Health Checker and redefined the meaning initial_interval_secs.
{quote}

has serious, unintended consequences.

Consider the following health check config:
{noformat}
  initial_interval_secs: 10
  interval_secs: 5
  max_consecutive_failures: 1
{noformat}

On the 0.16.0 executor, no health checking will occur for the first 10 seconds. 
Here the earliest a task can cause failure is at the 10th second.

On master, health checking starts right away which means the task can fail at 
the first second since {{max_consecutive_failures}} is set to 1.

This is not backwards compatible and needs to be fixed.

I think a good solution would be to revert the meaning change to 
initial_interval_secs and have the task transition into RUNNING when 
{{max_consecutive_successes}} is met.

An investigation shows {{initial_interval_secs}} was set to 5 but the task 
failed health checks right away:

{noformat}
D1011 19:52:13.295877 6 health_checker.py:107] Health checks enabled. 
Performing health check.
D1011 19:52:13.306816 6 health_checker.py:126] Reset consecutive failures 
counter.
D1011 19:52:13.307032 6 health_checker.py:132] Initial interval expired.
W1011 19:52:13.307130 6 health_checker.py:135] Failed to reach minimum 
consecutive successes.
{noformat}


  was:
The commit [ca683cb9e27bae76424a687bc6c3af5a73c501b9 | 
https://github.com/apache/aurora/commit/ca683cb9e27bae76424a687bc6c3af5a73c501b9]
 is not backwards compatible. The last section of the commit 

{quote}
4. Modified the Health Checker and redefined the meaning initial_interval_secs.
{quote}

has serious, unintended consequences.

Consider the following health check config:
{noformat}
  initial_interval_secs: 10
  interval_secs: 5
  max_consecutive_failures: 1
{noformat}

On the 0.16.0 executor, no health checking will occur for the first 10 seconds. 
Here the earliest a task can cause failure is at the 10th second.

On master, health checking starts right away which means the task can fail at 
the first second since {{max_consecutive_failures}} is set to 1.

This is not backwards compatible and needs to be fixed.

I think a good solution would be to revert the meaning change to 
initial_interval_secs and have the task transition into RUNNING when 
{{max_consecutive_successes}} is met.



> Commit ca683 is not backwards compatible.
> -
>
> Key: AURORA-1791
> URL: https://issues.apache.org/jira/browse/AURORA-1791
> Project: Aurora
>  Issue Type: Bug
>Reporter: Zameer Manji
>Assignee: Kai Huang
>Priority: Blocker
>
> The commit [ca683cb9e27bae76424a687bc6c3af5a73c501b9 | 
> https://github.com/apache/aurora/commit/ca683cb9e27bae76424a687bc6c3af5a73c501b9]
>  is not backwards compatible. The last section of the commit 
> {quote}
> 4. Modified the Health Checker and redefined the meaning 
> initial_interval_secs.
> {quote}
> has serious, unintended consequences.
> Consider the following health check config:
> {noformat}
>   initial_interval_secs: 10
>   interval_secs: 5
>   max_consecutive_failures: 1
> {noformat}
> On the 0.16.0 executor, no health checking will occur for the first 10 
> seconds. Here the earliest a task can cause failure is at the 10th second.
> On master, health checking starts right away which means the task can fail at 
> the first second since {{max_consecutive_failures}} is set to 1.
> This is not backwards compatible and needs to be fixed.
> I think a good solution would be to revert the meaning change to 
> initial_interval_secs and have the task transition into RUNNING when 
> {{max_consecutive_successes}} is met.
> An investigation shows {{initial_interval_secs}} was set to 5 but the task 
> failed health checks right away:
> {noformat}
> D1011 19:52:13.295877 6 health_checker.py:107] Health checks enabled. 
> Performing health check.
> D1011 19:52:13.306816 6 health_checker.py:126] Reset consecutive failures 
> counter.
> D1011 19:52:13.307032 6 health_checker.py:132] Initial interval expired.
> W1011 19:52:13.307130 6 health_checker.py:135] Failed to reach minimum 
> consecutive successes.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)