-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44926/#review124117
-----------------------------------------------------------




ambari-server/src/main/java/org/apache/ambari/server/actionmanager/HostRoleCommand.java
 (line 61)
<https://reviews.apache.org/r/44926/#comment186496>

    Today, startTime is allowed to be changed to -1 after it already had value. 
For this reason, I added a column called originalStartTime that can only be set 
once.



ambari-server/src/main/java/org/apache/ambari/server/agent/RetryActionMonitor.java
 (line 51)
<https://reviews.apache.org/r/44926/#comment186497>

    Switched to guava service :-)



ambari-server/src/main/java/org/apache/ambari/server/configuration/Configuration.java
 (line 1123)
<https://reviews.apache.org/r/44926/#comment186498>

    Feature will be turned off by default.



ambari-server/src/main/java/org/apache/ambari/server/upgrade/UpgradeCatalog240.java
 (line 643)
<https://reviews.apache.org/r/44926/#comment186499>

    Added column on ambari upgrade.


- Alejandro Fernandez


On March 17, 2016, 11:07 p.m., Alejandro Fernandez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44926/
> -----------------------------------------------------------
> 
> (Updated March 17, 2016, 11:07 p.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley and Nate Cole.
> 
> 
> Bugs: AMBARI-15446
>     https://issues.apache.org/jira/browse/AMBARI-15446
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> When a failure occurs during RU/EU and the task transitions to HOLDING_FAILED 
> or HOLDING_TIMEDOUT, want Ambari to automatically retry up to up to x mins. 
> This is useful when a host goes down as Ambari is running a task on it.
> ambari.properties will have 1 new parameter. E.g,. 
> stack-upgrade.max_retry_timeout_mins=15 (by default, will not be present)
> If Ambari Server is restarted, it should be able to recover.
> Today, Action Scheduler increases the attempt_count whenever a task is 
> retried, but it requires resetting the start_time to -1. Because of this, we 
> cannot rely on the start_time property to know when to timeout after several 
> retries.
> 
> For the implementation, will add another thread to Ambari that will monitor 
> failed tasks only during active RU/EU and change the status back to PENDING 
> so that Action Scheduler can reschedule it.
> Luckily, any tasks in HOLDING_TIMEDOUT and HOLDING_FAILED states are 
> blocking, so no other stages are allowed to proceed.
> In order to know when a task was first started, will add a new property to 
> host_role_command table called original_start_time.
> 
> For the agents, we need to ensure that they always write out a response. On 
> the first heartbeat, it should send the status of its last command so we know 
> it failed and Ambari can retry.
> 
> 
> Diffs
> -----
> 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java
>  429f573 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/HostRoleCommand.java
>  2764b3f 
>   
> ambari-server/src/main/java/org/apache/ambari/server/agent/HeartBeatHandler.java
>  3a80803 
>   
> ambari-server/src/main/java/org/apache/ambari/server/agent/HeartbeatProcessor.java
>  a1a686a 
>   
> ambari-server/src/main/java/org/apache/ambari/server/agent/RetryActionMonitor.java
>  PRE-CREATION 
>   
> ambari-server/src/main/java/org/apache/ambari/server/configuration/Configuration.java
>  9404506 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/dao/HostRoleCommandDAO.java
>  f5b1cb4 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/HostRoleCommandEntity.java
>  19f0602 
>   
> ambari-server/src/main/java/org/apache/ambari/server/topology/HostRequest.java
>  9eb514a 
>   
> ambari-server/src/main/java/org/apache/ambari/server/topology/LogicalRequest.java
>  82edbcf 
>   
> ambari-server/src/main/java/org/apache/ambari/server/upgrade/UpgradeCatalog240.java
>  a803f73 
>   ambari-server/src/main/resources/Ambari-DDL-MySQL-CREATE.sql 9b4810c 
>   ambari-server/src/main/resources/Ambari-DDL-Oracle-CREATE.sql cc3d197 
>   ambari-server/src/main/resources/Ambari-DDL-Postgres-CREATE.sql 07c786d 
>   ambari-server/src/main/resources/Ambari-DDL-Postgres-EMBEDDED-CREATE.sql 
> ab6dc93 
>   ambari-server/src/main/resources/Ambari-DDL-SQLAnywhere-CREATE.sql 8e91fde 
>   ambari-server/src/main/resources/Ambari-DDL-SQLServer-CREATE.sql 440ca44 
> 
> Diff: https://reviews.apache.org/r/44926/diff/
> 
> 
> Testing
> -------
> 
> Verified on a live cluster.
> 
> TODO: Still need to make more changes to the implementation, add the config, 
> switch to gauva service, add a column, and add unit tests.
> 
> 
> Thanks,
> 
> Alejandro Fernandez
> 
>

Reply via email to