-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44926/#review124173
-----------------------------------------------------------


Fix it, then Ship it!




I like it!  Just a couple of nits and a comment.


ambari-server/src/main/java/org/apache/ambari/server/agent/RetryActionMonitor.java
 (lines 164 - 165)
<https://reviews.apache.org/r/44926/#comment186608>

    Use {} syntax for logging



ambari-server/src/main/java/org/apache/ambari/server/agent/RetryActionMonitor.java
 (lines 211 - 216)
<https://reviews.apache.org/r/44926/#comment186609>

    Not a huge fan of string comparisons here.  Can we ignore server side 
actions (no host) instead or something?  We should probably do that anyway, and 
if we have command detail OR command names that specifically need it, let's add 
an ambari.properties CSV for it.



ambari-server/src/main/java/org/apache/ambari/server/configuration/Configuration.java
 (line 1123)
<https://reviews.apache.org/r/44926/#comment186610>

    NumberUtils.toInt() can be your friend :)


- Nate Cole


On March 17, 2016, 7:07 p.m., Alejandro Fernandez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44926/
> -----------------------------------------------------------
> 
> (Updated March 17, 2016, 7:07 p.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley and Nate Cole.
> 
> 
> Bugs: AMBARI-15446
>     https://issues.apache.org/jira/browse/AMBARI-15446
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> When a failure occurs during RU/EU and the task transitions to HOLDING_FAILED 
> or HOLDING_TIMEDOUT, want Ambari to automatically retry up to up to x mins. 
> This is useful when a host goes down as Ambari is running a task on it.
> ambari.properties will have 1 new parameter. E.g,. 
> stack-upgrade.max_retry_timeout_mins=15 (by default, will not be present)
> If Ambari Server is restarted, it should be able to recover.
> Today, Action Scheduler increases the attempt_count whenever a task is 
> retried, but it requires resetting the start_time to -1. Because of this, we 
> cannot rely on the start_time property to know when to timeout after several 
> retries.
> 
> For the implementation, will add another thread to Ambari that will monitor 
> failed tasks only during active RU/EU and change the status back to PENDING 
> so that Action Scheduler can reschedule it.
> Luckily, any tasks in HOLDING_TIMEDOUT and HOLDING_FAILED states are 
> blocking, so no other stages are allowed to proceed.
> In order to know when a task was first started, will add a new property to 
> host_role_command table called original_start_time.
> 
> For the agents, we need to ensure that they always write out a response. On 
> the first heartbeat, it should send the status of its last command so we know 
> it failed and Ambari can retry.
> 
> 
> Diffs
> -----
> 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java
>  429f573 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/HostRoleCommand.java
>  2764b3f 
>   
> ambari-server/src/main/java/org/apache/ambari/server/agent/HeartBeatHandler.java
>  3a80803 
>   
> ambari-server/src/main/java/org/apache/ambari/server/agent/HeartbeatProcessor.java
>  a1a686a 
>   
> ambari-server/src/main/java/org/apache/ambari/server/agent/RetryActionMonitor.java
>  PRE-CREATION 
>   
> ambari-server/src/main/java/org/apache/ambari/server/configuration/Configuration.java
>  9404506 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/dao/HostRoleCommandDAO.java
>  f5b1cb4 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/HostRoleCommandEntity.java
>  19f0602 
>   
> ambari-server/src/main/java/org/apache/ambari/server/topology/HostRequest.java
>  9eb514a 
>   
> ambari-server/src/main/java/org/apache/ambari/server/topology/LogicalRequest.java
>  82edbcf 
>   
> ambari-server/src/main/java/org/apache/ambari/server/upgrade/UpgradeCatalog240.java
>  a803f73 
>   ambari-server/src/main/resources/Ambari-DDL-MySQL-CREATE.sql 9b4810c 
>   ambari-server/src/main/resources/Ambari-DDL-Oracle-CREATE.sql cc3d197 
>   ambari-server/src/main/resources/Ambari-DDL-Postgres-CREATE.sql 07c786d 
>   ambari-server/src/main/resources/Ambari-DDL-Postgres-EMBEDDED-CREATE.sql 
> ab6dc93 
>   ambari-server/src/main/resources/Ambari-DDL-SQLAnywhere-CREATE.sql 8e91fde 
>   ambari-server/src/main/resources/Ambari-DDL-SQLServer-CREATE.sql 440ca44 
> 
> Diff: https://reviews.apache.org/r/44926/diff/
> 
> 
> Testing
> -------
> 
> Verified on a live cluster.
> 
> TODO: Still need to make more changes to the implementation, add the config, 
> switch to gauva service, add a column, and add unit tests.
> 
> 
> Thanks,
> 
> Alejandro Fernandez
> 
>

Reply via email to