Re: Review Request 49322: Failed task status during EU is wrongly reported as SKIPPED_FAILED instead of TIMED_OUT

2016-06-28 Thread Alejandro Fernandez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49322/#review139900
---


Ship it!




Ship It!

- Alejandro Fernandez


On June 28, 2016, 6:39 p.m., Jonathan Hurley wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49322/
> ---
> 
> (Updated June 28, 2016, 6:39 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Nate Cole, and Robert Levas.
> 
> 
> Bugs: AMBARI-17464
> https://issues.apache.org/jira/browse/AMBARI-17464
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> # Deploy HDP 2.4 cluster
> # Start EU to 2.5.0.0 and wait for EU to reach till backup Hive DB prompt
> # At this point, stop Ambari agent on the "slave only" node (lets call the 
> host as host1)
> # Proceed with rest of the EU
> 
> - The task - {{Stop Datanode}} (under Stop Core Components for Core Services 
> upgrade group) shows as {{HOLDING_TIMEDOUT}} for host1
> - Hit 'Ignore and Proceed' to continue further
> - The task - Restart HDFS/DATANODE shows as {{SKIPPED_FAILED}} and EU 
> continues to the next steps
> 
> Caused by AMBARI-15671 - basically this is assuming that all tasks within a 
> stage follow the stage's auto skip setting, which is wrong.
> 
> 
> Diffs
> -
> 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java
>  8c27d3c 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/HostRoleCommand.java
>  2b9c10b 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/Stage.java 
> 3fbeef9 
>   
> ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionScheduler.java
>  d2c7de9 
> 
> Diff: https://reviews.apache.org/r/49322/diff/
> 
> 
> Testing
> ---
> 
> Tests run: 4521, Failures: 0, Errors: 0, Skipped: 34
> 
> [INFO] 
> 
> [INFO] BUILD SUCCESS
> [INFO] 
> 
> [INFO] Total time: 41:08 min
> [INFO] Finished at: 2016-06-28T14:23:57-04:00
> [INFO] Final Memory: 37M/603M
> [INFO] 
> 
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>



Re: Review Request 49322: Failed task status during EU is wrongly reported as SKIPPED_FAILED instead of TIMED_OUT

2016-06-28 Thread Jonathan Hurley

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49322/
---

(Updated June 28, 2016, 2:39 p.m.)


Review request for Ambari, Alejandro Fernandez, Nate Cole, and Robert Levas.


Bugs: AMBARI-17464
https://issues.apache.org/jira/browse/AMBARI-17464


Repository: ambari


Description
---

# Deploy HDP 2.4 cluster
# Start EU to 2.5.0.0 and wait for EU to reach till backup Hive DB prompt
# At this point, stop Ambari agent on the "slave only" node (lets call the host 
as host1)
# Proceed with rest of the EU

- The task - {{Stop Datanode}} (under Stop Core Components for Core Services 
upgrade group) shows as {{HOLDING_TIMEDOUT}} for host1
- Hit 'Ignore and Proceed' to continue further
- The task - Restart HDFS/DATANODE shows as {{SKIPPED_FAILED}} and EU continues 
to the next steps

Caused by AMBARI-15671 - basically this is assuming that all tasks within a 
stage follow the stage's auto skip setting, which is wrong.


Diffs
-

  
ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java
 8c27d3c 
  
ambari-server/src/main/java/org/apache/ambari/server/actionmanager/HostRoleCommand.java
 2b9c10b 
  ambari-server/src/main/java/org/apache/ambari/server/actionmanager/Stage.java 
3fbeef9 
  
ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionScheduler.java
 d2c7de9 

Diff: https://reviews.apache.org/r/49322/diff/


Testing (updated)
---

Tests run: 4521, Failures: 0, Errors: 0, Skipped: 34

[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 41:08 min
[INFO] Finished at: 2016-06-28T14:23:57-04:00
[INFO] Final Memory: 37M/603M
[INFO] 


Thanks,

Jonathan Hurley



Re: Review Request 49322: Failed task status during EU is wrongly reported as SKIPPED_FAILED instead of TIMED_OUT

2016-06-28 Thread Robert Levas

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49322/#review139829
---


Ship it!




Ship It!

- Robert Levas


On June 28, 2016, 11:08 a.m., Jonathan Hurley wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49322/
> ---
> 
> (Updated June 28, 2016, 11:08 a.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Nate Cole, and Robert Levas.
> 
> 
> Bugs: AMBARI-17464
> https://issues.apache.org/jira/browse/AMBARI-17464
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> # Deploy HDP 2.4 cluster
> # Start EU to 2.5.0.0 and wait for EU to reach till backup Hive DB prompt
> # At this point, stop Ambari agent on the "slave only" node (lets call the 
> host as host1)
> # Proceed with rest of the EU
> 
> - The task - {{Stop Datanode}} (under Stop Core Components for Core Services 
> upgrade group) shows as {{HOLDING_TIMEDOUT}} for host1
> - Hit 'Ignore and Proceed' to continue further
> - The task - Restart HDFS/DATANODE shows as {{SKIPPED_FAILED}} and EU 
> continues to the next steps
> 
> Caused by AMBARI-15671 - basically this is assuming that all tasks within a 
> stage follow the stage's auto skip setting, which is wrong.
> 
> 
> Diffs
> -
> 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java
>  8c27d3c 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/HostRoleCommand.java
>  2b9c10b 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/Stage.java 
> 3fbeef9 
>   
> ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionScheduler.java
>  d2c7de9 
> 
> Diff: https://reviews.apache.org/r/49322/diff/
> 
> 
> Testing
> ---
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>



Re: Review Request 49322: Failed task status during EU is wrongly reported as SKIPPED_FAILED instead of TIMED_OUT

2016-06-28 Thread Jonathan Hurley


> On June 28, 2016, 11:43 a.m., Nate Cole wrote:
> > ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java,
> >  lines 701-704
> > 
> >
> > Could also use db.getTask(long), but this implementation probably saves 
> > a potential DB hit on cache miss?

Yes, I wanted to ensure thwe we didn't hit the DB. Since these tasks are 
already retrieved and I justed needed a simple boolean, it made sense to re-use 
what's in memory.


> On June 28, 2016, 11:43 a.m., Nate Cole wrote:
> > ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java,
> >  lines 726-727
> > 
> >
> > Will this flood logs?  It's hard to get context on RB.

It shouldn't - it will only log to INFO on when a task is rescheduled after 
being timed out.


- Jonathan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49322/#review139807
---


On June 28, 2016, 11:08 a.m., Jonathan Hurley wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49322/
> ---
> 
> (Updated June 28, 2016, 11:08 a.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Nate Cole, and Robert Levas.
> 
> 
> Bugs: AMBARI-17464
> https://issues.apache.org/jira/browse/AMBARI-17464
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> # Deploy HDP 2.4 cluster
> # Start EU to 2.5.0.0 and wait for EU to reach till backup Hive DB prompt
> # At this point, stop Ambari agent on the "slave only" node (lets call the 
> host as host1)
> # Proceed with rest of the EU
> 
> - The task - {{Stop Datanode}} (under Stop Core Components for Core Services 
> upgrade group) shows as {{HOLDING_TIMEDOUT}} for host1
> - Hit 'Ignore and Proceed' to continue further
> - The task - Restart HDFS/DATANODE shows as {{SKIPPED_FAILED}} and EU 
> continues to the next steps
> 
> Caused by AMBARI-15671 - basically this is assuming that all tasks within a 
> stage follow the stage's auto skip setting, which is wrong.
> 
> 
> Diffs
> -
> 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java
>  8c27d3c 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/HostRoleCommand.java
>  2b9c10b 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/Stage.java 
> 3fbeef9 
>   
> ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionScheduler.java
>  d2c7de9 
> 
> Diff: https://reviews.apache.org/r/49322/diff/
> 
> 
> Testing
> ---
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>



Re: Review Request 49322: Failed task status during EU is wrongly reported as SKIPPED_FAILED instead of TIMED_OUT

2016-06-28 Thread Nate Cole

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49322/#review139807
---


Ship it!





ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java
 (lines 700 - 703)


Could also use db.getTask(long), but this implementation probably saves a 
potential DB hit on cache miss?



ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java
 (lines 725 - 726)


Will this flood logs?  It's hard to get context on RB.


- Nate Cole


On June 28, 2016, 11:08 a.m., Jonathan Hurley wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49322/
> ---
> 
> (Updated June 28, 2016, 11:08 a.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Nate Cole, and Robert Levas.
> 
> 
> Bugs: AMBARI-17464
> https://issues.apache.org/jira/browse/AMBARI-17464
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> # Deploy HDP 2.4 cluster
> # Start EU to 2.5.0.0 and wait for EU to reach till backup Hive DB prompt
> # At this point, stop Ambari agent on the "slave only" node (lets call the 
> host as host1)
> # Proceed with rest of the EU
> 
> - The task - {{Stop Datanode}} (under Stop Core Components for Core Services 
> upgrade group) shows as {{HOLDING_TIMEDOUT}} for host1
> - Hit 'Ignore and Proceed' to continue further
> - The task - Restart HDFS/DATANODE shows as {{SKIPPED_FAILED}} and EU 
> continues to the next steps
> 
> Caused by AMBARI-15671 - basically this is assuming that all tasks within a 
> stage follow the stage's auto skip setting, which is wrong.
> 
> 
> Diffs
> -
> 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java
>  8c27d3c 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/HostRoleCommand.java
>  2b9c10b 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/Stage.java 
> 3fbeef9 
>   
> ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionScheduler.java
>  d2c7de9 
> 
> Diff: https://reviews.apache.org/r/49322/diff/
> 
> 
> Testing
> ---
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>