Re: Review Request 50865: Starting a Component After Pausing An Upgrade Can Take 9 Minutes

2016-08-06 Thread Jonathan Hurley

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50865/
---

(Updated Aug. 6, 2016, 1:42 p.m.)


Review request for Ambari, Alejandro Fernandez, Robert Levas, and Robert 
Nettleton.


Bugs: AMBARI-18052
https://issues.apache.org/jira/browse/AMBARI-18052


Repository: ambari


Description
---

The root cause of this seems to be how an upgrade is paused/resumed. The 
{{UpgradeResourceProvider}} loads the entire request in memory to iterate over 
it. In this case, that contains about 11,000 {{HostRoleCommandEntity}} where 
each one is between 2 and 3MB. That means, that we're trying to load about 33GB 
of data into memory.

This causes threads to die slowly, including scheduler threads, until the JVM 
can recover and start scheduling things again. 

The real question is _why_ each HRCEntity is so large. In many cases, the 
output includes information from HDFS, such as the state of SafeMode. These 
messages include the entire state of the system which is being captured to the 
stdout. I see two workarounds here:

The workaround here is to only load the necessary stages/tasks into memory, 
thereby reducing the footprint greatly.


Diffs (updated)
-

  
ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessor.java
 dcfe359 
  
ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java
 b44dc78 
  
ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java
 cdef06e 
  
ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java
 255cbbb 
  ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java 
541b2e9 
  
ambari-server/src/main/java/org/apache/ambari/server/orm/entities/StageEntity.java
 12ab568 
  
ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionScheduler.java
 22ac60f 

Diff: https://reviews.apache.org/r/50865/diff/


Testing (updated)
---

Added unit tests.


Thanks,

Jonathan Hurley



Re: Review Request 50865: Starting a Component After Pausing An Upgrade Can Take 9 Minutes

2016-08-05 Thread Robert Levas

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50865/#review144995
---


Ship it!




Ship It!

- Robert Levas


On Aug. 5, 2016, 4:37 p.m., Jonathan Hurley wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50865/
> ---
> 
> (Updated Aug. 5, 2016, 4:37 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Robert Levas, and Robert 
> Nettleton.
> 
> 
> Bugs: AMBARI-18052
> https://issues.apache.org/jira/browse/AMBARI-18052
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> The root cause of this seems to be how an upgrade is paused/resumed. The 
> {{UpgradeResourceProvider}} loads the entire request in memory to iterate 
> over it. In this case, that contains about 11,000 {{HostRoleCommandEntity}} 
> where each one is between 2 and 3MB. That means, that we're trying to load 
> about 33GB of data into memory.
> 
> This causes threads to die slowly, including scheduler threads, until the JVM 
> can recover and start scheduling things again. 
> 
> The real question is _why_ each HRCEntity is so large. In many cases, the 
> output includes information from HDFS, such as the state of SafeMode. These 
> messages include the entire state of the system which is being captured to 
> the stdout. I see two workarounds here:
> 
> The workaround here is to only load the necessary stages/tasks into memory, 
> thereby reducing the footprint greatly.
> 
> 
> Diffs
> -
> 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessor.java
>  dcfe359 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java
>  b44dc78 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java
>  cdef06e 
>   
> ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java
>  255cbbb 
>   ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java 
> 541b2e9 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/StageEntity.java
>  12ab568 
> 
> Diff: https://reviews.apache.org/r/50865/diff/
> 
> 
> Testing
> ---
> 
> PENDING
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>



Re: Review Request 50865: Starting a Component After Pausing An Upgrade Can Take 9 Minutes

2016-08-05 Thread Robert Nettleton

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50865/#review144980
---


Ship it!




Ship It!

- Robert Nettleton


On Aug. 5, 2016, 8:37 p.m., Jonathan Hurley wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50865/
> ---
> 
> (Updated Aug. 5, 2016, 8:37 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Robert Levas, and Robert 
> Nettleton.
> 
> 
> Bugs: AMBARI-18052
> https://issues.apache.org/jira/browse/AMBARI-18052
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> The root cause of this seems to be how an upgrade is paused/resumed. The 
> {{UpgradeResourceProvider}} loads the entire request in memory to iterate 
> over it. In this case, that contains about 11,000 {{HostRoleCommandEntity}} 
> where each one is between 2 and 3MB. That means, that we're trying to load 
> about 33GB of data into memory.
> 
> This causes threads to die slowly, including scheduler threads, until the JVM 
> can recover and start scheduling things again. 
> 
> The real question is _why_ each HRCEntity is so large. In many cases, the 
> output includes information from HDFS, such as the state of SafeMode. These 
> messages include the entire state of the system which is being captured to 
> the stdout. I see two workarounds here:
> 
> The workaround here is to only load the necessary stages/tasks into memory, 
> thereby reducing the footprint greatly.
> 
> 
> Diffs
> -
> 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessor.java
>  dcfe359 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java
>  b44dc78 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java
>  cdef06e 
>   
> ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java
>  255cbbb 
>   ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java 
> 541b2e9 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/StageEntity.java
>  12ab568 
> 
> Diff: https://reviews.apache.org/r/50865/diff/
> 
> 
> Testing
> ---
> 
> PENDING
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>