-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50865/
-----------------------------------------------------------

Review request for Ambari, Alejandro Fernandez, Robert Levas, and Robert 
Nettleton.


Bugs: AMBARI-18052
    https://issues.apache.org/jira/browse/AMBARI-18052


Repository: ambari


Description
-------

The root cause of this seems to be how an upgrade is paused/resumed. The 
{{UpgradeResourceProvider}} loads the entire request in memory to iterate over 
it. In this case, that contains about 11,000 {{HostRoleCommandEntity}} where 
each one is between 2 and 3MB. That means, that we're trying to load about 33GB 
of data into memory.

This causes threads to die slowly, including scheduler threads, until the JVM 
can recover and start scheduling things again. 

The real question is _why_ each HRCEntity is so large. In many cases, the 
output includes information from HDFS, such as the state of SafeMode. These 
messages include the entire state of the system which is being captured to the 
stdout. I see two workarounds here:

The workaround here is to only load the necessary stages/tasks into memory, 
thereby reducing the footprint greatly.


Diffs
-----

  
ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessor.java
 dcfe359 
  
ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java
 b44dc78 
  
ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java
 cdef06e 
  
ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java
 255cbbb 
  ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java 
541b2e9 
  
ambari-server/src/main/java/org/apache/ambari/server/orm/entities/StageEntity.java
 12ab568 

Diff: https://reviews.apache.org/r/50865/diff/


Testing
-------

PENDING


Thanks,

Jonathan Hurley

Reply via email to