Re: Review Request 50865: Starting a Component After Pausing An Upgrade Can Take 9 Minutes
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50865/ --- (Updated Aug. 6, 2016, 1:42 p.m.) Review request for Ambari, Alejandro Fernandez, Robert Levas, and Robert Nettleton. Bugs: AMBARI-18052 https://issues.apache.org/jira/browse/AMBARI-18052 Repository: ambari Description --- The root cause of this seems to be how an upgrade is paused/resumed. The {{UpgradeResourceProvider}} loads the entire request in memory to iterate over it. In this case, that contains about 11,000 {{HostRoleCommandEntity}} where each one is between 2 and 3MB. That means, that we're trying to load about 33GB of data into memory. This causes threads to die slowly, including scheduler threads, until the JVM can recover and start scheduling things again. The real question is _why_ each HRCEntity is so large. In many cases, the output includes information from HDFS, such as the state of SafeMode. These messages include the entire state of the system which is being captured to the stdout. I see two workarounds here: The workaround here is to only load the necessary stages/tasks into memory, thereby reducing the footprint greatly. Diffs (updated) - ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessor.java dcfe359 ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java b44dc78 ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java cdef06e ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java 255cbbb ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java 541b2e9 ambari-server/src/main/java/org/apache/ambari/server/orm/entities/StageEntity.java 12ab568 ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionScheduler.java 22ac60f Diff: https://reviews.apache.org/r/50865/diff/ Testing (updated) --- Added unit tests. Thanks, Jonathan Hurley
Re: Review Request 50865: Starting a Component After Pausing An Upgrade Can Take 9 Minutes
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50865/#review144995 --- Ship it! Ship It! - Robert Levas On Aug. 5, 2016, 4:37 p.m., Jonathan Hurley wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/50865/ > --- > > (Updated Aug. 5, 2016, 4:37 p.m.) > > > Review request for Ambari, Alejandro Fernandez, Robert Levas, and Robert > Nettleton. > > > Bugs: AMBARI-18052 > https://issues.apache.org/jira/browse/AMBARI-18052 > > > Repository: ambari > > > Description > --- > > The root cause of this seems to be how an upgrade is paused/resumed. The > {{UpgradeResourceProvider}} loads the entire request in memory to iterate > over it. In this case, that contains about 11,000 {{HostRoleCommandEntity}} > where each one is between 2 and 3MB. That means, that we're trying to load > about 33GB of data into memory. > > This causes threads to die slowly, including scheduler threads, until the JVM > can recover and start scheduling things again. > > The real question is _why_ each HRCEntity is so large. In many cases, the > output includes information from HDFS, such as the state of SafeMode. These > messages include the entire state of the system which is being captured to > the stdout. I see two workarounds here: > > The workaround here is to only load the necessary stages/tasks into memory, > thereby reducing the footprint greatly. > > > Diffs > - > > > ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessor.java > dcfe359 > > ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java > b44dc78 > > ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java > cdef06e > > ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java > 255cbbb > ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java > 541b2e9 > > ambari-server/src/main/java/org/apache/ambari/server/orm/entities/StageEntity.java > 12ab568 > > Diff: https://reviews.apache.org/r/50865/diff/ > > > Testing > --- > > PENDING > > > Thanks, > > Jonathan Hurley > >
Re: Review Request 50865: Starting a Component After Pausing An Upgrade Can Take 9 Minutes
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50865/#review144980 --- Ship it! Ship It! - Robert Nettleton On Aug. 5, 2016, 8:37 p.m., Jonathan Hurley wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/50865/ > --- > > (Updated Aug. 5, 2016, 8:37 p.m.) > > > Review request for Ambari, Alejandro Fernandez, Robert Levas, and Robert > Nettleton. > > > Bugs: AMBARI-18052 > https://issues.apache.org/jira/browse/AMBARI-18052 > > > Repository: ambari > > > Description > --- > > The root cause of this seems to be how an upgrade is paused/resumed. The > {{UpgradeResourceProvider}} loads the entire request in memory to iterate > over it. In this case, that contains about 11,000 {{HostRoleCommandEntity}} > where each one is between 2 and 3MB. That means, that we're trying to load > about 33GB of data into memory. > > This causes threads to die slowly, including scheduler threads, until the JVM > can recover and start scheduling things again. > > The real question is _why_ each HRCEntity is so large. In many cases, the > output includes information from HDFS, such as the state of SafeMode. These > messages include the entire state of the system which is being captured to > the stdout. I see two workarounds here: > > The workaround here is to only load the necessary stages/tasks into memory, > thereby reducing the footprint greatly. > > > Diffs > - > > > ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessor.java > dcfe359 > > ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java > b44dc78 > > ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java > cdef06e > > ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java > 255cbbb > ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java > 541b2e9 > > ambari-server/src/main/java/org/apache/ambari/server/orm/entities/StageEntity.java > 12ab568 > > Diff: https://reviews.apache.org/r/50865/diff/ > > > Testing > --- > > PENDING > > > Thanks, > > Jonathan Hurley > >