----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/58109/#review170773 -----------------------------------------------------------
Ship it! Ship It! ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java Lines 184-185 (original), 153-154 (patched) <https://reviews.apache.org/r/58109/#comment243656> By using a more comple query, could we avoid making multiple calls the the DB to get the stage entities? The following (non-JPA) query should do the trick once properly formatted for JPA. However, I am not sure if all DBs would support it. Apparenly PostgreSQL does, according to my test, and I know ath MySQL does. I am not sure about other databases able to be used with Ambari. ``` SELECT * FROM stage s INNER JOIN ( SELECT s.request_id, MIN(s.stage_id) AS stage_id FROM stage s INNER JOIN host_role_command hrc ON (hrc.stage_id = s.stage_id AND hrc.request_id = s.request_id) WHERE hrc.status IN ('COMPLETED') GROUP BY s.request_id ORDER BY s.request_id ) AS foo ON (s.request_id = foo.request_id and s.stage_id = foo.stage_id); ``` - Robert Levas On March 31, 2017, 3:02 p.m., Jonathan Hurley wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/58109/ > ----------------------------------------------------------- > > (Updated March 31, 2017, 3:02 p.m.) > > > Review request for Ambari, Alejandro Fernandez, Nate Cole, and Robert Levas. > > > Bugs: BUG-20646 > https://issues.apache.org/jira/browse/BUG-20646 > > > Repository: ambari > > > Description > ------- > > When creating a massive request (a rolling upgrade on a cluster with 1000 > nodes), the size of the request seems to slow down the {{ActionScheduler}}. > Each command was taking between 1 to 2 minutes to run (even server-side > tasks). > > The cause of this can be seen in the following two stack traces: > > {code:title=ActionSchedulerImpl} > at > org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84) > at org.apache.ambari.server.actionmanager.Stage.<init>(Stage.java:157) > at > org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72) > at > org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getStagesInProgress(ActionDBAccessorImpl.java:303) > at > org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:341) > at > org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:302) > at java.lang.Thread.run(Thread.java:745) > {code} > > {code:title=Server Action Executor} > at > org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:700) > at > org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84) > at org.apache.ambari.server.actionmanager.Stage.<init>(Stage.java:157) > at > org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72) > at > org.apache.ambari.server.actionmanager.Request.<init>(Request.java:199) > at > org.apache.ambari.server.actionmanager.Request$$FastClassByGuice$$9071e03.newInstance(<generated>) > at > com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40) > at > com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:60) > at > com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85) > at > com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254) > at > com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978) > at > com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024) > at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974) > at > com.google.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:632) > at com.sun.proxy.$Proxy26.createExisting(Unknown Source) > at > org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getRequests(ActionDBAccessorImpl.java:784) > at > org.apache.ambari.server.serveraction.ServerActionExecutor.cleanRequestShareDataContexts(ServerActionExecutor.java:259) > - locked <0x00007ff0a14083c8> (a java.util.HashMap) > at > org.apache.ambari.server.serveraction.ServerActionExecutor.doWork(ServerActionExecutor.java:454) > at > org.apache.ambari.server.serveraction.ServerActionExecutor$1.run(ServerActionExecutor.java:160) > at java.lang.Thread.run(Thread.java:745) > {code} > > It's clear from these stacks that every {{PENDING}} stage (roughly 15,000) > were being loaded into memory every second (and their accompanying task as > well). This makes no sense as these methods don't need all stages - just the > _next_ stage. This is because all stages are synchronous within a single > request. > > The proposed solution is to fix the {{StageEntity.findByCommandStatuses}} > call so it doesn't return every stage: > {code} > SELECT stage.requestid, > MIN(stage.stageid) > FROM stageentity stage, > hostrolecommandentity hrc > WHERE hrc.status IN :statuses > AND hrc.stageid = stage.stageid > AND hrc.requestid = stage.requestid > GROUP BY stage.requestid > {code} > > > Diffs > ----- > > > ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessor.java > 9325d03 > > ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java > ab4feaa > > ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java > 0984c5c > ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java > 5151fb3 > > ambari-server/src/main/java/org/apache/ambari/server/orm/entities/StageEntity.java > f68338f > > ambari-server/src/main/java/org/apache/ambari/server/serveraction/ServerActionExecutor.java > b0be6b3 > > ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionDBAccessorImpl.java > 81eef3b > > ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionScheduler.java > 2b5d2f3 > > ambari-server/src/test/java/org/apache/ambari/server/orm/dao/RequestDAOTest.java > 9b62671 > > ambari-server/src/test/java/org/apache/ambari/server/serveraction/ServerActionExecutorTest.java > 44d5b63 > > ambari-server/src/test/java/org/apache/ambari/server/state/services/RetryUpgradeActionServiceTest.java > e2ce6e7 > > > Diff: https://reviews.apache.org/r/58109/diff/2/ > > > Testing > ------- > > Tests run: 4976, Failures: 0, Errors: 0, Skipped: 39 > > [INFO] > ------------------------------------------------------------------------ > [INFO] BUILD SUCCESS > [INFO] > ------------------------------------------------------------------------ > [INFO] Total time: 17:49 min > [INFO] Finished at: 2017-03-31T12:58:22-04:00 > [INFO] Final Memory: 59M/664M > [INFO] > ------------------------------------------------------------------------ > > > Thanks, > > Jonathan Hurley > >
