> On March 31, 2017, 5:07 p.m., Robert Levas wrote:
> > ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java
> > Lines 184-185 (original), 153-154 (patched)
> > <https://reviews.apache.org/r/58109/diff/2/?file=1682589#file1682589line190>
> >
> >     By using a more comple query, could we avoid making multiple calls the 
> > the DB to get the stage entities?
> >     
> >     The following (non-JPA) query should do the trick once properly 
> > formatted for JPA. However, I am not sure if all DBs would support it.  
> > Apparenly PostgreSQL does, according to my test, and I know ath MySQL does. 
> >  I am not sure about other databases able to be used with Ambari. 
> >     
> >     ```
> >     SELECT *
> >     FROM stage s 
> >     INNER JOIN (
> >       SELECT s.request_id, MIN(s.stage_id) AS stage_id 
> >       FROM stage s 
> >       INNER JOIN host_role_command hrc ON (hrc.stage_id = s.stage_id AND 
> > hrc.request_id = s.request_id) 
> >       WHERE hrc.status  IN ('COMPLETED')
> >       GROUP BY s.request_id 
> >       ORDER BY s.request_id
> >     ) AS foo ON (s.request_id = foo.request_id and s.stage_id = 
> > foo.stage_id); 
> >     ```
> 
> Jonathan Hurley wrote:
>     This doesn't call into the database multiple times. The 2nd hit is a 
> cache-only lookup. I think when I was researching how to do this, that query 
> had problems on some databases... Namely; how do you get the entity from it 
> when the request_id is in the returned results.

I figured you would have looked at this approach... thanks or the clarification.


- Robert


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58109/#review170773
-----------------------------------------------------------


On March 31, 2017, 9:16 p.m., Jonathan Hurley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58109/
> -----------------------------------------------------------
> 
> (Updated March 31, 2017, 9:16 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Nate Cole, and Robert Levas.
> 
> 
> Bugs: AMBARI-20646
>     https://issues.apache.org/jira/browse/AMBARI-20646
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> When creating a massive request (a rolling upgrade on a cluster with 1000 
> nodes), the size of the request seems to slow down the {{ActionScheduler}}. 
> Each command was taking between 1 to 2 minutes to run (even server-side 
> tasks). 
> 
> The cause of this can be seen in the following two stack traces:
> 
> {code:title=ActionSchedulerImpl}
>       at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
>       at org.apache.ambari.server.actionmanager.Stage.<init>(Stage.java:157)
>       at 
> org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
>       at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getStagesInProgress(ActionDBAccessorImpl.java:303)
>       at 
> org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:341)
>       at 
> org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:302)
>       at java.lang.Thread.run(Thread.java:745)
> {code}
> 
> {code:title=Server Action Executor}
>       at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:700)
>       at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
>       at org.apache.ambari.server.actionmanager.Stage.<init>(Stage.java:157)
>       at 
> org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
>       at 
> org.apache.ambari.server.actionmanager.Request.<init>(Request.java:199)
>       at 
> org.apache.ambari.server.actionmanager.Request$$FastClassByGuice$$9071e03.newInstance(<generated>)
>       at 
> com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
>       at 
> com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:60)
>       at 
> com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
>       at 
> com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
>       at 
> com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)
>       at 
> com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
>       at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974)
>       at 
> com.google.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:632)
>       at com.sun.proxy.$Proxy26.createExisting(Unknown Source)
>       at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getRequests(ActionDBAccessorImpl.java:784)
>       at 
> org.apache.ambari.server.serveraction.ServerActionExecutor.cleanRequestShareDataContexts(ServerActionExecutor.java:259)
>       - locked <0x00007ff0a14083c8> (a java.util.HashMap)
>       at 
> org.apache.ambari.server.serveraction.ServerActionExecutor.doWork(ServerActionExecutor.java:454)
>       at 
> org.apache.ambari.server.serveraction.ServerActionExecutor$1.run(ServerActionExecutor.java:160)
>       at java.lang.Thread.run(Thread.java:745)
> {code}
> 
> It's clear from these stacks that every {{PENDING}} stage (roughly 15,000) 
> were being loaded into memory every second (and their accompanying task as 
> well). This makes no sense as these methods don't need all stages - just the 
> _next_ stage. This is because all stages are synchronous within a single 
> request.
> 
> The proposed solution is to fix the {{StageEntity.findByCommandStatuses}} 
> call so it doesn't return every stage:
> {code}
> SELECT stage.requestid, 
>        MIN(stage.stageid) 
> FROM   stageentity stage, 
>        hostrolecommandentity hrc 
> WHERE  hrc.status IN :statuses 
>        AND hrc.stageid = stage.stageid 
>        AND hrc.requestid = stage.requestid 
> GROUP  BY stage.requestid 
> {code}
> 
> 
> Diffs
> -----
> 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessor.java
>  9325d03 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java
>  ab4feaa 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java
>  0984c5c 
>   ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java 
> 5151fb3 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/StageEntity.java
>  f68338f 
>   
> ambari-server/src/main/java/org/apache/ambari/server/serveraction/ServerActionExecutor.java
>  b0be6b3 
>   
> ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionDBAccessorImpl.java
>  81eef3b 
>   
> ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionScheduler.java
>  2b5d2f3 
>   
> ambari-server/src/test/java/org/apache/ambari/server/orm/dao/RequestDAOTest.java
>  9b62671 
>   
> ambari-server/src/test/java/org/apache/ambari/server/serveraction/ServerActionExecutorTest.java
>  44d5b63 
>   
> ambari-server/src/test/java/org/apache/ambari/server/state/services/RetryUpgradeActionServiceTest.java
>  e2ce6e7 
> 
> 
> Diff: https://reviews.apache.org/r/58109/diff/2/
> 
> 
> Testing
> -------
> 
> Tests run: 4976, Failures: 0, Errors: 0, Skipped: 39
> 
> [INFO] 
> ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO] 
> ------------------------------------------------------------------------
> [INFO] Total time: 17:49 min
> [INFO] Finished at: 2017-03-31T12:58:22-04:00
> [INFO] Final Memory: 59M/664M
> [INFO] 
> ------------------------------------------------------------------------
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>

Reply via email to