Re: Review Request 58109: Large Long Running Requests Can Slow Down the ActionScheduler

2017-04-03 Thread Robert Levas


> On March 31, 2017, 5:07 p.m., Robert Levas wrote:
> > ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java
> > Lines 184-185 (original), 153-154 (patched)
> > 
> >
> > By using a more comple query, could we avoid making multiple calls the 
> > the DB to get the stage entities?
> > 
> > The following (non-JPA) query should do the trick once properly 
> > formatted for JPA. However, I am not sure if all DBs would support it.  
> > Apparenly PostgreSQL does, according to my test, and I know ath MySQL does. 
> >  I am not sure about other databases able to be used with Ambari. 
> > 
> > ```
> > SELECT *
> > FROM stage s 
> > INNER JOIN (
> >   SELECT s.request_id, MIN(s.stage_id) AS stage_id 
> >   FROM stage s 
> >   INNER JOIN host_role_command hrc ON (hrc.stage_id = s.stage_id AND 
> > hrc.request_id = s.request_id) 
> >   WHERE hrc.status  IN ('COMPLETED')
> >   GROUP BY s.request_id 
> >   ORDER BY s.request_id
> > ) AS foo ON (s.request_id = foo.request_id and s.stage_id = 
> > foo.stage_id); 
> > ```
> 
> Jonathan Hurley wrote:
> This doesn't call into the database multiple times. The 2nd hit is a 
> cache-only lookup. I think when I was researching how to do this, that query 
> had problems on some databases... Namely; how do you get the entity from it 
> when the request_id is in the returned results.
> 
> Robert Levas wrote:
> I figured you would have looked at this approach... thanks or the 
> clarification.
> 
> Jonathan Hurley wrote:
> Sure - I also don't think that JPA supports multiple matches in the ON 
> clause in a subquery ... We can always opena Jira to investigate if there is 
> a better call. But considering this is about 10,000x more performant than 
> what was there, I'm OK if we only got 99% efficiency :)

There is no need to open a new JIRA to investigate.  I'll take 10k better 
performance as a consolation. ;)


- Robert


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58109/#review170773
---


On March 31, 2017, 9:16 p.m., Jonathan Hurley wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58109/
> ---
> 
> (Updated March 31, 2017, 9:16 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Nate Cole, and Robert Levas.
> 
> 
> Bugs: AMBARI-20646
> https://issues.apache.org/jira/browse/AMBARI-20646
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> When creating a massive request (a rolling upgrade on a cluster with 1000 
> nodes), the size of the request seems to slow down the {{ActionScheduler}}. 
> Each command was taking between 1 to 2 minutes to run (even server-side 
> tasks). 
> 
> The cause of this can be seen in the following two stack traces:
> 
> {code:title=ActionSchedulerImpl}
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
>   at org.apache.ambari.server.actionmanager.Stage.(Stage.java:157)
>   at 
> org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getStagesInProgress(ActionDBAccessorImpl.java:303)
>   at 
> org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:341)
>   at 
> org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:302)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> 
> {code:title=Server Action Executor}
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:700)
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
>   at org.apache.ambari.server.actionmanager.Stage.(Stage.java:157)
>   at 
> org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
>   at 
> org.apache.ambari.server.actionmanager.Request.(Request.java:199)
>   at 
> org.apache.ambari.server.actionmanager.Request$$FastClassByGuice$$9071e03.newInstance()
>   at 
> com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
>   at 
> com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:60)
>   at 
> com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
>   at 
> com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
>   at 
> 

Re: Review Request 58109: Large Long Running Requests Can Slow Down the ActionScheduler

2017-04-01 Thread Jonathan Hurley


> On March 31, 2017, 5:07 p.m., Robert Levas wrote:
> > ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java
> > Lines 184-185 (original), 153-154 (patched)
> > 
> >
> > By using a more comple query, could we avoid making multiple calls the 
> > the DB to get the stage entities?
> > 
> > The following (non-JPA) query should do the trick once properly 
> > formatted for JPA. However, I am not sure if all DBs would support it.  
> > Apparenly PostgreSQL does, according to my test, and I know ath MySQL does. 
> >  I am not sure about other databases able to be used with Ambari. 
> > 
> > ```
> > SELECT *
> > FROM stage s 
> > INNER JOIN (
> >   SELECT s.request_id, MIN(s.stage_id) AS stage_id 
> >   FROM stage s 
> >   INNER JOIN host_role_command hrc ON (hrc.stage_id = s.stage_id AND 
> > hrc.request_id = s.request_id) 
> >   WHERE hrc.status  IN ('COMPLETED')
> >   GROUP BY s.request_id 
> >   ORDER BY s.request_id
> > ) AS foo ON (s.request_id = foo.request_id and s.stage_id = 
> > foo.stage_id); 
> > ```
> 
> Jonathan Hurley wrote:
> This doesn't call into the database multiple times. The 2nd hit is a 
> cache-only lookup. I think when I was researching how to do this, that query 
> had problems on some databases... Namely; how do you get the entity from it 
> when the request_id is in the returned results.
> 
> Robert Levas wrote:
> I figured you would have looked at this approach... thanks or the 
> clarification.

Sure - I also don't think that JPA supports multiple matches in the ON clause 
in a subquery ... We can always opena Jira to investigate if there is a better 
call. But considering this is about 10,000x more performant than what was 
there, I'm OK if we only got 99% efficiency :)


- Jonathan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58109/#review170773
---


On March 31, 2017, 9:16 p.m., Jonathan Hurley wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58109/
> ---
> 
> (Updated March 31, 2017, 9:16 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Nate Cole, and Robert Levas.
> 
> 
> Bugs: AMBARI-20646
> https://issues.apache.org/jira/browse/AMBARI-20646
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> When creating a massive request (a rolling upgrade on a cluster with 1000 
> nodes), the size of the request seems to slow down the {{ActionScheduler}}. 
> Each command was taking between 1 to 2 minutes to run (even server-side 
> tasks). 
> 
> The cause of this can be seen in the following two stack traces:
> 
> {code:title=ActionSchedulerImpl}
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
>   at org.apache.ambari.server.actionmanager.Stage.(Stage.java:157)
>   at 
> org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getStagesInProgress(ActionDBAccessorImpl.java:303)
>   at 
> org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:341)
>   at 
> org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:302)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> 
> {code:title=Server Action Executor}
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:700)
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
>   at org.apache.ambari.server.actionmanager.Stage.(Stage.java:157)
>   at 
> org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
>   at 
> org.apache.ambari.server.actionmanager.Request.(Request.java:199)
>   at 
> org.apache.ambari.server.actionmanager.Request$$FastClassByGuice$$9071e03.newInstance()
>   at 
> com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
>   at 
> com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:60)
>   at 
> com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
>   at 
> com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
>   at 
> com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)
>   at 
> com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
>   at 

Re: Review Request 58109: Large Long Running Requests Can Slow Down the ActionScheduler

2017-04-01 Thread Robert Levas


> On March 31, 2017, 5:07 p.m., Robert Levas wrote:
> > ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java
> > Lines 184-185 (original), 153-154 (patched)
> > 
> >
> > By using a more comple query, could we avoid making multiple calls the 
> > the DB to get the stage entities?
> > 
> > The following (non-JPA) query should do the trick once properly 
> > formatted for JPA. However, I am not sure if all DBs would support it.  
> > Apparenly PostgreSQL does, according to my test, and I know ath MySQL does. 
> >  I am not sure about other databases able to be used with Ambari. 
> > 
> > ```
> > SELECT *
> > FROM stage s 
> > INNER JOIN (
> >   SELECT s.request_id, MIN(s.stage_id) AS stage_id 
> >   FROM stage s 
> >   INNER JOIN host_role_command hrc ON (hrc.stage_id = s.stage_id AND 
> > hrc.request_id = s.request_id) 
> >   WHERE hrc.status  IN ('COMPLETED')
> >   GROUP BY s.request_id 
> >   ORDER BY s.request_id
> > ) AS foo ON (s.request_id = foo.request_id and s.stage_id = 
> > foo.stage_id); 
> > ```
> 
> Jonathan Hurley wrote:
> This doesn't call into the database multiple times. The 2nd hit is a 
> cache-only lookup. I think when I was researching how to do this, that query 
> had problems on some databases... Namely; how do you get the entity from it 
> when the request_id is in the returned results.

I figured you would have looked at this approach... thanks or the clarification.


- Robert


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58109/#review170773
---


On March 31, 2017, 9:16 p.m., Jonathan Hurley wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58109/
> ---
> 
> (Updated March 31, 2017, 9:16 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Nate Cole, and Robert Levas.
> 
> 
> Bugs: AMBARI-20646
> https://issues.apache.org/jira/browse/AMBARI-20646
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> When creating a massive request (a rolling upgrade on a cluster with 1000 
> nodes), the size of the request seems to slow down the {{ActionScheduler}}. 
> Each command was taking between 1 to 2 minutes to run (even server-side 
> tasks). 
> 
> The cause of this can be seen in the following two stack traces:
> 
> {code:title=ActionSchedulerImpl}
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
>   at org.apache.ambari.server.actionmanager.Stage.(Stage.java:157)
>   at 
> org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getStagesInProgress(ActionDBAccessorImpl.java:303)
>   at 
> org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:341)
>   at 
> org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:302)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> 
> {code:title=Server Action Executor}
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:700)
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
>   at org.apache.ambari.server.actionmanager.Stage.(Stage.java:157)
>   at 
> org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
>   at 
> org.apache.ambari.server.actionmanager.Request.(Request.java:199)
>   at 
> org.apache.ambari.server.actionmanager.Request$$FastClassByGuice$$9071e03.newInstance()
>   at 
> com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
>   at 
> com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:60)
>   at 
> com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
>   at 
> com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
>   at 
> com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)
>   at 
> com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
>   at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974)
>   at 
> com.google.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:632)
>   at com.sun.proxy.$Proxy26.createExisting(Unknown Source)
>   at 
> 

Re: Review Request 58109: Large Long Running Requests Can Slow Down the ActionScheduler

2017-03-31 Thread Jonathan Hurley

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58109/
---

(Updated March 31, 2017, 9:16 p.m.)


Review request for Ambari, Alejandro Fernandez, Nate Cole, and Robert Levas.


Bugs: AMBARI-20646
https://issues.apache.org/jira/browse/AMBARI-20646


Repository: ambari


Description
---

When creating a massive request (a rolling upgrade on a cluster with 1000 
nodes), the size of the request seems to slow down the {{ActionScheduler}}. 
Each command was taking between 1 to 2 minutes to run (even server-side tasks). 

The cause of this can be seen in the following two stack traces:

{code:title=ActionSchedulerImpl}
at 
org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
at org.apache.ambari.server.actionmanager.Stage.(Stage.java:157)
at 
org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
at 
org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getStagesInProgress(ActionDBAccessorImpl.java:303)
at 
org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:341)
at 
org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:302)
at java.lang.Thread.run(Thread.java:745)
{code}

{code:title=Server Action Executor}
at 
org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:700)
at 
org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
at org.apache.ambari.server.actionmanager.Stage.(Stage.java:157)
at 
org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
at 
org.apache.ambari.server.actionmanager.Request.(Request.java:199)
at 
org.apache.ambari.server.actionmanager.Request$$FastClassByGuice$$9071e03.newInstance()
at 
com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
at 
com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:60)
at 
com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
at 
com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
at 
com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)
at 
com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974)
at 
com.google.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:632)
at com.sun.proxy.$Proxy26.createExisting(Unknown Source)
at 
org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getRequests(ActionDBAccessorImpl.java:784)
at 
org.apache.ambari.server.serveraction.ServerActionExecutor.cleanRequestShareDataContexts(ServerActionExecutor.java:259)
- locked <0x7ff0a14083c8> (a java.util.HashMap)
at 
org.apache.ambari.server.serveraction.ServerActionExecutor.doWork(ServerActionExecutor.java:454)
at 
org.apache.ambari.server.serveraction.ServerActionExecutor$1.run(ServerActionExecutor.java:160)
at java.lang.Thread.run(Thread.java:745)
{code}

It's clear from these stacks that every {{PENDING}} stage (roughly 15,000) were 
being loaded into memory every second (and their accompanying task as well). 
This makes no sense as these methods don't need all stages - just the _next_ 
stage. This is because all stages are synchronous within a single request.

The proposed solution is to fix the {{StageEntity.findByCommandStatuses}} call 
so it doesn't return every stage:
{code}
SELECT stage.requestid, 
   MIN(stage.stageid) 
FROM   stageentity stage, 
   hostrolecommandentity hrc 
WHERE  hrc.status IN :statuses 
   AND hrc.stageid = stage.stageid 
   AND hrc.requestid = stage.requestid 
GROUP  BY stage.requestid 
{code}


Diffs
-

  
ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessor.java
 9325d03 
  
ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java
 ab4feaa 
  
ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java
 0984c5c 
  ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java 
5151fb3 
  
ambari-server/src/main/java/org/apache/ambari/server/orm/entities/StageEntity.java
 f68338f 
  
ambari-server/src/main/java/org/apache/ambari/server/serveraction/ServerActionExecutor.java
 b0be6b3 
  
ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionDBAccessorImpl.java
 81eef3b 
  

Re: Review Request 58109: Large Long Running Requests Can Slow Down the ActionScheduler

2017-03-31 Thread Jonathan Hurley


> On March 31, 2017, 5:07 p.m., Robert Levas wrote:
> > ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java
> > Lines 184-185 (original), 153-154 (patched)
> > 
> >
> > By using a more comple query, could we avoid making multiple calls the 
> > the DB to get the stage entities?
> > 
> > The following (non-JPA) query should do the trick once properly 
> > formatted for JPA. However, I am not sure if all DBs would support it.  
> > Apparenly PostgreSQL does, according to my test, and I know ath MySQL does. 
> >  I am not sure about other databases able to be used with Ambari. 
> > 
> > ```
> > SELECT *
> > FROM stage s 
> > INNER JOIN (
> >   SELECT s.request_id, MIN(s.stage_id) AS stage_id 
> >   FROM stage s 
> >   INNER JOIN host_role_command hrc ON (hrc.stage_id = s.stage_id AND 
> > hrc.request_id = s.request_id) 
> >   WHERE hrc.status  IN ('COMPLETED')
> >   GROUP BY s.request_id 
> >   ORDER BY s.request_id
> > ) AS foo ON (s.request_id = foo.request_id and s.stage_id = 
> > foo.stage_id); 
> > ```

This doesn't call into the database multiple times. The 2nd hit is a cache-only 
lookup. I think when I was researching how to do this, that query had problems 
on some databases... Namely; how do you get the entity from it when the 
request_id is in the returned results.


- Jonathan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58109/#review170773
---


On March 31, 2017, 3:02 p.m., Jonathan Hurley wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58109/
> ---
> 
> (Updated March 31, 2017, 3:02 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Nate Cole, and Robert Levas.
> 
> 
> Bugs: BUG-20646
> https://issues.apache.org/jira/browse/BUG-20646
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> When creating a massive request (a rolling upgrade on a cluster with 1000 
> nodes), the size of the request seems to slow down the {{ActionScheduler}}. 
> Each command was taking between 1 to 2 minutes to run (even server-side 
> tasks). 
> 
> The cause of this can be seen in the following two stack traces:
> 
> {code:title=ActionSchedulerImpl}
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
>   at org.apache.ambari.server.actionmanager.Stage.(Stage.java:157)
>   at 
> org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getStagesInProgress(ActionDBAccessorImpl.java:303)
>   at 
> org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:341)
>   at 
> org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:302)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> 
> {code:title=Server Action Executor}
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:700)
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
>   at org.apache.ambari.server.actionmanager.Stage.(Stage.java:157)
>   at 
> org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
>   at 
> org.apache.ambari.server.actionmanager.Request.(Request.java:199)
>   at 
> org.apache.ambari.server.actionmanager.Request$$FastClassByGuice$$9071e03.newInstance()
>   at 
> com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
>   at 
> com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:60)
>   at 
> com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
>   at 
> com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
>   at 
> com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)
>   at 
> com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
>   at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974)
>   at 
> com.google.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:632)
>   at com.sun.proxy.$Proxy26.createExisting(Unknown Source)
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getRequests(ActionDBAccessorImpl.java:784)
>   at 
> 

Re: Review Request 58109: Large Long Running Requests Can Slow Down the ActionScheduler

2017-03-31 Thread Robert Levas

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58109/#review170773
---


Ship it!




Ship It!


ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java
Lines 184-185 (original), 153-154 (patched)


By using a more comple query, could we avoid making multiple calls the the 
DB to get the stage entities?

The following (non-JPA) query should do the trick once properly formatted 
for JPA. However, I am not sure if all DBs would support it.  Apparenly 
PostgreSQL does, according to my test, and I know ath MySQL does.  I am not 
sure about other databases able to be used with Ambari. 

```
SELECT *
FROM stage s 
INNER JOIN (
  SELECT s.request_id, MIN(s.stage_id) AS stage_id 
  FROM stage s 
  INNER JOIN host_role_command hrc ON (hrc.stage_id = s.stage_id AND 
hrc.request_id = s.request_id) 
  WHERE hrc.status  IN ('COMPLETED')
  GROUP BY s.request_id 
  ORDER BY s.request_id
) AS foo ON (s.request_id = foo.request_id and s.stage_id = foo.stage_id); 
```


- Robert Levas


On March 31, 2017, 3:02 p.m., Jonathan Hurley wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58109/
> ---
> 
> (Updated March 31, 2017, 3:02 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Nate Cole, and Robert Levas.
> 
> 
> Bugs: BUG-20646
> https://issues.apache.org/jira/browse/BUG-20646
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> When creating a massive request (a rolling upgrade on a cluster with 1000 
> nodes), the size of the request seems to slow down the {{ActionScheduler}}. 
> Each command was taking between 1 to 2 minutes to run (even server-side 
> tasks). 
> 
> The cause of this can be seen in the following two stack traces:
> 
> {code:title=ActionSchedulerImpl}
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
>   at org.apache.ambari.server.actionmanager.Stage.(Stage.java:157)
>   at 
> org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getStagesInProgress(ActionDBAccessorImpl.java:303)
>   at 
> org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:341)
>   at 
> org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:302)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> 
> {code:title=Server Action Executor}
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:700)
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
>   at org.apache.ambari.server.actionmanager.Stage.(Stage.java:157)
>   at 
> org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
>   at 
> org.apache.ambari.server.actionmanager.Request.(Request.java:199)
>   at 
> org.apache.ambari.server.actionmanager.Request$$FastClassByGuice$$9071e03.newInstance()
>   at 
> com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
>   at 
> com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:60)
>   at 
> com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
>   at 
> com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
>   at 
> com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)
>   at 
> com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
>   at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974)
>   at 
> com.google.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:632)
>   at com.sun.proxy.$Proxy26.createExisting(Unknown Source)
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getRequests(ActionDBAccessorImpl.java:784)
>   at 
> org.apache.ambari.server.serveraction.ServerActionExecutor.cleanRequestShareDataContexts(ServerActionExecutor.java:259)
>   - locked <0x7ff0a14083c8> (a java.util.HashMap)
>   at 
> org.apache.ambari.server.serveraction.ServerActionExecutor.doWork(ServerActionExecutor.java:454)
>   at 
> org.apache.ambari.server.serveraction.ServerActionExecutor$1.run(ServerActionExecutor.java:160)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> 
> It's clear from these stacks that every 

Re: Review Request 58109: Large Long Running Requests Can Slow Down the ActionScheduler

2017-03-31 Thread Jonathan Hurley

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58109/
---

(Updated March 31, 2017, 3:02 p.m.)


Review request for Ambari, Alejandro Fernandez, Nate Cole, and Robert Levas.


Changes
---

Fixing a minor issue with scheduling the cache timer


Bugs: BUG-20646
https://issues.apache.org/jira/browse/BUG-20646


Repository: ambari


Description
---

When creating a massive request (a rolling upgrade on a cluster with 1000 
nodes), the size of the request seems to slow down the {{ActionScheduler}}. 
Each command was taking between 1 to 2 minutes to run (even server-side tasks). 

The cause of this can be seen in the following two stack traces:

{code:title=ActionSchedulerImpl}
at 
org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
at org.apache.ambari.server.actionmanager.Stage.(Stage.java:157)
at 
org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
at 
org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getStagesInProgress(ActionDBAccessorImpl.java:303)
at 
org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:341)
at 
org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:302)
at java.lang.Thread.run(Thread.java:745)
{code}

{code:title=Server Action Executor}
at 
org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:700)
at 
org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
at org.apache.ambari.server.actionmanager.Stage.(Stage.java:157)
at 
org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
at 
org.apache.ambari.server.actionmanager.Request.(Request.java:199)
at 
org.apache.ambari.server.actionmanager.Request$$FastClassByGuice$$9071e03.newInstance()
at 
com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
at 
com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:60)
at 
com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
at 
com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
at 
com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)
at 
com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974)
at 
com.google.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:632)
at com.sun.proxy.$Proxy26.createExisting(Unknown Source)
at 
org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getRequests(ActionDBAccessorImpl.java:784)
at 
org.apache.ambari.server.serveraction.ServerActionExecutor.cleanRequestShareDataContexts(ServerActionExecutor.java:259)
- locked <0x7ff0a14083c8> (a java.util.HashMap)
at 
org.apache.ambari.server.serveraction.ServerActionExecutor.doWork(ServerActionExecutor.java:454)
at 
org.apache.ambari.server.serveraction.ServerActionExecutor$1.run(ServerActionExecutor.java:160)
at java.lang.Thread.run(Thread.java:745)
{code}

It's clear from these stacks that every {{PENDING}} stage (roughly 15,000) were 
being loaded into memory every second (and their accompanying task as well). 
This makes no sense as these methods don't need all stages - just the _next_ 
stage. This is because all stages are synchronous within a single request.

The proposed solution is to fix the {{StageEntity.findByCommandStatuses}} call 
so it doesn't return every stage:
{code}
SELECT stage.requestid, 
   MIN(stage.stageid) 
FROM   stageentity stage, 
   hostrolecommandentity hrc 
WHERE  hrc.status IN :statuses 
   AND hrc.stageid = stage.stageid 
   AND hrc.requestid = stage.requestid 
GROUP  BY stage.requestid 
{code}


Diffs (updated)
-

  
ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessor.java
 9325d03 
  
ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java
 ab4feaa 
  
ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java
 0984c5c 
  ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java 
5151fb3 
  
ambari-server/src/main/java/org/apache/ambari/server/orm/entities/StageEntity.java
 f68338f 
  
ambari-server/src/main/java/org/apache/ambari/server/serveraction/ServerActionExecutor.java
 b0be6b3 
  
ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionDBAccessorImpl.java
 

Re: Review Request 58109: Large Long Running Requests Can Slow Down the ActionScheduler

2017-03-31 Thread Alejandro Fernandez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58109/#review170746
---


Ship it!




Ship It!

- Alejandro Fernandez


On March 31, 2017, 5:22 p.m., Jonathan Hurley wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58109/
> ---
> 
> (Updated March 31, 2017, 5:22 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Nate Cole, and Robert Levas.
> 
> 
> Bugs: BUG-20646
> https://issues.apache.org/jira/browse/BUG-20646
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> When creating a massive request (a rolling upgrade on a cluster with 1000 
> nodes), the size of the request seems to slow down the {{ActionScheduler}}. 
> Each command was taking between 1 to 2 minutes to run (even server-side 
> tasks). 
> 
> The cause of this can be seen in the following two stack traces:
> 
> {code:title=ActionSchedulerImpl}
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
>   at org.apache.ambari.server.actionmanager.Stage.(Stage.java:157)
>   at 
> org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getStagesInProgress(ActionDBAccessorImpl.java:303)
>   at 
> org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:341)
>   at 
> org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:302)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> 
> {code:title=Server Action Executor}
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:700)
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
>   at org.apache.ambari.server.actionmanager.Stage.(Stage.java:157)
>   at 
> org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
>   at 
> org.apache.ambari.server.actionmanager.Request.(Request.java:199)
>   at 
> org.apache.ambari.server.actionmanager.Request$$FastClassByGuice$$9071e03.newInstance()
>   at 
> com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
>   at 
> com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:60)
>   at 
> com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
>   at 
> com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
>   at 
> com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)
>   at 
> com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
>   at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974)
>   at 
> com.google.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:632)
>   at com.sun.proxy.$Proxy26.createExisting(Unknown Source)
>   at 
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getRequests(ActionDBAccessorImpl.java:784)
>   at 
> org.apache.ambari.server.serveraction.ServerActionExecutor.cleanRequestShareDataContexts(ServerActionExecutor.java:259)
>   - locked <0x7ff0a14083c8> (a java.util.HashMap)
>   at 
> org.apache.ambari.server.serveraction.ServerActionExecutor.doWork(ServerActionExecutor.java:454)
>   at 
> org.apache.ambari.server.serveraction.ServerActionExecutor$1.run(ServerActionExecutor.java:160)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> 
> It's clear from these stacks that every {{PENDING}} stage (roughly 15,000) 
> were being loaded into memory every second (and their accompanying task as 
> well). This makes no sense as these methods don't need all stages - just the 
> _next_ stage. This is because all stages are synchronous within a single 
> request.
> 
> The proposed solution is to fix the {{StageEntity.findByCommandStatuses}} 
> call so it doesn't return every stage:
> {code}
> SELECT stage.requestid, 
>MIN(stage.stageid) 
> FROM   stageentity stage, 
>hostrolecommandentity hrc 
> WHERE  hrc.status IN :statuses 
>AND hrc.stageid = stage.stageid 
>AND hrc.requestid = stage.requestid 
> GROUP  BY stage.requestid 
> {code}
> 
> 
> Diffs
> -
> 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessor.java
>  9325d03 
>   
> ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java
>  ab4feaa 
>   
> 

Re: Review Request 58109: Large Long Running Requests Can Slow Down the ActionScheduler

2017-03-31 Thread Jonathan Hurley

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58109/
---

(Updated March 31, 2017, 1:22 p.m.)


Review request for Ambari, Alejandro Fernandez, Nate Cole, and Robert Levas.


Bugs: BUG-20646
https://issues.apache.org/jira/browse/BUG-20646


Repository: ambari


Description
---

When creating a massive request (a rolling upgrade on a cluster with 1000 
nodes), the size of the request seems to slow down the {{ActionScheduler}}. 
Each command was taking between 1 to 2 minutes to run (even server-side tasks). 

The cause of this can be seen in the following two stack traces:

{code:title=ActionSchedulerImpl}
at 
org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
at org.apache.ambari.server.actionmanager.Stage.(Stage.java:157)
at 
org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
at 
org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getStagesInProgress(ActionDBAccessorImpl.java:303)
at 
org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:341)
at 
org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:302)
at java.lang.Thread.run(Thread.java:745)
{code}

{code:title=Server Action Executor}
at 
org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:700)
at 
org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
at org.apache.ambari.server.actionmanager.Stage.(Stage.java:157)
at 
org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
at 
org.apache.ambari.server.actionmanager.Request.(Request.java:199)
at 
org.apache.ambari.server.actionmanager.Request$$FastClassByGuice$$9071e03.newInstance()
at 
com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
at 
com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:60)
at 
com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
at 
com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
at 
com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)
at 
com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974)
at 
com.google.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:632)
at com.sun.proxy.$Proxy26.createExisting(Unknown Source)
at 
org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getRequests(ActionDBAccessorImpl.java:784)
at 
org.apache.ambari.server.serveraction.ServerActionExecutor.cleanRequestShareDataContexts(ServerActionExecutor.java:259)
- locked <0x7ff0a14083c8> (a java.util.HashMap)
at 
org.apache.ambari.server.serveraction.ServerActionExecutor.doWork(ServerActionExecutor.java:454)
at 
org.apache.ambari.server.serveraction.ServerActionExecutor$1.run(ServerActionExecutor.java:160)
at java.lang.Thread.run(Thread.java:745)
{code}

It's clear from these stacks that every {{PENDING}} stage (roughly 15,000) were 
being loaded into memory every second (and their accompanying task as well). 
This makes no sense as these methods don't need all stages - just the _next_ 
stage. This is because all stages are synchronous within a single request.

The proposed solution is to fix the {{StageEntity.findByCommandStatuses}} call 
so it doesn't return every stage:
{code}
SELECT stage.requestid, 
   MIN(stage.stageid) 
FROM   stageentity stage, 
   hostrolecommandentity hrc 
WHERE  hrc.status IN :statuses 
   AND hrc.stageid = stage.stageid 
   AND hrc.requestid = stage.requestid 
GROUP  BY stage.requestid 
{code}


Diffs
-

  
ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessor.java
 9325d03 
  
ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java
 ab4feaa 
  
ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java
 0984c5c 
  ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java 
5151fb3 
  
ambari-server/src/main/java/org/apache/ambari/server/orm/entities/StageEntity.java
 f68338f 
  
ambari-server/src/main/java/org/apache/ambari/server/serveraction/ServerActionExecutor.java
 b0be6b3 
  
ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionDBAccessorImpl.java
 81eef3b