[jira] [Commented] (YARN-9941) Opportunistic scheduler metrics should be reset during fail-over.

2020-07-20 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161710#comment-17161710
 ] 

Abhishek Modi commented on YARN-9941:
-

Sure [~BilwaST]. Feel free to take over. Thanks

> Opportunistic scheduler metrics should be reset during fail-over.
> -
>
> Key: YARN-9941
> URL: https://issues.apache.org/jira/browse/YARN-9941
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8859) Add audit logs for router service

2020-01-17 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi reassigned YARN-8859:
---

Assignee: Minni Mittal  (was: Abhishek Modi)

> Add audit logs for router service
> -
>
> Key: YARN-8859
> URL: https://issues.apache.org/jira/browse/YARN-8859
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: router
>Reporter: Bibin Chundatt
>Assignee: Minni Mittal
>Priority: Major
>
> Similar to all other yarn services. 
> RouterClientRMService and RouterWebServices api/rest call should have Audit 
> logging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8529) Add timeout to RouterWebServiceUtil#invokeRMWebService

2020-01-10 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi reassigned YARN-8529:
---

Assignee: Minni Mittal  (was: Giovanni Matteo Fumarola)

> Add timeout to RouterWebServiceUtil#invokeRMWebService
> --
>
> Key: YARN-8529
> URL: https://issues.apache.org/jira/browse/YARN-8529
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-8529.v1.patch, YARN-8529.v2.patch, 
> YARN-8529.v3.patch
>
>
> {{RouterWebServiceUtil#invokeRMWebService}} currently has a fixed timeout. 
> This should be configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7898) [FederationStateStore] Create a proxy chain for FederationStateStore API in the Router

2020-01-10 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi reassigned YARN-7898:
---

Assignee: Minni Mittal  (was: Giovanni Matteo Fumarola)

> [FederationStateStore] Create a proxy chain for FederationStateStore API in 
> the Router
> --
>
> Key: YARN-7898
> URL: https://issues.apache.org/jira/browse/YARN-7898
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Minni Mittal
>Priority: Major
> Attachments: StateStoreProxy StressTest.jpg, 
> YARN-7898-YARN-7402.proto.patch, YARN-7898-YARN-7402.v1.patch, 
> YARN-7898-YARN-7402.v2.patch, YARN-7898-YARN-7402.v3.patch, 
> YARN-7898-YARN-7402.v4.patch, YARN-7898-YARN-7402.v5.patch, 
> YARN-7898-YARN-7402.v6.patch
>
>
> As detailed in the proposal in the umbrella JIRA, we are introducing a new 
> component that routes client request to appropriate FederationStateStore. 
> This JIRA tracks the creation of a proxy for FederationStateStore in the 
> Router.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10080) Support show app id on localizer thread pool

2020-01-08 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011518#comment-17011518
 ] 

Abhishek Modi commented on YARN-10080:
--

Thanks [~cane] for working on this. Changes looks good to me. I will wait for 
jenkins result.

> Support show app id on localizer thread pool
> 
>
> Key: YARN-10080
> URL: https://issues.apache.org/jira/browse/YARN-10080
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-10080-001.patch
>
>
> Currently when we are troubleshooting a container localizer issue, if we want 
> to analyze the jstack with thread detail, we can not figure out which thread 
> is processing the given container. So i want to add app id on the thread name



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5542) Scheduling of opportunistic containers

2020-01-08 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011357#comment-17011357
 ] 

Abhishek Modi commented on YARN-5542:
-

[~brahmareddy] [~kkaranasos] - I have moved remaining open jiras to YARN-10079. 
Should we close this Jira now as all the sub tasks under it are completed?

> Scheduling of opportunistic containers
> --
>
> Key: YARN-5542
> URL: https://issues.apache.org/jira/browse/YARN-5542
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Konstantinos Karanasos
>Priority: Major
>
> This JIRA groups all efforts related to the scheduling of opportunistic 
> containers. 
> It includes the scheduling of opportunistic container through the central RM 
> (YARN-5220), through distributed scheduling (YARN-2877), as well as the 
> scheduling of containers based on actual node utilization (YARN-1011) and the 
> container promotion/demotion (YARN-5085).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2886) Estimating waiting time in NM container queues

2020-01-08 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-2886:

Parent Issue: YARN-10079  (was: YARN-5542)

> Estimating waiting time in NM container queues
> --
>
> Key: YARN-2886
> URL: https://issues.apache.org/jira/browse/YARN-2886
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
>Priority: Major
>
> This JIRA is about estimating the waiting time of each NM queue.
> Having these estimates is crucial for the distributed scheduling of container 
> requests, as it allows the LocalRM to decide in which NMs to queue the 
> queuable container requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7604) Fix some minor typos in the opportunistic container logging

2020-01-08 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-7604:

Parent Issue: YARN-10079  (was: YARN-5542)

> Fix some minor typos in the opportunistic container logging
> ---
>
> Key: YARN-7604
> URL: https://issues.apache.org/jira/browse/YARN-7604
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.9.0
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Trivial
> Attachments: YARN-7604.01.patch
>
>
> Fix some minor text issues. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5414) Integrate NodeQueueLoadMonitor with ClusterNodeTracker

2020-01-08 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-5414:

Parent Issue: YARN-10079  (was: YARN-5542)

> Integrate NodeQueueLoadMonitor with ClusterNodeTracker
> --
>
> Key: YARN-5414
> URL: https://issues.apache.org/jira/browse/YARN-5414
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: container-queuing, distributed-scheduling, scheduler
>Reporter: Arun Suresh
>Assignee: Abhishek Modi
>Priority: Major
>
> The {{ClusterNodeTracker}} tracks the states of clusterNodes and provides 
> convenience methods like sort and filter.
> The {{NodeQueueLoadMonitor}} should use the {{ClusterNodeTracker}} instead of 
> maintaining its own data-structure of node information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5688) Make allocation of opportunistic containers asynchronous

2020-01-08 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-5688:

Parent Issue: YARN-10079  (was: YARN-5542)

> Make allocation of opportunistic containers asynchronous
> 
>
> Key: YARN-5688
> URL: https://issues.apache.org/jira/browse/YARN-5688
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Abhishek Modi
>Priority: Major
>
> In the current implementation of the 
> {{OpportunisticContainerAllocatorAMService}}, we synchronously perform the 
> allocation of opportunistic containers. This results in "blocking" the 
> service at the RM when scheduling the opportunistic containers.
> The {{OpportunisticContainerAllocator}} should instead asynchronously run as 
> a separate thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9941) Opportunistic scheduler metrics should be reset during fail-over.

2020-01-08 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9941:

Parent Issue: YARN-10079  (was: YARN-5542)

> Opportunistic scheduler metrics should be reset during fail-over.
> -
>
> Key: YARN-9941
> URL: https://issues.apache.org/jira/browse/YARN-9941
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9390) Add support for configurable Resource Calculator in Opportunistic Scheduler.

2020-01-08 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9390:

Parent Issue: YARN-10079  (was: YARN-5542)

> Add support for configurable Resource Calculator in Opportunistic Scheduler.
> 
>
> Key: YARN-9390
> URL: https://issues.apache.org/jira/browse/YARN-9390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9390.001.patch
>
>
> Right now, Opportunistic scheduler uses hard coded DominantResourceCalculator 
> and there is no option to change it to other resource calculators. This Jira 
> is to make resource calculator configurable for Opportunistic scheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10079) Scheduling of opportunistic containers - Phase 2

2020-01-08 Thread Abhishek Modi (Jira)
Abhishek Modi created YARN-10079:


 Summary: Scheduling of opportunistic containers - Phase 2
 Key: YARN-10079
 URL: https://issues.apache.org/jira/browse/YARN-10079
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Abhishek Modi


This JIRA groups all efforts related to the improvements of scheduling of 
opportunistic containers.

Phase 1 of this was done as part of YARN-5542.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5542) Scheduling of opportunistic containers

2020-01-08 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011350#comment-17011350
 ] 

Abhishek Modi commented on YARN-5542:
-

[~kkaranasos] [~brahmareddy] I will move remaining open jiras to new Jira and 
then we should be good to close this. We have completed current set of 
improvements.

> Scheduling of opportunistic containers
> --
>
> Key: YARN-5542
> URL: https://issues.apache.org/jira/browse/YARN-5542
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Konstantinos Karanasos
>Priority: Major
>
> This JIRA groups all efforts related to the scheduling of opportunistic 
> containers. 
> It includes the scheduling of opportunistic container through the central RM 
> (YARN-5220), through distributed scheduling (YARN-2877), as well as the 
> scheduling of containers based on actual node utilization (YARN-1011) and the 
> container promotion/demotion (YARN-5085).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10040) DistributedShell test failure on X86 and ARM

2019-12-31 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006070#comment-17006070
 ] 

Abhishek Modi commented on YARN-10040:
--

I looked into it and Opportunistic Scheduling is working fine. I checked 
running distributed shell jobs locally with opportunistic containers and they 
are also getting completed. The issue is 
"yarn.nodemanager.opportunistic-containers-max-queue-length" is not being set 
while bringing up nodemanager in the test and thus no nodemanager is accepting 
the Opportunistic container.

Prior to YARN-9697, NM would accept Opportunistic containers even if max 
opportunistic containers allowed on them is set to 0. That's why these tests 
were passing before. I will provide a patch for fixing these tests.

 

> DistributedShell test failure on X86 and ARM
> 
>
> Key: YARN-10040
> URL: https://issues.apache.org/jira/browse/YARN-10040
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
> Environment: X86/ARM
> OS: ubuntu1804
> Java 8
>Reporter: zhao bo
>Assignee: Abhishek Modi
>Priority: Major
>
> * 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers
>  * 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithEnforceExecutionType
> Please see the Apache Jenkins Test result:
> [https://builds.apache.org/job/hadoop-multibranch/job/PR-1767/1/testReport/]
>  
> These 2 tests are failed on both X86 and ARM platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10040) DistributedShell test failure on X86 and ARM

2019-12-31 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi reassigned YARN-10040:


Assignee: Abhishek Modi

> DistributedShell test failure on X86 and ARM
> 
>
> Key: YARN-10040
> URL: https://issues.apache.org/jira/browse/YARN-10040
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
> Environment: X86/ARM
> OS: ubuntu1804
> Java 8
>Reporter: zhao bo
>Assignee: Abhishek Modi
>Priority: Major
>
> * 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers
>  * 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithEnforceExecutionType
> Please see the Apache Jenkins Test result:
> [https://builds.apache.org/job/hadoop-multibranch/job/PR-1767/1/testReport/]
>  
> These 2 tests are failed on both X86 and ARM platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10040) DistributedShell test failure on X86 and ARM

2019-12-30 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17005953#comment-17005953
 ] 

Abhishek Modi commented on YARN-10040:
--

Thanks [~ayushtkn] [~bzhaoopenstack]. I will take a look.

> DistributedShell test failure on X86 and ARM
> 
>
> Key: YARN-10040
> URL: https://issues.apache.org/jira/browse/YARN-10040
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
> Environment: X86/ARM
> OS: ubuntu1804
> Java 8
>Reporter: zhao bo
>Priority: Major
>
> * 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers
>  * 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithEnforceExecutionType
> Please see the Apache Jenkins Test result:
> [https://builds.apache.org/job/hadoop-multibranch/job/PR-1767/1/testReport/]
>  
> These 2 tests are failed on both X86 and ARM platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9990) Testcase fails with "Insufficient configured threads: required=16 < max=10"

2019-11-28 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984753#comment-16984753
 ] 

Abhishek Modi commented on YARN-9990:
-

Committed to trunk. Thanks [~prabhujoseph] for the patch.

> Testcase fails with "Insufficient configured threads: required=16 < max=10"
> ---
>
> Key: YARN-9990
> URL: https://issues.apache.org/jira/browse/YARN-9990
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9990-001.patch
>
>
> Testcase fails with "Insufficient configured threads: required=16 < max=10". 
> Below testcases failing 
> 1. TestWebAppProxyServlet
> 2. TestAmFilter
> 3. TestApiServiceClient
> 4. TestSecureApiServiceClient
> {code}
> [ERROR] org.apache.hadoop.yarn.server.webproxy.TestWebAppProxyServlet  Time 
> elapsed: 0.396 s  <<< ERROR!
> java.lang.IllegalStateException: Insufficient configured threads: required=16 
> < max=10 for 
> QueuedThreadPool[qtp1597249648]@5f341870{STARTED,8<=8<=10,i=8,r=1,q=0}[ReservedThreadExecutor@4c762604{s=0/1,p=0}]
>   at 
> org.eclipse.jetty.util.thread.ThreadPoolBudget.check(ThreadPoolBudget.java:156)
>   at 
> org.eclipse.jetty.util.thread.ThreadPoolBudget.leaseTo(ThreadPoolBudget.java:130)
>   at 
> org.eclipse.jetty.util.thread.ThreadPoolBudget.leaseFrom(ThreadPoolBudget.java:182)
>   at 
> org.eclipse.jetty.io.SelectorManager.doStart(SelectorManager.java:255)
>   at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
>   at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169)
>   at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:110)
>   at 
> org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:283)
>   at 
> org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:81)
>   at 
> org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:231)
>   at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
>   at org.eclipse.jetty.server.Server.doStart(Server.java:385)
>   at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
>   at 
> org.apache.hadoop.yarn.server.webproxy.TestWebAppProxyServlet.start(TestWebAppProxyServlet.java:102)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> [INFO] Running org.apache.hadoop.yarn.server.webproxy.amfilter.TestAmFilter
> [ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.326 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.webproxy.amfilter.TestAmFilter
> [ERROR] 
> testFindRedirectUrl(org.apache.hadoop.yarn.server.webproxy.amfilter.TestAmFilter)
>   Time elapsed: 0.306 s  <<< ERROR!
> java.lang.IllegalStateException: Insufficient configured threads: required=16 
> < max=10 for 
> QueuedThreadPool[qtp485041780]@1ce92674{STARTED,8<=8<=10,i=8,r=1,q=0}[ReservedThreadExecutor@31f924f5{s=0/1,p=0}]
>   at 
> 

[jira] [Commented] (YARN-9990) Testcase fails with "Insufficient configured threads: required=16 < max=10"

2019-11-27 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983526#comment-16983526
 ] 

Abhishek Modi commented on YARN-9990:
-

Thanks [~prabhujoseph] for the patch. LGTM. Will commit it shortly.

> Testcase fails with "Insufficient configured threads: required=16 < max=10"
> ---
>
> Key: YARN-9990
> URL: https://issues.apache.org/jira/browse/YARN-9990
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9990-001.patch
>
>
> Testcase fails with "Insufficient configured threads: required=16 < max=10". 
> Below testcases failing 
> 1. TestWebAppProxyServlet
> 2. TestAmFilter
> 3. TestApiServiceClient
> 4. TestSecureApiServiceClient
> {code}
> [ERROR] org.apache.hadoop.yarn.server.webproxy.TestWebAppProxyServlet  Time 
> elapsed: 0.396 s  <<< ERROR!
> java.lang.IllegalStateException: Insufficient configured threads: required=16 
> < max=10 for 
> QueuedThreadPool[qtp1597249648]@5f341870{STARTED,8<=8<=10,i=8,r=1,q=0}[ReservedThreadExecutor@4c762604{s=0/1,p=0}]
>   at 
> org.eclipse.jetty.util.thread.ThreadPoolBudget.check(ThreadPoolBudget.java:156)
>   at 
> org.eclipse.jetty.util.thread.ThreadPoolBudget.leaseTo(ThreadPoolBudget.java:130)
>   at 
> org.eclipse.jetty.util.thread.ThreadPoolBudget.leaseFrom(ThreadPoolBudget.java:182)
>   at 
> org.eclipse.jetty.io.SelectorManager.doStart(SelectorManager.java:255)
>   at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
>   at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169)
>   at 
> org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:110)
>   at 
> org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:283)
>   at 
> org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:81)
>   at 
> org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:231)
>   at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
>   at org.eclipse.jetty.server.Server.doStart(Server.java:385)
>   at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
>   at 
> org.apache.hadoop.yarn.server.webproxy.TestWebAppProxyServlet.start(TestWebAppProxyServlet.java:102)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> [INFO] Running org.apache.hadoop.yarn.server.webproxy.amfilter.TestAmFilter
> [ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.326 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.webproxy.amfilter.TestAmFilter
> [ERROR] 
> testFindRedirectUrl(org.apache.hadoop.yarn.server.webproxy.amfilter.TestAmFilter)
>   Time elapsed: 0.306 s  <<< ERROR!
> java.lang.IllegalStateException: Insufficient configured threads: required=16 
> < max=10 for 
> QueuedThreadPool[qtp485041780]@1ce92674{STARTED,8<=8<=10,i=8,r=1,q=0}[ReservedThreadExecutor@31f924f5{s=0/1,p=0}]
>   at 
> 

[jira] [Commented] (YARN-9965) Fix NodeManager failing to start on subsequent times when Hdfs Auxillary Jar is set

2019-11-18 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977073#comment-16977073
 ] 

Abhishek Modi commented on YARN-9965:
-

Thanks [~prabhujoseph]. Latest addendum patch looks good to me. Committed it to 
trunk.

> Fix NodeManager failing to start on subsequent times when Hdfs Auxillary Jar 
> is set
> ---
>
> Key: YARN-9965
> URL: https://issues.apache.org/jira/browse/YARN-9965
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: auxservices, nodemanager
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9965-001.patch, YARN-9965-addendum-01.patch
>
>
> Loading an auxiliary jar from a Hdfs location on a node manager works as 
> expected on first time. The subsequent restart fails with 
> ClassNotFoundException
> {code:java}
> 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: 
> classpath: []
> 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: 
> system classes: [java., javax.accessibility., javax.activation., 
> javax.activity., javax.annotation., javax.annotation.processing., 
> javax.crypto., javax.imageio., javax.jws., javax.lang.model., 
> -javax.management.j2ee., javax.management., javax.naming., javax.net., 
> javax.print., javax.rmi., javax.script., -javax.security.auth.message., 
> javax.security.auth., javax.security.cert., javax.security.sasl., 
> javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., 
> -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., 
> org.xml.sax., org.apache.commons.logging., org.apache.log4j., 
> -org.apache.hadoop.hbase., org.apache.hadoop., core-default.xml, 
> hdfs-default.xml, mapred-default.xml, yarn-default.xml]
> 2019-11-08 03:59:49,257 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed 
> in state INITED
> java.lang.ClassNotFoundException: org.apache.auxtest.AuxServiceFromHDFS
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at 
> org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189)
>   at 
> org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:270)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:321)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:478)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:936)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1016)
> {code}
>  
> The issue happens when reusing the previous localized auxillary service jar. 
> The localized jar file is appended with /* when reusing which has caused the 
> issue.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9965) Fix NodeManager failing to start on subsequent times when Hdfs Auxillary Jar is set

2019-11-14 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974033#comment-16974033
 ] 

Abhishek Modi commented on YARN-9965:
-

Thanks [~prabhujoseph] - I will review it by today.

> Fix NodeManager failing to start on subsequent times when Hdfs Auxillary Jar 
> is set
> ---
>
> Key: YARN-9965
> URL: https://issues.apache.org/jira/browse/YARN-9965
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: auxservices, nodemanager
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9965-001.patch, YARN-9965-addendum-01.patch
>
>
> Loading an auxiliary jar from a Hdfs location on a node manager works as 
> expected on first time. The subsequent restart fails with 
> ClassNotFoundException
> {code:java}
> 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: 
> classpath: []
> 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: 
> system classes: [java., javax.accessibility., javax.activation., 
> javax.activity., javax.annotation., javax.annotation.processing., 
> javax.crypto., javax.imageio., javax.jws., javax.lang.model., 
> -javax.management.j2ee., javax.management., javax.naming., javax.net., 
> javax.print., javax.rmi., javax.script., -javax.security.auth.message., 
> javax.security.auth., javax.security.cert., javax.security.sasl., 
> javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., 
> -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., 
> org.xml.sax., org.apache.commons.logging., org.apache.log4j., 
> -org.apache.hadoop.hbase., org.apache.hadoop., core-default.xml, 
> hdfs-default.xml, mapred-default.xml, yarn-default.xml]
> 2019-11-08 03:59:49,257 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed 
> in state INITED
> java.lang.ClassNotFoundException: org.apache.auxtest.AuxServiceFromHDFS
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at 
> org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189)
>   at 
> org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:270)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:321)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:478)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:936)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1016)
> {code}
>  
> The issue happens when reusing the previous localized auxillary service jar. 
> The localized jar file is appended with /* when reusing which has caused the 
> issue.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-11-12 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972296#comment-16972296
 ] 

Abhishek Modi commented on YARN-9697:
-

Thanks [~bibinchundatt] and [~elgoiri] for review. Committed to trunk.

> Efficient allocation of Opportunistic containers.
> -
>
> Key: YARN-9697
> URL: https://issues.apache.org/jira/browse/YARN-9697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9697.001.patch, YARN-9697.002.patch, 
> YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.005.patch, 
> YARN-9697.006.patch, YARN-9697.007.patch, YARN-9697.008.patch, 
> YARN-9697.009.patch, YARN-9697.ut.patch, YARN-9697.ut2.patch, 
> YARN-9697.wip1.patch, YARN-9697.wip2.patch
>
>
> In the current implementation, opportunistic containers are allocated based 
> on the number of queued opportunistic container information received in node 
> heartbeat. This information becomes stale as soon as more opportunistic 
> containers are allocated on that node.
> Allocation of opportunistic containers happens on the same heartbeat in which 
> AM asks for the containers. When multiple applications request for 
> Opportunistic containers, containers might get allocated on the same set of 
> nodes as already allocated containers on the node are not considered while 
> serving requests from different applications. This can lead to uneven 
> allocation of Opportunistic containers across the cluster leading to 
> increased queuing time 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9965) Fix NodeManager failing to start when Hdfs Auxillary Jar is set

2019-11-11 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972086#comment-16972086
 ] 

Abhishek Modi commented on YARN-9965:
-

[~vinodkv] - I think that's a mistake from my end - I should have enforced it 
to write an UT before committing it. Since this was a very minor fix and I also 
tested it - I went ahead with the commit. Should I create a separate Jira for 
writing UT for this - or should we add an addendum patch here only with the UT?

> Fix NodeManager failing to start when Hdfs Auxillary Jar is set
> ---
>
> Key: YARN-9965
> URL: https://issues.apache.org/jira/browse/YARN-9965
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: auxservices, nodemanager
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9965-001.patch
>
>
> Loading an auxiliary jar from a Hdfs location on a node manager works as 
> expected on first time. The subsequent restart fails with 
> ClassNotFoundException
> {code:java}
> 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: 
> classpath: []
> 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: 
> system classes: [java., javax.accessibility., javax.activation., 
> javax.activity., javax.annotation., javax.annotation.processing., 
> javax.crypto., javax.imageio., javax.jws., javax.lang.model., 
> -javax.management.j2ee., javax.management., javax.naming., javax.net., 
> javax.print., javax.rmi., javax.script., -javax.security.auth.message., 
> javax.security.auth., javax.security.cert., javax.security.sasl., 
> javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., 
> -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., 
> org.xml.sax., org.apache.commons.logging., org.apache.log4j., 
> -org.apache.hadoop.hbase., org.apache.hadoop., core-default.xml, 
> hdfs-default.xml, mapred-default.xml, yarn-default.xml]
> 2019-11-08 03:59:49,257 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed 
> in state INITED
> java.lang.ClassNotFoundException: org.apache.auxtest.AuxServiceFromHDFS
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at 
> org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189)
>   at 
> org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:270)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:321)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:478)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:936)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1016)
> {code}
>  
> The issue happens when reusing the previous localized auxillary service jar. 
> The localized jar file is appended with /* when reusing which has caused the 
> issue.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-11-11 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9697:

Attachment: YARN-9697.009.patch

> Efficient allocation of Opportunistic containers.
> -
>
> Key: YARN-9697
> URL: https://issues.apache.org/jira/browse/YARN-9697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9697.001.patch, YARN-9697.002.patch, 
> YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.005.patch, 
> YARN-9697.006.patch, YARN-9697.007.patch, YARN-9697.008.patch, 
> YARN-9697.009.patch, YARN-9697.ut.patch, YARN-9697.ut2.patch, 
> YARN-9697.wip1.patch, YARN-9697.wip2.patch
>
>
> In the current implementation, opportunistic containers are allocated based 
> on the number of queued opportunistic container information received in node 
> heartbeat. This information becomes stale as soon as more opportunistic 
> containers are allocated on that node.
> Allocation of opportunistic containers happens on the same heartbeat in which 
> AM asks for the containers. When multiple applications request for 
> Opportunistic containers, containers might get allocated on the same set of 
> nodes as already allocated containers on the node are not considered while 
> serving requests from different applications. This can lead to uneven 
> allocation of Opportunistic containers across the cluster leading to 
> increased queuing time 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-11-11 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971480#comment-16971480
 ] 

Abhishek Modi commented on YARN-9697:
-

Thanks [~bibinchundatt] for the review.
{code:java}
  private int numNodesForAnyAllocation =
  DEFAULT_OPP_CONTAINER_ALLOCATION_NODES_NUMBER_USED;
{code}
This is being used in another constructor that is being used in the test cases.

Apart from that I have addressed all other comments in Yarn-9697.009.patch.

> Efficient allocation of Opportunistic containers.
> -
>
> Key: YARN-9697
> URL: https://issues.apache.org/jira/browse/YARN-9697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9697.001.patch, YARN-9697.002.patch, 
> YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.005.patch, 
> YARN-9697.006.patch, YARN-9697.007.patch, YARN-9697.008.patch, 
> YARN-9697.009.patch, YARN-9697.ut.patch, YARN-9697.ut2.patch, 
> YARN-9697.wip1.patch, YARN-9697.wip2.patch
>
>
> In the current implementation, opportunistic containers are allocated based 
> on the number of queued opportunistic container information received in node 
> heartbeat. This information becomes stale as soon as more opportunistic 
> containers are allocated on that node.
> Allocation of opportunistic containers happens on the same heartbeat in which 
> AM asks for the containers. When multiple applications request for 
> Opportunistic containers, containers might get allocated on the same set of 
> nodes as already allocated containers on the node are not considered while 
> serving requests from different applications. This can lead to uneven 
> allocation of Opportunistic containers across the cluster leading to 
> increased queuing time 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9965) Fix NodeManager failing to start when Hdfs Auxillary Jar is set

2019-11-10 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971356#comment-16971356
 ] 

Abhishek Modi commented on YARN-9965:
-

Thanks [~prabhujoseph] for working on this. LGTM. I will commit it shortly.

> Fix NodeManager failing to start when Hdfs Auxillary Jar is set
> ---
>
> Key: YARN-9965
> URL: https://issues.apache.org/jira/browse/YARN-9965
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: auxservices, nodemanager
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9965-001.patch
>
>
> Loading an auxiliary jar from a Hdfs location on a node manager works as 
> expected on first time. The subsequent restart fails with 
> ClassNotFoundException
> {code:java}
> 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: 
> classpath: []
> 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: 
> system classes: [java., javax.accessibility., javax.activation., 
> javax.activity., javax.annotation., javax.annotation.processing., 
> javax.crypto., javax.imageio., javax.jws., javax.lang.model., 
> -javax.management.j2ee., javax.management., javax.naming., javax.net., 
> javax.print., javax.rmi., javax.script., -javax.security.auth.message., 
> javax.security.auth., javax.security.cert., javax.security.sasl., 
> javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., 
> -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., 
> org.xml.sax., org.apache.commons.logging., org.apache.log4j., 
> -org.apache.hadoop.hbase., org.apache.hadoop., core-default.xml, 
> hdfs-default.xml, mapred-default.xml, yarn-default.xml]
> 2019-11-08 03:59:49,257 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed 
> in state INITED
> java.lang.ClassNotFoundException: org.apache.auxtest.AuxServiceFromHDFS
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at 
> org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189)
>   at 
> org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:270)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:321)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:478)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:936)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1016)
> {code}
>  
> The issue happens when reusing the previous localized auxillary service jar. 
> The localized jar file is appended with /* when reusing which has caused the 
> issue.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-10-30 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962868#comment-16962868
 ] 

Abhishek Modi commented on YARN-9697:
-

Filed  YARN-9941 for fixing Opportunistic scheduler metrics during fail-over.

> Efficient allocation of Opportunistic containers.
> -
>
> Key: YARN-9697
> URL: https://issues.apache.org/jira/browse/YARN-9697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9697.001.patch, YARN-9697.002.patch, 
> YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.005.patch, 
> YARN-9697.006.patch, YARN-9697.007.patch, YARN-9697.008.patch, 
> YARN-9697.ut.patch, YARN-9697.ut2.patch, YARN-9697.wip1.patch, 
> YARN-9697.wip2.patch
>
>
> In the current implementation, opportunistic containers are allocated based 
> on the number of queued opportunistic container information received in node 
> heartbeat. This information becomes stale as soon as more opportunistic 
> containers are allocated on that node.
> Allocation of opportunistic containers happens on the same heartbeat in which 
> AM asks for the containers. When multiple applications request for 
> Opportunistic containers, containers might get allocated on the same set of 
> nodes as already allocated containers on the node are not considered while 
> serving requests from different applications. This can lead to uneven 
> allocation of Opportunistic containers across the cluster leading to 
> increased queuing time 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9941) Opportunistic scheduler metrics should be reset during fail-over.

2019-10-30 Thread Abhishek Modi (Jira)
Abhishek Modi created YARN-9941:
---

 Summary: Opportunistic scheduler metrics should be reset during 
fail-over.
 Key: YARN-9941
 URL: https://issues.apache.org/jira/browse/YARN-9941
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Abhishek Modi
Assignee: Abhishek Modi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2442) ResourceManager JMX UI does not give HA State

2019-10-29 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961868#comment-16961868
 ] 

Abhishek Modi commented on YARN-2442:
-

Thanks [~cyrusjackson25] for the patch and [~bibinchundatt] for additional 
review. Committed it to trunk.

> ResourceManager JMX UI does not give HA State
> -
>
> Key: YARN-2442
> URL: https://issues.apache.org/jira/browse/YARN-2442
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0, 2.6.0, 2.7.0
>Reporter: Nishan Shetty
>Assignee: Rohith Sharma K S
>Priority: Major
>  Labels: oct16-easy
> Attachments: 0001-YARN-2442.patch, YARN-2442.003.patch, 
> YARN-2442.004.patch, YARN-2442.02.patch
>
>
> ResourceManager JMX UI can show the haState (INITIALIZING, ACTIVE, STANDBY, 
> STOPPED)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-10-22 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957250#comment-16957250
 ] 

Abhishek Modi commented on YARN-9697:
-

Thanks [~bibinchundatt] for the review. I have addressed most of the review 
comments in v8 patch.

For 
{quote}OpportunisticSchedulerMetrics shouldn't we be having a destroy() method 
to reset the counters. During switch over i think we should reset the counters 
{quote}
I will file a separate jira.

> Efficient allocation of Opportunistic containers.
> -
>
> Key: YARN-9697
> URL: https://issues.apache.org/jira/browse/YARN-9697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9697.001.patch, YARN-9697.002.patch, 
> YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.005.patch, 
> YARN-9697.006.patch, YARN-9697.007.patch, YARN-9697.008.patch, 
> YARN-9697.ut.patch, YARN-9697.ut2.patch, YARN-9697.wip1.patch, 
> YARN-9697.wip2.patch
>
>
> In the current implementation, opportunistic containers are allocated based 
> on the number of queued opportunistic container information received in node 
> heartbeat. This information becomes stale as soon as more opportunistic 
> containers are allocated on that node.
> Allocation of opportunistic containers happens on the same heartbeat in which 
> AM asks for the containers. When multiple applications request for 
> Opportunistic containers, containers might get allocated on the same set of 
> nodes as already allocated containers on the node are not considered while 
> serving requests from different applications. This can lead to uneven 
> allocation of Opportunistic containers across the cluster leading to 
> increased queuing time 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-10-22 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9697:

Attachment: YARN-9697.008.patch

> Efficient allocation of Opportunistic containers.
> -
>
> Key: YARN-9697
> URL: https://issues.apache.org/jira/browse/YARN-9697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9697.001.patch, YARN-9697.002.patch, 
> YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.005.patch, 
> YARN-9697.006.patch, YARN-9697.007.patch, YARN-9697.008.patch, 
> YARN-9697.ut.patch, YARN-9697.ut2.patch, YARN-9697.wip1.patch, 
> YARN-9697.wip2.patch
>
>
> In the current implementation, opportunistic containers are allocated based 
> on the number of queued opportunistic container information received in node 
> heartbeat. This information becomes stale as soon as more opportunistic 
> containers are allocated on that node.
> Allocation of opportunistic containers happens on the same heartbeat in which 
> AM asks for the containers. When multiple applications request for 
> Opportunistic containers, containers might get allocated on the same set of 
> nodes as already allocated containers on the node are not considered while 
> serving requests from different applications. This can lead to uneven 
> allocation of Opportunistic containers across the cluster leading to 
> increased queuing time 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9910) Make private localizer download resources in parallel

2019-10-17 Thread Abhishek Modi (Jira)
Abhishek Modi created YARN-9910:
---

 Summary: Make private localizer download resources in parallel
 Key: YARN-9910
 URL: https://issues.apache.org/jira/browse/YARN-9910
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Abhishek Modi
Assignee: Abhishek Modi


Currently private localizer uses a single threaded pool to do the localization. 
As part of this jira, private localizer will create a fixed threadpool of 
configurable number of threads for localization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-10-16 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9697:

Attachment: YARN-9697.007.patch

> Efficient allocation of Opportunistic containers.
> -
>
> Key: YARN-9697
> URL: https://issues.apache.org/jira/browse/YARN-9697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9697.001.patch, YARN-9697.002.patch, 
> YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.005.patch, 
> YARN-9697.006.patch, YARN-9697.007.patch, YARN-9697.ut.patch, 
> YARN-9697.ut2.patch, YARN-9697.wip1.patch, YARN-9697.wip2.patch
>
>
> In the current implementation, opportunistic containers are allocated based 
> on the number of queued opportunistic container information received in node 
> heartbeat. This information becomes stale as soon as more opportunistic 
> containers are allocated on that node.
> Allocation of opportunistic containers happens on the same heartbeat in which 
> AM asks for the containers. When multiple applications request for 
> Opportunistic containers, containers might get allocated on the same set of 
> nodes as already allocated containers on the node are not considered while 
> serving requests from different applications. This can lead to uneven 
> allocation of Opportunistic containers across the cluster leading to 
> increased queuing time 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-10-16 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9697:

Attachment: YARN-9697.006.patch

> Efficient allocation of Opportunistic containers.
> -
>
> Key: YARN-9697
> URL: https://issues.apache.org/jira/browse/YARN-9697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9697.001.patch, YARN-9697.002.patch, 
> YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.005.patch, 
> YARN-9697.006.patch, YARN-9697.ut.patch, YARN-9697.ut2.patch, 
> YARN-9697.wip1.patch, YARN-9697.wip2.patch
>
>
> In the current implementation, opportunistic containers are allocated based 
> on the number of queued opportunistic container information received in node 
> heartbeat. This information becomes stale as soon as more opportunistic 
> containers are allocated on that node.
> Allocation of opportunistic containers happens on the same heartbeat in which 
> AM asks for the containers. When multiple applications request for 
> Opportunistic containers, containers might get allocated on the same set of 
> nodes as already allocated containers on the node are not considered while 
> serving requests from different applications. This can lead to uneven 
> allocation of Opportunistic containers across the cluster leading to 
> increased queuing time 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9908) Make ZK calls in parallel to load application states.

2019-10-16 Thread Abhishek Modi (Jira)
Abhishek Modi created YARN-9908:
---

 Summary: Make ZK calls in parallel to load application states.
 Key: YARN-9908
 URL: https://issues.apache.org/jira/browse/YARN-9908
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Abhishek Modi
Assignee: Abhishek Modi


At present, all the application states are fetched linearly from ZooKeeper. 
This can be optimized by using a threadpool to fetch application states from 
Zookeeper.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-10-16 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9697:

Attachment: YARN-9697.005.patch

> Efficient allocation of Opportunistic containers.
> -
>
> Key: YARN-9697
> URL: https://issues.apache.org/jira/browse/YARN-9697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9697.001.patch, YARN-9697.002.patch, 
> YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.005.patch, 
> YARN-9697.ut.patch, YARN-9697.ut2.patch, YARN-9697.wip1.patch, 
> YARN-9697.wip2.patch
>
>
> In the current implementation, opportunistic containers are allocated based 
> on the number of queued opportunistic container information received in node 
> heartbeat. This information becomes stale as soon as more opportunistic 
> containers are allocated on that node.
> Allocation of opportunistic containers happens on the same heartbeat in which 
> AM asks for the containers. When multiple applications request for 
> Opportunistic containers, containers might get allocated on the same set of 
> nodes as already allocated containers on the node are not considered while 
> serving requests from different applications. This can lead to uneven 
> allocation of Opportunistic containers across the cluster leading to 
> increased queuing time 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-10-04 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944825#comment-16944825
 ] 

Abhishek Modi commented on YARN-9697:
-

Thanks [~elgoiri] for review. I have addressed all the comments except cleanup 
of loop in CentralizedOpportunisticContainerAllocator. I will further look into 
that. Could you please review YARN-9697.004.patch. Thanks.

> Efficient allocation of Opportunistic containers.
> -
>
> Key: YARN-9697
> URL: https://issues.apache.org/jira/browse/YARN-9697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9697.001.patch, YARN-9697.002.patch, 
> YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.ut.patch, 
> YARN-9697.ut2.patch, YARN-9697.wip1.patch, YARN-9697.wip2.patch
>
>
> In the current implementation, opportunistic containers are allocated based 
> on the number of queued opportunistic container information received in node 
> heartbeat. This information becomes stale as soon as more opportunistic 
> containers are allocated on that node.
> Allocation of opportunistic containers happens on the same heartbeat in which 
> AM asks for the containers. When multiple applications request for 
> Opportunistic containers, containers might get allocated on the same set of 
> nodes as already allocated containers on the node are not considered while 
> serving requests from different applications. This can lead to uneven 
> allocation of Opportunistic containers across the cluster leading to 
> increased queuing time 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-10-04 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9697:

Attachment: YARN-9697.004.patch

> Efficient allocation of Opportunistic containers.
> -
>
> Key: YARN-9697
> URL: https://issues.apache.org/jira/browse/YARN-9697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9697.001.patch, YARN-9697.002.patch, 
> YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.ut.patch, 
> YARN-9697.ut2.patch, YARN-9697.wip1.patch, YARN-9697.wip2.patch
>
>
> In the current implementation, opportunistic containers are allocated based 
> on the number of queued opportunistic container information received in node 
> heartbeat. This information becomes stale as soon as more opportunistic 
> containers are allocated on that node.
> Allocation of opportunistic containers happens on the same heartbeat in which 
> AM asks for the containers. When multiple applications request for 
> Opportunistic containers, containers might get allocated on the same set of 
> nodes as already allocated containers on the node are not considered while 
> serving requests from different applications. This can lead to uneven 
> allocation of Opportunistic containers across the cluster leading to 
> increased queuing time 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9782) Avoid DNS resolution while running SLS.

2019-10-04 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944342#comment-16944342
 ] 

Abhishek Modi commented on YARN-9782:
-

Thanks [~elgoiri] for review. Committed to trunk.

> Avoid DNS resolution while running SLS.
> ---
>
> Key: YARN-9782
> URL: https://issues.apache.org/jira/browse/YARN-9782
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9782.001.patch, YARN-9782.002.patch, 
> YARN-9782.003.patch, YARN-9782.004.patch, YARN-9782.005.patch
>
>
> In SLS, we add nodes with random names and rack. DNS resolution of these 
> nodes takes around 2 seconds because it will timeout after that. This makes 
> the result of SLS unreliable and adds spikes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-10-04 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9697:

Attachment: YARN-9697.003.patch

> Efficient allocation of Opportunistic containers.
> -
>
> Key: YARN-9697
> URL: https://issues.apache.org/jira/browse/YARN-9697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9697.001.patch, YARN-9697.002.patch, 
> YARN-9697.003.patch, YARN-9697.ut.patch, YARN-9697.ut2.patch, 
> YARN-9697.wip1.patch, YARN-9697.wip2.patch
>
>
> In the current implementation, opportunistic containers are allocated based 
> on the number of queued opportunistic container information received in node 
> heartbeat. This information becomes stale as soon as more opportunistic 
> containers are allocated on that node.
> Allocation of opportunistic containers happens on the same heartbeat in which 
> AM asks for the containers. When multiple applications request for 
> Opportunistic containers, containers might get allocated on the same set of 
> nodes as already allocated containers on the node are not considered while 
> serving requests from different applications. This can lead to uneven 
> allocation of Opportunistic containers across the cluster leading to 
> increased queuing time 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-10-03 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943809#comment-16943809
 ] 

Abhishek Modi commented on YARN-9697:
-

[~elgoiri] could you please review Yarn-9697.002.patch whenever you get time. 
Thanks.

> Efficient allocation of Opportunistic containers.
> -
>
> Key: YARN-9697
> URL: https://issues.apache.org/jira/browse/YARN-9697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9697.001.patch, YARN-9697.002.patch, 
> YARN-9697.ut.patch, YARN-9697.ut2.patch, YARN-9697.wip1.patch, 
> YARN-9697.wip2.patch
>
>
> In the current implementation, opportunistic containers are allocated based 
> on the number of queued opportunistic container information received in node 
> heartbeat. This information becomes stale as soon as more opportunistic 
> containers are allocated on that node.
> Allocation of opportunistic containers happens on the same heartbeat in which 
> AM asks for the containers. When multiple applications request for 
> Opportunistic containers, containers might get allocated on the same set of 
> nodes as already allocated containers on the node are not considered while 
> serving requests from different applications. This can lead to uneven 
> allocation of Opportunistic containers across the cluster leading to 
> increased queuing time 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-10-03 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9697:

Attachment: YARN-9697.002.patch

> Efficient allocation of Opportunistic containers.
> -
>
> Key: YARN-9697
> URL: https://issues.apache.org/jira/browse/YARN-9697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9697.001.patch, YARN-9697.002.patch, 
> YARN-9697.ut.patch, YARN-9697.ut2.patch, YARN-9697.wip1.patch, 
> YARN-9697.wip2.patch
>
>
> In the current implementation, opportunistic containers are allocated based 
> on the number of queued opportunistic container information received in node 
> heartbeat. This information becomes stale as soon as more opportunistic 
> containers are allocated on that node.
> Allocation of opportunistic containers happens on the same heartbeat in which 
> AM asks for the containers. When multiple applications request for 
> Opportunistic containers, containers might get allocated on the same set of 
> nodes as already allocated containers on the node are not considered while 
> serving requests from different applications. This can lead to uneven 
> allocation of Opportunistic containers across the cluster leading to 
> increased queuing time 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-10-03 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9697:

Attachment: YARN-9697.001.patch

> Efficient allocation of Opportunistic containers.
> -
>
> Key: YARN-9697
> URL: https://issues.apache.org/jira/browse/YARN-9697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9697.001.patch, YARN-9697.ut.patch, 
> YARN-9697.ut2.patch, YARN-9697.wip1.patch, YARN-9697.wip2.patch
>
>
> In the current implementation, opportunistic containers are allocated based 
> on the number of queued opportunistic container information received in node 
> heartbeat. This information becomes stale as soon as more opportunistic 
> containers are allocated on that node.
> Allocation of opportunistic containers happens on the same heartbeat in which 
> AM asks for the containers. When multiple applications request for 
> Opportunistic containers, containers might get allocated on the same set of 
> nodes as already allocated containers on the node are not considered while 
> serving requests from different applications. This can lead to uneven 
> allocation of Opportunistic containers across the cluster leading to 
> increased queuing time 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9870) Remove unused function from OpportunisticContainerAllocatorAMService

2019-10-02 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942849#comment-16942849
 ] 

Abhishek Modi commented on YARN-9870:
-

Thanks [~elgoiri] for review. Committed to trunk.

> Remove unused function from OpportunisticContainerAllocatorAMService
> 
>
> Key: YARN-9870
> URL: https://issues.apache.org/jira/browse/YARN-9870
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Minor
> Attachments: YARN-9870.001.patch, YARN-9870.002.patch
>
>
> Code clean up of OpportunisticContainerAllocatorAMService and removal of 
> unused functions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9870) Remove unused function from OpportunisticContainerAllocatorAMService

2019-10-01 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9870:

Attachment: YARN-9870.002.patch

> Remove unused function from OpportunisticContainerAllocatorAMService
> 
>
> Key: YARN-9870
> URL: https://issues.apache.org/jira/browse/YARN-9870
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Minor
> Attachments: YARN-9870.001.patch, YARN-9870.002.patch
>
>
> Code clean up of OpportunisticContainerAllocatorAMService and removal of 
> unused functions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9782) Avoid DNS resolution while running SLS.

2019-10-01 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9782:

Attachment: YARN-9782.005.patch

> Avoid DNS resolution while running SLS.
> ---
>
> Key: YARN-9782
> URL: https://issues.apache.org/jira/browse/YARN-9782
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9782.001.patch, YARN-9782.002.patch, 
> YARN-9782.003.patch, YARN-9782.004.patch, YARN-9782.005.patch
>
>
> In SLS, we add nodes with random names and rack. DNS resolution of these 
> nodes takes around 2 seconds because it will timeout after that. This makes 
> the result of SLS unreliable and adds spikes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9870) Remove unused function from OpportunisticContainerAllocatorAMService

2019-10-01 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941776#comment-16941776
 ] 

Abhishek Modi commented on YARN-9870:
-

[~elgoiri] could you please review it. Thanks.

> Remove unused function from OpportunisticContainerAllocatorAMService
> 
>
> Key: YARN-9870
> URL: https://issues.apache.org/jira/browse/YARN-9870
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Minor
> Attachments: YARN-9870.001.patch
>
>
> Code clean up of OpportunisticContainerAllocatorAMService and removal of 
> unused functions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9870) Remove unused function from OpportunisticContainerAllocatorAMService

2019-09-30 Thread Abhishek Modi (Jira)
Abhishek Modi created YARN-9870:
---

 Summary: Remove unused function from 
OpportunisticContainerAllocatorAMService
 Key: YARN-9870
 URL: https://issues.apache.org/jira/browse/YARN-9870
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Abhishek Modi
Assignee: Abhishek Modi


Code clean up of OpportunisticContainerAllocatorAMService and removal of unused 
functions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9859) Refactor OpportunisticContainerAllocator

2019-09-30 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941221#comment-16941221
 ] 

Abhishek Modi commented on YARN-9859:
-

Thanks [~elgoiri] for review. Committed to trunk.

> Refactor OpportunisticContainerAllocator
> 
>
> Key: YARN-9859
> URL: https://issues.apache.org/jira/browse/YARN-9859
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9859.001.patch, YARN-9859.002.patch, 
> YARN-9859.003.patch
>
>
> Right now OpportunisticContainerAllocator is written mainly for Distributed 
> Scheduling and schedules Opportunistic containers on limited set of nodes. As 
> part of this jira, we are going to make OpportunisticContainerAllocator as an 
> abstract class and DistributedOpportunisticContainerAllocator as actual 
> implementation. This would be prerequisite for YARN-9697.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9782) Avoid DNS resolution while running SLS.

2019-09-30 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940982#comment-16940982
 ] 

Abhishek Modi commented on YARN-9782:
-

Thanks [~elgoiri] for the review. Attached 004 patch with fixes. 

I couldn't find "networkaddress.cache.ttl" defined as constant string in any of 
the libs. So I made them as const in SLSRunner. Please review it whenever you 
get some time. Thanks.

> Avoid DNS resolution while running SLS.
> ---
>
> Key: YARN-9782
> URL: https://issues.apache.org/jira/browse/YARN-9782
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9782.001.patch, YARN-9782.002.patch, 
> YARN-9782.003.patch, YARN-9782.004.patch
>
>
> In SLS, we add nodes with random names and rack. DNS resolution of these 
> nodes takes around 2 seconds because it will timeout after that. This makes 
> the result of SLS unreliable and adds spikes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9782) Avoid DNS resolution while running SLS.

2019-09-30 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9782:

Attachment: YARN-9782.004.patch

> Avoid DNS resolution while running SLS.
> ---
>
> Key: YARN-9782
> URL: https://issues.apache.org/jira/browse/YARN-9782
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9782.001.patch, YARN-9782.002.patch, 
> YARN-9782.003.patch, YARN-9782.004.patch
>
>
> In SLS, we add nodes with random names and rack. DNS resolution of these 
> nodes takes around 2 seconds because it will timeout after that. This makes 
> the result of SLS unreliable and adds spikes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9859) Refactor OpportunisticContainerAllocator

2019-09-30 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940955#comment-16940955
 ] 

Abhishek Modi commented on YARN-9859:
-

Thanks [~elgoiri]. I have done all the suggested changes. Could you please 
review it. Thanks.

> Refactor OpportunisticContainerAllocator
> 
>
> Key: YARN-9859
> URL: https://issues.apache.org/jira/browse/YARN-9859
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9859.001.patch, YARN-9859.002.patch, 
> YARN-9859.003.patch
>
>
> Right now OpportunisticContainerAllocator is written mainly for Distributed 
> Scheduling and schedules Opportunistic containers on limited set of nodes. As 
> part of this jira, we are going to make OpportunisticContainerAllocator as an 
> abstract class and DistributedOpportunisticContainerAllocator as actual 
> implementation. This would be prerequisite for YARN-9697.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9859) Refactor OpportunisticContainerAllocator

2019-09-30 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9859:

Attachment: YARN-9859.003.patch

> Refactor OpportunisticContainerAllocator
> 
>
> Key: YARN-9859
> URL: https://issues.apache.org/jira/browse/YARN-9859
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9859.001.patch, YARN-9859.002.patch, 
> YARN-9859.003.patch
>
>
> Right now OpportunisticContainerAllocator is written mainly for Distributed 
> Scheduling and schedules Opportunistic containers on limited set of nodes. As 
> part of this jira, we are going to make OpportunisticContainerAllocator as an 
> abstract class and DistributedOpportunisticContainerAllocator as actual 
> implementation. This would be prerequisite for YARN-9697.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9859) Refactor OpportunisticContainerAllocator

2019-09-26 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9859:

Attachment: YARN-9859.002.patch

> Refactor OpportunisticContainerAllocator
> 
>
> Key: YARN-9859
> URL: https://issues.apache.org/jira/browse/YARN-9859
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9859.001.patch, YARN-9859.002.patch
>
>
> Right now OpportunisticContainerAllocator is written mainly for Distributed 
> Scheduling and schedules Opportunistic containers on limited set of nodes. As 
> part of this jira, we are going to make OpportunisticContainerAllocator as an 
> abstract class and DistributedOpportunisticContainerAllocator as actual 
> implementation. This would be prerequisite for YARN-9697.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9859) Refactor OpportunisticContainerAllocator

2019-09-26 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939113#comment-16939113
 ] 

Abhishek Modi commented on YARN-9859:
-

Thanks [~elgoiri] for review.

Changed the title of the jira.
{quote}we should tune the indentation for 237 and adding extra indents to the 
following lines of the constructor.
{quote}
I checked the indentation and it seems to be correct to me. Could you please 
check it again and let me know if I am missing something.

Attached v2 patch with rest of the fixes.

> Refactor OpportunisticContainerAllocator
> 
>
> Key: YARN-9859
> URL: https://issues.apache.org/jira/browse/YARN-9859
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9859.001.patch, YARN-9859.002.patch
>
>
> Right now OpportunisticContainerAllocator is written mainly for Distributed 
> Scheduling and schedules Opportunistic containers on limited set of nodes. As 
> part of this jira, we are going to make OpportunisticContainerAllocator as an 
> abstract class and DistributedOpportunisticContainerAllocator as actual 
> implementation. This would be prerequisite for YARN-9697.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9859) Refactor OpportunisticContainerAllocator

2019-09-26 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9859:

Summary: Refactor OpportunisticContainerAllocator  (was: Code cleanup of 
OpportunisticContainerAllocator)

> Refactor OpportunisticContainerAllocator
> 
>
> Key: YARN-9859
> URL: https://issues.apache.org/jira/browse/YARN-9859
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9859.001.patch
>
>
> Right now OpportunisticContainerAllocator is written mainly for Distributed 
> Scheduling and schedules Opportunistic containers on limited set of nodes. As 
> part of this jira, we are going to make OpportunisticContainerAllocator as an 
> abstract class and DistributedOpportunisticContainerAllocator as actual 
> implementation. This would be prerequisite for YARN-9697.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9859) Code cleanup of OpportunisticContainerAllocator

2019-09-26 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938613#comment-16938613
 ] 

Abhishek Modi commented on YARN-9859:
-

[~elgoiri] could you please review this whenever you get some time. Thanks.

> Code cleanup of OpportunisticContainerAllocator
> ---
>
> Key: YARN-9859
> URL: https://issues.apache.org/jira/browse/YARN-9859
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9859.001.patch
>
>
> Right now OpportunisticContainerAllocator is written mainly for Distributed 
> Scheduling and schedules Opportunistic containers on limited set of nodes. As 
> part of this jira, we are going to make OpportunisticContainerAllocator as an 
> abstract class and DistributedOpportunisticContainerAllocator as actual 
> implementation. This would be prerequisite for YARN-9697.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9859) Code cleanup of OpportunisticContainerAllocator

2019-09-26 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9859:

Attachment: YARN-9859.001.patch

> Code cleanup of OpportunisticContainerAllocator
> ---
>
> Key: YARN-9859
> URL: https://issues.apache.org/jira/browse/YARN-9859
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9859.001.patch
>
>
> Right now OpportunisticContainerAllocator is written mainly for Distributed 
> Scheduling and schedules Opportunistic containers on limited set of nodes. As 
> part of this jira, we are going to make OpportunisticContainerAllocator as an 
> abstract class and DistributedOpportunisticContainerAllocator as actual 
> implementation. This would be prerequisite for YARN-9697.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9859) Code cleanup of OpportunisticContainerAllocator

2019-09-26 Thread Abhishek Modi (Jira)
Abhishek Modi created YARN-9859:
---

 Summary: Code cleanup of OpportunisticContainerAllocator
 Key: YARN-9859
 URL: https://issues.apache.org/jira/browse/YARN-9859
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Abhishek Modi
Assignee: Abhishek Modi


Right now OpportunisticContainerAllocator is written mainly for Distributed 
Scheduling and schedules Opportunistic containers on limited set of nodes. As 
part of this jira, we are going to make OpportunisticContainerAllocator as an 
abstract class and DistributedOpportunisticContainerAllocator as actual 
implementation. This would be prerequisite for YARN-9697.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9853) Add number of paused containers in NodeInfo page.

2019-09-24 Thread Abhishek Modi (Jira)
Abhishek Modi created YARN-9853:
---

 Summary: Add number of paused containers in NodeInfo page.
 Key: YARN-9853
 URL: https://issues.apache.org/jira/browse/YARN-9853
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Abhishek Modi
Assignee: Abhishek Modi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-09-19 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933658#comment-16933658
 ] 

Abhishek Modi commented on YARN-9697:
-

[~elgoiri] could you please review the approach of 
wip2(https://issues.apache.org/jira/secure/attachment/12980716/YARN-9697.wip2.patch)
 patch. Thanks.

> Efficient allocation of Opportunistic containers.
> -
>
> Key: YARN-9697
> URL: https://issues.apache.org/jira/browse/YARN-9697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9697.ut.patch, YARN-9697.ut2.patch, 
> YARN-9697.wip1.patch, YARN-9697.wip2.patch
>
>
> In the current implementation, opportunistic containers are allocated based 
> on the number of queued opportunistic container information received in node 
> heartbeat. This information becomes stale as soon as more opportunistic 
> containers are allocated on that node.
> Allocation of opportunistic containers happens on the same heartbeat in which 
> AM asks for the containers. When multiple applications request for 
> Opportunistic containers, containers might get allocated on the same set of 
> nodes as already allocated containers on the node are not considered while 
> serving requests from different applications. This can lead to uneven 
> allocation of Opportunistic containers across the cluster leading to 
> increased queuing time 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-09-19 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9697:

Attachment: YARN-9697.wip2.patch

> Efficient allocation of Opportunistic containers.
> -
>
> Key: YARN-9697
> URL: https://issues.apache.org/jira/browse/YARN-9697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9697.ut.patch, YARN-9697.ut2.patch, 
> YARN-9697.wip1.patch, YARN-9697.wip2.patch
>
>
> In the current implementation, opportunistic containers are allocated based 
> on the number of queued opportunistic container information received in node 
> heartbeat. This information becomes stale as soon as more opportunistic 
> containers are allocated on that node.
> Allocation of opportunistic containers happens on the same heartbeat in which 
> AM asks for the containers. When multiple applications request for 
> Opportunistic containers, containers might get allocated on the same set of 
> nodes as already allocated containers on the node are not considered while 
> serving requests from different applications. This can lead to uneven 
> allocation of Opportunistic containers across the cluster leading to 
> increased queuing time 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9782) Avoid DNS resolution while running SLS.

2019-09-19 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933270#comment-16933270
 ] 

Abhishek Modi commented on YARN-9782:
-

Thanks [~elgoiri] for review. Filed YARN-9843 for making 
TestAMSimulator.testAMSimulator more resilient.

> Avoid DNS resolution while running SLS.
> ---
>
> Key: YARN-9782
> URL: https://issues.apache.org/jira/browse/YARN-9782
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9782.001.patch, YARN-9782.002.patch, 
> YARN-9782.003.patch
>
>
> In SLS, we add nodes with random names and rack. DNS resolution of these 
> nodes takes around 2 seconds because it will timeout after that. This makes 
> the result of SLS unreliable and adds spikes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9843) Test TestAMSimulator.testAMSimulator fails intermittently.

2019-09-19 Thread Abhishek Modi (Jira)
Abhishek Modi created YARN-9843:
---

 Summary: Test TestAMSimulator.testAMSimulator fails intermittently.
 Key: YARN-9843
 URL: https://issues.apache.org/jira/browse/YARN-9843
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Abhishek Modi
Assignee: Abhishek Modi


Stack trace for failure:

java.lang.AssertionError: java.io.IOException: Unable to delete directory 
/testptch/hadoop/hadoop-tools/hadoop-sls/target/test-dir/output4038286622450859971/metrics.
 at org.junit.Assert.fail(Assert.java:88)
 at 
org.apache.hadoop.yarn.sls.appmaster.TestAMSimulator.deleteMetricOutputDir(TestAMSimulator.java:141)
 at 
org.apache.hadoop.yarn.sls.appmaster.TestAMSimulator.tearDown(TestAMSimulator.java:298)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
 at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
 at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
 at org.junit.runners.Suite.runChild(Suite.java:128)
 at org.junit.runners.Suite.runChild(Suite.java:27)
 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
 at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
 at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
 at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
 at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
 at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
 at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
 at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
 at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
 at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9842) Port YARN-9608 DecommissioningNodesWatcher should get lists of running applications on node from RMNode to branch-3.0/branch-2

2019-09-19 Thread Abhishek Modi (Jira)
Abhishek Modi created YARN-9842:
---

 Summary: Port YARN-9608 DecommissioningNodesWatcher should get 
lists of running applications on node from RMNode to branch-3.0/branch-2
 Key: YARN-9842
 URL: https://issues.apache.org/jira/browse/YARN-9842
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Abhishek Modi
Assignee: Abhishek Modi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9794) RM crashes due to runtime errors in TimelineServiceV2Publisher

2019-09-15 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929935#comment-16929935
 ] 

Abhishek Modi commented on YARN-9794:
-

Thanks [~tarunparimi]. Latest patch looks good to me. Thanks [~Prabhu Joseph] 
for additional review. Committed to trunk.

> RM crashes due to runtime errors in TimelineServiceV2Publisher
> --
>
> Key: YARN-9794
> URL: https://issues.apache.org/jira/browse/YARN-9794
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-9794.001.patch, YARN-9794.002.patch
>
>
> Saw that RM crashes while startup due to errors while putting entity in 
> TimelineServiceV2Publisher.
> {code:java}
> 2019-08-28 09:35:45,273 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.RuntimeException: java.lang.IllegalArgumentException: 
> org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  CodedInputStream encountered an embedded string or message which claimed to 
> have negative size
> .
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:200)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:269)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:437)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:312)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:597)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:834)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:732)
> at 
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281)
> at 
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:236)
> at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:321)
> at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:285)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.flush(TypedBufferedMutator.java:66)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.flush(HBaseTimelineWriterImpl.java:566)
> at 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector.flushBufferedTimelineEntities(TimelineCollector.java:173)
> at 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector.putEntities(TimelineCollector.java:150)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:459)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.access$100(TimelineServiceV2Publisher.java:73)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:494)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:483)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalArgumentException: 
> org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  CodedInputStream encountered an embedded string or message which claimed to 
> have negative size.
> at 
> org.apache.hbase.thirdparty.com.google.protobuf.CodedInputStream.newInstance(CodedInputStream.java:117)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9816) EntityGroupFSTimelineStore#scanActiveLogs fails when undesired files are present under /ats/active.

2019-09-12 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9816:

Summary: EntityGroupFSTimelineStore#scanActiveLogs fails when undesired 
files are present under /ats/active.  (was: 
EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError)

> EntityGroupFSTimelineStore#scanActiveLogs fails when undesired files are 
> present under /ats/active.
> ---
>
> Key: YARN-9816
> URL: https://issues.apache.org/jira/browse/YARN-9816
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 3.1.0, 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9816-001.patch
>
>
> EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError.  
> This happens when a file is present under /ats/active.
> {code}
> [hdfs@node2 yarn]$ hadoop fs -ls /ats/active
> Found 1 items
> -rw-r--r--   3 hdfs hadoop  0 2019-09-06 16:34 
> /ats/active/.distcp.tmp.attempt_155759136_39768_m_01_0
> {code}
> Error Message:
> {code:java}
> java.lang.StackOverflowError
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:632)
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
> at com.sun.proxy.$Proxy15.getListing(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2143)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1076)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1088)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1059)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1038)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1034)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1046)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.list(EntityGroupFSTimelineStore.java:398)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:368)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> 

[jira] [Commented] (YARN-9816) EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError

2019-09-12 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928231#comment-16928231
 ] 

Abhishek Modi commented on YARN-9816:
-

Sure.. committing it shortly. Thanks.

> EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError
> ---
>
> Key: YARN-9816
> URL: https://issues.apache.org/jira/browse/YARN-9816
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 3.1.0, 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9816-001.patch
>
>
> EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError.  
> This happens when a file is present under /ats/active.
> {code}
> [hdfs@node2 yarn]$ hadoop fs -ls /ats/active
> Found 1 items
> -rw-r--r--   3 hdfs hadoop  0 2019-09-06 16:34 
> /ats/active/.distcp.tmp.attempt_155759136_39768_m_01_0
> {code}
> Error Message:
> {code:java}
> java.lang.StackOverflowError
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:632)
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
> at com.sun.proxy.$Proxy15.getListing(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2143)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1076)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1088)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1059)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1038)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1034)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1046)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.list(EntityGroupFSTimelineStore.java:398)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:368)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
>  {code}
> One of our user has tried to distcp hdfs://ats/active dir. Distcp job has 
> created the 
> temp file .distcp.tmp.attempt_155759136_39768_m_01_0 and 

[jira] [Created] (YARN-9828) Add log line for app submission in RouterWebServices.

2019-09-12 Thread Abhishek Modi (Jira)
Abhishek Modi created YARN-9828:
---

 Summary: Add log line for app submission in RouterWebServices.
 Key: YARN-9828
 URL: https://issues.apache.org/jira/browse/YARN-9828
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Abhishek Modi
Assignee: Abhishek Modi






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.

2019-09-11 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928147#comment-16928147
 ] 

Abhishek Modi commented on YARN-9819:
-

Thanks [~elgoiri] for review. Committed to trunk.

> Make TestOpportunisticContainerAllocatorAMService more resilient.
> -
>
> Key: YARN-9819
> URL: https://issues.apache.org/jira/browse/YARN-9819
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9819.001.patch, YARN-9819.002.patch, 
> YARN-9819.003.patch
>
>
> Currently, TestOpportunisticContainerAllocatorAMService tries to set the 
> Opportunistic container status directly in RMNode but that can be updated by 
> NM heartbeat. Correct way would be to send it through NM heartbeat.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8972) [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size

2019-09-11 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927813#comment-16927813
 ] 

Abhishek Modi commented on YARN-8972:
-

[~giovanni.fumarola] are you still working on it. Thanks.

> [Router] Add support to prevent DoS attack over ApplicationSubmissionContext 
> size
> -
>
> Key: YARN-8972
> URL: https://issues.apache.org/jira/browse/YARN-8972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8972.v1.patch, YARN-8972.v2.patch, 
> YARN-8972.v3.patch, YARN-8972.v4.patch, YARN-8972.v5.patch
>
>
> This jira tracks the effort to add a new interceptor in the Router to prevent 
> user to submit applications with oversized ASC.
> This avoid YARN cluster to failover.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9827) Fix Http Response code in GenericExceptionHandler.

2019-09-11 Thread Abhishek Modi (Jira)
Abhishek Modi created YARN-9827:
---

 Summary: Fix Http Response code in GenericExceptionHandler.
 Key: YARN-9827
 URL: https://issues.apache.org/jira/browse/YARN-9827
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Abhishek Modi
Assignee: Abhishek Modi


GenericExceptionHandler should respond with SERVICE_UNAVAILABLE in case of 
connection and service unavailable exception instead of INTERNAL_SERVICE_ERROR.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9782) Avoid DNS resolution while running SLS.

2019-09-10 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926479#comment-16926479
 ] 

Abhishek Modi commented on YARN-9782:
-

Test failure is not related to this patch and is happening because we are not 
able to delete a directory at end. [~elgoiri] could you please review latest 
patch. Thanks.

> Avoid DNS resolution while running SLS.
> ---
>
> Key: YARN-9782
> URL: https://issues.apache.org/jira/browse/YARN-9782
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9782.001.patch, YARN-9782.002.patch, 
> YARN-9782.003.patch
>
>
> In SLS, we add nodes with random names and rack. DNS resolution of these 
> nodes takes around 2 seconds because it will timeout after that. This makes 
> the result of SLS unreliable and adds spikes. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9782) Avoid DNS resolution while running SLS.

2019-09-10 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9782:

Attachment: YARN-9782.003.patch

> Avoid DNS resolution while running SLS.
> ---
>
> Key: YARN-9782
> URL: https://issues.apache.org/jira/browse/YARN-9782
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9782.001.patch, YARN-9782.002.patch, 
> YARN-9782.003.patch
>
>
> In SLS, we add nodes with random names and rack. DNS resolution of these 
> nodes takes around 2 seconds because it will timeout after that. This makes 
> the result of SLS unreliable and adds spikes. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.

2019-09-09 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925922#comment-16925922
 ] 

Abhishek Modi commented on YARN-9819:
-

Thanks [~elgoiri] for review. 

Attached v3 patch with javadocs for all public functions. 

Private functions introduced in TestOpportunisticContainerAllocatorAMService 
are one liner and quite self explanatory. Please let me know if you think we 
need documentation there too.

> Make TestOpportunisticContainerAllocatorAMService more resilient.
> -
>
> Key: YARN-9819
> URL: https://issues.apache.org/jira/browse/YARN-9819
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9819.001.patch, YARN-9819.002.patch, 
> YARN-9819.003.patch
>
>
> Currently, TestOpportunisticContainerAllocatorAMService tries to set the 
> Opportunistic container status directly in RMNode but that can be updated by 
> NM heartbeat. Correct way would be to send it through NM heartbeat.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.

2019-09-09 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9819:

Attachment: YARN-9819.003.patch

> Make TestOpportunisticContainerAllocatorAMService more resilient.
> -
>
> Key: YARN-9819
> URL: https://issues.apache.org/jira/browse/YARN-9819
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9819.001.patch, YARN-9819.002.patch, 
> YARN-9819.003.patch
>
>
> Currently, TestOpportunisticContainerAllocatorAMService tries to set the 
> Opportunistic container status directly in RMNode but that can be updated by 
> NM heartbeat. Correct way would be to send it through NM heartbeat.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down

2019-09-09 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925490#comment-16925490
 ] 

Abhishek Modi commented on YARN-9821:
-

Sure [~rohithsharma]. I am leaving this Jira as unresolved and you can mark it 
as resolved after you backport it to 3.2 branches. Thanks.

> NM hangs at serviceStop when ATSV2 Backend Hbase is Down 
> -
>
> Key: YARN-9821
> URL: https://issues.apache.org/jira/browse/YARN-9821
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9821-001.patch, YARN-9821-002.patch
>
>
> NM hangs at serviceStop when ATSV2 Backend Hbase is Down.
> {code}
> "Thread-197" #302 prio=5 os_prio=0 tid=0x7f5f389ba000 nid=0x631d waiting 
> for monitor entry [0x7f5f1f29b000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:249)
>   - waiting to lock <0x0006c834d148> (a 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.close(TypedBufferedMutator.java:62)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.serviceStop(HBaseTimelineWriterImpl.java:636)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05808> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:247)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.serviceStop(TimelineCollectorManager.java:244)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.serviceStop(NodeTimelineCollectorManager.java:164)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05890> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.serviceStop(PerNodeTimelineCollectorsAuxService.java:113)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c058f8> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStop(AuxServices.java:330)
>   - locked <0x0006c7c23400> (a java.util.Collections$SynchronizedMap)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c059a8> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStop(ContainerManagerImpl.java:720)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05a98> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:526)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05c88> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager$1.run(NodeManager.java:552)
>   
>   
> "qtp183259297-76" #76 daemon prio=5 os_prio=0 tid=0x7f5f567ed000 
> nid=0x5fb7 in Object.wait() [0x7f5f23ad7000]
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:460)
>   at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
>   at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForSpecificCompletedTask(ResultBoundedCompletionService.java:258)
>   - locked <0x000784ee8220> (a 
> [Lorg.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture;)
>   at 
> 

[jira] [Commented] (YARN-9816) EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError

2019-09-09 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925484#comment-16925484
 ] 

Abhishek Modi commented on YARN-9816:
-

Thanks [~Prabhu Joseph]. changes looks good to me. will commit shortly.

> EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError
> ---
>
> Key: YARN-9816
> URL: https://issues.apache.org/jira/browse/YARN-9816
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 3.1.0, 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9816-001.patch
>
>
> EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError.  
> This happens when a file is present under /ats/active.
> {code}
> [hdfs@node2 yarn]$ hadoop fs -ls /ats/active
> Found 1 items
> -rw-r--r--   3 hdfs hadoop  0 2019-09-06 16:34 
> /ats/active/.distcp.tmp.attempt_155759136_39768_m_01_0
> {code}
> Error Message:
> {code:java}
> java.lang.StackOverflowError
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:632)
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
> at com.sun.proxy.$Proxy15.getListing(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2143)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1076)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1088)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1059)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1038)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1034)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1046)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.list(EntityGroupFSTimelineStore.java:398)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:368)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
>  {code}
> One of our user has tried to distcp hdfs://ats/active dir. Distcp job has 
> created the 
> temp file 

[jira] [Commented] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down

2019-09-09 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925481#comment-16925481
 ] 

Abhishek Modi commented on YARN-9821:
-

Thanks [~Prabhu Joseph] for the patch and [~rohithsharma] for additional 
review. I have committed it to trunk.

[~rohithsharma] should we commit it to 3.2 and 3.1 branch also?

> NM hangs at serviceStop when ATSV2 Backend Hbase is Down 
> -
>
> Key: YARN-9821
> URL: https://issues.apache.org/jira/browse/YARN-9821
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9821-001.patch, YARN-9821-002.patch
>
>
> NM hangs at serviceStop when ATSV2 Backend Hbase is Down.
> {code}
> "Thread-197" #302 prio=5 os_prio=0 tid=0x7f5f389ba000 nid=0x631d waiting 
> for monitor entry [0x7f5f1f29b000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:249)
>   - waiting to lock <0x0006c834d148> (a 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.close(TypedBufferedMutator.java:62)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.serviceStop(HBaseTimelineWriterImpl.java:636)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05808> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:247)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.serviceStop(TimelineCollectorManager.java:244)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.serviceStop(NodeTimelineCollectorManager.java:164)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05890> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.serviceStop(PerNodeTimelineCollectorsAuxService.java:113)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c058f8> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStop(AuxServices.java:330)
>   - locked <0x0006c7c23400> (a java.util.Collections$SynchronizedMap)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c059a8> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStop(ContainerManagerImpl.java:720)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05a98> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:526)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05c88> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager$1.run(NodeManager.java:552)
>   
>   
> "qtp183259297-76" #76 daemon prio=5 os_prio=0 tid=0x7f5f567ed000 
> nid=0x5fb7 in Object.wait() [0x7f5f23ad7000]
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:460)
>   at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
>   at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForSpecificCompletedTask(ResultBoundedCompletionService.java:258)
>   - locked <0x000784ee8220> (a 
> 

[jira] [Commented] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down

2019-09-08 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925331#comment-16925331
 ] 

Abhishek Modi commented on YARN-9821:
-

Thanks [~Prabhu Joseph] for the patch. Some minor comments:
 # Can we rename isHbaseUp => isStorageUp to make it more generic.
 # Can we log the exception too.

Apart from these minor comments, it looks good to me.

> NM hangs at serviceStop when ATSV2 Backend Hbase is Down 
> -
>
> Key: YARN-9821
> URL: https://issues.apache.org/jira/browse/YARN-9821
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9821-001.patch
>
>
> NM hangs at serviceStop when ATSV2 Backend Hbase is Down.
> {code}
> "Thread-197" #302 prio=5 os_prio=0 tid=0x7f5f389ba000 nid=0x631d waiting 
> for monitor entry [0x7f5f1f29b000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:249)
>   - waiting to lock <0x0006c834d148> (a 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.close(TypedBufferedMutator.java:62)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.serviceStop(HBaseTimelineWriterImpl.java:636)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05808> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:247)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.serviceStop(TimelineCollectorManager.java:244)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.serviceStop(NodeTimelineCollectorManager.java:164)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05890> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.serviceStop(PerNodeTimelineCollectorsAuxService.java:113)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c058f8> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStop(AuxServices.java:330)
>   - locked <0x0006c7c23400> (a java.util.Collections$SynchronizedMap)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c059a8> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStop(ContainerManagerImpl.java:720)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05a98> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:526)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05c88> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager$1.run(NodeManager.java:552)
>   
>   
> "qtp183259297-76" #76 daemon prio=5 os_prio=0 tid=0x7f5f567ed000 
> nid=0x5fb7 in Object.wait() [0x7f5f23ad7000]
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:460)
>   at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
>   at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForSpecificCompletedTask(ResultBoundedCompletionService.java:258)
>   - locked <0x000784ee8220> (a 
> 

[jira] [Commented] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.

2019-09-07 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925081#comment-16925081
 ] 

Abhishek Modi commented on YARN-9819:
-

[~elgoiri] could you please review it. Unit test failure is not related to 
patch.

> Make TestOpportunisticContainerAllocatorAMService more resilient.
> -
>
> Key: YARN-9819
> URL: https://issues.apache.org/jira/browse/YARN-9819
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9819.001.patch, YARN-9819.002.patch
>
>
> Currently, TestOpportunisticContainerAllocatorAMService tries to set the 
> Opportunistic container status directly in RMNode but that can be updated by 
> NM heartbeat. Correct way would be to send it through NM heartbeat.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.

2019-09-07 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9819:

Attachment: YARN-9819.002.patch

> Make TestOpportunisticContainerAllocatorAMService more resilient.
> -
>
> Key: YARN-9819
> URL: https://issues.apache.org/jira/browse/YARN-9819
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9819.001.patch, YARN-9819.002.patch
>
>
> Currently, TestOpportunisticContainerAllocatorAMService tries to set the 
> Opportunistic container status directly in RMNode but that can be updated by 
> NM heartbeat. Correct way would be to send it through NM heartbeat.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9784) org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue is flaky

2019-09-07 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924852#comment-16924852
 ] 

Abhishek Modi commented on YARN-9784:
-

Thanks [~kmarton] for the patch. LGTM.

Thanks [~sunilg] and [~adam.antal] for additional reviews. Committed to trunk.

> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue
>  is flaky
> ---
>
> Key: YARN-9784
> URL: https://issues.apache.org/jira/browse/YARN-9784
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.3.0
>Reporter: Julia Kinga Marton
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: YARN-9784.001.patch
>
>
> There are some test cases in TestLeafQueue which are failing intermittently.
> From 100 runs, there were 16 failures. 
> Some failure examples are the following ones:
> {code:java}
> 2019-08-26 13:18:13 [ERROR] Errors: 
> 2019-08-26 13:18:13 [ERROR]   TestLeafQueue.setUp:144->setUpInternal:221 
> WrongTypeOfReturnValue 
> 2019-08-26 13:18:13 YarnConfigu...
> 2019-08-26 13:18:13 [ERROR]   TestLeafQueue.setUp:144->setUpInternal:221 
> WrongTypeOfReturnValue 
> 2019-08-26 13:18:13 YarnConfigu...
> 2019-08-26 13:18:13 [INFO] 
> 2019-08-26 13:18:13 [ERROR] Tests run: 36, Failures: 0, Errors: 2, Skipped: 0
> {code}
> {code:java}
> 2019-08-26 13:18:09 [ERROR] Failures: 
> 2019-08-26 13:18:09 [ERROR]   TestLeafQueue.testHeadroomWithMaxCap:1373 
> expected:<2048> but was:<0>
> 2019-08-26 13:18:09 [INFO] 
> 2019-08-26 13:18:09 [ERROR] Tests run: 36, Failures: 1, Errors: 0, Skipped: 0
> {code}
> {code:java}
> 2019-08-26 13:18:18 [ERROR] Errors: 
> 2019-08-26 13:18:18 [ERROR]   TestLeafQueue.setUp:144->setUpInternal:221 
> WrongTypeOfReturnValue 
> 2019-08-26 13:18:18 YarnConfigu...
> 2019-08-26 13:18:18 [ERROR]   TestLeafQueue.testHeadroomWithMaxCap:1307 ? 
> ClassCast org.apache.hadoop.yarn.c...
> 2019-08-26 13:18:18 [INFO] 
> 2019-08-26 13:18:18 [ERROR] Tests run: 36, Failures: 0, Errors: 2, Skipped: 0
> {code}
> {code:java}
> 2019-08-26 13:18:10 [ERROR] Failures: 
> 2019-08-26 13:18:10 [ERROR]   TestLeafQueue.testDRFUserLimits:847 Verify 
> user_0 got resources 
> 2019-08-26 13:18:10 [INFO] 
> 2019-08-26 13:18:10 [ERROR] Tests run: 36, Failures: 1, Errors: 0, Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.

2019-09-07 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9819:

Attachment: YARN-9819.001.patch

> Make TestOpportunisticContainerAllocatorAMService more resilient.
> -
>
> Key: YARN-9819
> URL: https://issues.apache.org/jira/browse/YARN-9819
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9819.001.patch
>
>
> Currently, TestOpportunisticContainerAllocatorAMService tries to set the 
> Opportunistic container status directly in RMNode but that can be updated by 
> NM heartbeat. Correct way would be to send it through NM heartbeat.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.

2019-09-07 Thread Abhishek Modi (Jira)
Abhishek Modi created YARN-9819:
---

 Summary: Make TestOpportunisticContainerAllocatorAMService more 
resilient.
 Key: YARN-9819
 URL: https://issues.apache.org/jira/browse/YARN-9819
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Abhishek Modi
Assignee: Abhishek Modi


Currently, TestOpportunisticContainerAllocatorAMService tries to set the 
Opportunistic container status directly in RMNode but that can be updated by NM 
heartbeat. Correct way would be to send it through NM heartbeat.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9812) mvn javadoc:javadoc fails in hadoop-sls

2019-09-07 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924757#comment-16924757
 ] 

Abhishek Modi commented on YARN-9812:
-

Thanks [~elgoiri] for review. Committed to trunk.

> mvn javadoc:javadoc fails in hadoop-sls
> ---
>
> Key: YARN-9812
> URL: https://issues.apache.org/jira/browse/YARN-9812
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Akira Ajisaka
>Assignee: Abhishek Modi
>Priority: Major
>  Labels: newbie
> Fix For: 3.3.0
>
> Attachments: YARN-9812.001.patch, YARN-9812.002.patch
>
>
> {noformat}
> [ERROR] 
> hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:57:
>  error: bad use of '>'
> [ERROR]  * pending -> requests which are NOT yet sent to RM.
> [ERROR] ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:58:
>  error: bad use of '>'
> [ERROR]  * scheduled -> requests which are sent to RM but not yet assigned.
> [ERROR]   ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:59:
>  error: bad use of '>'
> [ERROR]  * assigned -> requests which are assigned to a container.
> [ERROR]  ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:60:
>  error: bad use of '>'
> [ERROR]  * completed -> request corresponding to which container has 
> completed.
> [ERROR]   ^
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7604) Fix some minor typos in the opportunistic container logging

2019-09-07 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924747#comment-16924747
 ] 

Abhishek Modi commented on YARN-7604:
-

Thanks [~cheersyang] for the patch. Could you please move this log lines to use 
new log4j format. Thanks.

> Fix some minor typos in the opportunistic container logging
> ---
>
> Key: YARN-7604
> URL: https://issues.apache.org/jira/browse/YARN-7604
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.9.0
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Trivial
> Attachments: YARN-7604.01.patch
>
>
> Fix some minor text issues. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9812) mvn javadoc:javadoc fails in hadoop-sls

2019-09-06 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9812:

Attachment: YARN-9812.002.patch

> mvn javadoc:javadoc fails in hadoop-sls
> ---
>
> Key: YARN-9812
> URL: https://issues.apache.org/jira/browse/YARN-9812
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Akira Ajisaka
>Assignee: Abhishek Modi
>Priority: Major
>  Labels: newbie
> Attachments: YARN-9812.001.patch, YARN-9812.002.patch
>
>
> {noformat}
> [ERROR] 
> hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:57:
>  error: bad use of '>'
> [ERROR]  * pending -> requests which are NOT yet sent to RM.
> [ERROR] ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:58:
>  error: bad use of '>'
> [ERROR]  * scheduled -> requests which are sent to RM but not yet assigned.
> [ERROR]   ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:59:
>  error: bad use of '>'
> [ERROR]  * assigned -> requests which are assigned to a container.
> [ERROR]  ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:60:
>  error: bad use of '>'
> [ERROR]  * completed -> request corresponding to which container has 
> completed.
> [ERROR]   ^
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9782) Avoid DNS resolution while running SLS.

2019-09-06 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924455#comment-16924455
 ] 

Abhishek Modi commented on YARN-9782:
-

[~elgoiri] I found a potential issue with this unit test. Since we are setting 
up java security settings, it would be set for all the following unit tests as 
all of them run within same java process.

One way to avoid that is to run Unit tests for SLS project in separate java 
process, but that will increase runtime for the tests.

Second option is to skip unit test for this. Since it's a very small change 
behind config, would it be possible to skip unit test for this?

[~elgoiri] [~subru] could you please provide some suggestions here. Thanks.

> Avoid DNS resolution while running SLS.
> ---
>
> Key: YARN-9782
> URL: https://issues.apache.org/jira/browse/YARN-9782
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9782.001.patch, YARN-9782.002.patch
>
>
> In SLS, we add nodes with random names and rack. DNS resolution of these 
> nodes takes around 2 seconds because it will timeout after that. This makes 
> the result of SLS unreliable and adds spikes. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9812) mvn javadoc:javadoc fails in hadoop-sls

2019-09-06 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924361#comment-16924361
 ] 

Abhishek Modi commented on YARN-9812:
-

[~aajisaka] [~elgoiri] could you please review it. Thanks.

> mvn javadoc:javadoc fails in hadoop-sls
> ---
>
> Key: YARN-9812
> URL: https://issues.apache.org/jira/browse/YARN-9812
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Akira Ajisaka
>Assignee: Abhishek Modi
>Priority: Major
>  Labels: newbie
> Attachments: YARN-9812.001.patch
>
>
> {noformat}
> [ERROR] 
> hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:57:
>  error: bad use of '>'
> [ERROR]  * pending -> requests which are NOT yet sent to RM.
> [ERROR] ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:58:
>  error: bad use of '>'
> [ERROR]  * scheduled -> requests which are sent to RM but not yet assigned.
> [ERROR]   ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:59:
>  error: bad use of '>'
> [ERROR]  * assigned -> requests which are assigned to a container.
> [ERROR]  ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:60:
>  error: bad use of '>'
> [ERROR]  * completed -> request corresponding to which container has 
> completed.
> [ERROR]   ^
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9782) Avoid DNS resolution while running SLS.

2019-09-06 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924327#comment-16924327
 ] 

Abhishek Modi commented on YARN-9782:
-

Thanks [~elgoiri] for review. Updated patch.

> Avoid DNS resolution while running SLS.
> ---
>
> Key: YARN-9782
> URL: https://issues.apache.org/jira/browse/YARN-9782
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9782.001.patch, YARN-9782.002.patch
>
>
> In SLS, we add nodes with random names and rack. DNS resolution of these 
> nodes takes around 2 seconds because it will timeout after that. This makes 
> the result of SLS unreliable and adds spikes. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-09-06 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924326#comment-16924326
 ] 

Abhishek Modi commented on YARN-9697:
-

[~elgoiri] could you please review the approach taken in poc patch. If it looks 
good to you, I can clean it up and add some more UTs. Thanks.

> Efficient allocation of Opportunistic containers.
> -
>
> Key: YARN-9697
> URL: https://issues.apache.org/jira/browse/YARN-9697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9697.ut.patch, YARN-9697.ut2.patch, 
> YARN-9697.wip1.patch
>
>
> In the current implementation, opportunistic containers are allocated based 
> on the number of queued opportunistic container information received in node 
> heartbeat. This information becomes stale as soon as more opportunistic 
> containers are allocated on that node.
> Allocation of opportunistic containers happens on the same heartbeat in which 
> AM asks for the containers. When multiple applications request for 
> Opportunistic containers, containers might get allocated on the same set of 
> nodes as already allocated containers on the node are not considered while 
> serving requests from different applications. This can lead to uneven 
> allocation of Opportunistic containers across the cluster leading to 
> increased queuing time 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9782) Avoid DNS resolution while running SLS.

2019-09-06 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9782:

Attachment: YARN-9782.002.patch

> Avoid DNS resolution while running SLS.
> ---
>
> Key: YARN-9782
> URL: https://issues.apache.org/jira/browse/YARN-9782
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9782.001.patch, YARN-9782.002.patch
>
>
> In SLS, we add nodes with random names and rack. DNS resolution of these 
> nodes takes around 2 seconds because it will timeout after that. This makes 
> the result of SLS unreliable and adds spikes. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-09-06 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9697:

Attachment: YARN-9697.wip1.patch

> Efficient allocation of Opportunistic containers.
> -
>
> Key: YARN-9697
> URL: https://issues.apache.org/jira/browse/YARN-9697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9697.ut.patch, YARN-9697.ut2.patch, 
> YARN-9697.wip1.patch
>
>
> In the current implementation, opportunistic containers are allocated based 
> on the number of queued opportunistic container information received in node 
> heartbeat. This information becomes stale as soon as more opportunistic 
> containers are allocated on that node.
> Allocation of opportunistic containers happens on the same heartbeat in which 
> AM asks for the containers. When multiple applications request for 
> Opportunistic containers, containers might get allocated on the same set of 
> nodes as already allocated containers on the node are not considered while 
> serving requests from different applications. This can lead to uneven 
> allocation of Opportunistic containers across the cluster leading to 
> increased queuing time 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9812) mvn javadoc:javadoc fails in hadoop-sls

2019-09-04 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi reassigned YARN-9812:
---

Assignee: Abhishek Modi

> mvn javadoc:javadoc fails in hadoop-sls
> ---
>
> Key: YARN-9812
> URL: https://issues.apache.org/jira/browse/YARN-9812
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Akira Ajisaka
>Assignee: Abhishek Modi
>Priority: Major
>  Labels: newbie
>
> {noformat}
> [ERROR] 
> hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:57:
>  error: bad use of '>'
> [ERROR]  * pending -> requests which are NOT yet sent to RM.
> [ERROR] ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:58:
>  error: bad use of '>'
> [ERROR]  * scheduled -> requests which are sent to RM but not yet assigned.
> [ERROR]   ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:59:
>  error: bad use of '>'
> [ERROR]  * assigned -> requests which are assigned to a container.
> [ERROR]  ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:60:
>  error: bad use of '>'
> [ERROR]  * completed -> request corresponding to which container has 
> completed.
> [ERROR]   ^
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9804) Update ATSv2 document for latest feature supports

2019-09-02 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921170#comment-16921170
 ] 

Abhishek Modi commented on YARN-9804:
-

Thanks [~rohithsharma]. New patch looks good to me. +1 from my end.

> Update ATSv2 document for latest feature supports
> -
>
> Key: YARN-9804
> URL: https://issues.apache.org/jira/browse/YARN-9804
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Blocker
> Attachments: YARN-9804.01.patch, YARN-9804.02.patch
>
>
> Revisit ATSv2 documents and update for GA features. And also for the road map.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8139) Skip node hostname resolution when running SLS.

2019-09-02 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi resolved YARN-8139.
-
Resolution: Duplicate

> Skip node hostname resolution when running SLS.
> ---
>
> Key: YARN-8139
> URL: https://issues.apache.org/jira/browse/YARN-8139
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
>
> Currently depending on the time taken in resolution of hostname, metrics of 
> SLS gets skewed. To avoid this, in this fix we are introducing a flag which 
> can be used to disable hostname resolutions.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9804) Update ATSv2 document for latest feature supports

2019-09-01 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920634#comment-16920634
 ] 

Abhishek Modi commented on YARN-9804:
-

Thanks [~rohithsharma] for working on it. Some minor comments:



Road map include -> Road map includes

Simple authorization in terms of a configurable whitelist of users and groups 
who can read timeline data -> Support for simple authorization has been added 
in terms of a configurable whitelist of users and groups who can read timeline 
data.

YARN Client integrates with ATSv2. -> YARN Client has been integrated with 
ATSv2.

This enables fetching application/attempt/container
report from TimelineReader if details not present in ResouceManager. -> This 
enables fetching application/attempt/container
report from TimelineReader if details are not present in ResouceManager.

It set true -> If set true

 

Since Yarn Cli support has been added, should we remove this line: Currently 
there is no support for command line access.

> Update ATSv2 document for latest feature supports
> -
>
> Key: YARN-9804
> URL: https://issues.apache.org/jira/browse/YARN-9804
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Blocker
> Attachments: YARN-9804.01.patch
>
>
> Revisit ATSv2 documents and update for GA features. And also for the road map.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9400) Remove unnecessary if at EntityGroupFSTimelineStore#parseApplicationId

2019-09-01 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920624#comment-16920624
 ] 

Abhishek Modi commented on YARN-9400:
-

Thanks [~Prabhu Joseph]. lgtm. will commit to trunk.

> Remove unnecessary if at EntityGroupFSTimelineStore#parseApplicationId
> --
>
> Key: YARN-9400
> URL: https://issues.apache.org/jira/browse/YARN-9400
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: YARN-9400-001.patch
>
>
> If clause to validate whether appIdStr starts with "application" is not 
> required at EntityGroupFSTimelineStore#parseApplicationId
> {code}
>  // converts the String to an ApplicationId or null if conversion failed
>   private static ApplicationId parseApplicationId(String appIdStr) {
> ApplicationId appId = null;
> if (appIdStr.startsWith(ApplicationId.appIdStrPrefix)) {
>   try {
> appId = ApplicationId.fromString(appIdStr);
>   } catch (IllegalArgumentException e) {
> appId = null;
>   }
> }
> return appId;
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8174) Add containerId to ResourceLocalizationService fetch failure log statement

2019-09-01 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920619#comment-16920619
 ] 

Abhishek Modi commented on YARN-8174:
-

v3 patch lgtm. Committed to trunk.

> Add containerId to ResourceLocalizationService fetch failure log statement
> --
>
> Key: YARN-8174
> URL: https://issues.apache.org/jira/browse/YARN-8174
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: YARN-8174.1.patch, YARN-8174.2.patch, YARN-8174.3.patch
>
>
> When a localization for a resource failed due to change in timestamp, there 
> is no containerId logged to correlate.
> {code}
> 2018-04-18 07:31:46,033 WARN  localizer.ResourceLocalizationService 
> (ResourceLocalizationService.java:processHeartbeat(1017)) - { 
> hdfs://tarunhdp-1.openstacklocal:8020/user/ambari-qa/.staging/job_1523550428406_0016/job.splitmetainfo,
>  1524036694502, FILE, null } failed: Resource 
> hdfs://tarunhdp-1.openstacklocal:8020/user/ambari-qa/.staging/job_1523550428406_0016/job.splitmetainfo
>  changed on src filesystem (expected 1524036694502, was 1524036694502
> java.io.IOException: Resource 
> hdfs://tarunhdp-1.openstacklocal:8020/user/ambari-qa/.staging/job_1523550428406_0016/job.splitmetainfo
>  changed on src filesystem (expected 1524036694502, was 1524036694502
> at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:258)
> at 
> org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:362)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:360)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:360)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   >