[jira] [Commented] (YARN-9941) Opportunistic scheduler metrics should be reset during fail-over.
[ https://issues.apache.org/jira/browse/YARN-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161710#comment-17161710 ] Abhishek Modi commented on YARN-9941: - Sure [~BilwaST]. Feel free to take over. Thanks > Opportunistic scheduler metrics should be reset during fail-over. > - > > Key: YARN-9941 > URL: https://issues.apache.org/jira/browse/YARN-9941 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8859) Add audit logs for router service
[ https://issues.apache.org/jira/browse/YARN-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi reassigned YARN-8859: --- Assignee: Minni Mittal (was: Abhishek Modi) > Add audit logs for router service > - > > Key: YARN-8859 > URL: https://issues.apache.org/jira/browse/YARN-8859 > Project: Hadoop YARN > Issue Type: Sub-task > Components: router >Reporter: Bibin Chundatt >Assignee: Minni Mittal >Priority: Major > > Similar to all other yarn services. > RouterClientRMService and RouterWebServices api/rest call should have Audit > logging. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8529) Add timeout to RouterWebServiceUtil#invokeRMWebService
[ https://issues.apache.org/jira/browse/YARN-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi reassigned YARN-8529: --- Assignee: Minni Mittal (was: Giovanni Matteo Fumarola) > Add timeout to RouterWebServiceUtil#invokeRMWebService > -- > > Key: YARN-8529 > URL: https://issues.apache.org/jira/browse/YARN-8529 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-8529.v1.patch, YARN-8529.v2.patch, > YARN-8529.v3.patch > > > {{RouterWebServiceUtil#invokeRMWebService}} currently has a fixed timeout. > This should be configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7898) [FederationStateStore] Create a proxy chain for FederationStateStore API in the Router
[ https://issues.apache.org/jira/browse/YARN-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi reassigned YARN-7898: --- Assignee: Minni Mittal (was: Giovanni Matteo Fumarola) > [FederationStateStore] Create a proxy chain for FederationStateStore API in > the Router > -- > > Key: YARN-7898 > URL: https://issues.apache.org/jira/browse/YARN-7898 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Minni Mittal >Priority: Major > Attachments: StateStoreProxy StressTest.jpg, > YARN-7898-YARN-7402.proto.patch, YARN-7898-YARN-7402.v1.patch, > YARN-7898-YARN-7402.v2.patch, YARN-7898-YARN-7402.v3.patch, > YARN-7898-YARN-7402.v4.patch, YARN-7898-YARN-7402.v5.patch, > YARN-7898-YARN-7402.v6.patch > > > As detailed in the proposal in the umbrella JIRA, we are introducing a new > component that routes client request to appropriate FederationStateStore. > This JIRA tracks the creation of a proxy for FederationStateStore in the > Router. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10080) Support show app id on localizer thread pool
[ https://issues.apache.org/jira/browse/YARN-10080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011518#comment-17011518 ] Abhishek Modi commented on YARN-10080: -- Thanks [~cane] for working on this. Changes looks good to me. I will wait for jenkins result. > Support show app id on localizer thread pool > > > Key: YARN-10080 > URL: https://issues.apache.org/jira/browse/YARN-10080 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-10080-001.patch > > > Currently when we are troubleshooting a container localizer issue, if we want > to analyze the jstack with thread detail, we can not figure out which thread > is processing the given container. So i want to add app id on the thread name -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5542) Scheduling of opportunistic containers
[ https://issues.apache.org/jira/browse/YARN-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011357#comment-17011357 ] Abhishek Modi commented on YARN-5542: - [~brahmareddy] [~kkaranasos] - I have moved remaining open jiras to YARN-10079. Should we close this Jira now as all the sub tasks under it are completed? > Scheduling of opportunistic containers > -- > > Key: YARN-5542 > URL: https://issues.apache.org/jira/browse/YARN-5542 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Konstantinos Karanasos >Priority: Major > > This JIRA groups all efforts related to the scheduling of opportunistic > containers. > It includes the scheduling of opportunistic container through the central RM > (YARN-5220), through distributed scheduling (YARN-2877), as well as the > scheduling of containers based on actual node utilization (YARN-1011) and the > container promotion/demotion (YARN-5085). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2886) Estimating waiting time in NM container queues
[ https://issues.apache.org/jira/browse/YARN-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-2886: Parent Issue: YARN-10079 (was: YARN-5542) > Estimating waiting time in NM container queues > -- > > Key: YARN-2886 > URL: https://issues.apache.org/jira/browse/YARN-2886 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos >Priority: Major > > This JIRA is about estimating the waiting time of each NM queue. > Having these estimates is crucial for the distributed scheduling of container > requests, as it allows the LocalRM to decide in which NMs to queue the > queuable container requests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7604) Fix some minor typos in the opportunistic container logging
[ https://issues.apache.org/jira/browse/YARN-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-7604: Parent Issue: YARN-10079 (was: YARN-5542) > Fix some minor typos in the opportunistic container logging > --- > > Key: YARN-7604 > URL: https://issues.apache.org/jira/browse/YARN-7604 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.9.0 >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Trivial > Attachments: YARN-7604.01.patch > > > Fix some minor text issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5414) Integrate NodeQueueLoadMonitor with ClusterNodeTracker
[ https://issues.apache.org/jira/browse/YARN-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-5414: Parent Issue: YARN-10079 (was: YARN-5542) > Integrate NodeQueueLoadMonitor with ClusterNodeTracker > -- > > Key: YARN-5414 > URL: https://issues.apache.org/jira/browse/YARN-5414 > Project: Hadoop YARN > Issue Type: Sub-task > Components: container-queuing, distributed-scheduling, scheduler >Reporter: Arun Suresh >Assignee: Abhishek Modi >Priority: Major > > The {{ClusterNodeTracker}} tracks the states of clusterNodes and provides > convenience methods like sort and filter. > The {{NodeQueueLoadMonitor}} should use the {{ClusterNodeTracker}} instead of > maintaining its own data-structure of node information. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5688) Make allocation of opportunistic containers asynchronous
[ https://issues.apache.org/jira/browse/YARN-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-5688: Parent Issue: YARN-10079 (was: YARN-5542) > Make allocation of opportunistic containers asynchronous > > > Key: YARN-5688 > URL: https://issues.apache.org/jira/browse/YARN-5688 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Abhishek Modi >Priority: Major > > In the current implementation of the > {{OpportunisticContainerAllocatorAMService}}, we synchronously perform the > allocation of opportunistic containers. This results in "blocking" the > service at the RM when scheduling the opportunistic containers. > The {{OpportunisticContainerAllocator}} should instead asynchronously run as > a separate thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9941) Opportunistic scheduler metrics should be reset during fail-over.
[ https://issues.apache.org/jira/browse/YARN-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9941: Parent Issue: YARN-10079 (was: YARN-5542) > Opportunistic scheduler metrics should be reset during fail-over. > - > > Key: YARN-9941 > URL: https://issues.apache.org/jira/browse/YARN-9941 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9390) Add support for configurable Resource Calculator in Opportunistic Scheduler.
[ https://issues.apache.org/jira/browse/YARN-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9390: Parent Issue: YARN-10079 (was: YARN-5542) > Add support for configurable Resource Calculator in Opportunistic Scheduler. > > > Key: YARN-9390 > URL: https://issues.apache.org/jira/browse/YARN-9390 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9390.001.patch > > > Right now, Opportunistic scheduler uses hard coded DominantResourceCalculator > and there is no option to change it to other resource calculators. This Jira > is to make resource calculator configurable for Opportunistic scheduler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10079) Scheduling of opportunistic containers - Phase 2
Abhishek Modi created YARN-10079: Summary: Scheduling of opportunistic containers - Phase 2 Key: YARN-10079 URL: https://issues.apache.org/jira/browse/YARN-10079 Project: Hadoop YARN Issue Type: New Feature Reporter: Abhishek Modi This JIRA groups all efforts related to the improvements of scheduling of opportunistic containers. Phase 1 of this was done as part of YARN-5542. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5542) Scheduling of opportunistic containers
[ https://issues.apache.org/jira/browse/YARN-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011350#comment-17011350 ] Abhishek Modi commented on YARN-5542: - [~kkaranasos] [~brahmareddy] I will move remaining open jiras to new Jira and then we should be good to close this. We have completed current set of improvements. > Scheduling of opportunistic containers > -- > > Key: YARN-5542 > URL: https://issues.apache.org/jira/browse/YARN-5542 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Konstantinos Karanasos >Priority: Major > > This JIRA groups all efforts related to the scheduling of opportunistic > containers. > It includes the scheduling of opportunistic container through the central RM > (YARN-5220), through distributed scheduling (YARN-2877), as well as the > scheduling of containers based on actual node utilization (YARN-1011) and the > container promotion/demotion (YARN-5085). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10040) DistributedShell test failure on X86 and ARM
[ https://issues.apache.org/jira/browse/YARN-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006070#comment-17006070 ] Abhishek Modi commented on YARN-10040: -- I looked into it and Opportunistic Scheduling is working fine. I checked running distributed shell jobs locally with opportunistic containers and they are also getting completed. The issue is "yarn.nodemanager.opportunistic-containers-max-queue-length" is not being set while bringing up nodemanager in the test and thus no nodemanager is accepting the Opportunistic container. Prior to YARN-9697, NM would accept Opportunistic containers even if max opportunistic containers allowed on them is set to 0. That's why these tests were passing before. I will provide a patch for fixing these tests. > DistributedShell test failure on X86 and ARM > > > Key: YARN-10040 > URL: https://issues.apache.org/jira/browse/YARN-10040 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell > Environment: X86/ARM > OS: ubuntu1804 > Java 8 >Reporter: zhao bo >Assignee: Abhishek Modi >Priority: Major > > * > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers > * > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithEnforceExecutionType > Please see the Apache Jenkins Test result: > [https://builds.apache.org/job/hadoop-multibranch/job/PR-1767/1/testReport/] > > These 2 tests are failed on both X86 and ARM platform. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10040) DistributedShell test failure on X86 and ARM
[ https://issues.apache.org/jira/browse/YARN-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi reassigned YARN-10040: Assignee: Abhishek Modi > DistributedShell test failure on X86 and ARM > > > Key: YARN-10040 > URL: https://issues.apache.org/jira/browse/YARN-10040 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell > Environment: X86/ARM > OS: ubuntu1804 > Java 8 >Reporter: zhao bo >Assignee: Abhishek Modi >Priority: Major > > * > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers > * > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithEnforceExecutionType > Please see the Apache Jenkins Test result: > [https://builds.apache.org/job/hadoop-multibranch/job/PR-1767/1/testReport/] > > These 2 tests are failed on both X86 and ARM platform. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10040) DistributedShell test failure on X86 and ARM
[ https://issues.apache.org/jira/browse/YARN-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005953#comment-17005953 ] Abhishek Modi commented on YARN-10040: -- Thanks [~ayushtkn] [~bzhaoopenstack]. I will take a look. > DistributedShell test failure on X86 and ARM > > > Key: YARN-10040 > URL: https://issues.apache.org/jira/browse/YARN-10040 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell > Environment: X86/ARM > OS: ubuntu1804 > Java 8 >Reporter: zhao bo >Priority: Major > > * > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers > * > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithEnforceExecutionType > Please see the Apache Jenkins Test result: > [https://builds.apache.org/job/hadoop-multibranch/job/PR-1767/1/testReport/] > > These 2 tests are failed on both X86 and ARM platform. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9990) Testcase fails with "Insufficient configured threads: required=16 < max=10"
[ https://issues.apache.org/jira/browse/YARN-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984753#comment-16984753 ] Abhishek Modi commented on YARN-9990: - Committed to trunk. Thanks [~prabhujoseph] for the patch. > Testcase fails with "Insufficient configured threads: required=16 < max=10" > --- > > Key: YARN-9990 > URL: https://issues.apache.org/jira/browse/YARN-9990 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9990-001.patch > > > Testcase fails with "Insufficient configured threads: required=16 < max=10". > Below testcases failing > 1. TestWebAppProxyServlet > 2. TestAmFilter > 3. TestApiServiceClient > 4. TestSecureApiServiceClient > {code} > [ERROR] org.apache.hadoop.yarn.server.webproxy.TestWebAppProxyServlet Time > elapsed: 0.396 s <<< ERROR! > java.lang.IllegalStateException: Insufficient configured threads: required=16 > < max=10 for > QueuedThreadPool[qtp1597249648]@5f341870{STARTED,8<=8<=10,i=8,r=1,q=0}[ReservedThreadExecutor@4c762604{s=0/1,p=0}] > at > org.eclipse.jetty.util.thread.ThreadPoolBudget.check(ThreadPoolBudget.java:156) > at > org.eclipse.jetty.util.thread.ThreadPoolBudget.leaseTo(ThreadPoolBudget.java:130) > at > org.eclipse.jetty.util.thread.ThreadPoolBudget.leaseFrom(ThreadPoolBudget.java:182) > at > org.eclipse.jetty.io.SelectorManager.doStart(SelectorManager.java:255) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72) > at > org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169) > at > org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:110) > at > org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:283) > at > org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:81) > at > org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:231) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72) > at org.eclipse.jetty.server.Server.doStart(Server.java:385) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72) > at > org.apache.hadoop.yarn.server.webproxy.TestWebAppProxyServlet.start(TestWebAppProxyServlet.java:102) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > [INFO] Running org.apache.hadoop.yarn.server.webproxy.amfilter.TestAmFilter > [ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.326 > s <<< FAILURE! - in > org.apache.hadoop.yarn.server.webproxy.amfilter.TestAmFilter > [ERROR] > testFindRedirectUrl(org.apache.hadoop.yarn.server.webproxy.amfilter.TestAmFilter) > Time elapsed: 0.306 s <<< ERROR! > java.lang.IllegalStateException: Insufficient configured threads: required=16 > < max=10 for > QueuedThreadPool[qtp485041780]@1ce92674{STARTED,8<=8<=10,i=8,r=1,q=0}[ReservedThreadExecutor@31f924f5{s=0/1,p=0}] > at > org.
[jira] [Commented] (YARN-9990) Testcase fails with "Insufficient configured threads: required=16 < max=10"
[ https://issues.apache.org/jira/browse/YARN-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16983526#comment-16983526 ] Abhishek Modi commented on YARN-9990: - Thanks [~prabhujoseph] for the patch. LGTM. Will commit it shortly. > Testcase fails with "Insufficient configured threads: required=16 < max=10" > --- > > Key: YARN-9990 > URL: https://issues.apache.org/jira/browse/YARN-9990 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9990-001.patch > > > Testcase fails with "Insufficient configured threads: required=16 < max=10". > Below testcases failing > 1. TestWebAppProxyServlet > 2. TestAmFilter > 3. TestApiServiceClient > 4. TestSecureApiServiceClient > {code} > [ERROR] org.apache.hadoop.yarn.server.webproxy.TestWebAppProxyServlet Time > elapsed: 0.396 s <<< ERROR! > java.lang.IllegalStateException: Insufficient configured threads: required=16 > < max=10 for > QueuedThreadPool[qtp1597249648]@5f341870{STARTED,8<=8<=10,i=8,r=1,q=0}[ReservedThreadExecutor@4c762604{s=0/1,p=0}] > at > org.eclipse.jetty.util.thread.ThreadPoolBudget.check(ThreadPoolBudget.java:156) > at > org.eclipse.jetty.util.thread.ThreadPoolBudget.leaseTo(ThreadPoolBudget.java:130) > at > org.eclipse.jetty.util.thread.ThreadPoolBudget.leaseFrom(ThreadPoolBudget.java:182) > at > org.eclipse.jetty.io.SelectorManager.doStart(SelectorManager.java:255) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72) > at > org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169) > at > org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:110) > at > org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:283) > at > org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:81) > at > org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:231) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72) > at org.eclipse.jetty.server.Server.doStart(Server.java:385) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72) > at > org.apache.hadoop.yarn.server.webproxy.TestWebAppProxyServlet.start(TestWebAppProxyServlet.java:102) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > [INFO] Running org.apache.hadoop.yarn.server.webproxy.amfilter.TestAmFilter > [ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.326 > s <<< FAILURE! - in > org.apache.hadoop.yarn.server.webproxy.amfilter.TestAmFilter > [ERROR] > testFindRedirectUrl(org.apache.hadoop.yarn.server.webproxy.amfilter.TestAmFilter) > Time elapsed: 0.306 s <<< ERROR! > java.lang.IllegalStateException: Insufficient configured threads: required=16 > < max=10 for > QueuedThreadPool[qtp485041780]@1ce92674{STARTED,8<=8<=10,i=8,r=1,q=0}[ReservedThreadExecutor@31f924f5{s=0/1,p=0}] > at > org.eclipse.jetty.util.th
[jira] [Commented] (YARN-9965) Fix NodeManager failing to start on subsequent times when Hdfs Auxillary Jar is set
[ https://issues.apache.org/jira/browse/YARN-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16977073#comment-16977073 ] Abhishek Modi commented on YARN-9965: - Thanks [~prabhujoseph]. Latest addendum patch looks good to me. Committed it to trunk. > Fix NodeManager failing to start on subsequent times when Hdfs Auxillary Jar > is set > --- > > Key: YARN-9965 > URL: https://issues.apache.org/jira/browse/YARN-9965 > Project: Hadoop YARN > Issue Type: Bug > Components: auxservices, nodemanager >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9965-001.patch, YARN-9965-addendum-01.patch > > > Loading an auxiliary jar from a Hdfs location on a node manager works as > expected on first time. The subsequent restart fails with > ClassNotFoundException > {code:java} > 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: > classpath: [] > 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: > system classes: [java., javax.accessibility., javax.activation., > javax.activity., javax.annotation., javax.annotation.processing., > javax.crypto., javax.imageio., javax.jws., javax.lang.model., > -javax.management.j2ee., javax.management., javax.naming., javax.net., > javax.print., javax.rmi., javax.script., -javax.security.auth.message., > javax.security.auth., javax.security.cert., javax.security.sasl., > javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., > -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., > org.xml.sax., org.apache.commons.logging., org.apache.log4j., > -org.apache.hadoop.hbase., org.apache.hadoop., core-default.xml, > hdfs-default.xml, mapred-default.xml, yarn-default.xml] > 2019-11-08 03:59:49,257 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed > in state INITED > java.lang.ClassNotFoundException: org.apache.auxtest.AuxServiceFromHDFS > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:270) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:321) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:478) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:936) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1016) > {code} > > The issue happens when reusing the previous localized auxillary service jar. > The localized jar file is appended with /* when reusing which has caused the > issue. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9965) Fix NodeManager failing to start on subsequent times when Hdfs Auxillary Jar is set
[ https://issues.apache.org/jira/browse/YARN-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974033#comment-16974033 ] Abhishek Modi commented on YARN-9965: - Thanks [~prabhujoseph] - I will review it by today. > Fix NodeManager failing to start on subsequent times when Hdfs Auxillary Jar > is set > --- > > Key: YARN-9965 > URL: https://issues.apache.org/jira/browse/YARN-9965 > Project: Hadoop YARN > Issue Type: Bug > Components: auxservices, nodemanager >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9965-001.patch, YARN-9965-addendum-01.patch > > > Loading an auxiliary jar from a Hdfs location on a node manager works as > expected on first time. The subsequent restart fails with > ClassNotFoundException > {code:java} > 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: > classpath: [] > 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: > system classes: [java., javax.accessibility., javax.activation., > javax.activity., javax.annotation., javax.annotation.processing., > javax.crypto., javax.imageio., javax.jws., javax.lang.model., > -javax.management.j2ee., javax.management., javax.naming., javax.net., > javax.print., javax.rmi., javax.script., -javax.security.auth.message., > javax.security.auth., javax.security.cert., javax.security.sasl., > javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., > -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., > org.xml.sax., org.apache.commons.logging., org.apache.log4j., > -org.apache.hadoop.hbase., org.apache.hadoop., core-default.xml, > hdfs-default.xml, mapred-default.xml, yarn-default.xml] > 2019-11-08 03:59:49,257 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed > in state INITED > java.lang.ClassNotFoundException: org.apache.auxtest.AuxServiceFromHDFS > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:270) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:321) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:478) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:936) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1016) > {code} > > The issue happens when reusing the previous localized auxillary service jar. > The localized jar file is appended with /* when reusing which has caused the > issue. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972296#comment-16972296 ] Abhishek Modi commented on YARN-9697: - Thanks [~bibinchundatt] and [~elgoiri] for review. Committed to trunk. > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.001.patch, YARN-9697.002.patch, > YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.005.patch, > YARN-9697.006.patch, YARN-9697.007.patch, YARN-9697.008.patch, > YARN-9697.009.patch, YARN-9697.ut.patch, YARN-9697.ut2.patch, > YARN-9697.wip1.patch, YARN-9697.wip2.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9965) Fix NodeManager failing to start when Hdfs Auxillary Jar is set
[ https://issues.apache.org/jira/browse/YARN-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972086#comment-16972086 ] Abhishek Modi commented on YARN-9965: - [~vinodkv] - I think that's a mistake from my end - I should have enforced it to write an UT before committing it. Since this was a very minor fix and I also tested it - I went ahead with the commit. Should I create a separate Jira for writing UT for this - or should we add an addendum patch here only with the UT? > Fix NodeManager failing to start when Hdfs Auxillary Jar is set > --- > > Key: YARN-9965 > URL: https://issues.apache.org/jira/browse/YARN-9965 > Project: Hadoop YARN > Issue Type: Bug > Components: auxservices, nodemanager >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9965-001.patch > > > Loading an auxiliary jar from a Hdfs location on a node manager works as > expected on first time. The subsequent restart fails with > ClassNotFoundException > {code:java} > 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: > classpath: [] > 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: > system classes: [java., javax.accessibility., javax.activation., > javax.activity., javax.annotation., javax.annotation.processing., > javax.crypto., javax.imageio., javax.jws., javax.lang.model., > -javax.management.j2ee., javax.management., javax.naming., javax.net., > javax.print., javax.rmi., javax.script., -javax.security.auth.message., > javax.security.auth., javax.security.cert., javax.security.sasl., > javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., > -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., > org.xml.sax., org.apache.commons.logging., org.apache.log4j., > -org.apache.hadoop.hbase., org.apache.hadoop., core-default.xml, > hdfs-default.xml, mapred-default.xml, yarn-default.xml] > 2019-11-08 03:59:49,257 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed > in state INITED > java.lang.ClassNotFoundException: org.apache.auxtest.AuxServiceFromHDFS > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:270) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:321) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:478) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:936) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1016) > {code} > > The issue happens when reusing the previous localized auxillary service jar. > The localized jar file is appended with /* when reusing which has caused the > issue. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9697: Attachment: YARN-9697.009.patch > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.001.patch, YARN-9697.002.patch, > YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.005.patch, > YARN-9697.006.patch, YARN-9697.007.patch, YARN-9697.008.patch, > YARN-9697.009.patch, YARN-9697.ut.patch, YARN-9697.ut2.patch, > YARN-9697.wip1.patch, YARN-9697.wip2.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971480#comment-16971480 ] Abhishek Modi commented on YARN-9697: - Thanks [~bibinchundatt] for the review. {code:java} private int numNodesForAnyAllocation = DEFAULT_OPP_CONTAINER_ALLOCATION_NODES_NUMBER_USED; {code} This is being used in another constructor that is being used in the test cases. Apart from that I have addressed all other comments in Yarn-9697.009.patch. > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.001.patch, YARN-9697.002.patch, > YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.005.patch, > YARN-9697.006.patch, YARN-9697.007.patch, YARN-9697.008.patch, > YARN-9697.009.patch, YARN-9697.ut.patch, YARN-9697.ut2.patch, > YARN-9697.wip1.patch, YARN-9697.wip2.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9965) Fix NodeManager failing to start when Hdfs Auxillary Jar is set
[ https://issues.apache.org/jira/browse/YARN-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971356#comment-16971356 ] Abhishek Modi commented on YARN-9965: - Thanks [~prabhujoseph] for working on this. LGTM. I will commit it shortly. > Fix NodeManager failing to start when Hdfs Auxillary Jar is set > --- > > Key: YARN-9965 > URL: https://issues.apache.org/jira/browse/YARN-9965 > Project: Hadoop YARN > Issue Type: Bug > Components: auxservices, nodemanager >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9965-001.patch > > > Loading an auxiliary jar from a Hdfs location on a node manager works as > expected on first time. The subsequent restart fails with > ClassNotFoundException > {code:java} > 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: > classpath: [] > 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: > system classes: [java., javax.accessibility., javax.activation., > javax.activity., javax.annotation., javax.annotation.processing., > javax.crypto., javax.imageio., javax.jws., javax.lang.model., > -javax.management.j2ee., javax.management., javax.naming., javax.net., > javax.print., javax.rmi., javax.script., -javax.security.auth.message., > javax.security.auth., javax.security.cert., javax.security.sasl., > javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., > -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., > org.xml.sax., org.apache.commons.logging., org.apache.log4j., > -org.apache.hadoop.hbase., org.apache.hadoop., core-default.xml, > hdfs-default.xml, mapred-default.xml, yarn-default.xml] > 2019-11-08 03:59:49,257 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed > in state INITED > java.lang.ClassNotFoundException: org.apache.auxtest.AuxServiceFromHDFS > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:270) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:321) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:478) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:936) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1016) > {code} > > The issue happens when reusing the previous localized auxillary service jar. > The localized jar file is appended with /* when reusing which has caused the > issue. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962868#comment-16962868 ] Abhishek Modi commented on YARN-9697: - Filed YARN-9941 for fixing Opportunistic scheduler metrics during fail-over. > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.001.patch, YARN-9697.002.patch, > YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.005.patch, > YARN-9697.006.patch, YARN-9697.007.patch, YARN-9697.008.patch, > YARN-9697.ut.patch, YARN-9697.ut2.patch, YARN-9697.wip1.patch, > YARN-9697.wip2.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9941) Opportunistic scheduler metrics should be reset during fail-over.
Abhishek Modi created YARN-9941: --- Summary: Opportunistic scheduler metrics should be reset during fail-over. Key: YARN-9941 URL: https://issues.apache.org/jira/browse/YARN-9941 Project: Hadoop YARN Issue Type: Sub-task Reporter: Abhishek Modi Assignee: Abhishek Modi -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2442) ResourceManager JMX UI does not give HA State
[ https://issues.apache.org/jira/browse/YARN-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961868#comment-16961868 ] Abhishek Modi commented on YARN-2442: - Thanks [~cyrusjackson25] for the patch and [~bibinchundatt] for additional review. Committed it to trunk. > ResourceManager JMX UI does not give HA State > - > > Key: YARN-2442 > URL: https://issues.apache.org/jira/browse/YARN-2442 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0, 2.6.0, 2.7.0 >Reporter: Nishan Shetty >Assignee: Rohith Sharma K S >Priority: Major > Labels: oct16-easy > Attachments: 0001-YARN-2442.patch, YARN-2442.003.patch, > YARN-2442.004.patch, YARN-2442.02.patch > > > ResourceManager JMX UI can show the haState (INITIALIZING, ACTIVE, STANDBY, > STOPPED) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957250#comment-16957250 ] Abhishek Modi commented on YARN-9697: - Thanks [~bibinchundatt] for the review. I have addressed most of the review comments in v8 patch. For {quote}OpportunisticSchedulerMetrics shouldn't we be having a destroy() method to reset the counters. During switch over i think we should reset the counters {quote} I will file a separate jira. > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.001.patch, YARN-9697.002.patch, > YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.005.patch, > YARN-9697.006.patch, YARN-9697.007.patch, YARN-9697.008.patch, > YARN-9697.ut.patch, YARN-9697.ut2.patch, YARN-9697.wip1.patch, > YARN-9697.wip2.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9697: Attachment: YARN-9697.008.patch > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.001.patch, YARN-9697.002.patch, > YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.005.patch, > YARN-9697.006.patch, YARN-9697.007.patch, YARN-9697.008.patch, > YARN-9697.ut.patch, YARN-9697.ut2.patch, YARN-9697.wip1.patch, > YARN-9697.wip2.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9910) Make private localizer download resources in parallel
Abhishek Modi created YARN-9910: --- Summary: Make private localizer download resources in parallel Key: YARN-9910 URL: https://issues.apache.org/jira/browse/YARN-9910 Project: Hadoop YARN Issue Type: Improvement Reporter: Abhishek Modi Assignee: Abhishek Modi Currently private localizer uses a single threaded pool to do the localization. As part of this jira, private localizer will create a fixed threadpool of configurable number of threads for localization. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9697: Attachment: YARN-9697.007.patch > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.001.patch, YARN-9697.002.patch, > YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.005.patch, > YARN-9697.006.patch, YARN-9697.007.patch, YARN-9697.ut.patch, > YARN-9697.ut2.patch, YARN-9697.wip1.patch, YARN-9697.wip2.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9697: Attachment: YARN-9697.006.patch > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.001.patch, YARN-9697.002.patch, > YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.005.patch, > YARN-9697.006.patch, YARN-9697.ut.patch, YARN-9697.ut2.patch, > YARN-9697.wip1.patch, YARN-9697.wip2.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9908) Make ZK calls in parallel to load application states.
Abhishek Modi created YARN-9908: --- Summary: Make ZK calls in parallel to load application states. Key: YARN-9908 URL: https://issues.apache.org/jira/browse/YARN-9908 Project: Hadoop YARN Issue Type: Improvement Reporter: Abhishek Modi Assignee: Abhishek Modi At present, all the application states are fetched linearly from ZooKeeper. This can be optimized by using a threadpool to fetch application states from Zookeeper. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9697: Attachment: YARN-9697.005.patch > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.001.patch, YARN-9697.002.patch, > YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.005.patch, > YARN-9697.ut.patch, YARN-9697.ut2.patch, YARN-9697.wip1.patch, > YARN-9697.wip2.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944825#comment-16944825 ] Abhishek Modi commented on YARN-9697: - Thanks [~elgoiri] for review. I have addressed all the comments except cleanup of loop in CentralizedOpportunisticContainerAllocator. I will further look into that. Could you please review YARN-9697.004.patch. Thanks. > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.001.patch, YARN-9697.002.patch, > YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.ut.patch, > YARN-9697.ut2.patch, YARN-9697.wip1.patch, YARN-9697.wip2.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9697: Attachment: YARN-9697.004.patch > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.001.patch, YARN-9697.002.patch, > YARN-9697.003.patch, YARN-9697.004.patch, YARN-9697.ut.patch, > YARN-9697.ut2.patch, YARN-9697.wip1.patch, YARN-9697.wip2.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9782) Avoid DNS resolution while running SLS.
[ https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944342#comment-16944342 ] Abhishek Modi commented on YARN-9782: - Thanks [~elgoiri] for review. Committed to trunk. > Avoid DNS resolution while running SLS. > --- > > Key: YARN-9782 > URL: https://issues.apache.org/jira/browse/YARN-9782 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9782.001.patch, YARN-9782.002.patch, > YARN-9782.003.patch, YARN-9782.004.patch, YARN-9782.005.patch > > > In SLS, we add nodes with random names and rack. DNS resolution of these > nodes takes around 2 seconds because it will timeout after that. This makes > the result of SLS unreliable and adds spikes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9697: Attachment: YARN-9697.003.patch > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.001.patch, YARN-9697.002.patch, > YARN-9697.003.patch, YARN-9697.ut.patch, YARN-9697.ut2.patch, > YARN-9697.wip1.patch, YARN-9697.wip2.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943809#comment-16943809 ] Abhishek Modi commented on YARN-9697: - [~elgoiri] could you please review Yarn-9697.002.patch whenever you get time. Thanks. > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.001.patch, YARN-9697.002.patch, > YARN-9697.ut.patch, YARN-9697.ut2.patch, YARN-9697.wip1.patch, > YARN-9697.wip2.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9697: Attachment: YARN-9697.002.patch > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.001.patch, YARN-9697.002.patch, > YARN-9697.ut.patch, YARN-9697.ut2.patch, YARN-9697.wip1.patch, > YARN-9697.wip2.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9697: Attachment: YARN-9697.001.patch > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.001.patch, YARN-9697.ut.patch, > YARN-9697.ut2.patch, YARN-9697.wip1.patch, YARN-9697.wip2.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9870) Remove unused function from OpportunisticContainerAllocatorAMService
[ https://issues.apache.org/jira/browse/YARN-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942849#comment-16942849 ] Abhishek Modi commented on YARN-9870: - Thanks [~elgoiri] for review. Committed to trunk. > Remove unused function from OpportunisticContainerAllocatorAMService > > > Key: YARN-9870 > URL: https://issues.apache.org/jira/browse/YARN-9870 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Minor > Attachments: YARN-9870.001.patch, YARN-9870.002.patch > > > Code clean up of OpportunisticContainerAllocatorAMService and removal of > unused functions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9870) Remove unused function from OpportunisticContainerAllocatorAMService
[ https://issues.apache.org/jira/browse/YARN-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9870: Attachment: YARN-9870.002.patch > Remove unused function from OpportunisticContainerAllocatorAMService > > > Key: YARN-9870 > URL: https://issues.apache.org/jira/browse/YARN-9870 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Minor > Attachments: YARN-9870.001.patch, YARN-9870.002.patch > > > Code clean up of OpportunisticContainerAllocatorAMService and removal of > unused functions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9782) Avoid DNS resolution while running SLS.
[ https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9782: Attachment: YARN-9782.005.patch > Avoid DNS resolution while running SLS. > --- > > Key: YARN-9782 > URL: https://issues.apache.org/jira/browse/YARN-9782 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9782.001.patch, YARN-9782.002.patch, > YARN-9782.003.patch, YARN-9782.004.patch, YARN-9782.005.patch > > > In SLS, we add nodes with random names and rack. DNS resolution of these > nodes takes around 2 seconds because it will timeout after that. This makes > the result of SLS unreliable and adds spikes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9870) Remove unused function from OpportunisticContainerAllocatorAMService
[ https://issues.apache.org/jira/browse/YARN-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941776#comment-16941776 ] Abhishek Modi commented on YARN-9870: - [~elgoiri] could you please review it. Thanks. > Remove unused function from OpportunisticContainerAllocatorAMService > > > Key: YARN-9870 > URL: https://issues.apache.org/jira/browse/YARN-9870 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Minor > Attachments: YARN-9870.001.patch > > > Code clean up of OpportunisticContainerAllocatorAMService and removal of > unused functions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9870) Remove unused function from OpportunisticContainerAllocatorAMService
Abhishek Modi created YARN-9870: --- Summary: Remove unused function from OpportunisticContainerAllocatorAMService Key: YARN-9870 URL: https://issues.apache.org/jira/browse/YARN-9870 Project: Hadoop YARN Issue Type: Sub-task Reporter: Abhishek Modi Assignee: Abhishek Modi Code clean up of OpportunisticContainerAllocatorAMService and removal of unused functions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9859) Refactor OpportunisticContainerAllocator
[ https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941221#comment-16941221 ] Abhishek Modi commented on YARN-9859: - Thanks [~elgoiri] for review. Committed to trunk. > Refactor OpportunisticContainerAllocator > > > Key: YARN-9859 > URL: https://issues.apache.org/jira/browse/YARN-9859 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9859.001.patch, YARN-9859.002.patch, > YARN-9859.003.patch > > > Right now OpportunisticContainerAllocator is written mainly for Distributed > Scheduling and schedules Opportunistic containers on limited set of nodes. As > part of this jira, we are going to make OpportunisticContainerAllocator as an > abstract class and DistributedOpportunisticContainerAllocator as actual > implementation. This would be prerequisite for YARN-9697. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9782) Avoid DNS resolution while running SLS.
[ https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16940982#comment-16940982 ] Abhishek Modi commented on YARN-9782: - Thanks [~elgoiri] for the review. Attached 004 patch with fixes. I couldn't find "networkaddress.cache.ttl" defined as constant string in any of the libs. So I made them as const in SLSRunner. Please review it whenever you get some time. Thanks. > Avoid DNS resolution while running SLS. > --- > > Key: YARN-9782 > URL: https://issues.apache.org/jira/browse/YARN-9782 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9782.001.patch, YARN-9782.002.patch, > YARN-9782.003.patch, YARN-9782.004.patch > > > In SLS, we add nodes with random names and rack. DNS resolution of these > nodes takes around 2 seconds because it will timeout after that. This makes > the result of SLS unreliable and adds spikes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9782) Avoid DNS resolution while running SLS.
[ https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9782: Attachment: YARN-9782.004.patch > Avoid DNS resolution while running SLS. > --- > > Key: YARN-9782 > URL: https://issues.apache.org/jira/browse/YARN-9782 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9782.001.patch, YARN-9782.002.patch, > YARN-9782.003.patch, YARN-9782.004.patch > > > In SLS, we add nodes with random names and rack. DNS resolution of these > nodes takes around 2 seconds because it will timeout after that. This makes > the result of SLS unreliable and adds spikes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9859) Refactor OpportunisticContainerAllocator
[ https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16940955#comment-16940955 ] Abhishek Modi commented on YARN-9859: - Thanks [~elgoiri]. I have done all the suggested changes. Could you please review it. Thanks. > Refactor OpportunisticContainerAllocator > > > Key: YARN-9859 > URL: https://issues.apache.org/jira/browse/YARN-9859 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9859.001.patch, YARN-9859.002.patch, > YARN-9859.003.patch > > > Right now OpportunisticContainerAllocator is written mainly for Distributed > Scheduling and schedules Opportunistic containers on limited set of nodes. As > part of this jira, we are going to make OpportunisticContainerAllocator as an > abstract class and DistributedOpportunisticContainerAllocator as actual > implementation. This would be prerequisite for YARN-9697. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9859) Refactor OpportunisticContainerAllocator
[ https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9859: Attachment: YARN-9859.003.patch > Refactor OpportunisticContainerAllocator > > > Key: YARN-9859 > URL: https://issues.apache.org/jira/browse/YARN-9859 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9859.001.patch, YARN-9859.002.patch, > YARN-9859.003.patch > > > Right now OpportunisticContainerAllocator is written mainly for Distributed > Scheduling and schedules Opportunistic containers on limited set of nodes. As > part of this jira, we are going to make OpportunisticContainerAllocator as an > abstract class and DistributedOpportunisticContainerAllocator as actual > implementation. This would be prerequisite for YARN-9697. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9859) Refactor OpportunisticContainerAllocator
[ https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9859: Attachment: YARN-9859.002.patch > Refactor OpportunisticContainerAllocator > > > Key: YARN-9859 > URL: https://issues.apache.org/jira/browse/YARN-9859 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9859.001.patch, YARN-9859.002.patch > > > Right now OpportunisticContainerAllocator is written mainly for Distributed > Scheduling and schedules Opportunistic containers on limited set of nodes. As > part of this jira, we are going to make OpportunisticContainerAllocator as an > abstract class and DistributedOpportunisticContainerAllocator as actual > implementation. This would be prerequisite for YARN-9697. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9859) Refactor OpportunisticContainerAllocator
[ https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939113#comment-16939113 ] Abhishek Modi commented on YARN-9859: - Thanks [~elgoiri] for review. Changed the title of the jira. {quote}we should tune the indentation for 237 and adding extra indents to the following lines of the constructor. {quote} I checked the indentation and it seems to be correct to me. Could you please check it again and let me know if I am missing something. Attached v2 patch with rest of the fixes. > Refactor OpportunisticContainerAllocator > > > Key: YARN-9859 > URL: https://issues.apache.org/jira/browse/YARN-9859 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9859.001.patch, YARN-9859.002.patch > > > Right now OpportunisticContainerAllocator is written mainly for Distributed > Scheduling and schedules Opportunistic containers on limited set of nodes. As > part of this jira, we are going to make OpportunisticContainerAllocator as an > abstract class and DistributedOpportunisticContainerAllocator as actual > implementation. This would be prerequisite for YARN-9697. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9859) Refactor OpportunisticContainerAllocator
[ https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9859: Summary: Refactor OpportunisticContainerAllocator (was: Code cleanup of OpportunisticContainerAllocator) > Refactor OpportunisticContainerAllocator > > > Key: YARN-9859 > URL: https://issues.apache.org/jira/browse/YARN-9859 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9859.001.patch > > > Right now OpportunisticContainerAllocator is written mainly for Distributed > Scheduling and schedules Opportunistic containers on limited set of nodes. As > part of this jira, we are going to make OpportunisticContainerAllocator as an > abstract class and DistributedOpportunisticContainerAllocator as actual > implementation. This would be prerequisite for YARN-9697. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9859) Code cleanup of OpportunisticContainerAllocator
[ https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16938613#comment-16938613 ] Abhishek Modi commented on YARN-9859: - [~elgoiri] could you please review this whenever you get some time. Thanks. > Code cleanup of OpportunisticContainerAllocator > --- > > Key: YARN-9859 > URL: https://issues.apache.org/jira/browse/YARN-9859 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9859.001.patch > > > Right now OpportunisticContainerAllocator is written mainly for Distributed > Scheduling and schedules Opportunistic containers on limited set of nodes. As > part of this jira, we are going to make OpportunisticContainerAllocator as an > abstract class and DistributedOpportunisticContainerAllocator as actual > implementation. This would be prerequisite for YARN-9697. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9859) Code cleanup of OpportunisticContainerAllocator
[ https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9859: Attachment: YARN-9859.001.patch > Code cleanup of OpportunisticContainerAllocator > --- > > Key: YARN-9859 > URL: https://issues.apache.org/jira/browse/YARN-9859 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9859.001.patch > > > Right now OpportunisticContainerAllocator is written mainly for Distributed > Scheduling and schedules Opportunistic containers on limited set of nodes. As > part of this jira, we are going to make OpportunisticContainerAllocator as an > abstract class and DistributedOpportunisticContainerAllocator as actual > implementation. This would be prerequisite for YARN-9697. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9859) Code cleanup of OpportunisticContainerAllocator
Abhishek Modi created YARN-9859: --- Summary: Code cleanup of OpportunisticContainerAllocator Key: YARN-9859 URL: https://issues.apache.org/jira/browse/YARN-9859 Project: Hadoop YARN Issue Type: Sub-task Reporter: Abhishek Modi Assignee: Abhishek Modi Right now OpportunisticContainerAllocator is written mainly for Distributed Scheduling and schedules Opportunistic containers on limited set of nodes. As part of this jira, we are going to make OpportunisticContainerAllocator as an abstract class and DistributedOpportunisticContainerAllocator as actual implementation. This would be prerequisite for YARN-9697. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9853) Add number of paused containers in NodeInfo page.
Abhishek Modi created YARN-9853: --- Summary: Add number of paused containers in NodeInfo page. Key: YARN-9853 URL: https://issues.apache.org/jira/browse/YARN-9853 Project: Hadoop YARN Issue Type: Task Reporter: Abhishek Modi Assignee: Abhishek Modi -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933658#comment-16933658 ] Abhishek Modi commented on YARN-9697: - [~elgoiri] could you please review the approach of wip2(https://issues.apache.org/jira/secure/attachment/12980716/YARN-9697.wip2.patch) patch. Thanks. > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.ut.patch, YARN-9697.ut2.patch, > YARN-9697.wip1.patch, YARN-9697.wip2.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9697: Attachment: YARN-9697.wip2.patch > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.ut.patch, YARN-9697.ut2.patch, > YARN-9697.wip1.patch, YARN-9697.wip2.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9782) Avoid DNS resolution while running SLS.
[ https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933270#comment-16933270 ] Abhishek Modi commented on YARN-9782: - Thanks [~elgoiri] for review. Filed YARN-9843 for making TestAMSimulator.testAMSimulator more resilient. > Avoid DNS resolution while running SLS. > --- > > Key: YARN-9782 > URL: https://issues.apache.org/jira/browse/YARN-9782 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9782.001.patch, YARN-9782.002.patch, > YARN-9782.003.patch > > > In SLS, we add nodes with random names and rack. DNS resolution of these > nodes takes around 2 seconds because it will timeout after that. This makes > the result of SLS unreliable and adds spikes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9843) Test TestAMSimulator.testAMSimulator fails intermittently.
Abhishek Modi created YARN-9843: --- Summary: Test TestAMSimulator.testAMSimulator fails intermittently. Key: YARN-9843 URL: https://issues.apache.org/jira/browse/YARN-9843 Project: Hadoop YARN Issue Type: Test Reporter: Abhishek Modi Assignee: Abhishek Modi Stack trace for failure: java.lang.AssertionError: java.io.IOException: Unable to delete directory /testptch/hadoop/hadoop-tools/hadoop-sls/target/test-dir/output4038286622450859971/metrics. at org.junit.Assert.fail(Assert.java:88) at org.apache.hadoop.yarn.sls.appmaster.TestAMSimulator.deleteMetricOutputDir(TestAMSimulator.java:141) at org.apache.hadoop.yarn.sls.appmaster.TestAMSimulator.tearDown(TestAMSimulator.java:298) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:27) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9842) Port YARN-9608 DecommissioningNodesWatcher should get lists of running applications on node from RMNode to branch-3.0/branch-2
Abhishek Modi created YARN-9842: --- Summary: Port YARN-9608 DecommissioningNodesWatcher should get lists of running applications on node from RMNode to branch-3.0/branch-2 Key: YARN-9842 URL: https://issues.apache.org/jira/browse/YARN-9842 Project: Hadoop YARN Issue Type: Task Reporter: Abhishek Modi Assignee: Abhishek Modi -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9794) RM crashes due to runtime errors in TimelineServiceV2Publisher
[ https://issues.apache.org/jira/browse/YARN-9794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929935#comment-16929935 ] Abhishek Modi commented on YARN-9794: - Thanks [~tarunparimi]. Latest patch looks good to me. Thanks [~Prabhu Joseph] for additional review. Committed to trunk. > RM crashes due to runtime errors in TimelineServiceV2Publisher > -- > > Key: YARN-9794 > URL: https://issues.apache.org/jira/browse/YARN-9794 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: YARN-9794.001.patch, YARN-9794.002.patch > > > Saw that RM crashes while startup due to errors while putting entity in > TimelineServiceV2Publisher. > {code:java} > 2019-08-28 09:35:45,273 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.lang.RuntimeException: java.lang.IllegalArgumentException: > org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: > CodedInputStream encountered an embedded string or message which claimed to > have negative size > . > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:200) > at > org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:269) > at > org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:437) > at > org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:312) > at > org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:597) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:834) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:732) > at > org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281) > at > org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:236) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:321) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:285) > at > org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.flush(TypedBufferedMutator.java:66) > at > org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.flush(HBaseTimelineWriterImpl.java:566) > at > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector.flushBufferedTimelineEntities(TimelineCollector.java:173) > at > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector.putEntities(TimelineCollector.java:150) > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:459) > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.access$100(TimelineServiceV2Publisher.java:73) > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:494) > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:483) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.IllegalArgumentException: > org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: > CodedInputStream encountered an embedded string or message which claimed to > have negative size. > at > org.apache.hbase.thirdparty.com.google.protobuf.CodedInputStream.newInstance(CodedInputStream.java:117) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9816) EntityGroupFSTimelineStore#scanActiveLogs fails when undesired files are present under /ats/active.
[ https://issues.apache.org/jira/browse/YARN-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9816: Summary: EntityGroupFSTimelineStore#scanActiveLogs fails when undesired files are present under /ats/active. (was: EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError) > EntityGroupFSTimelineStore#scanActiveLogs fails when undesired files are > present under /ats/active. > --- > > Key: YARN-9816 > URL: https://issues.apache.org/jira/browse/YARN-9816 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.1.0, 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9816-001.patch > > > EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError. > This happens when a file is present under /ats/active. > {code} > [hdfs@node2 yarn]$ hadoop fs -ls /ats/active > Found 1 items > -rw-r--r-- 3 hdfs hadoop 0 2019-09-06 16:34 > /ats/active/.distcp.tmp.attempt_155759136_39768_m_01_0 > {code} > Error Message: > {code:java} > java.lang.StackOverflowError > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:632) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185) > at com.sun.proxy.$Proxy15.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2143) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1076) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1088) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1059) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1038) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1034) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1046) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.list(EntityGroupFSTimelineStore.java:398) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:368) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383
[jira] [Commented] (YARN-9816) EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError
[ https://issues.apache.org/jira/browse/YARN-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928231#comment-16928231 ] Abhishek Modi commented on YARN-9816: - Sure.. committing it shortly. Thanks. > EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError > --- > > Key: YARN-9816 > URL: https://issues.apache.org/jira/browse/YARN-9816 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.1.0, 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9816-001.patch > > > EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError. > This happens when a file is present under /ats/active. > {code} > [hdfs@node2 yarn]$ hadoop fs -ls /ats/active > Found 1 items > -rw-r--r-- 3 hdfs hadoop 0 2019-09-06 16:34 > /ats/active/.distcp.tmp.attempt_155759136_39768_m_01_0 > {code} > Error Message: > {code:java} > java.lang.StackOverflowError > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:632) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185) > at com.sun.proxy.$Proxy15.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2143) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1076) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1088) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1059) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1038) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1034) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1046) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.list(EntityGroupFSTimelineStore.java:398) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:368) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > {code} > One of our user has tried to distcp hdfs://ats/active dir. Distcp job has > created the > temp file .distcp.tmp.attempt_155759136_39768_m_
[jira] [Created] (YARN-9828) Add log line for app submission in RouterWebServices.
Abhishek Modi created YARN-9828: --- Summary: Add log line for app submission in RouterWebServices. Key: YARN-9828 URL: https://issues.apache.org/jira/browse/YARN-9828 Project: Hadoop YARN Issue Type: Task Reporter: Abhishek Modi Assignee: Abhishek Modi -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.
[ https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928147#comment-16928147 ] Abhishek Modi commented on YARN-9819: - Thanks [~elgoiri] for review. Committed to trunk. > Make TestOpportunisticContainerAllocatorAMService more resilient. > - > > Key: YARN-9819 > URL: https://issues.apache.org/jira/browse/YARN-9819 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9819.001.patch, YARN-9819.002.patch, > YARN-9819.003.patch > > > Currently, TestOpportunisticContainerAllocatorAMService tries to set the > Opportunistic container status directly in RMNode but that can be updated by > NM heartbeat. Correct way would be to send it through NM heartbeat. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8972) [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size
[ https://issues.apache.org/jira/browse/YARN-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927813#comment-16927813 ] Abhishek Modi commented on YARN-8972: - [~giovanni.fumarola] are you still working on it. Thanks. > [Router] Add support to prevent DoS attack over ApplicationSubmissionContext > size > - > > Key: YARN-8972 > URL: https://issues.apache.org/jira/browse/YARN-8972 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8972.v1.patch, YARN-8972.v2.patch, > YARN-8972.v3.patch, YARN-8972.v4.patch, YARN-8972.v5.patch > > > This jira tracks the effort to add a new interceptor in the Router to prevent > user to submit applications with oversized ASC. > This avoid YARN cluster to failover. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9827) Fix Http Response code in GenericExceptionHandler.
Abhishek Modi created YARN-9827: --- Summary: Fix Http Response code in GenericExceptionHandler. Key: YARN-9827 URL: https://issues.apache.org/jira/browse/YARN-9827 Project: Hadoop YARN Issue Type: Bug Reporter: Abhishek Modi Assignee: Abhishek Modi GenericExceptionHandler should respond with SERVICE_UNAVAILABLE in case of connection and service unavailable exception instead of INTERNAL_SERVICE_ERROR. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9782) Avoid DNS resolution while running SLS.
[ https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926479#comment-16926479 ] Abhishek Modi commented on YARN-9782: - Test failure is not related to this patch and is happening because we are not able to delete a directory at end. [~elgoiri] could you please review latest patch. Thanks. > Avoid DNS resolution while running SLS. > --- > > Key: YARN-9782 > URL: https://issues.apache.org/jira/browse/YARN-9782 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9782.001.patch, YARN-9782.002.patch, > YARN-9782.003.patch > > > In SLS, we add nodes with random names and rack. DNS resolution of these > nodes takes around 2 seconds because it will timeout after that. This makes > the result of SLS unreliable and adds spikes. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9782) Avoid DNS resolution while running SLS.
[ https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9782: Attachment: YARN-9782.003.patch > Avoid DNS resolution while running SLS. > --- > > Key: YARN-9782 > URL: https://issues.apache.org/jira/browse/YARN-9782 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9782.001.patch, YARN-9782.002.patch, > YARN-9782.003.patch > > > In SLS, we add nodes with random names and rack. DNS resolution of these > nodes takes around 2 seconds because it will timeout after that. This makes > the result of SLS unreliable and adds spikes. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.
[ https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925922#comment-16925922 ] Abhishek Modi commented on YARN-9819: - Thanks [~elgoiri] for review. Attached v3 patch with javadocs for all public functions. Private functions introduced in TestOpportunisticContainerAllocatorAMService are one liner and quite self explanatory. Please let me know if you think we need documentation there too. > Make TestOpportunisticContainerAllocatorAMService more resilient. > - > > Key: YARN-9819 > URL: https://issues.apache.org/jira/browse/YARN-9819 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9819.001.patch, YARN-9819.002.patch, > YARN-9819.003.patch > > > Currently, TestOpportunisticContainerAllocatorAMService tries to set the > Opportunistic container status directly in RMNode but that can be updated by > NM heartbeat. Correct way would be to send it through NM heartbeat. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.
[ https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9819: Attachment: YARN-9819.003.patch > Make TestOpportunisticContainerAllocatorAMService more resilient. > - > > Key: YARN-9819 > URL: https://issues.apache.org/jira/browse/YARN-9819 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9819.001.patch, YARN-9819.002.patch, > YARN-9819.003.patch > > > Currently, TestOpportunisticContainerAllocatorAMService tries to set the > Opportunistic container status directly in RMNode but that can be updated by > NM heartbeat. Correct way would be to send it through NM heartbeat. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down
[ https://issues.apache.org/jira/browse/YARN-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925490#comment-16925490 ] Abhishek Modi commented on YARN-9821: - Sure [~rohithsharma]. I am leaving this Jira as unresolved and you can mark it as resolved after you backport it to 3.2 branches. Thanks. > NM hangs at serviceStop when ATSV2 Backend Hbase is Down > - > > Key: YARN-9821 > URL: https://issues.apache.org/jira/browse/YARN-9821 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9821-001.patch, YARN-9821-002.patch > > > NM hangs at serviceStop when ATSV2 Backend Hbase is Down. > {code} > "Thread-197" #302 prio=5 os_prio=0 tid=0x7f5f389ba000 nid=0x631d waiting > for monitor entry [0x7f5f1f29b000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:249) > - waiting to lock <0x0006c834d148> (a > org.apache.hadoop.hbase.client.BufferedMutatorImpl) > at > org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.close(TypedBufferedMutator.java:62) > at > org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.serviceStop(HBaseTimelineWriterImpl.java:636) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05808> (a java.lang.Object) > at > org.apache.hadoop.service.AbstractService.close(AbstractService.java:247) > at > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.serviceStop(TimelineCollectorManager.java:244) > at > org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.serviceStop(NodeTimelineCollectorManager.java:164) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05890> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.serviceStop(PerNodeTimelineCollectorsAuxService.java:113) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c058f8> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStop(AuxServices.java:330) > - locked <0x0006c7c23400> (a java.util.Collections$SynchronizedMap) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c059a8> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStop(ContainerManagerImpl.java:720) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05a98> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:526) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05c88> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager$1.run(NodeManager.java:552) > > > "qtp183259297-76" #76 daemon prio=5 os_prio=0 tid=0x7f5f567ed000 > nid=0x5fb7 in Object.wait() [0x7f5f23ad7000] >java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:460) > at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForSpecificCompletedTask(ResultBoundedCompletionService.java:258) > - locked <0x000784ee8220> (a > [Lorg.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture;) > at
[jira] [Commented] (YARN-9816) EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError
[ https://issues.apache.org/jira/browse/YARN-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925484#comment-16925484 ] Abhishek Modi commented on YARN-9816: - Thanks [~Prabhu Joseph]. changes looks good to me. will commit shortly. > EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError > --- > > Key: YARN-9816 > URL: https://issues.apache.org/jira/browse/YARN-9816 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.1.0, 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9816-001.patch > > > EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError. > This happens when a file is present under /ats/active. > {code} > [hdfs@node2 yarn]$ hadoop fs -ls /ats/active > Found 1 items > -rw-r--r-- 3 hdfs hadoop 0 2019-09-06 16:34 > /ats/active/.distcp.tmp.attempt_155759136_39768_m_01_0 > {code} > Error Message: > {code:java} > java.lang.StackOverflowError > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:632) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185) > at com.sun.proxy.$Proxy15.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2143) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1076) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1088) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1059) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1038) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1034) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1046) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.list(EntityGroupFSTimelineStore.java:398) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:368) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > {code} > One of our user has tried to distcp hdfs://ats/active dir. Distcp job has > created the > temp file .distcp.
[jira] [Commented] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down
[ https://issues.apache.org/jira/browse/YARN-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925481#comment-16925481 ] Abhishek Modi commented on YARN-9821: - Thanks [~Prabhu Joseph] for the patch and [~rohithsharma] for additional review. I have committed it to trunk. [~rohithsharma] should we commit it to 3.2 and 3.1 branch also? > NM hangs at serviceStop when ATSV2 Backend Hbase is Down > - > > Key: YARN-9821 > URL: https://issues.apache.org/jira/browse/YARN-9821 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9821-001.patch, YARN-9821-002.patch > > > NM hangs at serviceStop when ATSV2 Backend Hbase is Down. > {code} > "Thread-197" #302 prio=5 os_prio=0 tid=0x7f5f389ba000 nid=0x631d waiting > for monitor entry [0x7f5f1f29b000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:249) > - waiting to lock <0x0006c834d148> (a > org.apache.hadoop.hbase.client.BufferedMutatorImpl) > at > org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.close(TypedBufferedMutator.java:62) > at > org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.serviceStop(HBaseTimelineWriterImpl.java:636) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05808> (a java.lang.Object) > at > org.apache.hadoop.service.AbstractService.close(AbstractService.java:247) > at > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.serviceStop(TimelineCollectorManager.java:244) > at > org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.serviceStop(NodeTimelineCollectorManager.java:164) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05890> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.serviceStop(PerNodeTimelineCollectorsAuxService.java:113) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c058f8> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStop(AuxServices.java:330) > - locked <0x0006c7c23400> (a java.util.Collections$SynchronizedMap) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c059a8> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStop(ContainerManagerImpl.java:720) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05a98> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:526) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05c88> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager$1.run(NodeManager.java:552) > > > "qtp183259297-76" #76 daemon prio=5 os_prio=0 tid=0x7f5f567ed000 > nid=0x5fb7 in Object.wait() [0x7f5f23ad7000] >java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:460) > at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForSpecificCompletedTask(ResultBoundedCompletionService.java:258) > - locked <0x000784ee8220> (a > [Lorg.apache.hadoop.hbase.client.ResultBoundedCompletio
[jira] [Commented] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down
[ https://issues.apache.org/jira/browse/YARN-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925331#comment-16925331 ] Abhishek Modi commented on YARN-9821: - Thanks [~Prabhu Joseph] for the patch. Some minor comments: # Can we rename isHbaseUp => isStorageUp to make it more generic. # Can we log the exception too. Apart from these minor comments, it looks good to me. > NM hangs at serviceStop when ATSV2 Backend Hbase is Down > - > > Key: YARN-9821 > URL: https://issues.apache.org/jira/browse/YARN-9821 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9821-001.patch > > > NM hangs at serviceStop when ATSV2 Backend Hbase is Down. > {code} > "Thread-197" #302 prio=5 os_prio=0 tid=0x7f5f389ba000 nid=0x631d waiting > for monitor entry [0x7f5f1f29b000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:249) > - waiting to lock <0x0006c834d148> (a > org.apache.hadoop.hbase.client.BufferedMutatorImpl) > at > org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.close(TypedBufferedMutator.java:62) > at > org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.serviceStop(HBaseTimelineWriterImpl.java:636) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05808> (a java.lang.Object) > at > org.apache.hadoop.service.AbstractService.close(AbstractService.java:247) > at > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.serviceStop(TimelineCollectorManager.java:244) > at > org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.serviceStop(NodeTimelineCollectorManager.java:164) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05890> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.serviceStop(PerNodeTimelineCollectorsAuxService.java:113) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c058f8> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStop(AuxServices.java:330) > - locked <0x0006c7c23400> (a java.util.Collections$SynchronizedMap) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c059a8> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStop(ContainerManagerImpl.java:720) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05a98> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:526) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05c88> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager$1.run(NodeManager.java:552) > > > "qtp183259297-76" #76 daemon prio=5 os_prio=0 tid=0x7f5f567ed000 > nid=0x5fb7 in Object.wait() [0x7f5f23ad7000] >java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:460) > at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForSpecificCompletedTask(ResultBoundedCompletionService.java:258) > - locked <0x000784ee8220> (a > [Lorg.apache.hadoop.hbase.client.Resul
[jira] [Commented] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.
[ https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925081#comment-16925081 ] Abhishek Modi commented on YARN-9819: - [~elgoiri] could you please review it. Unit test failure is not related to patch. > Make TestOpportunisticContainerAllocatorAMService more resilient. > - > > Key: YARN-9819 > URL: https://issues.apache.org/jira/browse/YARN-9819 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9819.001.patch, YARN-9819.002.patch > > > Currently, TestOpportunisticContainerAllocatorAMService tries to set the > Opportunistic container status directly in RMNode but that can be updated by > NM heartbeat. Correct way would be to send it through NM heartbeat. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.
[ https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9819: Attachment: YARN-9819.002.patch > Make TestOpportunisticContainerAllocatorAMService more resilient. > - > > Key: YARN-9819 > URL: https://issues.apache.org/jira/browse/YARN-9819 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9819.001.patch, YARN-9819.002.patch > > > Currently, TestOpportunisticContainerAllocatorAMService tries to set the > Opportunistic container status directly in RMNode but that can be updated by > NM heartbeat. Correct way would be to send it through NM heartbeat. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9784) org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue is flaky
[ https://issues.apache.org/jira/browse/YARN-9784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924852#comment-16924852 ] Abhishek Modi commented on YARN-9784: - Thanks [~kmarton] for the patch. LGTM. Thanks [~sunilg] and [~adam.antal] for additional reviews. Committed to trunk. > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue > is flaky > --- > > Key: YARN-9784 > URL: https://issues.apache.org/jira/browse/YARN-9784 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 3.3.0 >Reporter: Julia Kinga Marton >Assignee: Julia Kinga Marton >Priority: Major > Attachments: YARN-9784.001.patch > > > There are some test cases in TestLeafQueue which are failing intermittently. > From 100 runs, there were 16 failures. > Some failure examples are the following ones: > {code:java} > 2019-08-26 13:18:13 [ERROR] Errors: > 2019-08-26 13:18:13 [ERROR] TestLeafQueue.setUp:144->setUpInternal:221 > WrongTypeOfReturnValue > 2019-08-26 13:18:13 YarnConfigu... > 2019-08-26 13:18:13 [ERROR] TestLeafQueue.setUp:144->setUpInternal:221 > WrongTypeOfReturnValue > 2019-08-26 13:18:13 YarnConfigu... > 2019-08-26 13:18:13 [INFO] > 2019-08-26 13:18:13 [ERROR] Tests run: 36, Failures: 0, Errors: 2, Skipped: 0 > {code} > {code:java} > 2019-08-26 13:18:09 [ERROR] Failures: > 2019-08-26 13:18:09 [ERROR] TestLeafQueue.testHeadroomWithMaxCap:1373 > expected:<2048> but was:<0> > 2019-08-26 13:18:09 [INFO] > 2019-08-26 13:18:09 [ERROR] Tests run: 36, Failures: 1, Errors: 0, Skipped: 0 > {code} > {code:java} > 2019-08-26 13:18:18 [ERROR] Errors: > 2019-08-26 13:18:18 [ERROR] TestLeafQueue.setUp:144->setUpInternal:221 > WrongTypeOfReturnValue > 2019-08-26 13:18:18 YarnConfigu... > 2019-08-26 13:18:18 [ERROR] TestLeafQueue.testHeadroomWithMaxCap:1307 ? > ClassCast org.apache.hadoop.yarn.c... > 2019-08-26 13:18:18 [INFO] > 2019-08-26 13:18:18 [ERROR] Tests run: 36, Failures: 0, Errors: 2, Skipped: 0 > {code} > {code:java} > 2019-08-26 13:18:10 [ERROR] Failures: > 2019-08-26 13:18:10 [ERROR] TestLeafQueue.testDRFUserLimits:847 Verify > user_0 got resources > 2019-08-26 13:18:10 [INFO] > 2019-08-26 13:18:10 [ERROR] Tests run: 36, Failures: 1, Errors: 0, Skipped: 0 > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.
[ https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9819: Attachment: YARN-9819.001.patch > Make TestOpportunisticContainerAllocatorAMService more resilient. > - > > Key: YARN-9819 > URL: https://issues.apache.org/jira/browse/YARN-9819 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9819.001.patch > > > Currently, TestOpportunisticContainerAllocatorAMService tries to set the > Opportunistic container status directly in RMNode but that can be updated by > NM heartbeat. Correct way would be to send it through NM heartbeat. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.
Abhishek Modi created YARN-9819: --- Summary: Make TestOpportunisticContainerAllocatorAMService more resilient. Key: YARN-9819 URL: https://issues.apache.org/jira/browse/YARN-9819 Project: Hadoop YARN Issue Type: Sub-task Reporter: Abhishek Modi Assignee: Abhishek Modi Currently, TestOpportunisticContainerAllocatorAMService tries to set the Opportunistic container status directly in RMNode but that can be updated by NM heartbeat. Correct way would be to send it through NM heartbeat. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9812) mvn javadoc:javadoc fails in hadoop-sls
[ https://issues.apache.org/jira/browse/YARN-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924757#comment-16924757 ] Abhishek Modi commented on YARN-9812: - Thanks [~elgoiri] for review. Committed to trunk. > mvn javadoc:javadoc fails in hadoop-sls > --- > > Key: YARN-9812 > URL: https://issues.apache.org/jira/browse/YARN-9812 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Akira Ajisaka >Assignee: Abhishek Modi >Priority: Major > Labels: newbie > Fix For: 3.3.0 > > Attachments: YARN-9812.001.patch, YARN-9812.002.patch > > > {noformat} > [ERROR] > hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:57: > error: bad use of '>' > [ERROR] * pending -> requests which are NOT yet sent to RM. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:58: > error: bad use of '>' > [ERROR] * scheduled -> requests which are sent to RM but not yet assigned. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:59: > error: bad use of '>' > [ERROR] * assigned -> requests which are assigned to a container. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:60: > error: bad use of '>' > [ERROR] * completed -> request corresponding to which container has > completed. > [ERROR] ^ > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7604) Fix some minor typos in the opportunistic container logging
[ https://issues.apache.org/jira/browse/YARN-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924747#comment-16924747 ] Abhishek Modi commented on YARN-7604: - Thanks [~cheersyang] for the patch. Could you please move this log lines to use new log4j format. Thanks. > Fix some minor typos in the opportunistic container logging > --- > > Key: YARN-7604 > URL: https://issues.apache.org/jira/browse/YARN-7604 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.9.0 >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Trivial > Attachments: YARN-7604.01.patch > > > Fix some minor text issues. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9812) mvn javadoc:javadoc fails in hadoop-sls
[ https://issues.apache.org/jira/browse/YARN-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9812: Attachment: YARN-9812.002.patch > mvn javadoc:javadoc fails in hadoop-sls > --- > > Key: YARN-9812 > URL: https://issues.apache.org/jira/browse/YARN-9812 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Akira Ajisaka >Assignee: Abhishek Modi >Priority: Major > Labels: newbie > Attachments: YARN-9812.001.patch, YARN-9812.002.patch > > > {noformat} > [ERROR] > hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:57: > error: bad use of '>' > [ERROR] * pending -> requests which are NOT yet sent to RM. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:58: > error: bad use of '>' > [ERROR] * scheduled -> requests which are sent to RM but not yet assigned. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:59: > error: bad use of '>' > [ERROR] * assigned -> requests which are assigned to a container. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:60: > error: bad use of '>' > [ERROR] * completed -> request corresponding to which container has > completed. > [ERROR] ^ > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9782) Avoid DNS resolution while running SLS.
[ https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924455#comment-16924455 ] Abhishek Modi commented on YARN-9782: - [~elgoiri] I found a potential issue with this unit test. Since we are setting up java security settings, it would be set for all the following unit tests as all of them run within same java process. One way to avoid that is to run Unit tests for SLS project in separate java process, but that will increase runtime for the tests. Second option is to skip unit test for this. Since it's a very small change behind config, would it be possible to skip unit test for this? [~elgoiri] [~subru] could you please provide some suggestions here. Thanks. > Avoid DNS resolution while running SLS. > --- > > Key: YARN-9782 > URL: https://issues.apache.org/jira/browse/YARN-9782 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9782.001.patch, YARN-9782.002.patch > > > In SLS, we add nodes with random names and rack. DNS resolution of these > nodes takes around 2 seconds because it will timeout after that. This makes > the result of SLS unreliable and adds spikes. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9812) mvn javadoc:javadoc fails in hadoop-sls
[ https://issues.apache.org/jira/browse/YARN-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924361#comment-16924361 ] Abhishek Modi commented on YARN-9812: - [~aajisaka] [~elgoiri] could you please review it. Thanks. > mvn javadoc:javadoc fails in hadoop-sls > --- > > Key: YARN-9812 > URL: https://issues.apache.org/jira/browse/YARN-9812 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Akira Ajisaka >Assignee: Abhishek Modi >Priority: Major > Labels: newbie > Attachments: YARN-9812.001.patch > > > {noformat} > [ERROR] > hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:57: > error: bad use of '>' > [ERROR] * pending -> requests which are NOT yet sent to RM. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:58: > error: bad use of '>' > [ERROR] * scheduled -> requests which are sent to RM but not yet assigned. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:59: > error: bad use of '>' > [ERROR] * assigned -> requests which are assigned to a container. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:60: > error: bad use of '>' > [ERROR] * completed -> request corresponding to which container has > completed. > [ERROR] ^ > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9782) Avoid DNS resolution while running SLS.
[ https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924327#comment-16924327 ] Abhishek Modi commented on YARN-9782: - Thanks [~elgoiri] for review. Updated patch. > Avoid DNS resolution while running SLS. > --- > > Key: YARN-9782 > URL: https://issues.apache.org/jira/browse/YARN-9782 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9782.001.patch, YARN-9782.002.patch > > > In SLS, we add nodes with random names and rack. DNS resolution of these > nodes takes around 2 seconds because it will timeout after that. This makes > the result of SLS unreliable and adds spikes. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924326#comment-16924326 ] Abhishek Modi commented on YARN-9697: - [~elgoiri] could you please review the approach taken in poc patch. If it looks good to you, I can clean it up and add some more UTs. Thanks. > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.ut.patch, YARN-9697.ut2.patch, > YARN-9697.wip1.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9782) Avoid DNS resolution while running SLS.
[ https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9782: Attachment: YARN-9782.002.patch > Avoid DNS resolution while running SLS. > --- > > Key: YARN-9782 > URL: https://issues.apache.org/jira/browse/YARN-9782 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9782.001.patch, YARN-9782.002.patch > > > In SLS, we add nodes with random names and rack. DNS resolution of these > nodes takes around 2 seconds because it will timeout after that. This makes > the result of SLS unreliable and adds spikes. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9697: Attachment: YARN-9697.wip1.patch > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.ut.patch, YARN-9697.ut2.patch, > YARN-9697.wip1.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9812) mvn javadoc:javadoc fails in hadoop-sls
[ https://issues.apache.org/jira/browse/YARN-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi reassigned YARN-9812: --- Assignee: Abhishek Modi > mvn javadoc:javadoc fails in hadoop-sls > --- > > Key: YARN-9812 > URL: https://issues.apache.org/jira/browse/YARN-9812 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Akira Ajisaka >Assignee: Abhishek Modi >Priority: Major > Labels: newbie > > {noformat} > [ERROR] > hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:57: > error: bad use of '>' > [ERROR] * pending -> requests which are NOT yet sent to RM. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:58: > error: bad use of '>' > [ERROR] * scheduled -> requests which are sent to RM but not yet assigned. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:59: > error: bad use of '>' > [ERROR] * assigned -> requests which are assigned to a container. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:60: > error: bad use of '>' > [ERROR] * completed -> request corresponding to which container has > completed. > [ERROR] ^ > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9804) Update ATSv2 document for latest feature supports
[ https://issues.apache.org/jira/browse/YARN-9804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921170#comment-16921170 ] Abhishek Modi commented on YARN-9804: - Thanks [~rohithsharma]. New patch looks good to me. +1 from my end. > Update ATSv2 document for latest feature supports > - > > Key: YARN-9804 > URL: https://issues.apache.org/jira/browse/YARN-9804 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Blocker > Attachments: YARN-9804.01.patch, YARN-9804.02.patch > > > Revisit ATSv2 documents and update for GA features. And also for the road map. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8139) Skip node hostname resolution when running SLS.
[ https://issues.apache.org/jira/browse/YARN-8139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi resolved YARN-8139. - Resolution: Duplicate > Skip node hostname resolution when running SLS. > --- > > Key: YARN-8139 > URL: https://issues.apache.org/jira/browse/YARN-8139 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > > Currently depending on the time taken in resolution of hostname, metrics of > SLS gets skewed. To avoid this, in this fix we are introducing a flag which > can be used to disable hostname resolutions. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9804) Update ATSv2 document for latest feature supports
[ https://issues.apache.org/jira/browse/YARN-9804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920634#comment-16920634 ] Abhishek Modi commented on YARN-9804: - Thanks [~rohithsharma] for working on it. Some minor comments: Road map include -> Road map includes Simple authorization in terms of a configurable whitelist of users and groups who can read timeline data -> Support for simple authorization has been added in terms of a configurable whitelist of users and groups who can read timeline data. YARN Client integrates with ATSv2. -> YARN Client has been integrated with ATSv2. This enables fetching application/attempt/container report from TimelineReader if details not present in ResouceManager. -> This enables fetching application/attempt/container report from TimelineReader if details are not present in ResouceManager. It set true -> If set true Since Yarn Cli support has been added, should we remove this line: Currently there is no support for command line access. > Update ATSv2 document for latest feature supports > - > > Key: YARN-9804 > URL: https://issues.apache.org/jira/browse/YARN-9804 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Blocker > Attachments: YARN-9804.01.patch > > > Revisit ATSv2 documents and update for GA features. And also for the road map. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9400) Remove unnecessary if at EntityGroupFSTimelineStore#parseApplicationId
[ https://issues.apache.org/jira/browse/YARN-9400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920624#comment-16920624 ] Abhishek Modi commented on YARN-9400: - Thanks [~Prabhu Joseph]. lgtm. will commit to trunk. > Remove unnecessary if at EntityGroupFSTimelineStore#parseApplicationId > -- > > Key: YARN-9400 > URL: https://issues.apache.org/jira/browse/YARN-9400 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Attachments: YARN-9400-001.patch > > > If clause to validate whether appIdStr starts with "application" is not > required at EntityGroupFSTimelineStore#parseApplicationId > {code} > // converts the String to an ApplicationId or null if conversion failed > private static ApplicationId parseApplicationId(String appIdStr) { > ApplicationId appId = null; > if (appIdStr.startsWith(ApplicationId.appIdStrPrefix)) { > try { > appId = ApplicationId.fromString(appIdStr); > } catch (IllegalArgumentException e) { > appId = null; > } > } > return appId; > } > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8174) Add containerId to ResourceLocalizationService fetch failure log statement
[ https://issues.apache.org/jira/browse/YARN-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920619#comment-16920619 ] Abhishek Modi commented on YARN-8174: - v3 patch lgtm. Committed to trunk. > Add containerId to ResourceLocalizationService fetch failure log statement > -- > > Key: YARN-8174 > URL: https://issues.apache.org/jira/browse/YARN-8174 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Attachments: YARN-8174.1.patch, YARN-8174.2.patch, YARN-8174.3.patch > > > When a localization for a resource failed due to change in timestamp, there > is no containerId logged to correlate. > {code} > 2018-04-18 07:31:46,033 WARN localizer.ResourceLocalizationService > (ResourceLocalizationService.java:processHeartbeat(1017)) - { > hdfs://tarunhdp-1.openstacklocal:8020/user/ambari-qa/.staging/job_1523550428406_0016/job.splitmetainfo, > 1524036694502, FILE, null } failed: Resource > hdfs://tarunhdp-1.openstacklocal:8020/user/ambari-qa/.staging/job_1523550428406_0016/job.splitmetainfo > changed on src filesystem (expected 1524036694502, was 1524036694502 > java.io.IOException: Resource > hdfs://tarunhdp-1.openstacklocal:8020/user/ambari-qa/.staging/job_1523550428406_0016/job.splitmetainfo > changed on src filesystem (expected 1524036694502, was 1524036694502 > at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:258) > at > org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:362) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:360) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:360) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org