[jira] [Assigned] (YARN-9035) Allow better troubleshooting of FS container assignments and lack of container assignments
[ https://issues.apache.org/jira/browse/YARN-9035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-9035: Assignee: (was: Szilard Nemeth) > Allow better troubleshooting of FS container assignments and lack of > container assignments > -- > > Key: YARN-9035 > URL: https://issues.apache.org/jira/browse/YARN-9035 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Priority: Major > Attachments: YARN-9035.001.patch > > > The call chain started from {{FairScheduler.attemptScheduling}}, to > {{FSQueue}} (parent / leaf).assignContainer and down to > {{FSAppAttempt#assignContainer}} has many calls and has many potential > conditions where {{Resources.none()}} can be returned, meaning container is > not allocated. > A bunch of these empty-assignments do not come with a debug log statement, > so it's very hard to tell what condition lead the {{FairScheduler}} to a > decision where containers are not allocated. > On top of that, in many places, it's difficult to tell either why a > container was allocated to an app attempt. > The goal is to have a common place (i.e. class) that will do all the > loggings, so users conveniently can control all the logs if they are curious > why (and why not) container assigments happened. > Also, it would be handy if readers of the log could easily decide which > {{AppAttempt}} is the log record created for, in other words: every log > record should include the ID of the application / app attempt, if possible. > > Details of implementation: > As most of the already in-place debug messages were protected by a condition > that checks whether the debug level is enabled on loggers, I followed a > similar pattern. All the relevant log messages are created with the class > {{ResourceAssignment}}. > This class is a wrapper for the assigned {{Resource}} object and has a > single logger, so clients should use its helper methods to create log > records. There is a helper method called {{shouldLogReservationActivity}} > that checks if DEBUG or TRACE level is activated on the logger. > See the javadoc on this class for further information. > > {{ResourceAssignment}} is also responsible for adding the app / appettempt ID > to every log record (with some exceptions). > A couple of check classes are introduced: They are responsible to run and > store results of checks that are dependency of a successful container > allocation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9856) Remove log-aggregation related duplicate function
[ https://issues.apache.org/jira/browse/YARN-9856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-9856: Assignee: (was: Szilard Nemeth) > Remove log-aggregation related duplicate function > - > > Key: YARN-9856 > URL: https://issues.apache.org/jira/browse/YARN-9856 > Project: Hadoop YARN > Issue Type: Task > Components: log-aggregation, yarn >Affects Versions: 3.3.0 >Reporter: Adam Antal >Priority: Trivial > Attachments: YARN-9856.001.patch, YARN-9856.002.patch > > > [~snemeth] has noticed a duplication in two of the log-aggregation related > functions. > {quote}I noticed duplicated code in > org.apache.hadoop.yarn.logaggregation.LogToolUtils#outputContainerLog, > duplicated in > org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat.LogReader#readContainerLogs. > [...] > {quote} > We should remove the duplication. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10843) [Umbrella] Tools to help migration from Fair Scheduler to Capacity Scheduler - part II
[ https://issues.apache.org/jira/browse/YARN-10843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-10843: - Assignee: (was: Szilard Nemeth) > [Umbrella] Tools to help migration from Fair Scheduler to Capacity Scheduler > - part II > -- > > Key: YARN-10843 > URL: https://issues.apache.org/jira/browse/YARN-10843 > Project: Hadoop YARN > Issue Type: Task > Components: capacity scheduler, capacityscheduler >Reporter: Peter Bacsko >Priority: Major > Labels: fs2cs > > Remaining tasks for fs2cs converter. > Phase I was completed under YARN-9698. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10249) Various ResourceManager tests are failing on branch-3.2
[ https://issues.apache.org/jira/browse/YARN-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-10249: - Assignee: (was: Szilard Nemeth) > Various ResourceManager tests are failing on branch-3.2 > --- > > Key: YARN-10249 > URL: https://issues.apache.org/jira/browse/YARN-10249 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.2.0 >Reporter: Benjamin Teke >Priority: Major > Attachments: YARN-10249.branch-3.2.POC001.patch, > YARN-10249.branch-3.2.POC002.patch, YARN-10249.branch-3.2.POC003.patch > > > Various tests are failing on branch-3.2. Some examples can be found in: > YARN-10003, YARN-10002, YARN-10237. The seemingly common thing that all of > the failing tests are RM/Capacity Scheduler related, and the failures are > flaky. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7505) RM REST endpoints generate malformed JSON
[ https://issues.apache.org/jira/browse/YARN-7505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-7505: - Description: For all endpoints that return DAOs that contain maps, the generated JSON is malformed. For example: {code:java} % curl 'http://localhost:8088/ws/v1/cluster/apps' {"apps":{"app":[{"id":"application_1510777276702_0001","user":"daniel","name":"QuasiMonteCarlo","queue":"root.daniel","state":"RUNNING","finalStatus":"UNDEFINED","progress":5.0,"trackingUI":"ApplicationMaster","trackingUrl":"http://dhcp-10-16-0-181.pa.cloudera.com:8088/proxy/application_1510777276702_0001/","diagnostics":"","clusterId":1510777276702,"applicationType":"MAPREDUCE","applicationTags":"","priority":0,"startedTime":1510777317853,"finishedTime":0,"elapsedTime":21623,"amContainerLogs":"http://dhcp-10-16-0-181.pa.cloudera.com:8042/node/containerlogs/container_1510777276702_0001_01_01/daniel","amHostHttpAddress":"dhcp-10-16-0-181.pa.cloudera.com:8042","amRPCAddress":"dhcp-10-16-0-181.pa.cloudera.com:63371","allocatedMB":5120,"allocatedVCores":4,"reservedMB":0,"reservedVCores":0,"runningContainers":4,"memorySeconds":49820,"vcoreSeconds":26,"queueUsagePercentage":62.5,"clusterUsagePercentage":62.5,"resourceSecondsMap":{"entry":{"key":"test2","value":"0"},"entry":{"key":"test","value":"0"},"entry":{"key":"memory-mb","value":"49820"},"entry":{"key":"vcores","value":"26"}},"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0,"preemptedMemorySeconds":0,"preemptedVcoreSeconds":0,"preemptedResourceSecondsMap":{},"resourceRequests":[{"priority":20,"resourceName":"dhcp-10-16-0-181.pa.cloudera.com","capability":{"memory":1024,"vCores":1},"numContainers":8,"relaxLocality":true,"nodeLabelExpression":"","executionTypeRequest":{"executionType":"GUARANTEED","enforceExecutionType":true},"enforceExecutionType":false},{"priority":20,"resourceName":"/default-rack","capability":{"memory":1024,"vCores":1},"numContainers":8,"relaxLocality":true,"nodeLabelExpression":"","executionTypeRequest":{"executionType":"GUARANTEED","enforceExecutionType":true},"enforceExecutionType":false},{"priority":20,"resourceName":"*","capability":{"memory":1024,"vCores":1},"numContainers":8,"relaxLocality":true,"nodeLabelExpression":"","executionTypeRequest":{"executionType":"GUARANTEED","enforceExecutionType":true},"enforceExecutionType":false}],"logAggregationStatus":"DISABLED","unmanagedApplication":false,"amNodeLabelExpression":"","timeouts":{"timeout":[{"type":"LIFETIME","expiryTime":"UNLIMITED","remainingTimeInSeconds":-1}]}}]}} {code} was: For all endpoints that return DAOs that contain maps, the generated JSON is malformed. For example: % curl 'http://localhost:8088/ws/v1/cluster/apps' {"apps":{"app":[{"id":"application_1510777276702_0001","user":"daniel","name":"QuasiMonteCarlo","queue":"root.daniel","state":"RUNNING","finalStatus":"UNDEFINED","progress":5.0,"trackingUI":"ApplicationMaster","trackingUrl":"http://dhcp-10-16-0-181.pa.cloudera.com:8088/proxy/application_1510777276702_0001/","diagnostics":"","clusterId":1510777276702,"applicationType":"MAPREDUCE","applicationTags":"","priority":0,"startedTime":1510777317853,"finishedTime":0,"elapsedTime":21623,"amContainerLogs":"http://dhcp-10-16-0-181.pa.cloudera.com:8042/node/containerlogs/container_1510777276702_0001_01_01/daniel","amHostHttpAddress":"dhcp-10-16-0-181.pa.cloudera.com:8042","amRPCAddress":"dhcp-10-16-0-181.pa.cloudera.com:63371","allocatedMB":5120,"allocatedVCores":4,"reservedMB":0,"reservedVCores":0,"runningContainers":4,"memorySeconds":49820,"vcoreSeconds":26,"queueUsagePercentage":62.5,"clusterUsagePercentage":62.5,"resourceSecondsMap":{"entry":{"key":"test2","value":"0"},"entry":{"key":"test","value":"0"},"entry":{"key":"memory-mb","value":"49820"},"entry":{"key":"vcores","value":"26"}},"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0,"preemptedMemorySeconds":0,"preemptedVcoreSeconds":0,"preemptedResourceSecondsMap":{},"resourceRequests":[{"priority":20,"resourceName":"dhcp-10-16-0-181.pa.cloudera.com","capability":{"memory":1024,"vCores":1},"numContainers":8,"relaxLocality":true,"nodeLabelExpression":"","executionTypeRequest":{"executionType":"GUARANTEED","enforceExecutionType":true},"enforceExecutionType":false},{"priority":20,"resourceName":"/default-rack","capability":{"memory":1024,"vCores":1},"numContainers":8,"relaxLocality":true,"nodeLabelExpression":"","executionTypeRequest":{"executionType":"GUARANTEED","enforceExecutionType":true},"enforceExecutionType":false},{"priority":20,"resourceName":"*","capability":{"memory":1024,"vCores":1},"numContainers":8,"relaxLocality":true,"nodeLabelExpression":"","executionTypeRequest":{"executionType":"GUARANTEED","enforceExecutionType":true},"enforceExecutionType":false}],"logAggregationStatus":"DISABLED","unmanaged
[jira] [Assigned] (YARN-7505) RM REST endpoints generate malformed JSON
[ https://issues.apache.org/jira/browse/YARN-7505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-7505: Assignee: (was: Szilard Nemeth) > RM REST endpoints generate malformed JSON > - > > Key: YARN-7505 > URL: https://issues.apache.org/jira/browse/YARN-7505 > Project: Hadoop YARN > Issue Type: Bug > Components: restapi >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Priority: Critical > Attachments: YARN-7505.001.patch, YARN-7505.002.patch > > > For all endpoints that return DAOs that contain maps, the generated JSON is > malformed. For example: > % curl 'http://localhost:8088/ws/v1/cluster/apps' > {"apps":{"app":[{"id":"application_1510777276702_0001","user":"daniel","name":"QuasiMonteCarlo","queue":"root.daniel","state":"RUNNING","finalStatus":"UNDEFINED","progress":5.0,"trackingUI":"ApplicationMaster","trackingUrl":"http://dhcp-10-16-0-181.pa.cloudera.com:8088/proxy/application_1510777276702_0001/","diagnostics":"","clusterId":1510777276702,"applicationType":"MAPREDUCE","applicationTags":"","priority":0,"startedTime":1510777317853,"finishedTime":0,"elapsedTime":21623,"amContainerLogs":"http://dhcp-10-16-0-181.pa.cloudera.com:8042/node/containerlogs/container_1510777276702_0001_01_01/daniel","amHostHttpAddress":"dhcp-10-16-0-181.pa.cloudera.com:8042","amRPCAddress":"dhcp-10-16-0-181.pa.cloudera.com:63371","allocatedMB":5120,"allocatedVCores":4,"reservedMB":0,"reservedVCores":0,"runningContainers":4,"memorySeconds":49820,"vcoreSeconds":26,"queueUsagePercentage":62.5,"clusterUsagePercentage":62.5,"resourceSecondsMap":{"entry":{"key":"test2","value":"0"},"entry":{"key":"test","value":"0"},"entry":{"key":"memory-mb","value":"49820"},"entry":{"key":"vcores","value":"26"}},"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0,"preemptedMemorySeconds":0,"preemptedVcoreSeconds":0,"preemptedResourceSecondsMap":{},"resourceRequests":[{"priority":20,"resourceName":"dhcp-10-16-0-181.pa.cloudera.com","capability":{"memory":1024,"vCores":1},"numContainers":8,"relaxLocality":true,"nodeLabelExpression":"","executionTypeRequest":{"executionType":"GUARANTEED","enforceExecutionType":true},"enforceExecutionType":false},{"priority":20,"resourceName":"/default-rack","capability":{"memory":1024,"vCores":1},"numContainers":8,"relaxLocality":true,"nodeLabelExpression":"","executionTypeRequest":{"executionType":"GUARANTEED","enforceExecutionType":true},"enforceExecutionType":false},{"priority":20,"resourceName":"*","capability":{"memory":1024,"vCores":1},"numContainers":8,"relaxLocality":true,"nodeLabelExpression":"","executionTypeRequest":{"executionType":"GUARANTEED","enforceExecutionType":true},"enforceExecutionType":false}],"logAggregationStatus":"DISABLED","unmanagedApplication":false,"amNodeLabelExpression":"","timeouts":{"timeout":[{"type":"LIFETIME","expiryTime":"UNLIMITED","remainingTimeInSeconds":-1}]}}]}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9450) TestCapacityOverTimePolicy#testAllocation fails sporadically
[ https://issues.apache.org/jira/browse/YARN-9450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-9450: Assignee: (was: Szilard Nemeth) > TestCapacityOverTimePolicy#testAllocation fails sporadically > > > Key: YARN-9450 > URL: https://issues.apache.org/jira/browse/YARN-9450 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, test >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Priority: Major > > TestCapacityOverTimePolicy#testAllocation fails sporadically. Observed in > multiple builds ran for - YARN-9447, YARN-8193, YARN-8051. > {code} > Failed > org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy.testAllocation[Duration > 90,000,000, height 0.25, numSubmission 1, periodic 8640)] > Failing for the past 1 build (Since Failed#23900 ) > Took 34 ms. > Stacktrace > junit.framework.AssertionFailedError > at junit.framework.Assert.fail(Assert.java:55) > at junit.framework.Assert.fail(Assert.java:64) > at junit.framework.TestCase.fail(TestCase.java:235) > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.BaseSharingPolicyTest.runTest(BaseSharingPolicyTest.java:146) > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy.testAllocation(TestCapacityOverTimePolicy.java:136) > at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runners.Suite.runChild(Suite.java:128) > at org.junit.runners.Suite.runChild(Suite.java:27) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > Standard Output > 2019-04-05 23:46:19,022 INFO [main] recovery.RMStateStore > (RMStateStore.java:transition(591)) - Storing reservation > allocation.reservation_-4277767163553399219_8391370105871519867 > 2019-04-05 23:46:19,022 INFO [main] recovery.RMStateStore > (MemoryRMStateStore.java:storeReservationState(258)) - Storing > reservationallocation for > reservation_-4277767163553399219_8391370105871519867 for plan dedicated > 2019-04-05 23:46:19,023 INFO [main] reservation.InMemoryPlan > (InMemoryPlan.java:addReservation(373)) - Successfully added reservation: > reservation_-4277767163553399219_8391370105871519867
[jira] [Assigned] (YARN-10877) SLSSchedulerCommons: Consider using application map from AbstractYarnScheduler and make event handling more consistent
[ https://issues.apache.org/jira/browse/YARN-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-10877: - Assignee: (was: Szilard Nemeth) > SLSSchedulerCommons: Consider using application map from > AbstractYarnScheduler and make event handling more consistent > -- > > Key: YARN-10877 > URL: https://issues.apache.org/jira/browse/YARN-10877 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Priority: Minor > > This is a follow-up of YARN-10552. > The improvements and things to check are coming from [this > comment|https://issues.apache.org/jira/browse/YARN-10552?focusedCommentId=17277991&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17277991]. > {quote} > appQueueMap was not present in SLSFairScheduler before (it was in > SLSCapacityScheduler) however from > https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L163, > it seems that the super class of the schedulers - > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java#L159 > has this already. As such, do we really need to define a new map as a common > map at all in SLSSchedulerCommons or can we somehow reuse the super class's > map? It might need some code updates though. > In regards to the above point, considering SLSFairScheduler did not > previously have any of the following code in handle() method: > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10799) Eliminate queue name replacement in ApplicationSubmissionContext based on placement context
[ https://issues.apache.org/jira/browse/YARN-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-10799: - Assignee: (was: Szilard Nemeth) > Eliminate queue name replacement in ApplicationSubmissionContext based on > placement context > --- > > Key: YARN-10799 > URL: https://issues.apache.org/jira/browse/YARN-10799 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Priority: Major > > This is the long-term fix for YARN-10787: The task is to investigate if it's > possible to eliminate RMAppManager#copyPlacementQueueToSubmissionContext. > This could introduce nasty backward incompatible issues with recovery, so it > should be thought through really carefully. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9511) TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is 436
[ https://issues.apache.org/jira/browse/YARN-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-9511: Assignee: (was: Szilard Nemeth) > TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The > remote jarfile should not be writable by group or others. The current > Permission is 436 > --- > > Key: YARN-9511 > URL: https://issues.apache.org/jira/browse/YARN-9511 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Siyao Meng >Priority: Major > > Found in maven JDK 11 unit test run. Compiled on JDK 8. > {code} > [ERROR] > testRemoteAuxServiceClassPath(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices) > Time elapsed: 0.551 s <<< > ERROR!org.apache.hadoop.yarn.exceptions.YarnRuntimeException: The remote > jarfile should not be writable by group or others. The current Permission is > 436 > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:202) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices.testRemoteAuxServiceClassPath(TestAuxServices.java:268) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10264) Add container launch related env / classpath debug info to container logs when a container fails
[ https://issues.apache.org/jira/browse/YARN-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-10264: - Assignee: (was: Szilard Nemeth) > Add container launch related env / classpath debug info to container logs > when a container fails > > > Key: YARN-10264 > URL: https://issues.apache.org/jira/browse/YARN-10264 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Priority: Major > > Sometimes when a container fails to launch, it can be pretty hard to figure > out why it has failed. > Similar to YARN-4309, we can add a switch to control if the printing of > environment variables and Java classpath should be done. > As a bonus, > [jdeps|https://docs.oracle.com/javase/8/docs/technotes/tools/unix/jdeps.html] > could also be utilized to print some verbose info about the classpath. > When log aggregation occurs, all this information will automatically get > collected and make debugging such container launch failures much easier. > Below is an example output when the user faces a classpath configuration > issue while launching an application: > {code:java} > End of LogType:prelaunch.err > ** > 2020-04-19 05:49:12,145 DEBUG:app_info:Diagnostics of the failed app > 2020-04-19 05:49:12,145 DEBUG:app_info:Application > application_1587300264561_0001 failed 2 times due to AM Container for > appattempt_1587300264561_0001_02 exited with exitCode: 1 > Failing this attempt.Diagnostics: [2020-04-19 12:45:01.955]Exception from > container-launch. > Container id: container_e60_1587300264561_0001_02_01 > Exit code: 1 > Exception message: Launch container failed > Shell output: main : command provided 1 > main : run as user is systest > main : requested yarn user is systest > Getting exit code file... > Creating script paths... > Writing pid file... > Writing to tmp file > /dataroot/ycloud/yarn/nm/nmPrivate/application_1587300264561_0001/container_e60_1587300264561_0001_02_01/container_e60_1587300264561_0001_02_01.pid.tmp > Writing to cgroup task files... > Creating local dirs... > Launching container... > Getting exit code file... > Creating script paths... > [2020-04-19 12:45:01.984]Container exited with a non-zero exit code 1. Error > file: prelaunch.err. > Last 4096 bytes of prelaunch.err : > Last 4096 bytes of stderr : > Error: Could not find or load main class > org.apache.hadoop.mapreduce.v2.app.MRAppMaster > Please check whether your etc/hadoop/mapred-site.xml contains the below > configuration: > > yarn.app.mapreduce.am.env > HADOOP_MAPRED_HOME=${full path of your hadoop distribution > directory} > > > mapreduce.map.env > HADOOP_MAPRED_HOME=${full path of your hadoop distribution > directory} > > > mapreduce.reduce.env > HADOOP_MAPRED_HOME=${full path of your hadoop distribution > directory} > > [2020-04-19 12:45:01.985]Container exited with a non-zero exit code 1. Error > file: prelaunch.err. > Last 4096 bytes of prelaunch.err : > Last 4096 bytes of stderr : > Error: Could not find or load main class > org.apache.hadoop.mapreduce.v2.app.MRAppMaster > Please check whether your etc/hadoop/mapred-site.xml contains the below > configuration: > > yarn.app.mapreduce.am.env > HADOOP_MAPRED_HOME=${full path of your hadoop distribution > directory} > > > mapreduce.map.env > HADOOP_MAPRED_HOME=${full path of your hadoop distribution > directory} > > > mapreduce.reduce.env > HADOOP_MAPRED_HOME=${full path of your hadoop distribution > directory} > > For more detailed output, check the application tracking page: > http://quasar-plnefj-2.quasar-plnefj.root.hwx.site:8088/cluster/app/application_1587300264561_0001 > Then click on links to logs of each attempt. > ... > 2020-04-19 05:49:12,148 INFO:util:* End test_app_API > (yarn.suite.YarnAPITests) * > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10798) Enhancements in RMAppManager: createAndPopulateNewRMApp and copyPlacementQueueToSubmissionContext
[ https://issues.apache.org/jira/browse/YARN-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-10798: - Assignee: (was: Szilard Nemeth) > Enhancements in RMAppManager: createAndPopulateNewRMApp and > copyPlacementQueueToSubmissionContext > - > > Key: YARN-10798 > URL: https://issues.apache.org/jira/browse/YARN-10798 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Priority: Major > > As a follow-up of YARN-10787, we need to do the following: > 1. Rename RMAppManager#copyPlacementQueueToSubmissionContext: This method not > really copies anything, it simply overrides the queue value. > 2. Add Debug log to print csqueue object before the authorization code: [Code > block|https://github.com/apache/hadoop/blob/2541efa496ba0e7e096ee5ec3c08d64b62036402/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java#L459-L475] > 3. Fix log messages: As 'copyPlacementQueueToSubmissionContext' overrides > (not copies) the original queue name with the queue name from the > PlacementContext, all calls to submissionContext.getQueue() will return the > short queue name. This results in very misleading log messages as well, > including the exception message itself: > {code} > org.apache.hadoop.yarn.exceptions.YarnException: > org.apache.hadoop.security.AccessControlException: User someuser1 does not > have permission to submit application_1621540945412_0001 to queue somequeue > {code} > All log messages should print the original submission queue, if possible. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9047) FairScheduler: default resource calculator is not resource type aware
[ https://issues.apache.org/jira/browse/YARN-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-9047: Assignee: (was: Szilard Nemeth) > FairScheduler: default resource calculator is not resource type aware > - > > Key: YARN-9047 > URL: https://issues.apache.org/jira/browse/YARN-9047 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-9047.001.patch, YARN-9047.002.patch, > YARN-9047.003.patch > > > The FairScheduler#getResourceCalculator always returns the default resource > calculator. The default calculator is not resource type aware and should only > be used if there are no resource types configured. > We need to make sure that in we the direct hard code reference to > {{RESOURCE_CALCULATOR}} is either safe to use in all cases or is not used in > the scheduler. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8078) TestDistributedShell#testDSShellWithoutDomainV2 fails on trunk
[ https://issues.apache.org/jira/browse/YARN-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-8078: Assignee: (was: Szilard Nemeth) > TestDistributedShell#testDSShellWithoutDomainV2 fails on trunk > -- > > Key: YARN-8078 > URL: https://issues.apache.org/jira/browse/YARN-8078 > Project: Hadoop YARN > Issue Type: Test >Reporter: Weiwei Yang >Priority: Major > Labels: UT > > java.lang.AssertionError: Unexpected number of YARN_CONTAINER_FINISHED event > published. > Expected :1 > Actual :0 > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.verifyEntityForTimelineV2(TestDistributedShell.java:692) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:584) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:450) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:309) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2(TestDistributedShell.java:305) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9421) Implement SafeMode for ResourceManager by defining a resource threshold
[ https://issues.apache.org/jira/browse/YARN-9421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-9421: Assignee: (was: Szilard Nemeth) > Implement SafeMode for ResourceManager by defining a resource threshold > --- > > Key: YARN-9421 > URL: https://issues.apache.org/jira/browse/YARN-9421 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Szilard Nemeth >Priority: Major > Attachments: client-log.log, nodemanager.log, resourcemanager.log > > > We have a hypothetical testcase in our test suite that tests Resource Types. > The test does the following: > 1. Sets up a resource named "gpu" > 2. Out of 9 NodeManager nodes, 1 node has 100 of "gpu". > 3. It executes a sleep job with resoure requests: > "-Dmapreduce.reduce.resource.gpu=7" and > "-Dyarn.app.mapreduce.am.resource.gpu=11" > Sometimes, we encounter situations when the app submission fails with: > {code:java} > 2019-02-25 06:09:56,795 WARN > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: RM app submission > failed in validating AM resource request for application > application_1551103768202_0001 > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request! Cannot allocate containers as requested resource is greater > than maximum allowed allocation. Requested resource type=[gpu], Requested > resource=, maximum allowed > allocation=, please note that maximum allowed > allocation is calculated by scheduler based on maximum resource of registered > NodeManagers, which might be less than configured maximum > allocation={code} > It's clearly visible that the maximum allowed allocation does not have any > "gpu" resources. > > Looking into the logs further, I realized that sometimes the node having the > "gpu" resources are registered after the app is submitted. > In a real world situation and even with this very special test exexution, we > can't be sure which order NMs are registering with RM. > With the advent of resource types, this issue was more likely surface. > If we have a cluster with some "rare" resources like GPUs only on some nodes > out of a 100, we can quickly run into a situation when the NMs with GPUs are > registering later than the normal nodes. While the critical NMs are still > registering, we will most likely experience the same > InvalidResourceRequestException if we submit jobs requesting GPUs. > There is a naive solution to this: > 1. Give some time for RM to wait for NMs to be able to register themselves > and put submitted applications on hold. This could work in some situations > but it's not the most flexible solution as different clusters can have > different requirements. Of course, we can make this more flexible by making > the timeout value configurable. > *A more flexible alternative would be:* > 2. We define a threshold of Resource capability: While we haven't reached > this threshold, we put submitted jobs on hold. Once we reached the threshold, > we enable jobs to pass through. > This is very similar to an already existing concept, the SafeMode in HDFS > ([https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Safemode]). > Back to my GPU example above, the threshold could be: 8 vcores, 16GB, 3 > GPUs. > Defining a threshold like this, we can ensure most of the submitted jobs > won't be lost, just "parked" until NMs are registered. > The final solution could be the Resource threshold, or the combination of the > threshold and timeout value. I'm open for any other suggestion as well. > *Last but not least, a very easy way to reproduce the issue on a 3 node > cluster:* > 1. Configure a resource type, named 'testres'. > 2. Node1 runs RM, Node 2/3 runs NMs > 3. Node2 has 1 testres > 4. Node3 has 0 testres > 5. Stop all nodes > 6. Start RM on Node1 > 7. Start NM on Node3 (the one without the resource) > 8. Start a pi job, request 1 testres for the AM > Here's the command to start the job: > {code:java} > MY_HADOOP_VERSION=3.3.0-SNAPSHOT;pushd /opt/hadoop;bin/yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" > pi -Dyarn.app.mapreduce.am.resource.testres=1 1 1000;popd{code} > > *Configurations*: > node1: yarn-site.xml of ResourceManager: > {code:java} > > yarn.resource-types > testres > {code} > node2: yarn-site.xml of NodeManager: > {code:java} > > yarn.resource-types > testres > > > yarn.nodemanager.resource-type.testres > 1 > {code} > node3: yarn-site.xml of NodeManager: > {code:java} > > yarn.resource-types > testres > {code} > Please see full process logs from RM, NM, YARN-client attached. -- This message was sent by Atlassian Jira (v8.20.10#820010) ---
[jira] [Assigned] (YARN-5684) testDecreaseAfterIncreaseWithAllocationExpiration fails intermittently
[ https://issues.apache.org/jira/browse/YARN-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-5684: Assignee: (was: Szilard Nemeth) > testDecreaseAfterIncreaseWithAllocationExpiration fails intermittently > --- > > Key: YARN-5684 > URL: https://issues.apache.org/jira/browse/YARN-5684 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Rohith Sharma K S >Priority: Major > > Saw the following in a precommit: > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer > testDecreaseAfterIncreaseWithAllocationExpiration(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer) > Time elapsed: 10.726 sec <<< FAILURE! > java.lang.AssertionError: expected:<3> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer.testDecreaseAfterIncreaseWithAllocationExpiration(TestIncreaseAllocationExpirer.java:367) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6286) TestCapacityScheduler.testAMLimitUsage throws UndeclaredThrowableException
[ https://issues.apache.org/jira/browse/YARN-6286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-6286: Assignee: (was: Szilard Nemeth) > TestCapacityScheduler.testAMLimitUsage throws UndeclaredThrowableException > -- > > Key: YARN-6286 > URL: https://issues.apache.org/jira/browse/YARN-6286 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Sunil G >Priority: Major > Labels: capacityscheduler > > {code} > testAMLimitUsage(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler) > Time elapsed: 0.124 sec <<< ERROR! > java.lang.reflect.UndeclaredThrowableException: null > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:253) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:218) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:189) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:497) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:384) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:295) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:664) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM$2.run(MockRM.java:752) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM$2.run(MockRM.java:746) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.submitApp(MockRM.java:765) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.submitApp(MockRM.java:665) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.submitApp(MockRM.java:572) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler.verifyAMLimitForLeafQueue(TestCapacityScheduler.java:3370) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler.testAMLimitUsage(TestCapacityScheduler.java:3232) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5897) Use drainEvent to replace sleep-wait in MockRM#waitForState
[ https://issues.apache.org/jira/browse/YARN-5897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-5897: - Summary: Use drainEvent to replace sleep-wait in MockRM#waitForState (was: using drainEvent to replace sleep-wait in MockRM#waitForState) > Use drainEvent to replace sleep-wait in MockRM#waitForState > --- > > Key: YARN-5897 > URL: https://issues.apache.org/jira/browse/YARN-5897 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: sandflee >Assignee: Szilard Nemeth >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8818) Yarn log aggregation of spark streaming job
[ https://issues.apache.org/jira/browse/YARN-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-8818: Assignee: (was: Szilard Nemeth) > Yarn log aggregation of spark streaming job > --- > > Key: YARN-8818 > URL: https://issues.apache.org/jira/browse/YARN-8818 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ayush Chauhan >Priority: Major > > By default, YARN aggregates logs after an application completes. But I am > trying to aggregate logs for spark streaming job which in theory will run > forever. I have set the following properties for log aggregation and > restarted yarn by restarting {{hadoop-yarn-nodemanager}} for core & task > nodes and {{hadoop-yarn-resourcemanager}} for master node on my emr cluster. > I can view my changes in [http://node-ip:8088/conf]. > {noformat} > yarn.log-aggregation-enable => true{noformat} > {noformat} > yarn.log-aggregation.retain-seconds => 172800{noformat} > {noformat} > yarn.log-aggregation.retain-check-interval-seconds => -1 {noformat} > {noformat} > yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds => > 3600{noformat} > All the articles and resources have only mentioned to include > {{yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds}} > property and yarn will start aggregating logs for running jobs. But it is not > working in my case. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7631) ResourceRequest with different Capacity (Resource) overrides each other in RM and thus lost
[ https://issues.apache.org/jira/browse/YARN-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-7631: Assignee: (was: Szilard Nemeth) > ResourceRequest with different Capacity (Resource) overrides each other in RM > and thus lost > --- > > Key: YARN-7631 > URL: https://issues.apache.org/jira/browse/YARN-7631 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Botong Huang >Priority: Major > Attachments: resourcebug.patch > > > Today in AMRMClientImpl, the ResourceRequests (RR) are kept as: RequestId -> > Priority -> ResourceName -> ExecutionType -> Resource (Capacity) -> > ResourceRequestInfo (the actual RR). This means that only RRs with the same > (requestId, priority, resourcename, executionType, resource) will be grouped > and aggregated together. > While in RM side, the mapping is SchedulerRequestKey (RequestId, priority) -> > LocalityAppPlacementAllocator (ResourceName -> RR). > The issue is that in RM side Resource is not in the key to the RR at all. > (Note that executionType is also not in the RM side, but it is fine because > RM handles it separately as container update requests.) This means that under > the same value of (requestId, priority, resourcename), RRs with different > Resource values will be grouped together and override each other in RM. As a > result, some of the container requests are lost and will never be allocated. > Furthermore, since the two RRs are kept under different keys in AMRMClient > side, allocation of RR1 will only trigger cancel for RR1, the pending RR2 > will not get resend as well. > I’ve attached an unit test (resourcebug.patch) which is failing in trunk to > illustrate this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7903) Method getStarvedResourceRequests() only consider the first encountered resource
[ https://issues.apache.org/jira/browse/YARN-7903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-7903: Assignee: (was: Szilard Nemeth) > Method getStarvedResourceRequests() only consider the first encountered > resource > > > Key: YARN-7903 > URL: https://issues.apache.org/jira/browse/YARN-7903 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Yufei Gu >Priority: Major > > We need to specify rack and ANY while submitting a node local resource > request, as YARN-7561 discussed. For example: > {code} > ResourceRequest nodeRequest = > createResourceRequest(GB, node1.getHostName(), 1, 1, false); > ResourceRequest rackRequest = > createResourceRequest(GB, node1.getRackName(), 1, 1, false); > ResourceRequest anyRequest = > createResourceRequest(GB, ResourceRequest.ANY, 1, 1, false); > List resourceRequests = > Arrays.asList(nodeRequest, rackRequest, anyRequest); > {code} > However, method getStarvedResourceRequests() only consider the first > encountered resource, which most likely is ResourceRequest.ANY. That's a > mismatch for locality request. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10837) Break down effectiveMinRatio calculation in ResourceConfigMode
[ https://issues.apache.org/jira/browse/YARN-10837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-10837: - Assignee: (was: Szilard Nemeth) > Break down effectiveMinRatio calculation in ResourceConfigMode > -- > > Key: YARN-10837 > URL: https://issues.apache.org/jira/browse/YARN-10837 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Andras Gyori >Priority: Major > > In ResourceConfigMode, the effectiveMinRatio resource calculation is hard to > understand, not documented and involves long methods. It must be refactored > and cleaned up in order to eliminate the future code debt. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-4929) Explore a better way than sleeping for a while in some test cases
[ https://issues.apache.org/jira/browse/YARN-4929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-4929: Assignee: (was: Szilard Nemeth) > Explore a better way than sleeping for a while in some test cases > - > > Key: YARN-4929 > URL: https://issues.apache.org/jira/browse/YARN-4929 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yufei Gu >Priority: Major > > The following unit test cases failed because we removed the minimum wait time > for attempt in YARN-4807. I manually added sleeps so the tests pass and added > a TODO in the code. We can explore a better way to do it. > - TestAMRestart.testRMAppAttemptFailuresValidityInterval > - TestApplicationMasterService.testResourceTypes > - TestContainerResourceUsage.testUsageAfterAMRestartWithMultipleContainers > - TestRMApplicationHistoryWriter.testRMWritingMassiveHistoryForFairSche > - TestRMApplicationHistoryWriter.testRMWritingMassiveHistoryForCapacitySche -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7908) TestSystemMetricsPublisher#testPublishContainerMetrics can fail with an NPE
[ https://issues.apache.org/jira/browse/YARN-7908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-7908: Assignee: (was: Szilard Nemeth) > TestSystemMetricsPublisher#testPublishContainerMetrics can fail with an NPE > --- > > Key: YARN-7908 > URL: https://issues.apache.org/jira/browse/YARN-7908 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.3 >Reporter: Jason Darrell Lowe >Priority: Major > > testPublishContainerMetrics can fail with a NullPointerException: > {noformat} > Running > org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher > Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.42 sec <<< > FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher > testPublishContainerMetrics(org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher) > Time elapsed: 0.031 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher.testPublishContainerMetrics(TestSystemMetricsPublisher.java:454) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10836) Clean up queue config mode methods
[ https://issues.apache.org/jira/browse/YARN-10836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-10836: - Assignee: (was: Szilard Nemeth) > Clean up queue config mode methods > -- > > Key: YARN-10836 > URL: https://issues.apache.org/jira/browse/YARN-10836 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Andras Gyori >Priority: Major > > After YARN-10759 is merged, it would be advisable to refactor long methods > inside the different classes. Also the error messages could be improved. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7548) TestCapacityOverTimePolicy.testAllocation is flaky
[ https://issues.apache.org/jira/browse/YARN-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820070#comment-17820070 ] Szilard Nemeth commented on YARN-7548: -- [~susheel_7] Sure, feel free to assign it to yourself and work on it. > TestCapacityOverTimePolicy.testAllocation is flaky > -- > > Key: YARN-7548 > URL: https://issues.apache.org/jira/browse/YARN-7548 > Project: Hadoop YARN > Issue Type: Bug > Components: reservation system >Affects Versions: 3.0.0-beta1 >Reporter: Haibo Chen >Assignee: Susheel Gupta >Priority: Major > > *Reported at: 15/Nov/18 20:32* > It failed in both YARN-7337 and YARN-6921 jenkins jobs. > org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy.testAllocation[Duration > 90,000,000, height 0.25, numSubmission 1, periodic 8640)] > *Stacktrace* > {code:java} > junit.framework.AssertionFailedError: null > at junit.framework.Assert.fail(Assert.java:55) > at junit.framework.Assert.fail(Assert.java:64) > at junit.framework.TestCase.fail(TestCase.java:235) > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.BaseSharingPolicyTest.runTest(BaseSharingPolicyTest.java:146) > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy.testAllocation(TestCapacityOverTimePolicy.java:136){code} > *Standard Output* > {code:java} > 2017-11-20 23:57:03,759 INFO [main] recovery.RMStateStore > (RMStateStore.java:transition(538)) - Storing reservation > allocation.reservation_-9026698577416205920_6337917439559340517 > 2017-11-20 23:57:03,759 INFO [main] recovery.RMStateStore > (MemoryRMStateStore.java:storeReservationState(247)) - Storing > reservationallocation for > reservation_-9026698577416205920_6337917439559340517 for plan dedicated > 2017-11-20 23:57:03,760 INFO [main] reservation.InMemoryPlan > (InMemoryPlan.java:addReservation(373)) - Successfully added reservation: > reservation_-9026698577416205920_6337917439559340517 to plan. > In-memory Plan: Parent Queue: dedicatedTotal Capacity: vCores:1000>Step: 1000reservation_-9026698577416205920_6337917439559340517 > user:u1 startTime: 0 endTime: 8640 Periodiciy: 8640 alloc: > [Period: 8640 > 0: > 3423748: > 86223748: > 8640: > 9223372036854775807: null > ] > {code} > *Reported at: 21/Feb/24* > Ran TestCapacityOverTimePolicy testcase locally 100 times in a row and found > it failed 5 times with the below error: > [INFO] Running > org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy > [ERROR] Tests run: 30, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 0.503 s <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy > [ERROR] testAllocation[Duration 60,000, height 0.25, numSubmission 3, > periodic > 720)](org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy) > Time elapsed: 0.009 s <<< ERROR! > org.apache.hadoop.yarn.server.resourcemanager.reservation.exceptions.PlanningQuotaException: > Integral (avg over time) quota capacity 0.25 over a window of 86400 seconds, > would be exceeded by accepting reservation: > reservation_-7619846766601560789_3793931544284185119 > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.CapacityOverTimePolicy.validate(CapacityOverTimePolicy.java:206) > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.InMemoryPlan.addReservation(InMemoryPlan.java:348) > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.BaseSharingPolicyTest.runTest(BaseSharingPolicyTest.java:141) > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy.testAllocation(TestCapacityOverTimePolicy.java:136) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(B
[jira] [Assigned] (YARN-10853) Add more tests to TestUsersManager
[ https://issues.apache.org/jira/browse/YARN-10853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-10853: - Assignee: (was: Szilard Nemeth) > Add more tests to TestUsersManager > -- > > Key: YARN-10853 > URL: https://issues.apache.org/jira/browse/YARN-10853 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Priority: Minor > Attachments: UsersManager.html > > > Running TestUsersManager with code coverage measurements only gives 18% line > coverage for class "UsersManager". This value is pretty low. > See the attached coverage report for that class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-11590) RM process stuck after calling confStore.format() when ZK SSL/TLS is enabled, as netty thread waits indefinitely
[ https://issues.apache.org/jira/browse/YARN-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-11590. --- Hadoop Flags: Reviewed Resolution: Fixed > RM process stuck after calling confStore.format() when ZK SSL/TLS is enabled, > as netty thread waits indefinitely > - > > Key: YARN-11590 > URL: https://issues.apache.org/jira/browse/YARN-11590 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Ferenc Erdelyi >Assignee: Ferenc Erdelyi >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > YARN-11468 enabled Zookeeper SSL/TLS support for YARN. > Curator uses ClientCnxnSocketNetty for secured connection and the thread > needs to be closed after calling confStore.format() to avoid the netty thread > waiting indefinitely, which renders the RM unresponsive after deleting the > confstore when started with the "-format-conf-store" arg. > The unclosed thread, which keeps RM running: > {code:java} > 2023-10-10 12:13:01,000 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: The > Thread[main-SendThread(ferdelyi-1.ferdelyi.root.hwx.site:2182),5,main]TIMED_WAITING > is stands at [sun.misc.Unsafe.park(Native Method), > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215), > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078), > > java.util.concurrent.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:522), > java.util.concurrent.LinkedBlockingDeque.poll(LinkedBlockingDeque.java:684), > org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:275), > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1289)] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11590) RM process stuck after calling confStore.format() when ZK SSL/TLS is enabled, as netty thread waits indefinitely
[ https://issues.apache.org/jira/browse/YARN-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-11590: -- Fix Version/s: 3.4.0 > RM process stuck after calling confStore.format() when ZK SSL/TLS is enabled, > as netty thread waits indefinitely > - > > Key: YARN-11590 > URL: https://issues.apache.org/jira/browse/YARN-11590 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Ferenc Erdelyi >Assignee: Ferenc Erdelyi >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > YARN-11468 enabled Zookeeper SSL/TLS support for YARN. > Curator uses ClientCnxnSocketNetty for secured connection and the thread > needs to be closed after calling confStore.format() to avoid the netty thread > waiting indefinitely, which renders the RM unresponsive after deleting the > confstore when started with the "-format-conf-store" arg. > The unclosed thread, which keeps RM running: > {code:java} > 2023-10-10 12:13:01,000 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: The > Thread[main-SendThread(ferdelyi-1.ferdelyi.root.hwx.site:2182),5,main]TIMED_WAITING > is stands at [sun.misc.Unsafe.park(Native Method), > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215), > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078), > > java.util.concurrent.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:522), > java.util.concurrent.LinkedBlockingDeque.poll(LinkedBlockingDeque.java:684), > org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:275), > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1289)] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-11468) Zookeeper SSL/TLS support
[ https://issues.apache.org/jira/browse/YARN-11468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-11468. --- Fix Version/s: 3.4.0 Resolution: Fixed > Zookeeper SSL/TLS support > - > > Key: YARN-11468 > URL: https://issues.apache.org/jira/browse/YARN-11468 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Ferenc Erdelyi >Assignee: Ferenc Erdelyi >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0 > > > Zookeeper 3.5.5 server can operate with SSL/TLS secure connection with its > clients. > [https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide] > The SSL communication should be possible in the different parts of YARN, > where it communicates with Zookeeper servers. The Zookeeper clients are used > in the following places: > * ResourceManager > * ZKConfigurationStore > * ZKRMStateStore > The yarn.resourcemanager.zk-client-ssl.enabled flag to enable SSL > communication should be provided in the yarn-default.xml and the required > parameters for the keystore and truststore should be picked up from the > core-default.xml (HADOOP-18709) > yarn.resourcemanager.ha.curator-leader-elector.enabled has to set to true via > yarn-site.xml to make sure Curator is used, otherwise we can't enable SSL. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11573) Add config option to make container allocation prefer nodes without reserved containers
[ https://issues.apache.org/jira/browse/YARN-11573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-11573: -- Fix Version/s: 3.4.0 > Add config option to make container allocation prefer nodes without reserved > containers > --- > > Key: YARN-11573 > URL: https://issues.apache.org/jira/browse/YARN-11573 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > Applications could be stuck when the container allocation logic does not > consider more nodes, but only nodes that are having reserved containers. > This behavior can even block new AMs to be allocated on nodes so they don't > reach the running status. > A jira that mentions the same thing is YARN-9598: > {quote}Nodes which have been reserved should be skipped when iterating > candidates in RegularContainerAllocator#allocate, otherwise scheduler may > generate allocation or reservation proposal on these node which will always > be rejected in FiCaScheduler#commonCheckContainerAllocation. > {quote} > Since this jira implements 2 other points, I decided to create this one and > implement the 3rd point separately. > h2. Notes: > 1. FiCaSchedulerApp#commonCheckContainerAllocation will log this: > {code:java} > Trying to allocate from reserved container in async scheduling mode > {code} > in case RegularContainerAllocator creates a reservation proposal for nodes > having reserved container. > 2. A better way is to prevent generating an AM container (or even normal > container) allocation proposal on a node if it already has a reservation on > it and we still have more nodes to check in the preferred node set. > Completely disabling task containers from being allocated to worker nodes > could limit the downscaling ability that we have currently. > h2. 3. CALL HIERARCHY > 1. > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#nodeUpdate > 2. > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#allocateContainersToNode(org.apache.hadoop.yarn.api.records.NodeId, > boolean) > 3. > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#allocateContainersToNode(org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.CandidateNodeSet, > boolean) > 3.1. This is the place where it is decided whether to call > allocateContainerOnSingleNode or allocateContainersOnMultiNodes > 4. > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#allocateContainersOnMultiNodes > 5. > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#allocateOrReserveNewContainers > 6. > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue#assignContainers > 7. > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractParentQueue#assignContainersToChildQueues > 8. > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractLeafQueue#assignContainers > 9. > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp#assignContainers > 10. > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#assignContainers > 11. > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#allocate > 12. > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#tryAllocateOnNode > 13. > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#assignContainersOnNode > 14. > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#assignNodeLocalContainers > 15. > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#assignContainer > Logs these lines as an example: > {code:java} > 2023-08-23 17:44:08,129 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator: > assignContainers: node= application=application_1692304118418_3151 > priority=0 pendingAsk= vCores:1>,repeat=1> type=OFF_SWITCH > {code} > h2. 4. DETAILS OF RegularContainerAllocator#allocate > [Method > definition|https://github.com/apache/hadoop/blob/9342ecf6ccd5c7ef443a0eb722852d2addc1d5db/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java#L826-L896] > 4.1. Defining ordered l
[jira] [Updated] (YARN-11573) Add config option to make container allocation prefer nodes without reserved containers
[ https://issues.apache.org/jira/browse/YARN-11573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-11573: -- Description: Applications could be stuck when the container allocation logic does not consider more nodes, but only nodes that are having reserved containers. This behavior can even block new AMs to be allocated on nodes so they don't reach the running status. A jira that mentions the same thing is YARN-9598: {quote}Nodes which have been reserved should be skipped when iterating candidates in RegularContainerAllocator#allocate, otherwise scheduler may generate allocation or reservation proposal on these node which will always be rejected in FiCaScheduler#commonCheckContainerAllocation. {quote} Since this jira implements 2 other points, I decided to create this one and implement the 3rd point separately. h2. Notes: 1. FiCaSchedulerApp#commonCheckContainerAllocation will log this: {code:java} Trying to allocate from reserved container in async scheduling mode {code} in case RegularContainerAllocator creates a reservation proposal for nodes having reserved container. 2. A better way is to prevent generating an AM container (or even normal container) allocation proposal on a node if it already has a reservation on it and we still have more nodes to check in the preferred node set. Completely disabling task containers from being allocated to worker nodes could limit the downscaling ability that we have currently. h2. 3. CALL HIERARCHY 1. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#nodeUpdate 2. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#allocateContainersToNode(org.apache.hadoop.yarn.api.records.NodeId, boolean) 3. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#allocateContainersToNode(org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.CandidateNodeSet, boolean) 3.1. This is the place where it is decided whether to call allocateContainerOnSingleNode or allocateContainersOnMultiNodes 4. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#allocateContainersOnMultiNodes 5. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#allocateOrReserveNewContainers 6. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue#assignContainers 7. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractParentQueue#assignContainersToChildQueues 8. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractLeafQueue#assignContainers 9. org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp#assignContainers 10. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#assignContainers 11. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#allocate 12. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#tryAllocateOnNode 13. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#assignContainersOnNode 14. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#assignNodeLocalContainers 15. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#assignContainer Logs these lines as an example: {code:java} 2023-08-23 17:44:08,129 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator: assignContainers: node= application=application_1692304118418_3151 priority=0 pendingAsk=,repeat=1> type=OFF_SWITCH {code} h2. 4. DETAILS OF RegularContainerAllocator#allocate [Method definition|https://github.com/apache/hadoop/blob/9342ecf6ccd5c7ef443a0eb722852d2addc1d5db/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java#L826-L896] 4.1. Defining ordered list of nodes to allocate containers on: [LINK|https://github.com/apache/hadoop/blob/9342ecf6ccd5c7ef443a0eb722852d2addc1d5db/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java#L851-L852] {code:java} Iterator iter = schedulingPS.getPreferredNodeIterator( candidates); {code} 4.2. org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.AppPlacementAllocator#getPreferredNodeIterator 4.3. org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.MultiNodeSortingManager#getMultiNodeSortIterator ([LINK|https://github.com/apache/hadoop/blob/9
[jira] [Updated] (YARN-11573) Add config option to make container allocation prefer nodes without reserved containers
[ https://issues.apache.org/jira/browse/YARN-11573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-11573: -- Description: Applications could be stuck when the container allocation logic does not consider more nodes, but only nodes that are having reserved containers. This behavior can even block new AMs to be allocated on nodes so they don't reach the running status. A jira that mentions the same thing is YARN-9598: {quote}Nodes which have been reserved should be skipped when iterating candidates in RegularContainerAllocator#allocate, otherwise scheduler may generate allocation or reservation proposal on these node which will always be rejected in FiCaScheduler#commonCheckContainerAllocation. {quote} Since this jira implements 2 other points, I decided to create this one and implement the 3rd point separately. Notes: 1. FiCaSchedulerApp#commonCheckContainerAllocation will log this: {code:java} Trying to allocate from reserved container in async scheduling mode {code} in case RegularContainerAllocator creates a reservation proposal for nodes having reserved container. 2. A better way is to prevent generating an AM container (or even normal container) allocation proposal on a node if it already has a reservation on it and we still have more nodes to check in the preferred node set. Completely disabling task containers from being allocated to worker nodes could limit the downscaling ability that we have currently. 3. CALL HIERARCHY 1. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#nodeUpdate 2. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#allocateContainersToNode(org.apache.hadoop.yarn.api.records.NodeId, boolean) 3. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#allocateContainersToNode(org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.CandidateNodeSet, boolean) 3.1. This is the place where it is decided whether to call allocateContainerOnSingleNode or allocateContainersOnMultiNodes 4. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#allocateContainersOnMultiNodes 5. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#allocateOrReserveNewContainers 6. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue#assignContainers 7. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractParentQueue#assignContainersToChildQueues 8. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractLeafQueue#assignContainers 9. org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp#assignContainers 10. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#assignContainers 11. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#allocate 12. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#tryAllocateOnNode 13. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#assignContainersOnNode 14. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#assignNodeLocalContainers 15. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#assignContainer Logs these lines as an example: {code:java} 2023-08-23 17:44:08,129 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator: assignContainers: node= application=application_1692304118418_3151 priority=0 pendingAsk=,repeat=1> type=OFF_SWITCH {code} 4. DETAILS OF RegularContainerAllocator#allocate [Method definition|https://github.com/apache/hadoop/blob/9342ecf6ccd5c7ef443a0eb722852d2addc1d5db/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java#L826-L896] 4.1. Defining ordered list of nodes to allocate containers on: [LINK|https://github.com/apache/hadoop/blob/9342ecf6ccd5c7ef443a0eb722852d2addc1d5db/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java#L851-L852] {code:java} Iterator iter = schedulingPS.getPreferredNodeIterator( candidates); {code} 4.2. org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.AppPlacementAllocator#getPreferredNodeIterator 4.3. org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.MultiNodeSortingManager#getMultiNodeSortIterator ([LINK|https://github.com/apache/hadoop/blob/9342ecf6ccd5c7
[jira] [Created] (YARN-11573) Add config option to make container allocation prefer nodes without reserved containers
Szilard Nemeth created YARN-11573: - Summary: Add config option to make container allocation prefer nodes without reserved containers Key: YARN-11573 URL: https://issues.apache.org/jira/browse/YARN-11573 Project: Hadoop YARN Issue Type: Bug Components: capacity scheduler Reporter: Szilard Nemeth Assignee: Szilard Nemeth -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11523) CapacityScheduler.md is incorrectly formatted
Szilard Nemeth created YARN-11523: - Summary: CapacityScheduler.md is incorrectly formatted Key: YARN-11523 URL: https://issues.apache.org/jira/browse/YARN-11523 Project: Hadoop YARN Issue Type: Improvement Reporter: Szilard Nemeth I noticed that the headers are not formatted corretly, I can see many "###"s instead of proper markdown headings, I think the space is missing between the hash and the name of the headings. See: https://github.com/apache/hadoop/blob/4bd873b816dbd889f410428d6e618586d4ff1780/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11490) JMX QueueMetrics breaks after mutable config validation in CS
[ https://issues.apache.org/jira/browse/YARN-11490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721946#comment-17721946 ] Szilard Nemeth commented on YARN-11490: --- Hi [~tdomok], Nice finding. I do agree with your statements. 1. The memory leak {quote} Revert YARN-11211, it's a nasty bug and the "leak" only causes problems if the validation API is abused with unique queue names. Note that YARN-11211 did not solve the leak problem either, details above. {quote} Good that you characterized the nature of the leak, I think it's okay to revert YARN-11211 in this case. Please file a separate bug ticket for the leak. 3. Validate separately: {quote} Spawn a separate process for configuration validation with the proper config / state. Not sure if this is feasible or not, but it would be the cleanest. {quote} I agree that this would be the cleanest approach but given the current state of the codebase I really doubt it's easy to implement. > JMX QueueMetrics breaks after mutable config validation in CS > - > > Key: YARN-11490 > URL: https://issues.apache.org/jira/browse/YARN-11490 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.4.0 >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Major > Labels: pull-request-available > Attachments: addqueue.xml, defaultqueue.json, > hadoop-tdomok-resourcemanager-tdomok-MBP16.log, removequeue.xml, > stopqueue.json > > > Reproduction steps: > 1. Submit a long running job > {code} > hadoop-3.4.0-SNAPSHOT/bin/yarn jar > hadoop-3.4.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.4.0-SNAPSHOT-tests.jar > sleep -m 1 -r 1 -rt 120 -mt 20 > {code} > 2. Verify that there is one running app > {code} > $ curl http://localhost:8088/ws/v1/cluster/metrics | jq > {code} > 3. Verify that the JMX endpoint reports 1 running app as well > {code} > $ curl http://localhost:8088/jmx | jq > {code} > 4. Validate the configuration (x2) > {code} > $ curl -X POST -H 'Content-Type: application/json' -d @defaultqueue.json > localhost:8088/ws/v1/cluster/scheduler-conf/validate > $ cat defaultqueue.json > {"update-queue":{"queue-name":"root.default","params":{"entry":{"key":"maximum-applications","value":"100"}}},"subClusterId":"","global":null,"global-updates":null} > {code} > 5. Check 2. and 3. again. The cluster metrics should still work but the JMX > endpoint will show 0 running apps, that's the bug. > It is caused by YARN-11211, reverting that patch (or only removing the > _QueueMetrics.clearQueueMetrics();_ line) fixes the issue. But I think that > would re-introduce the memory leak. > It looks like the QUEUE_METRICS hash map is "add-only", the > clearQueueMetrics() was only called from ResourceManager.reinitialize() > method (transitionToActive/transitionToStandby) prior to YARN-11211. > Constantly adding and removing queues with unique names would cause a leak as > well, because there is no remove from QUEUE_METRICS, so it is not just the > validation API that has this problem. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11482) Fix bug of DRF comparison DominantResourceFairnessComparator2 in fair scheduler
[ https://issues.apache.org/jira/browse/YARN-11482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-11482: -- Summary: Fix bug of DRF comparison DominantResourceFairnessComparator2 in fair scheduler (was: Fix bug of DRF comparision DominantResourceFairnessComparator2 in fair scheduler) > Fix bug of DRF comparison DominantResourceFairnessComparator2 in fair > scheduler > --- > > Key: YARN-11482 > URL: https://issues.apache.org/jira/browse/YARN-11482 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.6 > > > DominantResourceFairnessComparator2 was using wrong resource info to get if > one queue is needy or not now. We should fix it. > {code:java} > boolean s1Needy = resourceInfo1[dominant1].getValue() < > minShareInfo1[dominant1].getValue(); > boolean s2Needy = resourceInfo1[dominant2].getValue() < > minShareInfo2[dominant2].getValue(); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11464) queue element is added to any other leaf queue, it's queueType becomes QueueType.PARENT_QUEUE
[ https://issues.apache.org/jira/browse/YARN-11464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719603#comment-17719603 ] Szilard Nemeth commented on YARN-11464: --- Hi [~susheel_7] , Is this a test only issue? >From the title it's not clear for me. > queue element is added to any other leaf queue, it's queueType > becomes QueueType.PARENT_QUEUE > > > Key: YARN-11464 > URL: https://issues.apache.org/jira/browse/YARN-11464 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.3.4 >Reporter: Susheel Gupta >Priority: Major > > This testcase clearly reproduces the issue. There is a missing dot before > "auto-queue-creation-v2.enabled" for method call assertNoValueForQueues. > {code:java} > @Test > public void testAutoCreateV2FlagsInWeightMode() { > converter = builder.withPercentages(false).build(); > converter.convertQueueHierarchy(rootQueue); > assertTrue("root autocreate v2 flag", > csConfig.getBoolean( > PREFIX + "root.auto-queue-creation-v2.enabled", false)); > assertTrue("root.admins autocreate v2 flag", > csConfig.getBoolean( > PREFIX + "root.admins.auto-queue-creation-v2.enabled", false)); > assertTrue("root.users autocreate v2 flag", > csConfig.getBoolean( > PREFIX + "root.users.auto-queue-creation-v2.enabled", false)); > assertTrue("root.misc autocreate v2 flag", > csConfig.getBoolean( > PREFIX + "root.misc.auto-queue-creation-v2.enabled", false)); > Set leafs = Sets.difference(ALL_QUEUES, > Sets.newHashSet("root", > "root.default", > "root.admins", > "root.users", > "root.misc")); > assertNoValueForQueues(leafs, "auto-queue-creation-v2.enabled", > csConfig); > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-11079) Make an AbstractParentQueue to store common ParentQueue and ManagedParentQueue functionality
[ https://issues.apache.org/jira/browse/YARN-11079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-11079. --- Hadoop Flags: Reviewed Resolution: Fixed > Make an AbstractParentQueue to store common ParentQueue and > ManagedParentQueue functionality > > > Key: YARN-11079 > URL: https://issues.apache.org/jira/browse/YARN-11079 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Benjamin Teke >Assignee: Susheel Gupta >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > ParentQueue is an instantiable class which stores the necessary functionality > of parent queues, however it is also extended by the > AbstractManagedParentQueue, which is an abstract class for storing managed > parent queue functionality. Since legacy AQC doesn't allow dynamic queues > next to static ones, managed parent queues technically behave like leaf > queues by not having any static child queues when created. This structure and > behaviour is really error prone, as for example if someone is not completely > aware of this and simply changes the checking order by first checking if the > queue in question is a ParentQueue in a method like > MappingRuleValidationContextImpl.isDynamicParent can result a completely > wrong return value (as a ManagedParent is a dynamic parent, but currently > it's also a ParentQueue, and ManagedParent cannot have the > isEligibleForAutoQueueCreation as true, so the method will return false). > {code:java} > private boolean isDynamicParent(CSQueue queue) { > if (queue == null) { > return false; > } > if (queue instanceof ManagedParentQueue) { > return true; > } > if (queue instanceof ParentQueue) { > return ((ParentQueue)queue).isEligibleForAutoQueueCreation(); > } > return false; > } > {code} > Similarly to YARN-11024 an AbstractParentQueue class should be created to > completely separate the managed parents from the instantiable ParentQueue > class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11079) Make an AbstractParentQueue to store common ParentQueue and ManagedParentQueue functionality
[ https://issues.apache.org/jira/browse/YARN-11079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-11079: -- Fix Version/s: 3.4.0 > Make an AbstractParentQueue to store common ParentQueue and > ManagedParentQueue functionality > > > Key: YARN-11079 > URL: https://issues.apache.org/jira/browse/YARN-11079 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Benjamin Teke >Assignee: Susheel Gupta >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > ParentQueue is an instantiable class which stores the necessary functionality > of parent queues, however it is also extended by the > AbstractManagedParentQueue, which is an abstract class for storing managed > parent queue functionality. Since legacy AQC doesn't allow dynamic queues > next to static ones, managed parent queues technically behave like leaf > queues by not having any static child queues when created. This structure and > behaviour is really error prone, as for example if someone is not completely > aware of this and simply changes the checking order by first checking if the > queue in question is a ParentQueue in a method like > MappingRuleValidationContextImpl.isDynamicParent can result a completely > wrong return value (as a ManagedParent is a dynamic parent, but currently > it's also a ParentQueue, and ManagedParent cannot have the > isEligibleForAutoQueueCreation as true, so the method will return false). > {code:java} > private boolean isDynamicParent(CSQueue queue) { > if (queue == null) { > return false; > } > if (queue instanceof ManagedParentQueue) { > return true; > } > if (queue instanceof ParentQueue) { > return ((ParentQueue)queue).isEligibleForAutoQueueCreation(); > } > return false; > } > {code} > Similarly to YARN-11024 an AbstractParentQueue class should be created to > completely separate the managed parents from the instantiable ParentQueue > class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10178) Global Scheduler async thread crash caused by 'Comparison method violates its general contract
[ https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-10178: -- Description: Stack trace: {code:java} ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException: Comparison method violates its general contract! at java.util.TimSort.mergeHi(TimSort.java:899) at java.util.TimSort.mergeAt(TimSort.java:516) at java.util.TimSort.mergeForceCollapse(TimSort.java:457) at java.util.TimSort.sort(TimSort.java:254) at java.util.Arrays.sort(Arrays.java:1512) at java.util.ArrayList.sort(ArrayList.java:1462) at java.util.Collections.sort(Collections.java:177) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1732) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1481) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:569) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:616) {code} In JDK 8, Arrays.sort by default is using the timsort algorithm, and timsort has a few requirements: {code:java} 1.x.compareTo(y) != y.compareTo(x) 2.x>y,y>z --> x > z 3.x=y, x.compareTo(z) == y.compareTo(z) {code} If the Array / List does not satisfy any of these requirements, TimSort will throw a java.lang.IllegalArgumentException. 1. If we take a look into PriorityUtilizationQueueOrderingPolicy.compare method, we can see that Capacity Scheduler these queue fields in order to compare resource usage: {code:java} AbsoluteUsedCapacity UsedCapacity ConfiguredMinResource AbsoluteCapacity {code} 2. In CS, during the execution of AsyncScheduleThread while the queues are being sorted in PriorityUtilizationQueueOrderingPolicy, for choosing the queue to assign the container to this IllegalArgumentException is thrown. 3. If we take a look into the ResourceCommitterService method, it tries to commit a CSAssignment coming from the ResourceCommitRequest, look tryCommit function, the queue resource usage is being updated. {code:java} public boolean tryCommit(Resource cluster, ResourceCommitRequest r, boolean updatePending) { long commitStart = System.nanoTime(); ResourceCommitRequest request = (ResourceCommitRequest) r; ... boolean isSuccess = false; if (attemptId != null) { FiCaSchedulerApp app = getApplicationAttempt(attemptId); // Required sanity check for attemptId - when async-scheduling enabled, // proposal might be outdated if AM failover just finished // and proposal queue was not be consumed in time if (app != null && attemptId.equals(app.getApplicationAttemptId())) { if (app.accept(cluster, request, updatePending) && app.apply(cluster, request, updatePending)) { // apply this resource ... } } } return isSuccess; } } {code} {code:java} public boolean apply(Resource cluster, ResourceCommitRequest request, boolean updatePending) { ... if (!reReservation) { getCSLeafQueue().apply(cluster, request); } ... } {code} 4. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue#apply invokes org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue#allocateResource: {code:java} void allocateResource(Resource clusterResource, Resource resource, String nodePartition) { try { writeLock.lock(); // only lock leaf queue lock queueUsage.incUsed(nodePartition, resource); ++numContainers; CSQueueUtils.updateQue
[jira] [Updated] (YARN-10178) Global Scheduler async thread crash caused by 'Comparison method violates its general contract
[ https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-10178: -- Description: Stack trace: {code:java} ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException: Comparison method violates its general contract! at java.util.TimSort.mergeHi(TimSort.java:899) at java.util.TimSort.mergeAt(TimSort.java:516) at java.util.TimSort.mergeForceCollapse(TimSort.java:457) at java.util.TimSort.sort(TimSort.java:254) at java.util.Arrays.sort(Arrays.java:1512) at java.util.ArrayList.sort(ArrayList.java:1462) at java.util.Collections.sort(Collections.java:177) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1732) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1481) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:569) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:616) {code} In JDK 8, Arrays.sort by default is using the timsort algorithm, and timsort has a few requirements: {code:java} 1.x.compareTo(y) != y.compareTo(x) 2.x>y,y>z --> x > z 3.x=y, x.compareTo(z) == y.compareTo(z) {code} If the Array / List does not satisfy any of these requirements, TimSort will throw a java.lang.IllegalArgumentException. 1. If we take a look into PriorityUtilizationQueueOrderingPolicy.compare method, we can see that Capacity Scheduler these queue fields in order to compare resource usage: {code:java} AbsoluteUsedCapacity UsedCapacity ConfiguredMinResource AbsoluteCapacity {code} 2. In CS, during the execution of AsyncScheduleThread while the queues are being sorted in PriorityUtilizationQueueOrderingPolicy, for choosing the queue to assign the container to this IllegalArgumentException is thrown. 3. If we take a look into the ResourceCommitterService method, it tries to commit a CSAssignment coming from the ResourceCommitRequest, look tryCommit function, the queue resource usage is being updated. {code:java} public boolean tryCommit(Resource cluster, ResourceCommitRequest r, boolean updatePending) { long commitStart = System.nanoTime(); ResourceCommitRequest request = (ResourceCommitRequest) r; ... boolean isSuccess = false; if (attemptId != null) { FiCaSchedulerApp app = getApplicationAttempt(attemptId); // Required sanity check for attemptId - when async-scheduling enabled, // proposal might be outdated if AM failover just finished // and proposal queue was not be consumed in time if (app != null && attemptId.equals(app.getApplicationAttemptId())) { if (app.accept(cluster, request, updatePending) && app.apply(cluster, request, updatePending)) { // apply this resource ... } } } return isSuccess; } } {code} {code:java} public boolean apply(Resource cluster, ResourceCommitRequest request, boolean updatePending) { ... if (!reReservation) { getCSLeafQueue().apply(cluster, request); } ... } {code} 4. org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue#apply invokes org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue#allocateResource: {code:java} void allocateResource(Resource clusterResource, Resource resource, String nodePartition) { try { writeLock.lock(); // only lock leaf queue lock queueUsage.incUsed(nodePartition, resource); ++numContainers; CSQueueUtils.updateQue
[jira] [Resolved] (YARN-11415) Refactor TestConfigurationFieldsBase and the connected test classes
[ https://issues.apache.org/jira/browse/YARN-11415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-11415. --- Resolution: Not A Problem > Refactor TestConfigurationFieldsBase and the connected test classes > --- > > Key: YARN-11415 > URL: https://issues.apache.org/jira/browse/YARN-11415 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Benjamin Teke >Assignee: Szilard Nemeth >Priority: Major > Labels: pull-request-available > > YARN-11413 pointed out a strange way of how the configuration tests are > executed. The first problem is that there is a > [Pattern|https://github.com/apache/hadoop/blob/570b503e3e7e7adf5b0a8fabca76003298216543/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestConfigurationFieldsBase.java#L197], > that matches only numbers, letters, dots, hyphens and underscores, but not > %, which is used in string replacements (i.e > {{yarn.nodemanager.aux-services.%s.classpath}} ), so essentially every > property that's present in any configuration object and doesn't match this > pattern is silently skipped, and documenting it will result in invalid test > failures, ergo the test encourages introducing props and not documenting > them. The pattern should be fixed in YARN-11413 for %s, but it's necessity > could be checked. > Another issue with this is that it works in a semi-opposite way of what it's > supposed to do. To ensure all of the configuration entries are documented it > should iterate through all of the configuration fields and check if those > have matching xyz-default.xml entries, but currently it just reports the > entries that are present in the xyz-default.xml and missing in the matching > configuration file. Since this test checks all the configuration objects this > might need some other follow-ups to document the missing properties from > other components if there are any. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11415) Refactor TestConfigurationFieldsBase and the connected test classes
[ https://issues.apache.org/jira/browse/YARN-11415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696896#comment-17696896 ] Szilard Nemeth commented on YARN-11415: --- Based on our offline discussion with [~bteke] , I'm closing this. > Refactor TestConfigurationFieldsBase and the connected test classes > --- > > Key: YARN-11415 > URL: https://issues.apache.org/jira/browse/YARN-11415 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Benjamin Teke >Assignee: Szilard Nemeth >Priority: Major > Labels: pull-request-available > > YARN-11413 pointed out a strange way of how the configuration tests are > executed. The first problem is that there is a > [Pattern|https://github.com/apache/hadoop/blob/570b503e3e7e7adf5b0a8fabca76003298216543/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestConfigurationFieldsBase.java#L197], > that matches only numbers, letters, dots, hyphens and underscores, but not > %, which is used in string replacements (i.e > {{yarn.nodemanager.aux-services.%s.classpath}} ), so essentially every > property that's present in any configuration object and doesn't match this > pattern is silently skipped, and documenting it will result in invalid test > failures, ergo the test encourages introducing props and not documenting > them. The pattern should be fixed in YARN-11413 for %s, but it's necessity > could be checked. > Another issue with this is that it works in a semi-opposite way of what it's > supposed to do. To ensure all of the configuration entries are documented it > should iterate through all of the configuration fields and check if those > have matching xyz-default.xml entries, but currently it just reports the > entries that are present in the xyz-default.xml and missing in the matching > configuration file. Since this test checks all the configuration objects this > might need some other follow-ups to document the missing properties from > other components if there are any. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11450) Improvements for TestYarnConfigurationFields and TestConfigurationFieldsBase
Szilard Nemeth created YARN-11450: - Summary: Improvements for TestYarnConfigurationFields and TestConfigurationFieldsBase Key: YARN-11450 URL: https://issues.apache.org/jira/browse/YARN-11450 Project: Hadoop YARN Issue Type: Improvement Reporter: Szilard Nemeth Assignee: Szilard Nemeth -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-11415) Refactor TestConfigurationFieldsBase and the connected test classes
[ https://issues.apache.org/jira/browse/YARN-11415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696584#comment-17696584 ] Szilard Nemeth edited comment on YARN-11415 at 3/5/23 4:21 PM: --- Hi [~bteke], I just briefly checked this. {quote}Another issue with this is that it works in a semi-opposite way of what it's supposed to do. To ensure all of the configuration entries are documented it should iterate through all of the configuration fields and check if those have matching xyz-default.xml entries, but currently it just reports the entries that are present in the xyz-default.xml and missing in the matching configuration file. Since this test checks all the configuration objects this might need some other follow-ups to document the missing properties from other components if there are any. {quote} I think what you stated here is true for this method: TestCommonConfigurationFields#testCompareXmlAgainstConfigurationClass. This method compares the properties that are in yarn-default.xml, but not in the Configuration class. *With my first commit* I added a dummy property to yarn-default.xml without adding it to the YarnConfiguration class. The property called "yarn.nodemanager.missingpropinclass", but the name doesn't really matter. Test result: Failure in TestCommonConfigurationFields#testCompareXmlAgainstConfigurationClass: {code:java} java.lang.AssertionError: yarn-default.xml has 1 properties missing in class org.apache.hadoop.yarn.conf.YarnConfiguration Entries: yarn.nodemanager.missingpropinclass Expected :0 Actual :1 {code} However, we also have TestCommonConfigurationFields#testCompareConfigurationClassAgainstXml which compares the properties that are in the YarnConfiguration class, but not defined in yarn-default.xml. {*}So with my second commit{*}, I added this to YarnConfiguration: {code:java} public static final String MISSING_PROP_IN_YARN_DEF = "yarn.missingprop.in.yarndefault"; {code} without touching the yarn-default.xml so the new config was not documented. As I expected, the test case TestConfigurationFieldsBase#testCompareConfigurationClassAgainstXml failed with: {code:java} java.lang.AssertionError: class org.apache.hadoop.yarn.conf.YarnConfiguration has 1 variables missing in yarn-default.xml Entries: yarn.missingprop.in.yarndefault Expected :0 Actual :1 {code} I think so far so good, this is the expected behavior. The main issue before YARN-11413 was: 1. org.apache.hadoop.conf.TestConfigurationFieldsBase#setupTestConfigurationFields is called as a "@Before" method. 2. org.apache.hadoop.conf.TestConfigurationFieldsBase#extractMemberVariablesFromConfigurationFields is called. 3. All of the fields of the class is checked here with certain restrictions. Among these are that it should be a public static final String, and it should match a Pattern. If the pattern is not matched, the field won't be added to the "known fields" for sure. {*}4. So with my third commit{*}, I just removed the percent sign (basically a revert of YARN-11415) to see what happens. TestConfigurationFieldsBase#testCompareXmlAgainstConfigurationClass fails, which is a false positive now. Here's the assertion message: {code:java} java.lang.AssertionError: yarn-default.xml has 2 properties missing in class org.apache.hadoop.yarn.conf.YarnConfiguration Entries: yarn.nodemanager.aux-services.%s.classpath yarn.nodemanager.aux-services.%s.system-classes Expected :0 Actual :2 {code} This is indeed wrong, as per your description. However, I don't see how this could be fixed in a clean way. Here, the configuration fields of YarnConfiguration were not recognized: yarn.nodemanager.aux-services.%s.classpath yarn.nodemanager.aux-services.%s.system-classes. What can we do then? *CONCLUSION* The whole point of matching the values of the String fields is to differentiate config keys from other strings like: {code:java} public static final String NVIDIA_DOCKER_V1 = "nvidia-docker-v1"; {code} I cannot see a better way than what we are currently doing with the regex. I think the regex pattern should be continuously maintained and occassions like unmatched real config keys like "yarn.nodemanager.aux-services.%s.classpath" (because of the regex didn't contain the percentage sign) could be quite rare. [~bteke] Do you have anything in mind about a clean fix for this? Anyway, I will report another jira where I will list my suggested improvements of the test and its base class: TestConfigurationFieldsBase. was (Author: snemeth): Hi [~bteke], I just briefly checked this. {quote} Another issue with this is that it works in a semi-opposite way of what it's supposed to do. To ensure all of the configuration entries are documented it should iterate through all of the configuration fields and check if those have matching xyz-default.xml entries, but cur
[jira] [Commented] (YARN-11415) Refactor TestConfigurationFieldsBase and the connected test classes
[ https://issues.apache.org/jira/browse/YARN-11415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696584#comment-17696584 ] Szilard Nemeth commented on YARN-11415: --- Hi [~bteke], I just briefly checked this. {quote} Another issue with this is that it works in a semi-opposite way of what it's supposed to do. To ensure all of the configuration entries are documented it should iterate through all of the configuration fields and check if those have matching xyz-default.xml entries, but currently it just reports the entries that are present in the xyz-default.xml and missing in the matching configuration file. Since this test checks all the configuration objects this might need some other follow-ups to document the missing properties from other components if there are any. {quote} I think what you stated here is true for this method: TestCommonConfigurationFields#testCompareXmlAgainstConfigurationClass. This method compares the properties that are in yarn-default.xml, but not in the Configuration class. With my first commit I added a dummy property to yarn-default.xml without adding it to the YarnConfiguration class. The property called "yarn.nodemanager.missingpropinclass", but the name doesn't really matter. Test result: Failure in TestCommonConfigurationFields#testCompareXmlAgainstConfigurationClass: {code} java.lang.AssertionError: yarn-default.xml has 1 properties missing in class org.apache.hadoop.yarn.conf.YarnConfiguration Entries: yarn.nodemanager.missingpropinclass Expected :0 Actual :1 {code} However, we also have TestCommonConfigurationFields#testCompareConfigurationClassAgainstXml which compares the properties that are in the YarnConfiguration class, but not defined in yarn-default.xml. So with my second commit, I added this to YarnConfiguration: {code} public static final String MISSING_PROP_IN_YARN_DEF = "yarn.missingprop.in.yarndefault"; {code} without touching the yarn-default.xml so the new config was not documented. As I expected, the testcase TestConfigurationFieldsBase#testCompareConfigurationClassAgainstXml failed with: {code} java.lang.AssertionError: class org.apache.hadoop.yarn.conf.YarnConfiguration has 1 variables missing in yarn-default.xml Entries: yarn.missingprop.in.yarndefault Expected :0 Actual :1 {code} I think so far so good, this is the expected behavior. The main issue before YARN-11413 was: 1. org.apache.hadoop.conf.TestConfigurationFieldsBase#setupTestConfigurationFields is called as a "\@Before" method. 2. org.apache.hadoop.conf.TestConfigurationFieldsBase#extractMemberVariablesFromConfigurationFields is called. 3. All of the fields of the class is checked here with certain restrictions. Among these are that it should be a public static final String, and it should match a Pattern. If the pattern is not matched, the field won't be added to the "known fields" for sure. 4. So with my third commit, I just removed the percent sign (basically a revert of YARN-11415) to see what happens. TestConfigurationFieldsBase#testCompareXmlAgainstConfigurationClass fails, which is a false positive now. Here's the assertion message: {code} java.lang.AssertionError: yarn-default.xml has 2 properties missing in class org.apache.hadoop.yarn.conf.YarnConfiguration Entries: yarn.nodemanager.aux-services.%s.classpath yarn.nodemanager.aux-services.%s.system-classes Expected :0 Actual :2 {code} This is indeed wrong, as per your description. However, I don't see how this could be fixed in a clean way. Here, the configuration fields of YarnConfiguration were not recognized: yarn.nodemanager.aux-services.%s.classpath yarn.nodemanager.aux-services.%s.system-classes. What can we do then? The whole point of matching the values of the String fields is to differentiate config keys from other strings like: {code} public static final String NVIDIA_DOCKER_V1 = "nvidia-docker-v1"; {code} I cannot see a better way than what we are currently doing with the regex. I think the regex pattern should be continuously maintained and occassions like unmatched real config keys like "yarn.nodemanager.aux-services.%s.classpath" (because of the regex didn't contain the percentage sign) could be quite rare. [~bteke] Do you have anything in mind about a clean fix for this? Anyway, I will report another jira where I will list my suggested improvements of the test and its base class: TestConfigurationFieldsBase. > Refactor TestConfigurationFieldsBase and the connected test classes > --- > > Key: YARN-11415 > URL: https://issues.apache.org/jira/browse/YARN-11415 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Benjamin Teke >Assignee: Szilard Nemeth >Priority: Major > Labels: pu
[jira] [Updated] (YARN-11427) Pull up the versioned imports in pom of hadoop-mapreduce-client-app to hadoop-project pom
[ https://issues.apache.org/jira/browse/YARN-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-11427: -- Description: The versioned imports in pom.xml of hadoop-mapreduce-client-app can be pulled up to hadoop-project pom as it is better for version maintenance and ease of using an IDE to find where things are used {code:java} org.mockito mockito-junit-jupiter 4.11.0 test uk.org.webcompere system-stubs-core 1.1.0 test uk.org.webcompere system-stubs-jupiter 1.1.0 test {code} was: The versioned imports in pom.xml of hadoop-mapreduce-client-app can be pullup to hadoop-project pom as it is better for version maintenance and ease of using an IDE to find where things are used {code:java} org.mockito mockito-junit-jupiter 4.11.0 test uk.org.webcompere system-stubs-core 1.1.0 test uk.org.webcompere system-stubs-jupiter 1.1.0 test {code} > Pull up the versioned imports in pom of hadoop-mapreduce-client-app to > hadoop-project pom > - > > Key: YARN-11427 > URL: https://issues.apache.org/jira/browse/YARN-11427 > Project: Hadoop YARN > Issue Type: Task > Components: yarn >Reporter: Susheel Gupta >Assignee: Susheel Gupta >Priority: Minor > > The versioned imports in pom.xml of hadoop-mapreduce-client-app can be pulled > up to hadoop-project pom as it is better for version maintenance and ease of > using an IDE to find where things are used > > {code:java} > > org.mockito > mockito-junit-jupiter > 4.11.0 > test > > > uk.org.webcompere > system-stubs-core > 1.1.0 > test > > > uk.org.webcompere > system-stubs-jupiter > 1.1.0 > test > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11427) Pull up the versioned imports in pom of hadoop-mapreduce-client-app to hadoop-project pom
[ https://issues.apache.org/jira/browse/YARN-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-11427: -- Summary: Pull up the versioned imports in pom of hadoop-mapreduce-client-app to hadoop-project pom (was: Pullup the versioned imports in pom of hadoop-mapreduce-client-app to hadoop-project pom) > Pull up the versioned imports in pom of hadoop-mapreduce-client-app to > hadoop-project pom > - > > Key: YARN-11427 > URL: https://issues.apache.org/jira/browse/YARN-11427 > Project: Hadoop YARN > Issue Type: Task > Components: yarn >Reporter: Susheel Gupta >Assignee: Susheel Gupta >Priority: Minor > > The versioned imports in pom.xml of hadoop-mapreduce-client-app can be pullup > to hadoop-project pom as it is better for version maintenance and ease of > using an IDE to find where things are used > > {code:java} > > org.mockito > mockito-junit-jupiter > 4.11.0 > test > > > uk.org.webcompere > system-stubs-core > 1.1.0 > test > > > uk.org.webcompere > system-stubs-jupiter > 1.1.0 > test > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-11372) Migrate legacy AQC to flexible AQC
[ https://issues.apache.org/jira/browse/YARN-11372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-11372: - Assignee: Peter Szucs > Migrate legacy AQC to flexible AQC > -- > > Key: YARN-11372 > URL: https://issues.apache.org/jira/browse/YARN-11372 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Peter Szucs >Priority: Major > > Currently the codebase of Legacy AQC (with > ManagedParentQueue/ManagedLeafQueue) classes live next to the basic queue > classes that are used by the flexible AQC. The scope of this task is to > eliminate the former while migrating the functionality of legacy AQC. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11372) Migrate legacy AQC to flexible AQC
[ https://issues.apache.org/jira/browse/YARN-11372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-11372: -- Parent Issue: YARN-10889 (was: YARN-10888) > Migrate legacy AQC to flexible AQC > -- > > Key: YARN-11372 > URL: https://issues.apache.org/jira/browse/YARN-11372 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Priority: Major > > Currently the codebase of Legacy AQC (with > ManagedParentQueue/ManagedLeafQueue) classes live next to the basic queue > classes that are used by the flexible AQC. The scope of this task is to > eliminate the former while migrating the functionality of legacy AQC. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10965) Centralize queue resource calculation based on CapacityVectors
[ https://issues.apache.org/jira/browse/YARN-10965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-10965. --- Hadoop Flags: Reviewed Resolution: Fixed > Centralize queue resource calculation based on CapacityVectors > -- > > Key: YARN-10965 > URL: https://issues.apache.org/jira/browse/YARN-10965 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 6.5h > Remaining Estimate: 0h > > With the introduction of YARN-10930 it is possible to unify queue resource > calculation. In order to narrow down the scope of this patch, the base system > is implemented here, without refactoring the existing resource calculation in > updateClusterResource (which will be done in YARN-11000). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10965) Centralize queue resource calculation based on CapacityVectors
[ https://issues.apache.org/jira/browse/YARN-10965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-10965: -- Fix Version/s: 3.4.0 > Centralize queue resource calculation based on CapacityVectors > -- > > Key: YARN-10965 > URL: https://issues.apache.org/jira/browse/YARN-10965 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 6.5h > Remaining Estimate: 0h > > With the introduction of YARN-10930 it is possible to unify queue resource > calculation. In order to narrow down the scope of this patch, the base system > is implemented here, without refactoring the existing resource calculation in > updateClusterResource (which will be done in YARN-11000). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6971) Clean up different ways to create resources
[ https://issues.apache.org/jira/browse/YARN-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-6971: - Fix Version/s: 3.4.0 > Clean up different ways to create resources > --- > > Key: YARN-6971 > URL: https://issues.apache.org/jira/browse/YARN-6971 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Reporter: Yufei Gu >Assignee: Riya Khandelwal >Priority: Minor > Labels: newbie, pull-request-available > Fix For: 3.4.0 > > > There are several ways to create a {{resource}} object, e.g., > BuilderUtils.newResource() and Resources.createResource(). These methods not > only cause confusing but also performance issues, for example > BuilderUtils.newResource() is significant slow than > Resources.createResource(). > We could merge them some how, and replace most BuilderUtils.newResource() > with Resources.createResource(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-6971) Clean up different ways to create resources
[ https://issues.apache.org/jira/browse/YARN-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-6971. -- Hadoop Flags: Reviewed Resolution: Fixed > Clean up different ways to create resources > --- > > Key: YARN-6971 > URL: https://issues.apache.org/jira/browse/YARN-6971 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Reporter: Yufei Gu >Assignee: Riya Khandelwal >Priority: Minor > Labels: newbie, pull-request-available > Fix For: 3.4.0 > > > There are several ways to create a {{resource}} object, e.g., > BuilderUtils.newResource() and Resources.createResource(). These methods not > only cause confusing but also performance issues, for example > BuilderUtils.newResource() is significant slow than > Resources.createResource(). > We could merge them some how, and replace most BuilderUtils.newResource() > with Resources.createResource(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11416) FS2CS should use CapacitySchedulerConfiguration in FSQueueConverterBuilder
[ https://issues.apache.org/jira/browse/YARN-11416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-11416: -- Description: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.converter.FSQueueConverter and its builder stores the variable capacitySchedulerConfig as a simple Configuration object instead of CapacitySchedulerConfiguration. This is misleading, as capacitySchedulerConfig suggests that it is indeed a CapacitySchedulerConfiguration and it loses access to the convenience methods to check for various properties. Because of this every time a property getter is changed FS2CS should be checked if it reimplemented the same, otherwise there might be behaviour differences or even bugs. (was: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.converter.FSQueueConverter and it's builder stores the variable capacitySchedulerConfig as a simple Configuration object instead of CapacitySchedulerConfiguration. This is misleading, as capacitySchedulerConfig suggests that it is indeed a CapacitySchedulerConfiguration and it loses access to the convenience methods to check for various properties. Because of this every time a property getter is changed FS2CS should be checked if it reimplemented the same, otherwise there might be behaviour differences or even bugs.) > FS2CS should use CapacitySchedulerConfiguration in FSQueueConverterBuilder > --- > > Key: YARN-11416 > URL: https://issues.apache.org/jira/browse/YARN-11416 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Benjamin Teke >Assignee: Susheel Gupta >Priority: Major > Labels: pull-request-available > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.converter.FSQueueConverter > and its builder stores the variable capacitySchedulerConfig as a simple > Configuration object instead of CapacitySchedulerConfiguration. This is > misleading, as capacitySchedulerConfig suggests that it is indeed a > CapacitySchedulerConfiguration and it loses access to the convenience methods > to check for various properties. Because of this every time a property getter > is changed FS2CS should be checked if it reimplemented the same, otherwise > there might be behaviour differences or even bugs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5607) Document TestContainerResourceUsage#waitForContainerCompletion
[ https://issues.apache.org/jira/browse/YARN-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-5607: - Fix Version/s: 3.4.0 > Document TestContainerResourceUsage#waitForContainerCompletion > -- > > Key: YARN-5607 > URL: https://issues.apache.org/jira/browse/YARN-5607 > Project: Hadoop YARN > Issue Type: Test > Components: resourcemanager, test >Affects Versions: 2.9.0 >Reporter: Karthik Kambatla >Assignee: Susheel Gupta >Priority: Major > Labels: newbie, pull-request-available > Fix For: 3.4.0 > > > The logic in TestContainerResourceUsage#waitForContainerCompletion > (introduced in YARN-5024) is not immediately obvious. It could use some > documentation. Also, this seems like a useful helper method. Should this be > moved to one of the mock classes or to a util class? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11404) Add junit5 dependency to hadoop-mapreduce-client-app to fix few unit test failure
[ https://issues.apache.org/jira/browse/YARN-11404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-11404: -- Fix Version/s: 3.4.0 > Add junit5 dependency to hadoop-mapreduce-client-app to fix few unit test > failure > - > > Key: YARN-11404 > URL: https://issues.apache.org/jira/browse/YARN-11404 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Susheel Gupta >Assignee: Susheel Gupta >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: > patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app.txt > > > We need to add Junit 5 dependency in > {code:java} > /hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/pom.xml{code} > as the testcase TestAMWebServicesJobConf, TestAMWebServicesJobs, > TestAMWebServices, TestAMWebServicesAttempts, TestAMWebServicesTasks were > passing locally but failed at jenkins build in this > [link|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5119/7/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app.txt] > for YARN-5607 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-11404) Add junit5 dependency to hadoop-mapreduce-client-app to fix few unit test failure
[ https://issues.apache.org/jira/browse/YARN-11404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-11404. --- Hadoop Flags: Reviewed Resolution: Fixed > Add junit5 dependency to hadoop-mapreduce-client-app to fix few unit test > failure > - > > Key: YARN-11404 > URL: https://issues.apache.org/jira/browse/YARN-11404 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Susheel Gupta >Assignee: Susheel Gupta >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: > patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app.txt > > > We need to add Junit 5 dependency in > {code:java} > /hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/pom.xml{code} > as the testcase TestAMWebServicesJobConf, TestAMWebServicesJobs, > TestAMWebServices, TestAMWebServicesAttempts, TestAMWebServicesTasks were > passing locally but failed at jenkins build in this > [link|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5119/7/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app.txt] > for YARN-5607 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11409) Fix Typo of ResourceManager#webapp module
[ https://issues.apache.org/jira/browse/YARN-11409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-11409: -- Summary: Fix Typo of ResourceManager#webapp module (was: Fix Typo of ResourceManager#webapp moudle) > Fix Typo of ResourceManager#webapp module > - > > Key: YARN-11409 > URL: https://issues.apache.org/jira/browse/YARN-11409 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.4.0 >Reporter: Shilun Fan >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > When finishing YARN-11218, I found some typo problems in RM's RMWebServices. > I checked the java class of the webapp moudle and fixed the typo problems. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11404) Add junit5 dependency to hadoop-mapreduce-client-app to fix few unit test failure
[ https://issues.apache.org/jira/browse/YARN-11404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-11404: -- Attachment: patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app.txt > Add junit5 dependency to hadoop-mapreduce-client-app to fix few unit test > failure > - > > Key: YARN-11404 > URL: https://issues.apache.org/jira/browse/YARN-11404 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Susheel Gupta >Assignee: Susheel Gupta >Priority: Major > Attachments: > patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app.txt > > > We need to add Junit 5 dependency in > {code:java} > /hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/pom.xml{code} > as the testcase TestAMWebServicesJobConf, TestAMWebServicesJobs, > TestAMWebServices, TestAMWebServicesAttempts, TestAMWebServicesTasks were > passing locally but failed at jenkins build in this > [link|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5119/7/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app.txt] > for YARN-5607 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-11415) Refactor TestConfigurationFieldsBase and the connected test classes
[ https://issues.apache.org/jira/browse/YARN-11415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-11415: - Assignee: Szilard Nemeth > Refactor TestConfigurationFieldsBase and the connected test classes > --- > > Key: YARN-11415 > URL: https://issues.apache.org/jira/browse/YARN-11415 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Benjamin Teke >Assignee: Szilard Nemeth >Priority: Major > > YARN-11413 pointed out a strange way of how the configuration tests are > executed. The first problem is that there is a > [Pattern|https://github.com/apache/hadoop/blob/570b503e3e7e7adf5b0a8fabca76003298216543/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestConfigurationFieldsBase.java#L197], > that matches only numbers, letters, dots, hyphens and underscores, but not > %, which is used in string replacements (i.e > {{yarn.nodemanager.aux-services.%s.classpath}} ), so essentially every > property that's present in any configuration object and doesn't match this > pattern is silently skipped, and documenting it will result in invalid test > failures, ergo the test encourages introducing props and not documenting > them. The pattern should be fixed in YARN-11413 for %s, but it's necessity > could be checked. > Another issue with this is that it works in a semi-opposite way of what it's > supposed to do. To ensure all of the configuration entries are documented it > should iterate through all of the configuration fields and check if those > have matching xyz-default.xml entries, but currently it just reports the > entries that are present in the xyz-default.xml and missing in the matching > configuration file. Since this test checks all the configuration objects this > might need some other follow-ups to document the missing properties from > other components if there are any. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11355) YARN Client Failovers immediately to rm2 but takes ~30000ms to rm3
[ https://issues.apache.org/jira/browse/YARN-11355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17656635#comment-17656635 ] Szilard Nemeth commented on YARN-11355: --- Hi [~vineethNaroju], You have to move this jira to Patch available to trigger Jenkins. Nowadays Github PRs are more welcome. Thanks > YARN Client Failovers immediately to rm2 but takes ~3ms to rm3 > -- > > Key: YARN-11355 > URL: https://issues.apache.org/jira/browse/YARN-11355 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 3.4.0 >Reporter: Prabhu Joseph >Assignee: Vineeth Naroju >Priority: Major > Attachments: YARN-11355.diff > > > YARN Client Failovers immediately to rm2 but takes ~3ms to rm3 during > initial retry. > *Repro:* > {code:java} > 1. YARN Cluster with three master nodes rm1,rm2 and rm3 > 2. rm3 is active > 3. yarn node -list or any other yarn client calls takes more than 30 seconds. > {code} > The initial failover to rm2 is immediate but then the failover to rm3 is > after ~3 ms. Current RetryPolicy does not honor the number of master > nodes. It has to perform atleast one immediate failover to every rm. > {code:java} > 2022-10-20 06:37:44,123 INFO client.ConfiguredRMFailoverProxyProvider: > Failing over to rm2 > 2022-10-20 06:37:44,129 INFO retry.RetryInvocationHandler: > java.net.ConnectException: Call From local to remote:8032 failed on > connection exception: java.net.ConnectException: Connection refused; For more > details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking > ApplicationClientProtocolPBClientImpl.getClusterNodes over rm2 after 1 > failover attempts. Trying to failover after sleeping for 21139ms. > {code} > > *Workaround:* > Reduce yarn.resourcemanager.connect.retry-interval.ms from 3 to like 100. > This will do immediate failover to rm3 but there will be too many retries > when there is no active resourcemanager. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11410) Add default methods for StateMachine
[ https://issues.apache.org/jira/browse/YARN-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-11410: -- Description: YARN-11395 created a new method in the StateMachine interface, what can break the compatibility with connected softwares, so the method should be converted to default method, what can prevent this break (was: The YARN-11395 created a new method in the StateMachine interface, what can break the compatibility with connected softwares, so the method should be converted to default method, what can prevent this break) > Add default methods for StateMachine > > > Key: YARN-11410 > URL: https://issues.apache.org/jira/browse/YARN-11410 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bence Kosztolnik >Assignee: Bence Kosztolnik >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > YARN-11395 created a new method in the StateMachine interface, what can break > the compatibility with connected softwares, so the method should be converted > to default method, what can prevent this break -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11395) Resource Manager UI, cluster/appattempt/*, can not present FINAL_SAVING state
[ https://issues.apache.org/jira/browse/YARN-11395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-11395: -- Description: If an attempt is in *FINAL_SAVING* state, the *RMAppAttemptBlock#createAttemptHeadRoomTable* method fails with a convert error, what will results a {code:java} RFC6265 Cookie values may not contain character: [ ]{code} error in the UI an in the logs as well. RM log: {code:java} ... at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.FINAL_SAVING at java.lang.Enum.valueOf(Enum.java:238) at org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createAttemptHeadRoomTable(RMAppAttemptBlock.java:424) at org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:151) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) at org.apache.hadoop.yarn.webapp.View.render(View.java:243) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.appattempt(RmController.java:62) ... 63 more 2022-12-05 04:15:33,029 WARN org.eclipse.jetty.server.HttpChannel: /cluster/appattempt/appattempt_1667297151262_0247_01 java.lang.IllegalArgumentException: RFC6265 Cookie values may not contain character: [ ] at org.eclipse.jetty.http.Syntax.requireValidRFC6265CookieValue(Syntax.java:136) ...{code} This bug was introduced with the YARN-1345 ticket what also caused a similar error called YARN-4411. In case of the YARN-4411 the enum mapping logic from RMAppAttemptStates to YarnApplicationAttemptState was modified like this: - if the state is FINAL_SAVING we should represent the previous state This error can also be occur in case of ALLOCATED_SAVING, LAUNCHED_UNMANAGED_SAVING states as well. So we should modify the *createAttemptHeadRoomTable* method to be able to handle the previously mentioned 3 states just like in case of YARN-4411 was: If an attempt is in *FINAL_SAVING* state, the *RMAppAttemptBlock#createAttemptHeadRoomTable* method fails with a convert error, what will results a {code:java} RFC6265 Cookie values may not contain character: [ ]{code} error in the UI an in the logs as well. RM log: {code:java} ... at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.FINAL_SAVING at java.lang.Enum.valueOf(Enum.java:238) at org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createAttemptHeadRoomTable(RMAppAttemptBlock.java:424) at org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:151) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) at org.apache.hadoop.yarn.webapp.View.render(View.java:243) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.appattempt(RmController.java:62) ... 63 more 2022-12-05 04:15:33,029 WARN org.eclipse.jetty.server.HttpChannel: /cluster/appattempt/appattempt_1667297151262_0247_01 java.lang.IllegalArgumentException: RFC6265 Cookie values may not contain character: [ ] at org.eclipse.jetty.http.Syntax.requireValidRFC6265CookieValue(Syntax.java:136) ...{code} This bug was introduced with the YARN-1345 ticket what also caused a similar error called YARN-4411. In case of the YARN-4411 the enum mapping logic from RMAppAttemptStates to YarnApplicationAttemptStat
[jira] [Commented] (YARN-10905) Investigate if AbstractCSQueue#configuredNodeLabels vs. QueueCapacities#getExistingNodeLabels holds the same data
[ https://issues.apache.org/jira/browse/YARN-10905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654073#comment-17654073 ] Szilard Nemeth commented on YARN-10905: --- Hi [~pszucs], Thanks for your investigation. I checked the code and your assessment is correct. Feel free to close this ticket. > Investigate if AbstractCSQueue#configuredNodeLabels vs. > QueueCapacities#getExistingNodeLabels holds the same data > - > > Key: YARN-10905 > URL: https://issues.apache.org/jira/browse/YARN-10905 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Assignee: Peter Szucs >Priority: Minor > > The task is to investigate whether the field > AbstractCSQueue#configuredNodeLabels holds the same data as > QueueCapacities#getExistingNodeLabels. > Obviously, we don't want double-entry bookkeeping so if the data is the same, > we can remove this or that. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10946) AbstractCSQueue: Create separate class for constructing Queue API objects
[ https://issues.apache.org/jira/browse/YARN-10946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-10946. --- Hadoop Flags: Reviewed Resolution: Fixed > AbstractCSQueue: Create separate class for constructing Queue API objects > - > > Key: YARN-10946 > URL: https://issues.apache.org/jira/browse/YARN-10946 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Assignee: Peter Szucs >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > Relevant methods are: > - > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue#getQueueConfigurations > - > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue#getQueueInfo > - > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue#getQueueStatistics -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10946) AbstractCSQueue: Create separate class for constructing Queue API objects
[ https://issues.apache.org/jira/browse/YARN-10946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-10946: -- Fix Version/s: 3.4.0 > AbstractCSQueue: Create separate class for constructing Queue API objects > - > > Key: YARN-10946 > URL: https://issues.apache.org/jira/browse/YARN-10946 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Assignee: Peter Szucs >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > Relevant methods are: > - > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue#getQueueConfigurations > - > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue#getQueueInfo > - > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue#getQueueStatistics -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-10959) Extract common method of two that check if preemption disabled in CSQueuePreemption
[ https://issues.apache.org/jira/browse/YARN-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reopened YARN-10959: --- > Extract common method of two that check if preemption disabled in > CSQueuePreemption > --- > > Key: YARN-10959 > URL: https://issues.apache.org/jira/browse/YARN-10959 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Assignee: Peter Szucs >Priority: Minor > > This is a follow-up of YARN-10913. > After YARN-10913, we have a class called CSQueuePreemption that has 2 methods > that are very similar to each other: > - isQueueHierarchyPreemptionDisabled > - isIntraQueueHierarchyPreemptionDisabled > The goal is to create one method and use it from those 2, merging the common > logic as much as we can. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10959) Extract common method of two that check if preemption disabled in CSQueuePreemption
[ https://issues.apache.org/jira/browse/YARN-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-10959. --- Resolution: Won't Fix > Extract common method of two that check if preemption disabled in > CSQueuePreemption > --- > > Key: YARN-10959 > URL: https://issues.apache.org/jira/browse/YARN-10959 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Assignee: Peter Szucs >Priority: Minor > > This is a follow-up of YARN-10913. > After YARN-10913, we have a class called CSQueuePreemption that has 2 methods > that are very similar to each other: > - isQueueHierarchyPreemptionDisabled > - isIntraQueueHierarchyPreemptionDisabled > The goal is to create one method and use it from those 2, merging the common > logic as much as we can. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8262) get_executable in container-executor should provide meaningful error codes
[ https://issues.apache.org/jira/browse/YARN-8262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-8262. -- Hadoop Flags: Reviewed Resolution: Fixed > get_executable in container-executor should provide meaningful error codes > -- > > Key: YARN-8262 > URL: https://issues.apache.org/jira/browse/YARN-8262 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Susheel Gupta >Priority: Minor > Labels: newbie, pull-request-available, trivial > Fix For: 3.4.0 > > > Currently it calls exit(-1) that makes it difficult to debug without stderr. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8262) get_executable in container-executor should provide meaningful error codes
[ https://issues.apache.org/jira/browse/YARN-8262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-8262: - Fix Version/s: 3.4.0 > get_executable in container-executor should provide meaningful error codes > -- > > Key: YARN-8262 > URL: https://issues.apache.org/jira/browse/YARN-8262 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Susheel Gupta >Priority: Minor > Labels: newbie, pull-request-available, trivial > Fix For: 3.4.0 > > > Currently it calls exit(-1) that makes it difficult to debug without stderr. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11369) Commons.compress throws an IllegalArgumentException with large uids after 1.21
[ https://issues.apache.org/jira/browse/YARN-11369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-11369: -- Fix Version/s: 3.4.0 > Commons.compress throws an IllegalArgumentException with large uids after 1.21 > -- > > Key: YARN-11369 > URL: https://issues.apache.org/jira/browse/YARN-11369 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > Encountering COMPRESS-587 with large uids/gids in > {{hadoop-mapreduce-client-uploader/src/main/java/org/apache/hadoop/mapred/uploader/FrameworkUploader.java}}: > {code:java} > 22/09/13 06:39:05 INFO uploader.FrameworkUploader: Adding > /cs/cloudera/opt/cloudera/cm/lib/plugins/event-publish-7.7.1-shaded.jar > Exception in thread "main" java.lang.IllegalArgumentException: group id > '5049047' is too big ( > 2097151 ). Use STAR or POSIX extensions to overcome > this limit > {code} > A workaround is to specifically set bignumber mode to BIGNUMBER_POSIX or > BIGNUMBER_STAR on the instance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10886) Cluster based and parent based max capacity in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-10886: - Assignee: (was: Szilard Nemeth) > Cluster based and parent based max capacity in Capacity Scheduler > - > > Key: YARN-10886 > URL: https://issues.apache.org/jira/browse/YARN-10886 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Priority: Major > > We want to introduce the percentage modes relative to the cluster, not the > parent, i.e > The property root.users.maximum-capacity will mean one of the following > things: > *Either Parent Percentage:* maximum capacity relative to its parent. If it’s > set to 50, then it means that the capacity is capped with respect to the > parent. This can be covered by the current format, no change there. > *Or Cluster Percentage:* maximum capacity expressed as a percentage of the > overall cluster capacity. This case is the new scenario, for example: > yarn.scheduler.capacity.root.users.max-capacity = c:50% > yarn.scheduler.capacity.root.users.max-capacity = c:50%, c:30% -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10959) Extract common method of two that check if preemption disabled in CSQueuePreemption
[ https://issues.apache.org/jira/browse/YARN-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-10959: - Assignee: (was: Szilard Nemeth) > Extract common method of two that check if preemption disabled in > CSQueuePreemption > --- > > Key: YARN-10959 > URL: https://issues.apache.org/jira/browse/YARN-10959 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Priority: Minor > > This is a follow-up of YARN-10913. > After YARN-10913, we have a class called CSQueuePreemption that has 2 methods > that are very similar to each other: > - isQueueHierarchyPreemptionDisabled > - isIntraQueueHierarchyPreemptionDisabled > The goal is to create one method and use it from those 2, merging the common > logic as much as we can. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10946) AbstractCSQueue: Create separate class for constructing Queue API objects
[ https://issues.apache.org/jira/browse/YARN-10946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-10946: - Assignee: (was: Szilard Nemeth) > AbstractCSQueue: Create separate class for constructing Queue API objects > - > > Key: YARN-10946 > URL: https://issues.apache.org/jira/browse/YARN-10946 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Priority: Minor > > Relevant methods are: > - > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue#getQueueConfigurations > - > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue#getQueueInfo > - > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue#getQueueStatistics -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10921) AbstractCSQueue: Node Labels logic is scattered and iteration logic is repeated all over the place
[ https://issues.apache.org/jira/browse/YARN-10921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-10921: - Assignee: (was: Szilard Nemeth) > AbstractCSQueue: Node Labels logic is scattered and iteration logic is > repeated all over the place > -- > > Key: YARN-10921 > URL: https://issues.apache.org/jira/browse/YARN-10921 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Priority: Minor > > TODO items: > - Check original Node labels epic / jiras? > - Think about ways to improve repetitive iteration on configuredNodeLabels > - Search for: "String label" in code > Code blocks to handle Node labels: > - AbstractCSQueue#setupQueueConfigs > - AbstractCSQueue#getQueueConfigurations > - AbstractCSQueue#accessibleToPartition > - AbstractCSQueue#getNodeLabelsForQueue > - AbstractCSQueue#updateAbsoluteCapacities > - AbstractCSQueue#updateConfigurableResourceRequirement > - CSQueueUtils#loadCapacitiesByLabelsFromConf > - AutoCreatedLeafQueue -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10926) Test validation after YARN-10504 and YARN-10506: Check if modified test expectations are correct or not
[ https://issues.apache.org/jira/browse/YARN-10926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-10926: - Assignee: (was: Szilard Nemeth) > Test validation after YARN-10504 and YARN-10506: Check if modified test > expectations are correct or not > --- > > Key: YARN-10926 > URL: https://issues.apache.org/jira/browse/YARN-10926 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Priority: Minor > > YARN-10504 and YARN-10506 modified some test expectations. > The task is to verify if those expectations are correct. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10920) Create a dedicated class for Node Labels
[ https://issues.apache.org/jira/browse/YARN-10920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-10920. --- Resolution: Won't Fix Since this is a huge effort and pretty hard, I think there's would be a very little gain compared to the size of the effort, hence closing this ticket with "Won't fix". > Create a dedicated class for Node Labels > > > Key: YARN-10920 > URL: https://issues.apache.org/jira/browse/YARN-10920 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Priority: Minor > > In the current codebase, Node labels are simple strings. It's very > error-prone to use Strings as it can contain basically anything. Moreover, > it's easier to keep track of all usages if we have a dedicated class for Node > labels. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10920) Create a dedicated class for Node Labels
[ https://issues.apache.org/jira/browse/YARN-10920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-10920: - Assignee: (was: Szilard Nemeth) > Create a dedicated class for Node Labels > > > Key: YARN-10920 > URL: https://issues.apache.org/jira/browse/YARN-10920 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Priority: Minor > > In the current codebase, Node labels are simple strings. It's very > error-prone to use Strings as it can contain basically anything. Moreover, > it's easier to keep track of all usages if we have a dedicated class for Node > labels. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10905) Investigate if AbstractCSQueue#configuredNodeLabels vs. QueueCapacities#getExistingNodeLabels holds the same data
[ https://issues.apache.org/jira/browse/YARN-10905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-10905: - Assignee: (was: Szilard Nemeth) > Investigate if AbstractCSQueue#configuredNodeLabels vs. > QueueCapacities#getExistingNodeLabels holds the same data > - > > Key: YARN-10905 > URL: https://issues.apache.org/jira/browse/YARN-10905 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Priority: Minor > > The task is to investigate whether the field > AbstractCSQueue#configuredNodeLabels holds the same data as > QueueCapacities#getExistingNodeLabels. > Obviously, we don't want double-entry bookkeeping so if the data is the same, > we can remove this or that. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10005) Code improvements in MutableCSConfigurationProvider
[ https://issues.apache.org/jira/browse/YARN-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-10005. --- Hadoop Flags: Reviewed Resolution: Fixed > Code improvements in MutableCSConfigurationProvider > --- > > Key: YARN-10005 > URL: https://issues.apache.org/jira/browse/YARN-10005 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Peter Szucs >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > * Important: constructKeyValueConfUpdate and all related methods seems a > separate responsibility: how to convert incoming SchedConfUpdateInfo to > Configuration changes (Configuration object) > * Duplicated code block (9 lines) in init / formatConfigurationInStore methods > * Method "getConfStore" could be package-private -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10005) Code improvements in MutableCSConfigurationProvider
[ https://issues.apache.org/jira/browse/YARN-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-10005: -- Fix Version/s: 3.4.0 > Code improvements in MutableCSConfigurationProvider > --- > > Key: YARN-10005 > URL: https://issues.apache.org/jira/browse/YARN-10005 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Peter Szucs >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > * Important: constructKeyValueConfUpdate and all related methods seems a > separate responsibility: how to convert incoming SchedConfUpdateInfo to > Configuration changes (Configuration object) > * Duplicated code block (9 lines) in init / formatConfigurationInStore methods > * Method "getConfStore" could be package-private -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11362) Fix several typos in YARN codebase of misspelled resource
[ https://issues.apache.org/jira/browse/YARN-11362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-11362: -- Labels: newbie newbie++ (was: ) > Fix several typos in YARN codebase of misspelled resource > - > > Key: YARN-11362 > URL: https://issues.apache.org/jira/browse/YARN-11362 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Labels: newbie, newbie++ > > I noticed that in YARN's codebase, there are several occassions of misspelled > resource as "Resoure". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11362) Fix several typos in YARN codebase of misspelled resource
[ https://issues.apache.org/jira/browse/YARN-11362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-11362: -- Description: I noticed that in YARN's codebase, there are several occassions of misspelled resource as "Resoure". > Fix several typos in YARN codebase of misspelled resource > - > > Key: YARN-11362 > URL: https://issues.apache.org/jira/browse/YARN-11362 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > > I noticed that in YARN's codebase, there are several occassions of misspelled > resource as "Resoure". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11362) Fix several typos in YARN codebase of misspelled resource
Szilard Nemeth created YARN-11362: - Summary: Fix several typos in YARN codebase of misspelled resource Key: YARN-11362 URL: https://issues.apache.org/jira/browse/YARN-11362 Project: Hadoop YARN Issue Type: Improvement Reporter: Szilard Nemeth Assignee: Szilard Nemeth -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9361) Write testcase for FSLeafQueue that explicitly checks if non-zero AM-share values are not overwritten for custom resources
[ https://issues.apache.org/jira/browse/YARN-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-9361: - Component/s: fairscheduler > Write testcase for FSLeafQueue that explicitly checks if non-zero AM-share > values are not overwritten for custom resources > -- > > Key: YARN-9361 > URL: https://issues.apache.org/jira/browse/YARN-9361 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Szilard Nemeth >Priority: Major > > This is a follow-up for YARN-9323, covering changes regarding explicit zero > value check that has been discussed with [~templedf] earlier. > YARN-9323 fixed a bug in FSLeafQueue#computeMaxAMResource, so that custom > resource values are also set to the AM share. > We need a new test in TestFSLeafQueue that explicitly checks if the custom > resource value is only being set if the fairshare for that resource is zero. > This way, we can make sure we don't overwrite any meaningful resource value. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9457) Integrate custom resource metrics better for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-9457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-9457: - Component/s: fairscheduler > Integrate custom resource metrics better for FairScheduler > -- > > Key: YARN-9457 > URL: https://issues.apache.org/jira/browse/YARN-9457 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Szilard Nemeth >Priority: Major > > YARN-8842 added > org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetricsForCustomResources. > This class stores all metrics data for custom resource types. > A field is there in QueueMetrics to hold an object of this class. > Similarly, YARN-9322 added FSQueueMetricsForCustomResources and added an > object of this class to FSQueueMetrics. > This jira is about to investigate how it is possible to integrate > QueueMetricsForCustomResources into QueueMetrics and > FSQueueMetricsForCustomResources into FSQueueMetrics. > The trick is that the Metrics annotation > (org.apache.hadoop.metrics2.annotation.Metric) is used to expose values on > JMX. > We need to implement a mechanism where QueueMetrics / FSQueueMetrics classes > do contain a field of the custom resource values which is a map of resource > names as keys, and longs as values. > This way, we don't need the new classes (QueueMetricsForCustomResources and > FSQueueMetricsForCustomResources), the code could be much cleaner and > consistent. > The hardest part possibly is to find a way to expose metrics values from a > map. We obviously can't use the Metrics annotation so a mechanism is required > to expose the values on JMX. > For a quick search, I haven't found any way like this in the code > [~wilfreds]: Are you aware of any way to expose values like this? > Most probably, we need to check how the Metrics annotation is processed, > understand the whole flow and check what is the underlying mechanism of the > metrics propagation to the JMX interface. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9352) Multiple versions of createSchedulingRequest in FairSchedulerTestBase could be cleaned up
[ https://issues.apache.org/jira/browse/YARN-9352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-9352: - Component/s: fairscheduler > Multiple versions of createSchedulingRequest in FairSchedulerTestBase could > be cleaned up > - > > Key: YARN-9352 > URL: https://issues.apache.org/jira/browse/YARN-9352 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Szilard Nemeth >Assignee: Siddharth Ahuja >Priority: Minor > Labels: newbie, newbie++, trivial > > createSchedulingRequest in FairSchedulerTestBase is overloaded many times. > This could be more cleaner is we introduced a builder instead of calling > various forms of this method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7239) Possible launch/cleanup race condition in ContainersLauncher
[ https://issues.apache.org/jira/browse/YARN-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-7239: - Component/s: nodemanager > Possible launch/cleanup race condition in ContainersLauncher > > > Key: YARN-7239 > URL: https://issues.apache.org/jira/browse/YARN-7239 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Miklos Szegedi >Priority: Major > Labels: newbie > > ContainersLauncher.handle() submits the launch job and then adds the job into > the collection risking that the cleanup will miss it and return. This should > be in reversed order in all 3 instances: > {code} > containerLauncher.submit(launch); > running.put(containerId, launch); > {code} > The cleanup code that the above code is racing with: > {code} > ContainerLaunch runningContainer = running.get(containerId); > if (runningContainer == null) { > // Container not launched. So nothing needs to be done. > LOG.info("Container " + containerId + " not running, nothing to > signal."); > return; > } > ... > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6474) CGroupsHandlerImpl.java has a few checkstyle issues left to be fixed after YARN-5301
[ https://issues.apache.org/jira/browse/YARN-6474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-6474: - Component/s: nodemanager > CGroupsHandlerImpl.java has a few checkstyle issues left to be fixed after > YARN-5301 > > > Key: YARN-6474 > URL: https://issues.apache.org/jira/browse/YARN-6474 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Miklos Szegedi >Priority: Minor > Labels: newbie, trivial > > The main issue is throw inside finally -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6525) Linux container executor should not propagate application errors
[ https://issues.apache.org/jira/browse/YARN-6525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-6525: - Component/s: LCE > Linux container executor should not propagate application errors > > > Key: YARN-6525 > URL: https://issues.apache.org/jira/browse/YARN-6525 > Project: Hadoop YARN > Issue Type: Bug > Components: LCE >Affects Versions: 3.0.0-alpha2 >Reporter: Miklos Szegedi >Priority: Major > Labels: newbie > > wait_and_get_exit_code currently returns the application error code as LCE > error code. This may overlap with LCE errors. Instead LCE should return a > fixed application failed error code. I should print the application error > into the logs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10680) Revisit try blocks without catch blocks but having finally blocks
[ https://issues.apache.org/jira/browse/YARN-10680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-10680: -- Fix Version/s: 3.4.0 > Revisit try blocks without catch blocks but having finally blocks > - > > Key: YARN-10680 > URL: https://issues.apache.org/jira/browse/YARN-10680 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Susheel Gupta >Priority: Minor > Labels: newbie, pull-request-available, trivial > Fix For: 3.4.0 > > Attachments: YARN-10860.001.patch > > > This jira is to revisit all try blocks without catch blocks but having > finally blocks in SLS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10680) Revisit try blocks without catch blocks but having finally blocks
[ https://issues.apache.org/jira/browse/YARN-10680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-10680. --- Hadoop Flags: Reviewed Resolution: Fixed > Revisit try blocks without catch blocks but having finally blocks > - > > Key: YARN-10680 > URL: https://issues.apache.org/jira/browse/YARN-10680 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Susheel Gupta >Priority: Minor > Labels: newbie, pull-request-available, trivial > Fix For: 3.4.0 > > Attachments: YARN-10860.001.patch > > > This jira is to revisit all try blocks without catch blocks but having > finally blocks in SLS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-4944) Handle lack of ResourceCalculatorPlugin gracefully
[ https://issues.apache.org/jira/browse/YARN-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-4944: Assignee: Susheel Gupta > Handle lack of ResourceCalculatorPlugin gracefully > -- > > Key: YARN-4944 > URL: https://issues.apache.org/jira/browse/YARN-4944 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Susheel Gupta >Priority: Major > Labels: newbie++, trivial > > On some systems (e.g. mac), the NM might not be able to instantiate a > ResourceCalculatorPlugin and leads to logging a bunch of error messages. We > could improve the way we handle this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-5607) Document TestContainerResourceUsage#waitForContainerCompletion
[ https://issues.apache.org/jira/browse/YARN-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-5607: Assignee: Susheel Gupta (was: Gergely Pollák) > Document TestContainerResourceUsage#waitForContainerCompletion > -- > > Key: YARN-5607 > URL: https://issues.apache.org/jira/browse/YARN-5607 > Project: Hadoop YARN > Issue Type: Test > Components: resourcemanager, test >Affects Versions: 2.9.0 >Reporter: Karthik Kambatla >Assignee: Susheel Gupta >Priority: Major > Labels: newbie > > The logic in TestContainerResourceUsage#waitForContainerCompletion > (introduced in YARN-5024) is not immediately obvious. It could use some > documentation. Also, this seems like a useful helper method. Should this be > moved to one of the mock classes or to a util class? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6412) aux-services classpath not documented
[ https://issues.apache.org/jira/browse/YARN-6412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-6412: Assignee: Riya Khandelwal (was: Siddharth Ahuja) > aux-services classpath not documented > - > > Key: YARN-6412 > URL: https://issues.apache.org/jira/browse/YARN-6412 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Riya Khandelwal >Priority: Minor > Labels: docuentation, newbie > > YARN-4577 introduced two new configuration entries > yarn.nodemanager.aux-services.%s.classpath and > yarn.nodemanager.aux-services.%s.system-classes. These are not documented in > hadoop-yarn-common/.../yarn-default.xml -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6971) Clean up different ways to create resources
[ https://issues.apache.org/jira/browse/YARN-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-6971: Assignee: Riya Khandelwal > Clean up different ways to create resources > --- > > Key: YARN-6971 > URL: https://issues.apache.org/jira/browse/YARN-6971 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Reporter: Yufei Gu >Assignee: Riya Khandelwal >Priority: Minor > Labels: newbie > > There are several ways to create a {{resource}} object, e.g., > BuilderUtils.newResource() and Resources.createResource(). These methods not > only cause confusing but also performance issues, for example > BuilderUtils.newResource() is significant slow than > Resources.createResource(). > We could merge them some how, and replace most BuilderUtils.newResource() > with Resources.createResource(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-6766) Add helper method in FairSchedulerAppsBlock to print app info
[ https://issues.apache.org/jira/browse/YARN-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-6766. -- Hadoop Flags: Reviewed Resolution: Fixed > Add helper method in FairSchedulerAppsBlock to print app info > - > > Key: YARN-6766 > URL: https://issues.apache.org/jira/browse/YARN-6766 > Project: Hadoop YARN > Issue Type: Improvement > Components: webapp >Affects Versions: 2.8.1, 3.0.0-alpha3 >Reporter: Daniel Templeton >Assignee: Riya Khandelwal >Priority: Minor > Labels: newbie, pull-request-available, trivial > Fix For: 3.4.0 > > > The various {{*AppsBlock}} classes are riddled with statements like: > {code}.append(appInfo.getReservedVCores() == -1 ? "N/A" : > String.valueOf(appInfo.getReservedVCores())){code} > The code would be much cleaner if there were a utility method for that > operation, e.g.: > {code}.append(printData(appInfo.getReservedCores())){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6766) Add helper method in FairSchedulerAppsBlock to print app info
[ https://issues.apache.org/jira/browse/YARN-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-6766: - Fix Version/s: 3.4.0 > Add helper method in FairSchedulerAppsBlock to print app info > - > > Key: YARN-6766 > URL: https://issues.apache.org/jira/browse/YARN-6766 > Project: Hadoop YARN > Issue Type: Improvement > Components: webapp >Affects Versions: 2.8.1, 3.0.0-alpha3 >Reporter: Daniel Templeton >Assignee: Riya Khandelwal >Priority: Minor > Labels: newbie, pull-request-available, trivial > Fix For: 3.4.0 > > > The various {{*AppsBlock}} classes are riddled with statements like: > {code}.append(appInfo.getReservedVCores() == -1 ? "N/A" : > String.valueOf(appInfo.getReservedVCores())){code} > The code would be much cleaner if there were a utility method for that > operation, e.g.: > {code}.append(printData(appInfo.getReservedCores())){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org