[jira] [Commented] (YARN-9554) TimelineEntity DAO has java.util.Set interface which JAXB can't handle

2021-12-13 Thread Jira


[ 
https://issues.apache.org/jira/browse/YARN-9554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458788#comment-17458788
 ] 

László Bodor commented on YARN-9554:


while I'm trying to upgrade tez to hadoop 3.3.1, a [unit 
test|https://github.com/apache/tez/blob/master/tez-plugins/tez-yarn-timeline-history-with-acls/src/test/java/org/apache/tez/dag/history/ats/acls/TestATSHistoryWithACLs.java]
 throws an exception which is introduced by patch:
{code}
SEVERE: Failed to generate the schema for the JAX-B elements
javax.xml.bind.JAXBException: TimelineEntity and TimelineEntities has 
IllegalAnnotation
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.ContextFactory.createContext(ContextFactory.java)
{code}
I'm a bit confused, what's the expected way of making the unit test work again? 
the exception is thrown when test tries to fetch a timeline entity from AHS 
(1.0):
{code}
  private  K getTimelineData(String url, Class clazz) {
Client client = new Client();
WebResource resource = client.resource(url);

ClientResponse response = resource.accept(MediaType.APPLICATION_JSON)
.get(ClientResponse.class);
assertEquals(200, response.getStatus());

assertTrue(MediaType.APPLICATION_JSON_TYPE.isCompatible(response.getType()));

K entity = response.getEntity(clazz); <--- fails at this point, clazz is 
TimelineEntity.class
assertNotNull(entity);
return entity;
  }

{code}

> TimelineEntity DAO has java.util.Set interface which JAXB can't handle
> --
>
> Key: YARN-9554
> URL: https://issues.apache.org/jira/browse/YARN-9554
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineservice
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9554-001.patch, YARN-9554-002.patch
>
>
> TimelineEntity DAO has java.util.Set interface which JAXB can't handle. This 
> breaks the fix of YARN-7266.
> {code}
> Caused by: com.sun.xml.internal.bind.v2.runtime.IllegalAnnotationsException: 
> 1 counts of IllegalAnnotationExceptions
> java.util.Set is an interface, and JAXB can't handle interfaces.
>   this problem is related to the following location:
>   at java.util.Set
>   at public java.util.HashMap 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getPrimaryFiltersJAXB()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities
>   at 
> com.sun.xml.internal.bind.v2.runtime.IllegalAnnotationsException$Builder.check(IllegalAnnotationsException.java:91)
>   at 
> com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:445)
>   at 
> com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:277)
>   at 
> com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:124)
>   at 
> com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1123)
>   at 
> com.sun.xml.internal.bind.v2.ContextFactory.createContext(ContextFactory.java:147)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10907) Minimize usages of AbstractCSQueue#csContext

2021-12-13 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10907:
--
Fix Version/s: 3.4.0

> Minimize usages of AbstractCSQueue#csContext
> 
>
> Key: YARN-10907
> URL: https://issues.apache.org/jira/browse/YARN-10907
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Szilard Nemeth
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> Context objects can be a sign of a code smell as they can contain many, 
> possible loosely related references to other objects.
> CapacitySchedulerContext seems like this.
> This task is to investigate how the field AbstractCSQueue#csContext is being 
> used from this class and possibly keeping the usage of this context class on 
> the bare minimum. 
> Related article: https://wiki.c2.com/?ContextObjectsAreEvil



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9450) TestCapacityOverTimePolicy#testAllocation fails sporadically

2021-12-13 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth reassigned YARN-9450:


Assignee: Szilard Nemeth  (was: Prabhu Joseph)

> TestCapacityOverTimePolicy#testAllocation fails sporadically
> 
>
> Key: YARN-9450
> URL: https://issues.apache.org/jira/browse/YARN-9450
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, test
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Szilard Nemeth
>Priority: Major
>
> TestCapacityOverTimePolicy#testAllocation fails sporadically. Observed in 
> multiple builds ran for - YARN-9447, YARN-8193, YARN-8051.
> {code}
> Failed
> org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy.testAllocation[Duration
>  90,000,000, height 0.25, numSubmission 1, periodic 8640)]
> Failing for the past 1 build (Since Failed#23900 )
> Took 34 ms.
> Stacktrace
> junit.framework.AssertionFailedError
>   at junit.framework.Assert.fail(Assert.java:55)
>   at junit.framework.Assert.fail(Assert.java:64)
>   at junit.framework.TestCase.fail(TestCase.java:235)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.BaseSharingPolicyTest.runTest(BaseSharingPolicyTest.java:146)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy.testAllocation(TestCapacityOverTimePolicy.java:136)
>   at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runners.Suite.runChild(Suite.java:128)
>   at org.junit.runners.Suite.runChild(Suite.java:27)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> Standard Output
> 2019-04-05 23:46:19,022 INFO  [main] recovery.RMStateStore 
> (RMStateStore.java:transition(591)) - Storing reservation 
> allocation.reservation_-4277767163553399219_8391370105871519867
> 2019-04-05 23:46:19,022 INFO  [main] recovery.RMStateStore 
> (MemoryRMStateStore.java:storeReservationState(258)) - Storing 
> reservationallocation for 
> reservation_-4277767163553399219_8391370105871519867 for plan dedicated
> 2019-04-05 23:46:19,023 INFO  [main] reservation.InMemoryPlan 
> (InMemoryPlan.java:addReservation(373)) - Successfully added reservation: 
> 

[jira] [Assigned] (YARN-7548) TestCapacityOverTimePolicy.testAllocation is flaky

2021-12-13 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth reassigned YARN-7548:


Assignee: Szilard Nemeth

> TestCapacityOverTimePolicy.testAllocation is flaky
> --
>
> Key: YARN-7548
> URL: https://issues.apache.org/jira/browse/YARN-7548
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: reservation system
>Affects Versions: 3.0.0-beta1
>Reporter: Haibo Chen
>Assignee: Szilard Nemeth
>Priority: Major
>
> It failed in both YARN-7337 and YARN-6921 jenkins jobs.
> org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy.testAllocation[Duration
>  90,000,000, height 0.25, numSubmission 1, periodic 8640)]
> *Stacktrace*
> {code:java}
> junit.framework.AssertionFailedError: null
>  at junit.framework.Assert.fail(Assert.java:55)
>  at junit.framework.Assert.fail(Assert.java:64)
>  at junit.framework.TestCase.fail(TestCase.java:235)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.BaseSharingPolicyTest.runTest(BaseSharingPolicyTest.java:146)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy.testAllocation(TestCapacityOverTimePolicy.java:136){code}
> *Standard Output*
> {code:java}
> 2017-11-20 23:57:03,759 INFO [main] recovery.RMStateStore 
> (RMStateStore.java:transition(538)) - Storing reservation 
> allocation.reservation_-9026698577416205920_6337917439559340517
>  2017-11-20 23:57:03,759 INFO [main] recovery.RMStateStore 
> (MemoryRMStateStore.java:storeReservationState(247)) - Storing 
> reservationallocation for 
> reservation_-9026698577416205920_6337917439559340517 for plan dedicated
>  2017-11-20 23:57:03,760 INFO [main] reservation.InMemoryPlan 
> (InMemoryPlan.java:addReservation(373)) - Successfully added reservation: 
> reservation_-9026698577416205920_6337917439559340517 to plan.
>  In-memory Plan: Parent Queue: dedicatedTotal Capacity:  vCores:1000>Step: 1000reservation_-9026698577416205920_6337917439559340517 
> user:u1 startTime: 0 endTime: 8640 Periodiciy: 8640 alloc:
>  [Period: 8640
>  0: 
>  3423748: 
>  86223748: 
>  8640: 
>  9223372036854775807: null
>  ]{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11045) ATSv2 storage monitor fails to read from hbase cluster

2021-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-11045:
--
Labels: pull-request-available  (was: )

> ATSv2 storage monitor fails to read from hbase cluster
> --
>
> Key: YARN-11045
> URL: https://issues.apache.org/jira/browse/YARN-11045
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HBase compatible guava dependency is bit messed up i.e. timelineservice-hbase 
> modules are still being built with Hadoop's guava version (defined in 
> hadoop-project) and this creates issues with HBaseStorageMonitor reading 
> records from hbase cluster:
> {code:java}
> java.lang.RuntimeException: org.apache.hadoop.hbase.DoNotRetryIOException: 
> java.lang.NoSuchMethodError: 
> com.google.common.net.HostAndPort.getHostText()Ljava/lang/String;
>         at 
> org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:95)
>         at 
> org.apache.hadoop.yarn.server.timelineservice.storage.reader.TimelineEntityReader.readEntities(TimelineEntityReader.java:283)
>         at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseStorageMonitor.healthCheck(HBaseStorageMonitor.java:77)
>         at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineStorageMonitor$MonitorThread.run(TimelineStorageMonitor.java:89)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: 
> java.lang.NoSuchMethodError: 
> com.google.common.net.HostAndPort.getHostText()Ljava/lang/String;
>         at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.translateException(RpcRetryingCaller.java:260)
>         at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:233)
>         at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:394)
>         at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:368)
>         at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:143)
>         at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
>         ... 3 more
> Caused by: java.lang.NoSuchMethodError: 
> com.google.common.net.HostAndPort.getHostText()Ljava/lang/String;
>         at org.apache.hadoop.hbase.net.Address.getHostName(Address.java:72)
>         at 
> org.apache.hadoop.hbase.net.Address.toSocketAddress(Address.java:57)
>         at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:576)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:37250)
>         at 
> org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:405)
>         at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:274)
>         at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)
>         at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:231)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11045) ATSv2 storage monitor fails to read from hbase cluster

2021-12-13 Thread Viraj Jasani (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani updated YARN-11045:

Summary: ATSv2 storage monitor fails to read from hbase cluster  (was: 
ATSv2 storage monitor fails to read from HBase)

> ATSv2 storage monitor fails to read from hbase cluster
> --
>
> Key: YARN-11045
> URL: https://issues.apache.org/jira/browse/YARN-11045
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>
> HBase compatible guava dependency is bit messed up i.e. timelineservice-hbase 
> modules are still being built with Hadoop's guava version (defined in 
> hadoop-project) and this creates issues with HBaseStorageMonitor reading 
> records from hbase cluster:
> {code:java}
> java.lang.RuntimeException: org.apache.hadoop.hbase.DoNotRetryIOException: 
> java.lang.NoSuchMethodError: 
> com.google.common.net.HostAndPort.getHostText()Ljava/lang/String;
>         at 
> org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:95)
>         at 
> org.apache.hadoop.yarn.server.timelineservice.storage.reader.TimelineEntityReader.readEntities(TimelineEntityReader.java:283)
>         at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseStorageMonitor.healthCheck(HBaseStorageMonitor.java:77)
>         at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineStorageMonitor$MonitorThread.run(TimelineStorageMonitor.java:89)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: 
> java.lang.NoSuchMethodError: 
> com.google.common.net.HostAndPort.getHostText()Ljava/lang/String;
>         at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.translateException(RpcRetryingCaller.java:260)
>         at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:233)
>         at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:394)
>         at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:368)
>         at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:143)
>         at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
>         ... 3 more
> Caused by: java.lang.NoSuchMethodError: 
> com.google.common.net.HostAndPort.getHostText()Ljava/lang/String;
>         at org.apache.hadoop.hbase.net.Address.getHostName(Address.java:72)
>         at 
> org.apache.hadoop.hbase.net.Address.toSocketAddress(Address.java:57)
>         at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:576)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:37250)
>         at 
> org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:405)
>         at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:274)
>         at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)
>         at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:231)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11045) ATSv2 storage monitor fails to read from HBase

2021-12-13 Thread Viraj Jasani (Jira)
Viraj Jasani created YARN-11045:
---

 Summary: ATSv2 storage monitor fails to read from HBase
 Key: YARN-11045
 URL: https://issues.apache.org/jira/browse/YARN-11045
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.4.0
Reporter: Viraj Jasani
Assignee: Viraj Jasani


HBase compatible guava dependency is bit messed up i.e. timelineservice-hbase 
modules are still being built with Hadoop's guava version (defined in 
hadoop-project) and this creates issues with HBaseStorageMonitor reading 
records from hbase cluster:
{code:java}
java.lang.RuntimeException: org.apache.hadoop.hbase.DoNotRetryIOException: 
java.lang.NoSuchMethodError: 
com.google.common.net.HostAndPort.getHostText()Ljava/lang/String;
        at 
org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:95)
        at 
org.apache.hadoop.yarn.server.timelineservice.storage.reader.TimelineEntityReader.readEntities(TimelineEntityReader.java:283)
        at 
org.apache.hadoop.yarn.server.timelineservice.storage.HBaseStorageMonitor.healthCheck(HBaseStorageMonitor.java:77)
        at 
org.apache.hadoop.yarn.server.timelineservice.storage.TimelineStorageMonitor$MonitorThread.run(TimelineStorageMonitor.java:89)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: 
java.lang.NoSuchMethodError: 
com.google.common.net.HostAndPort.getHostText()Ljava/lang/String;
        at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.translateException(RpcRetryingCaller.java:260)
        at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:233)
        at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:394)
        at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:368)
        at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:143)
        at 
org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
        ... 3 more
Caused by: java.lang.NoSuchMethodError: 
com.google.common.net.HostAndPort.getHostText()Ljava/lang/String;
        at org.apache.hadoop.hbase.net.Address.getHostName(Address.java:72)
        at org.apache.hadoop.hbase.net.Address.toSocketAddress(Address.java:57)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:576)
        at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:37250)
        at 
org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:405)
        at 
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:274)
        at 
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)
        at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:231)
 {code}
 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11044) TestApplicationLimits.testLimitsComputation() has some uneffective asserts

2021-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-11044:
--
Labels: newbie pull-request-available  (was: newbie)

> TestApplicationLimits.testLimitsComputation() has some uneffective asserts
> --
>
> Key: YARN-11044
> URL: https://issues.apache.org/jira/browse/YARN-11044
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: newbie, pull-request-available
>
> TestApplicationLimits.testLimitsComputation() has the following two asserts:
> {code:java}
> // should default to global setting if per queue setting not set
> assertEquals((long)CapacitySchedulerConfiguration.DEFAULT_MAXIMUM_APPLICATIONMASTERS_RESOURCE_PERCENT,
> (long)csConf.getMaximumApplicationMasterResourcePerQueuePercent(
> queue.getQueuePath()));
> {code}
> and
> {code:java}
> assertEquals((long) 0.5,
> (long) csConf.getMaximumApplicationMasterResourcePerQueuePercent(
>   queue.getQueuePath()));
> {code}
> In the current form neither of them make too much sense because 
> getMaximumApplicationMasterResourcePerQueuePercent returns a float (between 0 
> and 1.0), so the only way this will fail is when the configuration is below 0 
> or above 1, but we're not testing invalid configurations here. This should be 
> corrected.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11043) Clean up checkstyle warnings from YARN-11024/10907/10929

2021-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-11043:
--
Labels: pull-request-available  (was: )

> Clean up checkstyle warnings from YARN-11024/10907/10929
> 
>
> Key: YARN-11043
> URL: https://issues.apache.org/jira/browse/YARN-11043
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: pull-request-available
> Attachments: checkstyle_warnings.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> YARN-11024, YARN-10907, YARN-10929 are consecutive changes built on top of 
> each other. This jira is a followup to clean up the checkstyle warnings 
> present in the modified files.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-11044) TestApplicationLimits.testLimitsComputation() has some uneffective asserts

2021-12-13 Thread Benjamin Teke (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke reassigned YARN-11044:


Assignee: Benjamin Teke

> TestApplicationLimits.testLimitsComputation() has some uneffective asserts
> --
>
> Key: YARN-11044
> URL: https://issues.apache.org/jira/browse/YARN-11044
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: newbie
>
> TestApplicationLimits.testLimitsComputation() has the following two asserts:
> {code:java}
> // should default to global setting if per queue setting not set
> assertEquals((long)CapacitySchedulerConfiguration.DEFAULT_MAXIMUM_APPLICATIONMASTERS_RESOURCE_PERCENT,
> (long)csConf.getMaximumApplicationMasterResourcePerQueuePercent(
> queue.getQueuePath()));
> {code}
> and
> {code:java}
> assertEquals((long) 0.5,
> (long) csConf.getMaximumApplicationMasterResourcePerQueuePercent(
>   queue.getQueuePath()));
> {code}
> In the current form neither of them make too much sense because 
> getMaximumApplicationMasterResourcePerQueuePercent returns a float (between 0 
> and 1.0), so the only way this will fail is when the configuration is below 0 
> or above 1, but we're not testing invalid configurations here. This should be 
> corrected.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11044) TestApplicationLimits.testLimitsComputation() has some uneffective asserts

2021-12-13 Thread Benjamin Teke (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke updated YARN-11044:
-
Labels: newbie  (was: )

> TestApplicationLimits.testLimitsComputation() has some uneffective asserts
> --
>
> Key: YARN-11044
> URL: https://issues.apache.org/jira/browse/YARN-11044
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Benjamin Teke
>Priority: Major
>  Labels: newbie
>
> TestApplicationLimits.testLimitsComputation() has the following two asserts:
> {code:java}
> // should default to global setting if per queue setting not set
> assertEquals((long)CapacitySchedulerConfiguration.DEFAULT_MAXIMUM_APPLICATIONMASTERS_RESOURCE_PERCENT,
> (long)csConf.getMaximumApplicationMasterResourcePerQueuePercent(
> queue.getQueuePath()));
> {code}
> and
> {code:java}
> assertEquals((long) 0.5,
> (long) csConf.getMaximumApplicationMasterResourcePerQueuePercent(
>   queue.getQueuePath()));
> {code}
> In the current form neither of them make too much sense because 
> getMaximumApplicationMasterResourcePerQueuePercent returns a float (between 0 
> and 1.0), so the only way this will fail is when the configuration is below 0 
> or above 1, but we're not testing invalid configurations here. This should be 
> corrected.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11044) TestApplicationLimits.testLimitsComputation() has some uneffective asserts

2021-12-13 Thread Benjamin Teke (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke updated YARN-11044:
-
Description: 
TestApplicationLimits.testLimitsComputation() has the following two asserts:


{code:java}
// should default to global setting if per queue setting not set
assertEquals((long)CapacitySchedulerConfiguration.DEFAULT_MAXIMUM_APPLICATIONMASTERS_RESOURCE_PERCENT,
(long)csConf.getMaximumApplicationMasterResourcePerQueuePercent(
queue.getQueuePath()));
{code}

and


{code:java}
assertEquals((long) 0.5,
(long) csConf.getMaximumApplicationMasterResourcePerQueuePercent(
  queue.getQueuePath()));
{code}

In the current form neither of them make too much sense because 
getMaximumApplicationMasterResourcePerQueuePercent returns a float (between 0 
and 1.0), so the only way this will fail is when the configuration is below 0 
or above 1, but we're not testing invalid configurations here. This should be 
corrected.


  was:
TestApplicationLimits.testLimitsComputation() has the following two asserts:


{code:java}
// should default to global setting if per queue setting not set
assertEquals((long)CapacitySchedulerConfiguration.DEFAULT_MAXIMUM_APPLICATIONMASTERS_RESOURCE_PERCENT,
(long)csConf.getMaximumApplicationMasterResourcePerQueuePercent(
queue.getQueuePath()));
{code}

and


{code:java}
assertEquals((long) 0.5,
(long) csConf.getMaximumApplicationMasterResourcePerQueuePercent(
  queue.getQueuePath()));
{code}

In the current form neither of them make too much sense because 
getMaximumApplicationMasterResourcePerQueuePercent returns a float (between 0 
and 1.0), so the only way this will fail, if the configuration is below 0 or 
above 1, but we're not testing that here. This should be corrected.



> TestApplicationLimits.testLimitsComputation() has some uneffective asserts
> --
>
> Key: YARN-11044
> URL: https://issues.apache.org/jira/browse/YARN-11044
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Benjamin Teke
>Priority: Major
>
> TestApplicationLimits.testLimitsComputation() has the following two asserts:
> {code:java}
> // should default to global setting if per queue setting not set
> assertEquals((long)CapacitySchedulerConfiguration.DEFAULT_MAXIMUM_APPLICATIONMASTERS_RESOURCE_PERCENT,
> (long)csConf.getMaximumApplicationMasterResourcePerQueuePercent(
> queue.getQueuePath()));
> {code}
> and
> {code:java}
> assertEquals((long) 0.5,
> (long) csConf.getMaximumApplicationMasterResourcePerQueuePercent(
>   queue.getQueuePath()));
> {code}
> In the current form neither of them make too much sense because 
> getMaximumApplicationMasterResourcePerQueuePercent returns a float (between 0 
> and 1.0), so the only way this will fail is when the configuration is below 0 
> or above 1, but we're not testing invalid configurations here. This should be 
> corrected.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11044) TestApplicationLimits.testLimitsComputation() has some uneffective asserts

2021-12-13 Thread Benjamin Teke (Jira)
Benjamin Teke created YARN-11044:


 Summary: TestApplicationLimits.testLimitsComputation() has some 
uneffective asserts
 Key: YARN-11044
 URL: https://issues.apache.org/jira/browse/YARN-11044
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Benjamin Teke


TestApplicationLimits.testLimitsComputation() has the following two asserts:


{code:java}
// should default to global setting if per queue setting not set
assertEquals((long)CapacitySchedulerConfiguration.DEFAULT_MAXIMUM_APPLICATIONMASTERS_RESOURCE_PERCENT,
(long)csConf.getMaximumApplicationMasterResourcePerQueuePercent(
queue.getQueuePath()));
{code}

and


{code:java}
assertEquals((long) 0.5,
(long) csConf.getMaximumApplicationMasterResourcePerQueuePercent(
  queue.getQueuePath()));
{code}

In the current form neither of them make too much sense because 
getMaximumApplicationMasterResourcePerQueuePercent returns a float (between 0 
and 1.0), so the only way this will fail, if the configuration is below 0 or 
above 1, but we're not testing that here. This should be corrected.




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11024) Create an AbstractLeafQueue to store the common LeafQueue + AutoCreatedLeafQueue functionality

2021-12-13 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-11024:
--
Fix Version/s: 3.4.0

> Create an AbstractLeafQueue to store the common LeafQueue + 
> AutoCreatedLeafQueue functionality
> --
>
> Key: YARN-11024
> URL: https://issues.apache.org/jira/browse/YARN-11024
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> AbstractAutoCreatedLeafQueue extends the LeafQueue class which is an 
> instantiable class, so every time an AutoCreatedLeafQueue is created a normal 
> LeafQueue is configured as well. This setup results in some strange behaviour 
> like having to pass the template configs of an auto created queue to a leaf 
> queue. To make the whole structure more flexible an AbstractLeafQueue should 
> be created which stores the common methods.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11043) Clean up checkstyle warnings from YARN-11024/10907/10929

2021-12-13 Thread Benjamin Teke (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke updated YARN-11043:
-
Attachment: checkstyle_warnings.txt

> Clean up checkstyle warnings from YARN-11024/10907/10929
> 
>
> Key: YARN-11043
> URL: https://issues.apache.org/jira/browse/YARN-11043
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: checkstyle_warnings.txt
>
>
> YARN-11024, YARN-10907, YARN-10929 are consecutive changes built on top of 
> each other. This jira is a followup to clean up the checkstyle warnings 
> present in the modified files.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11043) Clean up checkstyle warnings from YARN-11024/10907/10929

2021-12-13 Thread Benjamin Teke (Jira)
Benjamin Teke created YARN-11043:


 Summary: Clean up checkstyle warnings from YARN-11024/10907/10929
 Key: YARN-11043
 URL: https://issues.apache.org/jira/browse/YARN-11043
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Benjamin Teke
Assignee: Benjamin Teke
 Attachments: checkstyle_warnings.txt

YARN-11024, YARN-10907, YARN-10929 are consecutive changes built on top of each 
other. This jira is a followup to clean up the checkstyle warnings present in 
the modified files.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10870) Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM Scheduler page

2021-12-13 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth reassigned YARN-10870:
-

Assignee: Gergely Pollák  (was: Siddharth Ahuja)

> Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM 
> Scheduler page
> 
>
> Key: YARN-10870
> URL: https://issues.apache.org/jira/browse/YARN-10870
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Siddharth Ahuja
>Assignee: Gergely Pollák
>Priority: Major
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: YARN-10870.001.patch, YARN-10870.002.patch, 
> YARN-10870.branch-3.1.002.patch, YARN-10870.branch-3.2.002.patch, 
> YARN-10870.branch-3.3.002.patch
>
>
> Non-permissible users are (incorrectly) able to view application submitted by 
> another user on the RM's Scheduler UI (not Applications UI), where 
> _non-permissible users_ are non-application-owners and are not present in the 
> application ACL -> mapreduce.job.acl-view-job, nor present in the Queue ACL 
> as a Queue admin to which this job was submitted to" (see [1] where both the 
> filter setting introduced by YARN-8319 & ACL checks are performed):
> The issue can be reproduced easily by having the setting 
> {{yarn.webapp.filter-entity-list-by-user}} set to true in yarn-site.xml.
> The above disallows non-permissible users from viewing another user's 
> applications in the Applications page, but not in the Scheduler's page.
> The filter setting seems to be getting checked only on the getApps() call but 
> not while rendering the apps information on the Scheduler page. This seems to 
> be a "missed" feature from YARN-8319.
> Following pre-requisites are needed to reproduce the issue:
> * Kerberized cluster,
> * SPNEGO enabled for HDFS & YARN,
> * Add test users - systest and user1 on all nodes.
> * Add kerberos princs for the above users.
> * Create HDFS user dirs for above users and chown them appropriately.
> * Run a sample MR Sleep job and test.
> Steps to reproduce the issue:
> * kinit as "systest" user and run a sample MR sleep job from one of the nodes 
> in the cluster:
> {code}
> yarn jar  sleep -m 1 -mt 
> 360
> {code}
> * kinit as "user1" from Mac as an example (this assumes you've copied the 
> /etc/krb5.conf from the cluster to your Mac's /private/etc folder already for 
> Spengo auth).
> * Open the Applications page. user1 cannot view the job being run by systest. 
> This is correct.
> * Open the Scheduler page. user1 *CAN* view the job being run by systest. 
> This is *INCORRECT*.
> [1] 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L676



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8234) Improve RM system metrics publisher's performance by pushing events to timeline server in batch

2021-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-8234:
-
Labels: pull-request-available  (was: )

> Improve RM system metrics publisher's performance by pushing events to 
> timeline server in batch
> ---
>
> Key: YARN-8234
> URL: https://issues.apache.org/jira/browse/YARN-8234
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.8.3
>Reporter: Hu Ziqian
>Assignee: Ashutosh Gupta
>Priority: Critical
>  Labels: pull-request-available
> Attachments: YARN-8234-branch-2.8.3.001.patch, 
> YARN-8234-branch-2.8.3.002.patch, YARN-8234-branch-2.8.3.003.patch, 
> YARN-8234-branch-2.8.3.004.patch, YARN-8234.001.patch, YARN-8234.002.patch, 
> YARN-8234.003.patch, YARN-8234.004.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When system metrics publisher is enabled, RM will push events to timeline 
> server via restful api. If the cluster load is heavy, many events are sent to 
> timeline server and the timeline server's event handler thread locked. 
> YARN-7266 talked about the detail of this problem. Because of the lock, 
> timeline server can't receive event as fast as it generated in RM and lots of 
> timeline event stays in RM's memory. Finally, those events will consume all 
> RM's memory and RM will start a full gc (which cause an JVM stop-world and 
> cause a timeout from rm to zookeeper) or even get an OOM. 
> The main problem here is that timeline can't receive timeline server's event 
> as fast as it generated. Now, RM system metrics publisher put only one event 
> in a request, and most time costs on handling http header or some thing about 
> the net connection on timeline side. Only few time is spent on dealing with 
> the timeline event which is truly valuable.
> In this issue, we add a buffer in system metrics publisher and let publisher 
> send events to timeline server in batch via one request. When sets the batch 
> size to 1000, in out experiment the speed of the timeline server receives 
> events has 100x improvement. We have implement this function int our product 
> environment which accepts 2 app's in one hour and it works fine.
> We add following configuration:
>  * yarn.resourcemanager.system-metrics-publisher.batch-size: the size of 
> system metrics publisher sending events in one request. Default value is 1000
>  * yarn.resourcemanager.system-metrics-publisher.buffer-size: the size of the 
> event buffer in system metrics publisher.
>  * yarn.resourcemanager.system-metrics-publisher.interval-seconds: When 
> enable batch publishing, we must avoid that the publisher waits for a batch 
> to be filled up and hold events in buffer for long time. So we add another 
> thread which send event's in the buffer periodically. This config sets the 
> interval of the cyclical sending thread. The default value is 60s.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10823) Expose all node labels for root without explicit configurations

2021-12-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/YARN-10823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tibor Kovács reassigned YARN-10823:
---

Assignee: Andras Gyori  (was: Tibor Kovács)

> Expose all node labels for root without explicit configurations
> ---
>
> Key: YARN-10823
> URL: https://issues.apache.org/jira/browse/YARN-10823
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> By definition root capacity should be set for all node labels that are 
> configured for its descendants. Current proposition is to set a default 100 
> capacity for every node label that is configured for any of its descendant 
> and not for root.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10555) Missing access check before getAppAttempts

2021-12-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/YARN-10555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tibor Kovács reassigned YARN-10555:
---

Assignee: lujie  (was: Tibor Kovács)

>  Missing access check before getAppAttempts
> ---
>
> Key: YARN-10555
> URL: https://issues.apache.org/jira/browse/YARN-10555
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: lujie
>Assignee: lujie
>Priority: Critical
>  Labels: pull-request-available, security
> Fix For: 3.4.0, 3.3.1, 2.10.2, 3.2.3
>
> Attachments: YARN-10555_1.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> It seems that we miss a security check before getAppAttempts, see 
> [https://github.com/apache/hadoop/blob/513f1995adc9b73f9c7f4c7beb89725b51b313ac/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L1127]
> thus we can get the some sensitive information, like logs link.  
> {code:java}
> application_1609318368700_0002 belong to user2
> user1@hadoop11$ curl --negotiate -u  : 
> http://hadoop11:8088/ws/v1/cluster/apps/application_1609318368700_0002/appattempts/|jq
> {
>   "appAttempts": {
> "appAttempt": [
>   {
> "id": 1,
> "startTime": 1609318411566,
> "containerId": "container_1609318368700_0002_01_01",
> "nodeHttpAddress": "hadoop12:8044",
> "nodeId": "hadoop12:36831",
> "logsLink": 
> "http://hadoop12:8044/node/containerlogs/container_1609318368700_0002_01_01/user2;,
> "blacklistedNodes": "",
> "nodesBlacklistedBySystem": ""
>   }
> ]
>   }
> }
> {code}
> Other apis, like getApps and getApp, has access check  like "hasAccess(app, 
> hsr)", they would hide the logs link if the appid do not belong to query 
> user, see 
> [https://github.com/apache/hadoop/blob/513f1995adc9b73f9c7f4c7beb89725b51b313ac/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L1098]
>  We need add hasAccess(app, hsr) for getAppAttempts.
>  
> Besides, at 
> [https://github.com/apache/hadoop/blob/580a6a75a3e3d3b7918edeffd6e93fc211166884/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java#L145]
> it seems that we have  a access check in its caller,  so now i pass "true" to 
> AppAttemptInfo in the patch.  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10870) Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM Scheduler page

2021-12-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/YARN-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tibor Kovács reassigned YARN-10870:
---

Assignee: Siddharth Ahuja  (was: Tibor Kovács)

> Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM 
> Scheduler page
> 
>
> Key: YARN-10870
> URL: https://issues.apache.org/jira/browse/YARN-10870
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Siddharth Ahuja
>Assignee: Siddharth Ahuja
>Priority: Major
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: YARN-10870.001.patch, YARN-10870.002.patch, 
> YARN-10870.branch-3.1.002.patch, YARN-10870.branch-3.2.002.patch, 
> YARN-10870.branch-3.3.002.patch
>
>
> Non-permissible users are (incorrectly) able to view application submitted by 
> another user on the RM's Scheduler UI (not Applications UI), where 
> _non-permissible users_ are non-application-owners and are not present in the 
> application ACL -> mapreduce.job.acl-view-job, nor present in the Queue ACL 
> as a Queue admin to which this job was submitted to" (see [1] where both the 
> filter setting introduced by YARN-8319 & ACL checks are performed):
> The issue can be reproduced easily by having the setting 
> {{yarn.webapp.filter-entity-list-by-user}} set to true in yarn-site.xml.
> The above disallows non-permissible users from viewing another user's 
> applications in the Applications page, but not in the Scheduler's page.
> The filter setting seems to be getting checked only on the getApps() call but 
> not while rendering the apps information on the Scheduler page. This seems to 
> be a "missed" feature from YARN-8319.
> Following pre-requisites are needed to reproduce the issue:
> * Kerberized cluster,
> * SPNEGO enabled for HDFS & YARN,
> * Add test users - systest and user1 on all nodes.
> * Add kerberos princs for the above users.
> * Create HDFS user dirs for above users and chown them appropriately.
> * Run a sample MR Sleep job and test.
> Steps to reproduce the issue:
> * kinit as "systest" user and run a sample MR sleep job from one of the nodes 
> in the cluster:
> {code}
> yarn jar  sleep -m 1 -mt 
> 360
> {code}
> * kinit as "user1" from Mac as an example (this assumes you've copied the 
> /etc/krb5.conf from the cluster to your Mac's /private/etc folder already for 
> Spengo auth).
> * Open the Applications page. user1 cannot view the job being run by systest. 
> This is correct.
> * Open the Scheduler page. user1 *CAN* view the job being run by systest. 
> This is *INCORRECT*.
> [1] 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L676



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10720) YARN WebAppProxyServlet should support connection timeout to prevent proxy server from hanging

2021-12-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/YARN-10720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tibor Kovács reassigned YARN-10720:
---

Assignee: Qi Zhu  (was: Tibor Kovács)

> YARN WebAppProxyServlet should support connection timeout to prevent proxy 
> server from hanging
> --
>
> Key: YARN-10720
> URL: https://issues.apache.org/jira/browse/YARN-10720
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
> Fix For: 3.4.0
>
> Attachments: YARN-10720.001.patch, YARN-10720.002.patch, 
> YARN-10720.003.patch, YARN-10720.004.patch, YARN-10720.005.patch, 
> YARN-10720.006.patch, image-2021-03-29-14-04-33-776.png, 
> image-2021-03-29-14-05-32-708.png
>
>
> Following is proxy server show, {color:#de350b}too many connections from one 
> client{color}, this caused the proxy server hang, and the yarn web can't jump 
> to web proxy.
> !image-2021-03-29-14-04-33-776.png|width=632,height=57!
> Following is the AM which is abnormal, but proxy server don't know it is 
> abnormal already, so the connections can't be closed, we should add time out 
> support in proxy server to prevent this. And one abnormal AM may cause 
> hundreds even thousands of connections, it is very heavy.
> !image-2021-03-29-14-05-32-708.png|width=669,height=101!
>  
> After i kill the abnormal AM, the proxy server become healthy. This case 
> happened many times in our production clusters, our clusters are huge, and 
> the abnormal AM will be existed in a regular case.
>  
> I will add timeout supported in web proxy server in this jira.
>  
> cc  [~pbacsko] [~ebadger] [~Jim_Brennan]  [~ztang]  [~epayne] [~gandras]  
> [~bteke]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10701) The yarn.resource-types should support multi types without trimmed.

2021-12-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/YARN-10701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tibor Kovács reassigned YARN-10701:
---

Assignee: Qi Zhu  (was: Tibor Kovács)

> The yarn.resource-types should support multi types without trimmed.
> ---
>
> Key: YARN-10701
> URL: https://issues.apache.org/jira/browse/YARN-10701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-10701-branch-3.3.001.patch, YARN-10701.001.patch, 
> YARN-10701.002.patch
>
>
> {code:java}
> 
>  
>  yarn.resource-types
>  yarn.io/gpu, yarn.io/fpga
>  
>  {code}
>  When i configured the resource type above with gpu and fpga, the error 
> happend:
>  
> {code:java}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: ' yarn.io/fpga' is 
> not a valid resource name. A valid resource name must begin with a letter and 
> contain only letters, numbers, and any of: '.', '_', or '-'. A valid resource 
> name may also be optionally preceded by a name space followed by a slash. A 
> valid name space consists of period-separated groups of letters, numbers, and 
> dashes.{code}
>   
>  The resource types should support trim.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10555) Missing access check before getAppAttempts

2021-12-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/YARN-10555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tibor Kovács reassigned YARN-10555:
---

Assignee: Tibor Kovács  (was: lujie)

>  Missing access check before getAppAttempts
> ---
>
> Key: YARN-10555
> URL: https://issues.apache.org/jira/browse/YARN-10555
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: lujie
>Assignee: Tibor Kovács
>Priority: Critical
>  Labels: pull-request-available, security
> Fix For: 3.4.0, 3.3.1, 2.10.2, 3.2.3
>
> Attachments: YARN-10555_1.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> It seems that we miss a security check before getAppAttempts, see 
> [https://github.com/apache/hadoop/blob/513f1995adc9b73f9c7f4c7beb89725b51b313ac/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L1127]
> thus we can get the some sensitive information, like logs link.  
> {code:java}
> application_1609318368700_0002 belong to user2
> user1@hadoop11$ curl --negotiate -u  : 
> http://hadoop11:8088/ws/v1/cluster/apps/application_1609318368700_0002/appattempts/|jq
> {
>   "appAttempts": {
> "appAttempt": [
>   {
> "id": 1,
> "startTime": 1609318411566,
> "containerId": "container_1609318368700_0002_01_01",
> "nodeHttpAddress": "hadoop12:8044",
> "nodeId": "hadoop12:36831",
> "logsLink": 
> "http://hadoop12:8044/node/containerlogs/container_1609318368700_0002_01_01/user2;,
> "blacklistedNodes": "",
> "nodesBlacklistedBySystem": ""
>   }
> ]
>   }
> }
> {code}
> Other apis, like getApps and getApp, has access check  like "hasAccess(app, 
> hsr)", they would hide the logs link if the appid do not belong to query 
> user, see 
> [https://github.com/apache/hadoop/blob/513f1995adc9b73f9c7f4c7beb89725b51b313ac/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L1098]
>  We need add hasAccess(app, hsr) for getAppAttempts.
>  
> Besides, at 
> [https://github.com/apache/hadoop/blob/580a6a75a3e3d3b7918edeffd6e93fc211166884/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java#L145]
> it seems that we have  a access check in its caller,  so now i pass "true" to 
> AppAttemptInfo in the patch.  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10823) Expose all node labels for root without explicit configurations

2021-12-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/YARN-10823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tibor Kovács reassigned YARN-10823:
---

Assignee: Tibor Kovács  (was: Andras Gyori)

> Expose all node labels for root without explicit configurations
> ---
>
> Key: YARN-10823
> URL: https://issues.apache.org/jira/browse/YARN-10823
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Andras Gyori
>Assignee: Tibor Kovács
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> By definition root capacity should be set for all node labels that are 
> configured for its descendants. Current proposition is to set a default 100 
> capacity for every node label that is configured for any of its descendant 
> and not for root.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10870) Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM Scheduler page

2021-12-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/YARN-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tibor Kovács reassigned YARN-10870:
---

Assignee: Tibor Kovács  (was: Gergely Pollák)

> Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM 
> Scheduler page
> 
>
> Key: YARN-10870
> URL: https://issues.apache.org/jira/browse/YARN-10870
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Siddharth Ahuja
>Assignee: Tibor Kovács
>Priority: Major
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: YARN-10870.001.patch, YARN-10870.002.patch, 
> YARN-10870.branch-3.1.002.patch, YARN-10870.branch-3.2.002.patch, 
> YARN-10870.branch-3.3.002.patch
>
>
> Non-permissible users are (incorrectly) able to view application submitted by 
> another user on the RM's Scheduler UI (not Applications UI), where 
> _non-permissible users_ are non-application-owners and are not present in the 
> application ACL -> mapreduce.job.acl-view-job, nor present in the Queue ACL 
> as a Queue admin to which this job was submitted to" (see [1] where both the 
> filter setting introduced by YARN-8319 & ACL checks are performed):
> The issue can be reproduced easily by having the setting 
> {{yarn.webapp.filter-entity-list-by-user}} set to true in yarn-site.xml.
> The above disallows non-permissible users from viewing another user's 
> applications in the Applications page, but not in the Scheduler's page.
> The filter setting seems to be getting checked only on the getApps() call but 
> not while rendering the apps information on the Scheduler page. This seems to 
> be a "missed" feature from YARN-8319.
> Following pre-requisites are needed to reproduce the issue:
> * Kerberized cluster,
> * SPNEGO enabled for HDFS & YARN,
> * Add test users - systest and user1 on all nodes.
> * Add kerberos princs for the above users.
> * Create HDFS user dirs for above users and chown them appropriately.
> * Run a sample MR Sleep job and test.
> Steps to reproduce the issue:
> * kinit as "systest" user and run a sample MR sleep job from one of the nodes 
> in the cluster:
> {code}
> yarn jar  sleep -m 1 -mt 
> 360
> {code}
> * kinit as "user1" from Mac as an example (this assumes you've copied the 
> /etc/krb5.conf from the cluster to your Mac's /private/etc folder already for 
> Spengo auth).
> * Open the Applications page. user1 cannot view the job being run by systest. 
> This is correct.
> * Open the Scheduler page. user1 *CAN* view the job being run by systest. 
> This is *INCORRECT*.
> [1] 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L676



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10720) YARN WebAppProxyServlet should support connection timeout to prevent proxy server from hanging

2021-12-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/YARN-10720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tibor Kovács reassigned YARN-10720:
---

Assignee: Tibor Kovács  (was: Qi Zhu)

> YARN WebAppProxyServlet should support connection timeout to prevent proxy 
> server from hanging
> --
>
> Key: YARN-10720
> URL: https://issues.apache.org/jira/browse/YARN-10720
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Qi Zhu
>Assignee: Tibor Kovács
>Priority: Critical
> Fix For: 3.4.0
>
> Attachments: YARN-10720.001.patch, YARN-10720.002.patch, 
> YARN-10720.003.patch, YARN-10720.004.patch, YARN-10720.005.patch, 
> YARN-10720.006.patch, image-2021-03-29-14-04-33-776.png, 
> image-2021-03-29-14-05-32-708.png
>
>
> Following is proxy server show, {color:#de350b}too many connections from one 
> client{color}, this caused the proxy server hang, and the yarn web can't jump 
> to web proxy.
> !image-2021-03-29-14-04-33-776.png|width=632,height=57!
> Following is the AM which is abnormal, but proxy server don't know it is 
> abnormal already, so the connections can't be closed, we should add time out 
> support in proxy server to prevent this. And one abnormal AM may cause 
> hundreds even thousands of connections, it is very heavy.
> !image-2021-03-29-14-05-32-708.png|width=669,height=101!
>  
> After i kill the abnormal AM, the proxy server become healthy. This case 
> happened many times in our production clusters, our clusters are huge, and 
> the abnormal AM will be existed in a regular case.
>  
> I will add timeout supported in web proxy server in this jira.
>  
> cc  [~pbacsko] [~ebadger] [~Jim_Brennan]  [~ztang]  [~epayne] [~gandras]  
> [~bteke]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10701) The yarn.resource-types should support multi types without trimmed.

2021-12-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/YARN-10701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tibor Kovács reassigned YARN-10701:
---

Assignee: Tibor Kovács  (was: Qi Zhu)

> The yarn.resource-types should support multi types without trimmed.
> ---
>
> Key: YARN-10701
> URL: https://issues.apache.org/jira/browse/YARN-10701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Qi Zhu
>Assignee: Tibor Kovács
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-10701-branch-3.3.001.patch, YARN-10701.001.patch, 
> YARN-10701.002.patch
>
>
> {code:java}
> 
>  
>  yarn.resource-types
>  yarn.io/gpu, yarn.io/fpga
>  
>  {code}
>  When i configured the resource type above with gpu and fpga, the error 
> happend:
>  
> {code:java}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: ' yarn.io/fpga' is 
> not a valid resource name. A valid resource name must begin with a letter and 
> contain only letters, numbers, and any of: '.', '_', or '-'. A valid resource 
> name may also be optionally preceded by a name space followed by a slash. A 
> valid name space consists of period-separated groups of letters, numbers, and 
> dashes.{code}
>   
>  The resource types should support trim.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org