date:20190815

[jira] [Commented] (YARN-9749) TestAppLogAggregatorImpl#testDFSQuotaExceeded fails on trunk

2019-08-15 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908792#comment-16908792
 ] 

Szilard Nemeth commented on YARN-9749:
--

Thanks [~adam.antal] for this fix, commited to trunk and to branch-3.2
Thanks [~pbacsko] for the review!

> TestAppLogAggregatorImpl#testDFSQuotaExceeded fails on trunk
> 
>
> Key: YARN-9749
> URL: https://issues.apache.org/jira/browse/YARN-9749
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, test
>Reporter: Peter Bacsko
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9749.001.patch
>
>
> TestAppLogAggregatorImpl#testDFSQuotaExceeded currently fails on trunk. It 
> was most likely introduced by YARN-9676 (resetting HEAD to the previous 
> commit and then re-running the test passes).
> {noformat}
> [INFO] Running 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl
> [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.781 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl
> [ERROR] 
> testDFSQuotaExceeded(org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl)
>   Time elapsed: 2.361 s  <<< FAILURE!
> java.lang.AssertionError: The set of paths for deletion are not the same as 
> expected: actual size: 0 vs expected size: 1
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.verifyFilesToDelete(TestAppLogAggregatorImpl.java:344)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.access$000(TestAppLogAggregatorImpl.java:82)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl$1.answer(TestAppLogAggregatorImpl.java:330)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl$1.answer(TestAppLogAggregatorImpl.java:319)
>   at 
> org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:39)
>   at 
> org.mockito.internal.handler.MockHandlerImpl.handle(MockHandlerImpl.java:96)
>   at 
> org.mockito.internal.handler.NullResultGuardian.handle(NullResultGuardian.java:29)
>   at 
> org.mockito.internal.handler.InvocationNotifierHandler.handle(InvocationNotifierHandler.java:35)
>   at 
> org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:61)
>   at 
> org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:49)
>   at 
> org.mockito.internal.creation.bytebuddy.MockMethodInterceptor$DispatcherDefaultingToRealMethod.interceptSuperCallable(MockMethodInterceptor.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DeletionService$MockitoMock$1879282050.delete(Unknown
>  Source)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregationPostCleanUp(AppLogAggregatorImpl.java:556)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:476)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.testDFSQuotaExceeded(TestAppLogAggregatorImpl.java:469)
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5857) TestLogAggregationService.testFixedSizeThreadPool fails intermittently on trunk

2019-08-15 Thread Bibin A Chundatt (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908791#comment-16908791
 ] 

Bibin A Chundatt commented on YARN-5857:


Thank you [~BilwaST] for updating the patch.

+1 LGTM.  Will wait for [~adam.antal] comments too.

> TestLogAggregationService.testFixedSizeThreadPool fails intermittently on 
> trunk
> ---
>
> Key: YARN-5857
> URL: https://issues.apache.org/jira/browse/YARN-5857
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: YARN-5857-001.patch, YARN-5857-002.patch, 
> testFixedSizeThreadPool failure reproduction
>
>
> {noformat}
> testFixedSizeThreadPool(org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService)
>   Time elapsed: 0.11 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testFixedSizeThreadPool(TestLogAggregationService.java:1139)
> {noformat}
> Refer to https://builds.apache.org/job/PreCommit-YARN-Build/13829/testReport/



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5857) TestLogAggregationService.testFixedSizeThreadPool fails intermittently on trunk

2019-08-15 Thread Bilwa S T (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-5857:

Attachment: YARN-5857-002.patch

> TestLogAggregationService.testFixedSizeThreadPool fails intermittently on 
> trunk
> ---
>
> Key: YARN-5857
> URL: https://issues.apache.org/jira/browse/YARN-5857
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: YARN-5857-001.patch, YARN-5857-002.patch, 
> testFixedSizeThreadPool failure reproduction
>
>
> {noformat}
> testFixedSizeThreadPool(org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService)
>   Time elapsed: 0.11 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testFixedSizeThreadPool(TestLogAggregationService.java:1139)
> {noformat}
> Refer to https://builds.apache.org/job/PreCommit-YARN-Build/13829/testReport/



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-2599) Standby RM should also expose some jmx and metrics

2019-08-15 Thread Sunil Govindan (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan updated YARN-2599:
-
Release Note: YARN /jmx URL end points will be accessible per resource 
manager process. Hence there will not be any redirection to active resource 
manager while accessing /jmx endpoints.

> Standby RM should also expose some jmx and metrics
> --
>
> Key: YARN-2599
> URL: https://issues.apache.org/jira/browse/YARN-2599
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.1, 2.7.3, 3.0.0-alpha1
>Reporter: Karthik Kambatla
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-2599.002.patch, YARN-2599.patch
>
>
> YARN-1898 redirects jmx and metrics to the Active. As discussed there, we 
> need to separate out metrics displayed so the Standby RM can also be 
> monitored. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5857) TestLogAggregationService.testFixedSizeThreadPool fails intermittently on trunk

2019-08-15 Thread Bilwa S T (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-5857:

Attachment: (was: YARN-5857-002.patch)

> TestLogAggregationService.testFixedSizeThreadPool fails intermittently on 
> trunk
> ---
>
> Key: YARN-5857
> URL: https://issues.apache.org/jira/browse/YARN-5857
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: YARN-5857-001.patch, testFixedSizeThreadPool failure 
> reproduction
>
>
> {noformat}
> testFixedSizeThreadPool(org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService)
>   Time elapsed: 0.11 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testFixedSizeThreadPool(TestLogAggregationService.java:1139)
> {noformat}
> Refer to https://builds.apache.org/job/PreCommit-YARN-Build/13829/testReport/



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2599) Standby RM should also expose some jmx and metrics

2019-08-15 Thread Sunil Govindan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908779#comment-16908779
 ] 

Sunil Govindan commented on YARN-2599:
--

cc [~cheersyang] New patch is added.

> Standby RM should also expose some jmx and metrics
> --
>
> Key: YARN-2599
> URL: https://issues.apache.org/jira/browse/YARN-2599
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.1, 2.7.3, 3.0.0-alpha1
>Reporter: Karthik Kambatla
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-2599.002.patch, YARN-2599.patch
>
>
> YARN-1898 redirects jmx and metrics to the Active. As discussed there, we 
> need to separate out metrics displayed so the Standby RM can also be 
> monitored. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-2599) Standby RM should also expose some jmx and metrics

2019-08-15 Thread Sunil Govindan (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan updated YARN-2599:
-
Attachment: YARN-2599.002.patch

> Standby RM should also expose some jmx and metrics
> --
>
> Key: YARN-2599
> URL: https://issues.apache.org/jira/browse/YARN-2599
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.1, 2.7.3, 3.0.0-alpha1
>Reporter: Karthik Kambatla
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-2599.002.patch, YARN-2599.patch
>
>
> YARN-1898 redirects jmx and metrics to the Active. As discussed there, we 
> need to separate out metrics displayed so the Standby RM can also be 
> monitored. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5857) TestLogAggregationService.testFixedSizeThreadPool fails intermittently on trunk

2019-08-15 Thread Bilwa S T (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908771#comment-16908771
 ] 

Bilwa S T commented on YARN-5857:
-

Hi [~adam.antal] latch-countDown should be put before tryLock as tryLock 
retries for 35secs. 

> TestLogAggregationService.testFixedSizeThreadPool fails intermittently on 
> trunk
> ---
>
> Key: YARN-5857
> URL: https://issues.apache.org/jira/browse/YARN-5857
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: YARN-5857-001.patch, YARN-5857-002.patch, 
> testFixedSizeThreadPool failure reproduction
>
>
> {noformat}
> testFixedSizeThreadPool(org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService)
>   Time elapsed: 0.11 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testFixedSizeThreadPool(TestLogAggregationService.java:1139)
> {noformat}
> Refer to https://builds.apache.org/job/PreCommit-YARN-Build/13829/testReport/



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5857) TestLogAggregationService.testFixedSizeThreadPool fails intermittently on trunk

2019-08-15 Thread Bilwa S T (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-5857:

Attachment: YARN-5857-002.patch

> TestLogAggregationService.testFixedSizeThreadPool fails intermittently on 
> trunk
> ---
>
> Key: YARN-5857
> URL: https://issues.apache.org/jira/browse/YARN-5857
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: YARN-5857-001.patch, YARN-5857-002.patch, 
> testFixedSizeThreadPool failure reproduction
>
>
> {noformat}
> testFixedSizeThreadPool(org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService)
>   Time elapsed: 0.11 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testFixedSizeThreadPool(TestLogAggregationService.java:1139)
> {noformat}
> Refer to https://builds.apache.org/job/PreCommit-YARN-Build/13829/testReport/



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-9753) Cache Pre-Priming

2019-08-15 Thread Akash R Nilugal (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash R Nilugal resolved YARN-9753.
---
Resolution: Invalid

created issue in wrong project

> Cache Pre-Priming
> -
>
> Key: YARN-9753
> URL: https://issues.apache.org/jira/browse/YARN-9753
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akash R Nilugal
>Priority: Major
>
> Currently, we have an index server which basically helps in distributed 
> caching of the datamaps in a separate spark application.
> The caching of the datamaps in index server will start once the query is 
> fired on the table for the first time, all the datamaps will be loaded
> if the count(*) is fired and only required will be loaded for any filter 
> query.
> Here the problem or the bottleneck is, until and unless the query is fired on 
> table, the caching won’t be done for the table datamaps.
> So consider a scenario where we are just loading the data to table for whole 
> day and then next day we query,
> so all the segments will start loading into cache. So first time the query 
> will be slow.
> What if we load the datamaps into cache or preprime the cache without 
> waititng for any query on the table?
> Yes, what if we load the cache after every load is done, what if we load the 
> cache for all the segments at once,
> so that first time query need not do all this job, which makes it faster.
> Here i have attached the design document for the pre-priming of cache into 
> index server. Please have a look at it



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-9753) Cache Pre-Priming

2019-08-15 Thread Akash R Nilugal (JIRA)

Akash R Nilugal created YARN-9753:
-

 Summary: Cache Pre-Priming
 Key: YARN-9753
 URL: https://issues.apache.org/jira/browse/YARN-9753
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Akash R Nilugal


Currently, we have an index server which basically helps in distributed caching 
of the datamaps in a separate spark application.

The caching of the datamaps in index server will start once the query is fired 
on the table for the first time, all the datamaps will be loaded

if the count(*) is fired and only required will be loaded for any filter query.



Here the problem or the bottleneck is, until and unless the query is fired on 
table, the caching won’t be done for the table datamaps.

So consider a scenario where we are just loading the data to table for whole 
day and then next day we query,

so all the segments will start loading into cache. So first time the query will 
be slow.



What if we load the datamaps into cache or preprime the cache without waititng 
for any query on the table?

Yes, what if we load the cache after every load is done, what if we load the 
cache for all the segments at once,

so that first time query need not do all this job, which makes it faster.



Here i have attached the design document for the pre-priming of cache into 
index server. Please have a look at it



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-9752) Add support for allocation id in SLS.

2019-08-15 Thread Abhishek Modi (JIRA)

Abhishek Modi created YARN-9752:
---

 Summary: Add support for allocation id in SLS.
 Key: YARN-9752
 URL: https://issues.apache.org/jira/browse/YARN-9752
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Abhishek Modi
Assignee: Abhishek Modi






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources

2019-08-15 Thread wangchengwei (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908644#comment-16908644
 ] 

wangchengwei commented on YARN-9616:


[~wzzdreamer] it seems I can't upload my patch...

> Shared Cache Manager Failed To Upload Unpacked Resources
> 
>
> Key: YARN-9616
> URL: https://issues.apache.org/jira/browse/YARN-9616
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.3, 2.9.2, 2.8.5
>Reporter: zhenzhao wang
>Assignee: zhenzhao wang
>Priority: Major
> Attachments: YARN-9616.001-2.9.patch
>
>
> Yarn will unpack archives files and some other files based on the file type 
> and configuration. E.g. 
>  If I started an MR job with -archive one.zip, then the one.zip will be 
> unpacked while download. Let's say there're file1 && file2 inside one.zip. 
> Then the files kept on local disk will be like 
> /disk3/yarn/local/filecache/352/one.zip/file1 
> and/disk3/yarn/local/filecache/352/one.zip/file2. So the shared cache 
> uploader couldn't upload one.zip to shared cache as it was removed during 
> localization. The following errors will be thrown.
> {code:java}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader:
>  Exception while uploading the file dict.zip
> java.io.FileNotFoundException: File 
> /disk3/yarn/local/filecache/352/one.zip/one.zip does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:631)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:857)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:621)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:926)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.computeChecksum(SharedCacheUploader.java:257)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:55)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-9751) Separate queue and app ordering policy capacity scheduler configs

2019-08-15 Thread Jonathan Hung (JIRA)

Jonathan Hung created YARN-9751:
---

 Summary: Separate queue and app ordering policy capacity scheduler 
configs
 Key: YARN-9751
 URL: https://issues.apache.org/jira/browse/YARN-9751
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Jonathan Hung


Right now it's not possible to specify distinct app and queue ordering policies 
since they share the same {{ordering-policy}} suffix.

There's already a TODO in CapacitySchedulerConfiguration for this. This Jira 
intends to fix it.
{noformat}
// TODO (wangda): We need to better distinguish app ordering policy and queue
// ordering policy's classname / configuration options, etc. And dedup code
// if possible.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9749) TestAppLogAggregatorImpl#testDFSQuotaExceeded fails on trunk

2019-08-15 Thread Peter Bacsko (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908384#comment-16908384
 ] 

Peter Bacsko commented on YARN-9749:


+1 non-binding

> TestAppLogAggregatorImpl#testDFSQuotaExceeded fails on trunk
> 
>
> Key: YARN-9749
> URL: https://issues.apache.org/jira/browse/YARN-9749
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, test
>Reporter: Peter Bacsko
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9749.001.patch
>
>
> TestAppLogAggregatorImpl#testDFSQuotaExceeded currently fails on trunk. It 
> was most likely introduced by YARN-9676 (resetting HEAD to the previous 
> commit and then re-running the test passes).
> {noformat}
> [INFO] Running 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl
> [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.781 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl
> [ERROR] 
> testDFSQuotaExceeded(org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl)
>   Time elapsed: 2.361 s  <<< FAILURE!
> java.lang.AssertionError: The set of paths for deletion are not the same as 
> expected: actual size: 0 vs expected size: 1
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.verifyFilesToDelete(TestAppLogAggregatorImpl.java:344)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.access$000(TestAppLogAggregatorImpl.java:82)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl$1.answer(TestAppLogAggregatorImpl.java:330)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl$1.answer(TestAppLogAggregatorImpl.java:319)
>   at 
> org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:39)
>   at 
> org.mockito.internal.handler.MockHandlerImpl.handle(MockHandlerImpl.java:96)
>   at 
> org.mockito.internal.handler.NullResultGuardian.handle(NullResultGuardian.java:29)
>   at 
> org.mockito.internal.handler.InvocationNotifierHandler.handle(InvocationNotifierHandler.java:35)
>   at 
> org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:61)
>   at 
> org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:49)
>   at 
> org.mockito.internal.creation.bytebuddy.MockMethodInterceptor$DispatcherDefaultingToRealMethod.interceptSuperCallable(MockMethodInterceptor.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DeletionService$MockitoMock$1879282050.delete(Unknown
>  Source)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregationPostCleanUp(AppLogAggregatorImpl.java:556)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:476)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.testDFSQuotaExceeded(TestAppLogAggregatorImpl.java:469)
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9100) Add tests for GpuResourceAllocator and do minor code cleanup

2019-08-15 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908372#comment-16908372
 ] 

Hadoop QA commented on YARN-9100:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 29s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 0 new + 10 unchanged - 6 fixed = 10 total (was 16) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 25s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 50s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 75m 51s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9100 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12977719/YARN-9100-009.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c8c7f02d8e60 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / c801f7a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/24575/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24575/testReport/ |
| Max. process+thread count | 465 (vs. ulim

[jira] [Commented] (YARN-9461) TestRMWebServicesDelegationTokenAuthentication.testCancelledDelegationToken fails with HTTP 400

2019-08-15 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908337#comment-16908337
 ] 

Hadoop QA commented on YARN-9461:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  6s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 26s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 86m 
58s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}136m 41s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9461 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12965202/YARN-9461-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 08a5de39895a 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 3f4f097 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24573/testReport/ |
| Max. process+thread count | 873 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24573/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> TestRMWebServicesDelegationTokenAuthenti

[jira] [Commented] (YARN-9749) TestAppLogAggregatorImpl#testDFSQuotaExceeded fails on trunk

2019-08-15 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908310#comment-16908310
 ] 

Hadoop QA commented on YARN-9749:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
 0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 45s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 12s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 
56s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 70m 46s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9749 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12977717/YARN-9749.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux fde964138990 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / c801f7a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24574/testReport/ |
| Max. process+thread count | 412 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24574/console |
| Powered by | Apache Yetus 0.8.0

[jira] [Updated] (YARN-9100) Add tests for GpuResourceAllocator and do minor code cleanup

2019-08-15 Thread Peter Bacsko (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9100:
---
Attachment: YARN-9100-009.patch

> Add tests for GpuResourceAllocator and do minor code cleanup
> 
>
> Key: YARN-9100
> URL: https://issues.apache.org/jira/browse/YARN-9100
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9100-004.patch, YARN-9100-005.patch, 
> YARN-9100-006.patch, YARN-9100-007.patch, YARN-9100-008.patch, 
> YARN-9100-009.patch, YARN-9100.001.patch, YARN-9100.002.patch, 
> YARN-9100.003.patch
>
>
> Add tests for GpuResourceAllocator and do minor code cleanup
> - Improved log and exception messages
> - Added some new debug logs
> - Some methods are named like *Copy, these are returning copies of internal 
> data structures. The word "copy" is just a noise in their name, so they have 
> been renamed. Additionally, the copied data structures modified to be 
> immutable.
> - The waiting loop in method assignGpus were decoupled into a new class, 
> RetryCommand. 
> Some more words about the new class RetryCommand: 
> There are some similar waiting loops in the code in: AMRMClient, 
> AMRMClientAsync and even in GenericTestUtils (see waitFor method). 
> RetryCommand could be a future replacement of these duplicated code, as it 
> gives a solution to this waiting loop problem in a generic way.
> The only downside of the usage of RetryCommand in GpuResourceAllocator 
> (startGpuAssignmentLoop) is the ugly exception handling part, but that's 
> solely because how Java deals with checked exceptions vs. lambdas. If there's 
> a cleaner way to solve the exception handling, I'm open for any suggestions.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9679) Regular code cleanup in TestResourcePluginManager

2019-08-15 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908261#comment-16908261
 ] 

Szilard Nemeth commented on YARN-9679:
--

Thanks [~adam.antal], resolved it!

> Regular code cleanup in TestResourcePluginManager
> -
>
> Key: YARN-9679
> URL: https://issues.apache.org/jira/browse/YARN-9679
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Adam Antal
>Priority: Major
>  Labels: newbie
> Fix For: 3.3.0
>
>
> There are several things could be cleaned up in this class: 
> 1. stubResourcePluginmanager should be private.
> 2. In tearDown, the result of dest.delete() should be checked
> 3. In class CustomizedResourceHandler, there are several methods where 
> exceptions decalarations are unnecessary.
> 4. Class MyMockNM should be renamed to some more meaningful name.
> 5. There are some danling javadoc comments, for example: 
> {code:java}
> /*
>* Make sure ResourcePluginManager is initialized during NM start up.
>*/
> {code}
> 6. There are some exceptions unnecessarily declared on test methods but they 
> are never thrown, an example: 
> testLinuxContainerExecutorWithResourcePluginsEnabled
> 7. Assert.assertTrue(false); expressions should be replaced with Assert.fail()
> 8. A handful of usages of Mockito's spy method. This method is not preferred 
> so we should think about replacing it with mocks, somehow.
> The rest can be figured out by whoever takes this jira :) 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9679) Regular code cleanup in TestResourcePluginManager

2019-08-15 Thread Adam Antal (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908259#comment-16908259
 ] 

Adam Antal commented on YARN-9679:
--

Thank you for the commit [~snemeth]!

Half of the class looks different / missing on branch-3.2. I see no gain in 
backporting it. The issue can be resolved.

> Regular code cleanup in TestResourcePluginManager
> -
>
> Key: YARN-9679
> URL: https://issues.apache.org/jira/browse/YARN-9679
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Adam Antal
>Priority: Major
>  Labels: newbie
> Fix For: 3.3.0
>
>
> There are several things could be cleaned up in this class: 
> 1. stubResourcePluginmanager should be private.
> 2. In tearDown, the result of dest.delete() should be checked
> 3. In class CustomizedResourceHandler, there are several methods where 
> exceptions decalarations are unnecessary.
> 4. Class MyMockNM should be renamed to some more meaningful name.
> 5. There are some danling javadoc comments, for example: 
> {code:java}
> /*
>* Make sure ResourcePluginManager is initialized during NM start up.
>*/
> {code}
> 6. There are some exceptions unnecessarily declared on test methods but they 
> are never thrown, an example: 
> testLinuxContainerExecutorWithResourcePluginsEnabled
> 7. Assert.assertTrue(false); expressions should be replaced with Assert.fail()
> 8. A handful of usages of Mockito's spy method. This method is not preferred 
> so we should think about replacing it with mocks, somehow.
> The rest can be figured out by whoever takes this jira :) 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9749) TestAppLogAggregatorImpl#testDFSQuotaExceeded fails on trunk

2019-08-15 Thread Adam Antal (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal updated YARN-9749:
-
Attachment: YARN-9749.001.patch

> TestAppLogAggregatorImpl#testDFSQuotaExceeded fails on trunk
> 
>
> Key: YARN-9749
> URL: https://issues.apache.org/jira/browse/YARN-9749
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, test
>Reporter: Peter Bacsko
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9749.001.patch
>
>
> TestAppLogAggregatorImpl#testDFSQuotaExceeded currently fails on trunk. It 
> was most likely introduced by YARN-9676 (resetting HEAD to the previous 
> commit and then re-running the test passes).
> {noformat}
> [INFO] Running 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl
> [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.781 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl
> [ERROR] 
> testDFSQuotaExceeded(org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl)
>   Time elapsed: 2.361 s  <<< FAILURE!
> java.lang.AssertionError: The set of paths for deletion are not the same as 
> expected: actual size: 0 vs expected size: 1
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.verifyFilesToDelete(TestAppLogAggregatorImpl.java:344)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.access$000(TestAppLogAggregatorImpl.java:82)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl$1.answer(TestAppLogAggregatorImpl.java:330)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl$1.answer(TestAppLogAggregatorImpl.java:319)
>   at 
> org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:39)
>   at 
> org.mockito.internal.handler.MockHandlerImpl.handle(MockHandlerImpl.java:96)
>   at 
> org.mockito.internal.handler.NullResultGuardian.handle(NullResultGuardian.java:29)
>   at 
> org.mockito.internal.handler.InvocationNotifierHandler.handle(InvocationNotifierHandler.java:35)
>   at 
> org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:61)
>   at 
> org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:49)
>   at 
> org.mockito.internal.creation.bytebuddy.MockMethodInterceptor$DispatcherDefaultingToRealMethod.interceptSuperCallable(MockMethodInterceptor.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DeletionService$MockitoMock$1879282050.delete(Unknown
>  Source)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregationPostCleanUp(AppLogAggregatorImpl.java:556)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:476)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.testDFSQuotaExceeded(TestAppLogAggregatorImpl.java:469)
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9749) TestAppLogAggregatorImpl#testDFSQuotaExceeded fails on trunk

2019-08-15 Thread Adam Antal (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908227#comment-16908227
 ] 

Adam Antal commented on YARN-9749:
--

Uploaded patch v1 with the straightforward fix. I revisited the patch looking 
for other of this nasty NPEs but the other parts are fine.

> TestAppLogAggregatorImpl#testDFSQuotaExceeded fails on trunk
> 
>
> Key: YARN-9749
> URL: https://issues.apache.org/jira/browse/YARN-9749
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, test
>Reporter: Peter Bacsko
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9749.001.patch
>
>
> TestAppLogAggregatorImpl#testDFSQuotaExceeded currently fails on trunk. It 
> was most likely introduced by YARN-9676 (resetting HEAD to the previous 
> commit and then re-running the test passes).
> {noformat}
> [INFO] Running 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl
> [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.781 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl
> [ERROR] 
> testDFSQuotaExceeded(org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl)
>   Time elapsed: 2.361 s  <<< FAILURE!
> java.lang.AssertionError: The set of paths for deletion are not the same as 
> expected: actual size: 0 vs expected size: 1
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.verifyFilesToDelete(TestAppLogAggregatorImpl.java:344)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.access$000(TestAppLogAggregatorImpl.java:82)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl$1.answer(TestAppLogAggregatorImpl.java:330)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl$1.answer(TestAppLogAggregatorImpl.java:319)
>   at 
> org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:39)
>   at 
> org.mockito.internal.handler.MockHandlerImpl.handle(MockHandlerImpl.java:96)
>   at 
> org.mockito.internal.handler.NullResultGuardian.handle(NullResultGuardian.java:29)
>   at 
> org.mockito.internal.handler.InvocationNotifierHandler.handle(InvocationNotifierHandler.java:35)
>   at 
> org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:61)
>   at 
> org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:49)
>   at 
> org.mockito.internal.creation.bytebuddy.MockMethodInterceptor$DispatcherDefaultingToRealMethod.interceptSuperCallable(MockMethodInterceptor.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DeletionService$MockitoMock$1879282050.delete(Unknown
>  Source)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregationPostCleanUp(AppLogAggregatorImpl.java:556)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:476)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.testDFSQuotaExceeded(TestAppLogAggregatorImpl.java:469)
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9749) TestAppLogAggregatorImpl#testDFSQuotaExceeded fails on trunk

2019-08-15 Thread Adam Antal (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908222#comment-16908222
 ] 

Adam Antal commented on YARN-9749:
--

Sooo my patch actually introduced a bug in the code.
When diagnosticMessage.isEmpty() is called (introduced by YARN-9676) it turns 
out that the message is null. By default it has been set to "" (empty String), 
thus I didn't recognise it can be an issue.

The following line causes the NPE:
{noformat}
} catch (Exception e) {
diagnosticMessage = e.getMessage();
...  
}
{noformat}
We cannot assume that an exception has always proper message text. I'll upload 
a fix soon.

> TestAppLogAggregatorImpl#testDFSQuotaExceeded fails on trunk
> 
>
> Key: YARN-9749
> URL: https://issues.apache.org/jira/browse/YARN-9749
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, test
>Reporter: Peter Bacsko
>Assignee: Adam Antal
>Priority: Major
>
> TestAppLogAggregatorImpl#testDFSQuotaExceeded currently fails on trunk. It 
> was most likely introduced by YARN-9676 (resetting HEAD to the previous 
> commit and then re-running the test passes).
> {noformat}
> [INFO] Running 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl
> [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.781 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl
> [ERROR] 
> testDFSQuotaExceeded(org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl)
>   Time elapsed: 2.361 s  <<< FAILURE!
> java.lang.AssertionError: The set of paths for deletion are not the same as 
> expected: actual size: 0 vs expected size: 1
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.verifyFilesToDelete(TestAppLogAggregatorImpl.java:344)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.access$000(TestAppLogAggregatorImpl.java:82)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl$1.answer(TestAppLogAggregatorImpl.java:330)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl$1.answer(TestAppLogAggregatorImpl.java:319)
>   at 
> org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:39)
>   at 
> org.mockito.internal.handler.MockHandlerImpl.handle(MockHandlerImpl.java:96)
>   at 
> org.mockito.internal.handler.NullResultGuardian.handle(NullResultGuardian.java:29)
>   at 
> org.mockito.internal.handler.InvocationNotifierHandler.handle(InvocationNotifierHandler.java:35)
>   at 
> org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:61)
>   at 
> org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:49)
>   at 
> org.mockito.internal.creation.bytebuddy.MockMethodInterceptor$DispatcherDefaultingToRealMethod.interceptSuperCallable(MockMethodInterceptor.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DeletionService$MockitoMock$1879282050.delete(Unknown
>  Source)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregationPostCleanUp(AppLogAggregatorImpl.java:556)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:476)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.testDFSQuotaExceeded(TestAppLogAggregatorImpl.java:469)
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9488) Skip YARNFeatureNotEnabledException from ClientRMService

2019-08-15 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908208#comment-16908208
 ] 

Hudson commented on YARN-9488:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17130 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17130/])
YARN-9488. Skip YARNFeatureNotEnabledException from ClientRMService. (snemeth: 
rev 1845a83cec6563482523d8c34b38c4e36c0aa9df)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java


> Skip YARNFeatureNotEnabledException from ClientRMService
> 
>
> Key: YARN-9488
> URL: https://issues.apache.org/jira/browse/YARN-9488
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9488-001.patch, YARN-9488-002.patch, 
> YARN-9488-002.patch
>
>
> RM logs are accumulated with YARNFeatureNotEnabledException when running 
> Distributed Shell jobs while {{ClientRMService#getResourceProfiles}}
> {code}
> 2019-04-16 07:10:47,699 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 0 on 8050, call Call#5 Retry#0 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getResourceProfiles 
> from 172.26.81.91:41198
> org.apache.hadoop.yarn.exceptions.YARNFeatureNotEnabledException: Resource 
> profile is not enabled, please enable resource profile feature before using 
> its functions. (by setting yarn.resourcemanager.resource-profiles.enabled to 
> true)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.resource.ResourceProfilesManagerImpl.checkAndThrowExceptionWhenFeatureDisabled(ResourceProfilesManagerImpl.java:191)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.resource.ResourceProfilesManagerImpl.getResourceProfiles(ResourceProfilesManagerImpl.java:214)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getResourceProfiles(ClientRMService.java:1833)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getResourceProfiles(ApplicationClientProtocolPBServiceImpl.java:670)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:665)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9679) Regular code cleanup in TestResourcePluginManager

2019-08-15 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908209#comment-16908209
 ] 

Hudson commented on YARN-9679:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17130 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17130/])
YARN-9679. Regular code cleanup in TestResourcePluginManager (#1122) 
(954799+szilard-nemeth: rev 22c4f38c4b005a70c9b95d8aaa350763aaec5c5e)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/TestResourcePluginManager.java


> Regular code cleanup in TestResourcePluginManager
> -
>
> Key: YARN-9679
> URL: https://issues.apache.org/jira/browse/YARN-9679
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Adam Antal
>Priority: Major
>  Labels: newbie
> Fix For: 3.3.0
>
>
> There are several things could be cleaned up in this class: 
> 1. stubResourcePluginmanager should be private.
> 2. In tearDown, the result of dest.delete() should be checked
> 3. In class CustomizedResourceHandler, there are several methods where 
> exceptions decalarations are unnecessary.
> 4. Class MyMockNM should be renamed to some more meaningful name.
> 5. There are some danling javadoc comments, for example: 
> {code:java}
> /*
>* Make sure ResourcePluginManager is initialized during NM start up.
>*/
> {code}
> 6. There are some exceptions unnecessarily declared on test methods but they 
> are never thrown, an example: 
> testLinuxContainerExecutorWithResourcePluginsEnabled
> 7. Assert.assertTrue(false); expressions should be replaced with Assert.fail()
> 8. A handful of usages of Mockito's spy method. This method is not preferred 
> so we should think about replacing it with mocks, somehow.
> The rest can be figured out by whoever takes this jira :) 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9679) Regular code cleanup in TestResourcePluginManager

2019-08-15 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908198#comment-16908198
 ] 

Szilard Nemeth commented on YARN-9679:
--

Hi [~adam.antal]!
As told on github, I merged your patch to trunk.
Could you please provide branch-3.2 / branch-3.1 patches as well?
I tried to cherry-pick the trunk patch to 3.2 but it had many conflicts.

Thanks!

> Regular code cleanup in TestResourcePluginManager
> -
>
> Key: YARN-9679
> URL: https://issues.apache.org/jira/browse/YARN-9679
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Adam Antal
>Priority: Major
>  Labels: newbie
> Fix For: 3.3.0
>
>
> There are several things could be cleaned up in this class: 
> 1. stubResourcePluginmanager should be private.
> 2. In tearDown, the result of dest.delete() should be checked
> 3. In class CustomizedResourceHandler, there are several methods where 
> exceptions decalarations are unnecessary.
> 4. Class MyMockNM should be renamed to some more meaningful name.
> 5. There are some danling javadoc comments, for example: 
> {code:java}
> /*
>* Make sure ResourcePluginManager is initialized during NM start up.
>*/
> {code}
> 6. There are some exceptions unnecessarily declared on test methods but they 
> are never thrown, an example: 
> testLinuxContainerExecutorWithResourcePluginsEnabled
> 7. Assert.assertTrue(false); expressions should be replaced with Assert.fail()
> 8. A handful of usages of Mockito's spy method. This method is not preferred 
> so we should think about replacing it with mocks, somehow.
> The rest can be figured out by whoever takes this jira :) 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9679) Regular code cleanup in TestResourcePluginManager

2019-08-15 Thread Szilard Nemeth (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-9679:
-
Fix Version/s: 3.3.0

> Regular code cleanup in TestResourcePluginManager
> -
>
> Key: YARN-9679
> URL: https://issues.apache.org/jira/browse/YARN-9679
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Adam Antal
>Priority: Major
>  Labels: newbie
> Fix For: 3.3.0
>
>
> There are several things could be cleaned up in this class: 
> 1. stubResourcePluginmanager should be private.
> 2. In tearDown, the result of dest.delete() should be checked
> 3. In class CustomizedResourceHandler, there are several methods where 
> exceptions decalarations are unnecessary.
> 4. Class MyMockNM should be renamed to some more meaningful name.
> 5. There are some danling javadoc comments, for example: 
> {code:java}
> /*
>* Make sure ResourcePluginManager is initialized during NM start up.
>*/
> {code}
> 6. There are some exceptions unnecessarily declared on test methods but they 
> are never thrown, an example: 
> testLinuxContainerExecutorWithResourcePluginsEnabled
> 7. Assert.assertTrue(false); expressions should be replaced with Assert.fail()
> 8. A handful of usages of Mockito's spy method. This method is not preferred 
> so we should think about replacing it with mocks, somehow.
> The rest can be figured out by whoever takes this jira :) 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-9750) TestAppLogAggregatorImpl.verifyFilesToDelete fails

2019-08-15 Thread Prabhu Joseph (JIRA)

Prabhu Joseph created YARN-9750:
---

 Summary: TestAppLogAggregatorImpl.verifyFilesToDelete fails
 Key: YARN-9750
 URL: https://issues.apache.org/jira/browse/YARN-9750
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation, test
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


*TestAppLogAggregatorImpl.verifyFilesToDelete fails*

{code}
[ERROR] 
testDFSQuotaExceeded(org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl)
  Time elapsed: 2.252 s  <<< FAILURE!
java.lang.AssertionError: The set of paths for deletion are not the same as 
expected: actual size: 0 vs expected size: 1
at org.junit.Assert.fail(Assert.java:88)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.verifyFilesToDelete(TestAppLogAggregatorImpl.java:344)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.access$000(TestAppLogAggregatorImpl.java:82)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl$1.answer(TestAppLogAggregatorImpl.java:330)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl$1.answer(TestAppLogAggregatorImpl.java:319)
at 
org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:39)
at 
org.mockito.internal.handler.MockHandlerImpl.handle(MockHandlerImpl.java:96)
at 
org.mockito.internal.handler.NullResultGuardian.handle(NullResultGuardian.java:29)
at 
org.mockito.internal.handler.InvocationNotifierHandler.handle(InvocationNotifierHandler.java:35)
at 
org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:61)
at 
org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:49)
at 
org.mockito.internal.creation.bytebuddy.MockMethodInterceptor$DispatcherDefaultingToRealMethod.interceptSuperCallable(MockMethodInterceptor.java:108)
at 
org.apache.hadoop.yarn.server.nodemanager.DeletionService$MockitoMock$1136724178.delete(Unknown
 Source)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregationPostCleanUp(AppLogAggregatorImpl.java:556)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:476)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.testDFSQuotaExceeded(TestAppLogAggregatorImpl.java:469)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.boo

[jira] [Assigned] (YARN-9748) Allow capacity-scheduler configuration on HDFS

2019-08-15 Thread Prabhu Joseph (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph reassigned YARN-9748:
---

Assignee: Prabhu Joseph

> Allow capacity-scheduler configuration on HDFS
> --
>
> Key: YARN-9748
> URL: https://issues.apache.org/jira/browse/YARN-9748
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 3.1.2
>Reporter: zhoukang
>Assignee: Prabhu Joseph
>Priority: Major
>
> Improvement:
> Support auto reload from hdfs



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9748) Allow capacity-scheduler configuration on HDFS

2019-08-15 Thread Prabhu Joseph (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908188#comment-16908188
 ] 

Prabhu Joseph commented on YARN-9748:
-

Thanks [~cane]. Please feel free to reassign if you want to share the same code 
here.

> Allow capacity-scheduler configuration on HDFS
> --
>
> Key: YARN-9748
> URL: https://issues.apache.org/jira/browse/YARN-9748
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 3.1.2
>Reporter: zhoukang
>Priority: Major
>
> Improvement:
> Support auto reload from hdfs



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9749) TestAppLogAggregatorImpl#testDFSQuotaExceeded fails on trunk

2019-08-15 Thread Adam Antal (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908186#comment-16908186
 ] 

Adam Antal commented on YARN-9749:
--

Thanks for catching this [~pbacsko]! Will look into this shortly.

> TestAppLogAggregatorImpl#testDFSQuotaExceeded fails on trunk
> 
>
> Key: YARN-9749
> URL: https://issues.apache.org/jira/browse/YARN-9749
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, test
>Reporter: Peter Bacsko
>Assignee: Adam Antal
>Priority: Major
>
> TestAppLogAggregatorImpl#testDFSQuotaExceeded currently fails on trunk. It 
> was most likely introduced by YARN-9676 (resetting HEAD to the previous 
> commit and then re-running the test passes).
> {noformat}
> [INFO] Running 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl
> [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.781 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl
> [ERROR] 
> testDFSQuotaExceeded(org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl)
>   Time elapsed: 2.361 s  <<< FAILURE!
> java.lang.AssertionError: The set of paths for deletion are not the same as 
> expected: actual size: 0 vs expected size: 1
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.verifyFilesToDelete(TestAppLogAggregatorImpl.java:344)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.access$000(TestAppLogAggregatorImpl.java:82)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl$1.answer(TestAppLogAggregatorImpl.java:330)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl$1.answer(TestAppLogAggregatorImpl.java:319)
>   at 
> org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:39)
>   at 
> org.mockito.internal.handler.MockHandlerImpl.handle(MockHandlerImpl.java:96)
>   at 
> org.mockito.internal.handler.NullResultGuardian.handle(NullResultGuardian.java:29)
>   at 
> org.mockito.internal.handler.InvocationNotifierHandler.handle(InvocationNotifierHandler.java:35)
>   at 
> org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:61)
>   at 
> org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:49)
>   at 
> org.mockito.internal.creation.bytebuddy.MockMethodInterceptor$DispatcherDefaultingToRealMethod.interceptSuperCallable(MockMethodInterceptor.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DeletionService$MockitoMock$1879282050.delete(Unknown
>  Source)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregationPostCleanUp(AppLogAggregatorImpl.java:556)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:476)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.testDFSQuotaExceeded(TestAppLogAggregatorImpl.java:469)
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9488) Skip YARNFeatureNotEnabledException from ClientRMService

2019-08-15 Thread Prabhu Joseph (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908185#comment-16908185
 ] 

Prabhu Joseph commented on YARN-9488:
-

Thanks [~snemeth].

> Skip YARNFeatureNotEnabledException from ClientRMService
> 
>
> Key: YARN-9488
> URL: https://issues.apache.org/jira/browse/YARN-9488
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9488-001.patch, YARN-9488-002.patch, 
> YARN-9488-002.patch
>
>
> RM logs are accumulated with YARNFeatureNotEnabledException when running 
> Distributed Shell jobs while {{ClientRMService#getResourceProfiles}}
> {code}
> 2019-04-16 07:10:47,699 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 0 on 8050, call Call#5 Retry#0 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getResourceProfiles 
> from 172.26.81.91:41198
> org.apache.hadoop.yarn.exceptions.YARNFeatureNotEnabledException: Resource 
> profile is not enabled, please enable resource profile feature before using 
> its functions. (by setting yarn.resourcemanager.resource-profiles.enabled to 
> true)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.resource.ResourceProfilesManagerImpl.checkAndThrowExceptionWhenFeatureDisabled(ResourceProfilesManagerImpl.java:191)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.resource.ResourceProfilesManagerImpl.getResourceProfiles(ResourceProfilesManagerImpl.java:214)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getResourceProfiles(ClientRMService.java:1833)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getResourceProfiles(ApplicationClientProtocolPBServiceImpl.java:670)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:665)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9743) [JDK11] TestTimelineWebServices.testContextFactory fails

2019-08-15 Thread Prabhu Joseph (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908183#comment-16908183
 ] 

Prabhu Joseph commented on YARN-9743:
-

[~adam.antal] Looks jdk11 has removed classes from jaxbl-impl which has 
"com.sun.xml.internal.bind.v2.ContextFactory". I think we need to add 
jaxbl-impl library in the classpath to fix the same.
{code:java}
https://issues.openbravo.com/view.php?id=38900
http://openjdk.java.net/jeps/320
{code}

> [JDK11] TestTimelineWebServices.testContextFactory fails
> 
>
> Key: YARN-9743
> URL: https://issues.apache.org/jira/browse/YARN-9743
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineservice
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Priority: Major
>
> Tested on OpenJDK 11.0.2 on a Mac.
> Stack trace:
> {noformat}
> [ERROR] Tests run: 29, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 
> 36.016 s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.timeline.webapp.TestTimelineWebServices
> [ERROR] 
> testContextFactory(org.apache.hadoop.yarn.server.timeline.webapp.TestTimelineWebServices)
>   Time elapsed: 1.031 s  <<< ERROR!
> java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory
>   at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:583)
>   at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
>   at java.base/java.lang.Class.forName0(Native Method)
>   at java.base/java.lang.Class.forName(Class.java:315)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.ContextFactory.newContext(ContextFactory.java:85)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.ContextFactory.createContext(ContextFactory.java:112)
>   at 
> org.apache.hadoop.yarn.server.timeline.webapp.TestTimelineWebServices.testContextFactory(TestTimelineWebServices.java:1039)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9488) Skip YARNFeatureNotEnabledException from ClientRMService

2019-08-15 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908181#comment-16908181
 ] 

Szilard Nemeth commented on YARN-9488:
--

Thanks [~Prabhu Joseph] for this patch!
Commited it to trunk, branch-3.2 and branch-3.1.


> Skip YARNFeatureNotEnabledException from ClientRMService
> 
>
> Key: YARN-9488
> URL: https://issues.apache.org/jira/browse/YARN-9488
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: YARN-9488-001.patch, YARN-9488-002.patch, 
> YARN-9488-002.patch
>
>
> RM logs are accumulated with YARNFeatureNotEnabledException when running 
> Distributed Shell jobs while {{ClientRMService#getResourceProfiles}}
> {code}
> 2019-04-16 07:10:47,699 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 0 on 8050, call Call#5 Retry#0 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getResourceProfiles 
> from 172.26.81.91:41198
> org.apache.hadoop.yarn.exceptions.YARNFeatureNotEnabledException: Resource 
> profile is not enabled, please enable resource profile feature before using 
> its functions. (by setting yarn.resourcemanager.resource-profiles.enabled to 
> true)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.resource.ResourceProfilesManagerImpl.checkAndThrowExceptionWhenFeatureDisabled(ResourceProfilesManagerImpl.java:191)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.resource.ResourceProfilesManagerImpl.getResourceProfiles(ResourceProfilesManagerImpl.java:214)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getResourceProfiles(ClientRMService.java:1833)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getResourceProfiles(ApplicationClientProtocolPBServiceImpl.java:670)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:665)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9488) Skip YARNFeatureNotEnabledException from ClientRMService

2019-08-15 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908167#comment-16908167
 ] 

Szilard Nemeth commented on YARN-9488:
--

Hi [~Prabhu Joseph]!
+1, committing shortly

> Skip YARNFeatureNotEnabledException from ClientRMService
> 
>
> Key: YARN-9488
> URL: https://issues.apache.org/jira/browse/YARN-9488
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: YARN-9488-001.patch, YARN-9488-002.patch, 
> YARN-9488-002.patch
>
>
> RM logs are accumulated with YARNFeatureNotEnabledException when running 
> Distributed Shell jobs while {{ClientRMService#getResourceProfiles}}
> {code}
> 2019-04-16 07:10:47,699 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 0 on 8050, call Call#5 Retry#0 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getResourceProfiles 
> from 172.26.81.91:41198
> org.apache.hadoop.yarn.exceptions.YARNFeatureNotEnabledException: Resource 
> profile is not enabled, please enable resource profile feature before using 
> its functions. (by setting yarn.resourcemanager.resource-profiles.enabled to 
> true)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.resource.ResourceProfilesManagerImpl.checkAndThrowExceptionWhenFeatureDisabled(ResourceProfilesManagerImpl.java:191)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.resource.ResourceProfilesManagerImpl.getResourceProfiles(ResourceProfilesManagerImpl.java:214)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getResourceProfiles(ClientRMService.java:1833)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getResourceProfiles(ApplicationClientProtocolPBServiceImpl.java:670)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:665)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8148) Update decimal values for queue capacities shown on queue status cli

2019-08-15 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908163#comment-16908163
 ] 

Szilard Nemeth commented on YARN-8148:
--

Hi [~sunilg], [~wangda]!
I wanted to commit this then something popped into my mind: Should we worry 
about backward compatibility?
I mean, the output of the Admin CLI will change, can we commit to trunk / 3.2 
and 3.1?

Thanks!

> Update decimal values for queue capacities shown on queue status cli
> 
>
> Key: YARN-8148
> URL: https://issues.apache.org/jira/browse/YARN-8148
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 3.0.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-8148-002.patch, YARN-8148.1.patch
>
>
> Capacities are shown with two decimal values in RM UI as part of YARN-6182. 
> The queue status cli are still showing one decimal value.
> {code}
> [root@bigdata3 yarn]# yarn queue -status default
> Queue Information : 
> Queue Name : default
>   State : RUNNING
>   Capacity : 69.9%
>   Current Capacity : .0%
>   Maximum Capacity : 70.0%
>   Default Node Label expression : 
>   Accessible Node Labels : *
>   Preemption : enabled
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9488) Skip YARNFeatureNotEnabledException from ClientRMService

2019-08-15 Thread Szilard Nemeth (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-9488:
-
Description: 
RM logs are accumulated with YARNFeatureNotEnabledException when running 
Distributed Shell jobs while {{ClientRMService#getResourceProfiles}}

{code}
2019-04-16 07:10:47,699 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 
on 8050, call Call#5 Retry#0 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getResourceProfiles from 
172.26.81.91:41198
org.apache.hadoop.yarn.exceptions.YARNFeatureNotEnabledException: Resource 
profile is not enabled, please enable resource profile feature before using its 
functions. (by setting yarn.resourcemanager.resource-profiles.enabled to true)
at 
org.apache.hadoop.yarn.server.resourcemanager.resource.ResourceProfilesManagerImpl.checkAndThrowExceptionWhenFeatureDisabled(ResourceProfilesManagerImpl.java:191)
at 
org.apache.hadoop.yarn.server.resourcemanager.resource.ResourceProfilesManagerImpl.getResourceProfiles(ResourceProfilesManagerImpl.java:214)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getResourceProfiles(ClientRMService.java:1833)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getResourceProfiles(ApplicationClientProtocolPBServiceImpl.java:670)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:665)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
{code}

  was:
RM logs are accumulated with YARNFeatureNotEnabledException when running 
DIstributed Shell jobs while {{ClientRMService#getResourceProfiles}}

{code}
2019-04-16 07:10:47,699 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 
on 8050, call Call#5 Retry#0 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getResourceProfiles from 
172.26.81.91:41198
org.apache.hadoop.yarn.exceptions.YARNFeatureNotEnabledException: Resource 
profile is not enabled, please enable resource profile feature before using its 
functions. (by setting yarn.resourcemanager.resource-profiles.enabled to true)
at 
org.apache.hadoop.yarn.server.resourcemanager.resource.ResourceProfilesManagerImpl.checkAndThrowExceptionWhenFeatureDisabled(ResourceProfilesManagerImpl.java:191)
at 
org.apache.hadoop.yarn.server.resourcemanager.resource.ResourceProfilesManagerImpl.getResourceProfiles(ResourceProfilesManagerImpl.java:214)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getResourceProfiles(ClientRMService.java:1833)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getResourceProfiles(ApplicationClientProtocolPBServiceImpl.java:670)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:665)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
{code}


> Skip YARNFeatureNotEnabledException from ClientRMService
> 
>
> Key: YARN-9488
> URL: https://issues.apache.org/jira/browse/YARN-9488
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: YARN-9488-001.patch, YARN-9488-002.patch, 
> YARN-9488-002.patch
>
>
> RM logs are accumulated with YARNFeatureNotEnabledException when running 
> Distributed Shell jobs while {{ClientRMService#getResourceProfiles}}
> {code}
> 2019-04-16 07:10:47,699 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 0 on 80

[jira] [Commented] (YARN-9461) TestRMWebServicesDelegationTokenAuthentication.testCancelledDelegationToken fails with HTTP 400

2019-08-15 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908154#comment-16908154
 ] 

Szilard Nemeth commented on YARN-9461:
--

Kicked off build again, in order to have a fresh Jenkins result.

> TestRMWebServicesDelegationTokenAuthentication.testCancelledDelegationToken 
> fails with HTTP 400
> ---
>
> Key: YARN-9461
> URL: https://issues.apache.org/jira/browse/YARN-9461
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, test
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Minor
> Attachments: YARN-9461-001.patch
>
>
> The test 
> {{TestRMWebServicesDelegationTokenAuthentication.testCancelledDelegationToken}}
>  sometimes fails with the following error:
> {noformat}
> java.io.IOException: Server returned HTTP response code: 400 for URL: 
> http://localhost:8088/ws/v1/cluster/delegation-token
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication.cancelDelegationToken(TestRMWebServicesDelegationTokenAuthentication.java:462)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication.testCancelledDelegationToken(TestRMWebServicesDelegationTokenAuthentication.java:283)
> {noformat}
> The problem is that for whatever reason, Jetty seems to execute the token 
> cancellation REST call twice. First we get HTTP 200 OK, but the second 
> request fails with HTTP 400 Bad Request.
> The {{MockRM}} instance is static. Something could be a problem in this class 
> and it turned out that using separate {{MockRM}} instances solves the 
> flakiness.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-9749) TestAppLogAggregatorImpl#testDFSQuotaExceeded fails on trunk

2019-08-15 Thread Peter Bacsko (JIRA)

Peter Bacsko created YARN-9749:
--

 Summary: TestAppLogAggregatorImpl#testDFSQuotaExceeded fails on 
trunk
 Key: YARN-9749
 URL: https://issues.apache.org/jira/browse/YARN-9749
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Reporter: Peter Bacsko
Assignee: Adam Antal


TestAppLogAggregatorImpl#testDFSQuotaExceeded currently fails on trunk. It was 
most likely introduced by YARN-9676 (resetting HEAD to the previous commit and 
then re-running the test passes).

{noformat}
[INFO] Running 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl
[ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.781 s 
<<< FAILURE! - in 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl
[ERROR] 
testDFSQuotaExceeded(org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl)
  Time elapsed: 2.361 s  <<< FAILURE!
java.lang.AssertionError: The set of paths for deletion are not the same as 
expected: actual size: 0 vs expected size: 1
at org.junit.Assert.fail(Assert.java:88)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.verifyFilesToDelete(TestAppLogAggregatorImpl.java:344)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.access$000(TestAppLogAggregatorImpl.java:82)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl$1.answer(TestAppLogAggregatorImpl.java:330)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl$1.answer(TestAppLogAggregatorImpl.java:319)
at 
org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:39)
at 
org.mockito.internal.handler.MockHandlerImpl.handle(MockHandlerImpl.java:96)
at 
org.mockito.internal.handler.NullResultGuardian.handle(NullResultGuardian.java:29)
at 
org.mockito.internal.handler.InvocationNotifierHandler.handle(InvocationNotifierHandler.java:35)
at 
org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:61)
at 
org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:49)
at 
org.mockito.internal.creation.bytebuddy.MockMethodInterceptor$DispatcherDefaultingToRealMethod.interceptSuperCallable(MockMethodInterceptor.java:108)
at 
org.apache.hadoop.yarn.server.nodemanager.DeletionService$MockitoMock$1879282050.delete(Unknown
 Source)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregationPostCleanUp(AppLogAggregatorImpl.java:556)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:476)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.testDFSQuotaExceeded(TestAppLogAggregatorImpl.java:469)
...
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9217) Nodemanager will fail to start if GPU is misconfigured on the node or GPU drivers missing

2019-08-15 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908150#comment-16908150
 ] 

Hadoop QA commented on YARN-9217:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  5m 
19s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 24s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
52s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 
0 new + 224 unchanged - 2 fixed = 224 total (was 226) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
49s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
39s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 21m  2s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}115m 48s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9217 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12977703/YARN-9217.011.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedcli

[jira] [Updated] (YARN-9749) TestAppLogAggregatorImpl#testDFSQuotaExceeded fails on trunk

2019-08-15 Thread Peter Bacsko (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9749:
---
Component/s: (was: yarn)
 test
 log-aggregation

> TestAppLogAggregatorImpl#testDFSQuotaExceeded fails on trunk
> 
>
> Key: YARN-9749
> URL: https://issues.apache.org/jira/browse/YARN-9749
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, test
>Reporter: Peter Bacsko
>Assignee: Adam Antal
>Priority: Major
>
> TestAppLogAggregatorImpl#testDFSQuotaExceeded currently fails on trunk. It 
> was most likely introduced by YARN-9676 (resetting HEAD to the previous 
> commit and then re-running the test passes).
> {noformat}
> [INFO] Running 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl
> [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.781 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl
> [ERROR] 
> testDFSQuotaExceeded(org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl)
>   Time elapsed: 2.361 s  <<< FAILURE!
> java.lang.AssertionError: The set of paths for deletion are not the same as 
> expected: actual size: 0 vs expected size: 1
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.verifyFilesToDelete(TestAppLogAggregatorImpl.java:344)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.access$000(TestAppLogAggregatorImpl.java:82)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl$1.answer(TestAppLogAggregatorImpl.java:330)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl$1.answer(TestAppLogAggregatorImpl.java:319)
>   at 
> org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:39)
>   at 
> org.mockito.internal.handler.MockHandlerImpl.handle(MockHandlerImpl.java:96)
>   at 
> org.mockito.internal.handler.NullResultGuardian.handle(NullResultGuardian.java:29)
>   at 
> org.mockito.internal.handler.InvocationNotifierHandler.handle(InvocationNotifierHandler.java:35)
>   at 
> org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:61)
>   at 
> org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:49)
>   at 
> org.mockito.internal.creation.bytebuddy.MockMethodInterceptor$DispatcherDefaultingToRealMethod.interceptSuperCallable(MockMethodInterceptor.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DeletionService$MockitoMock$1879282050.delete(Unknown
>  Source)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregationPostCleanUp(AppLogAggregatorImpl.java:556)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:476)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl.testDFSQuotaExceeded(TestAppLogAggregatorImpl.java:469)
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9743) [JDK11] TestTimelineWebServices.testContextFactory fails

2019-08-15 Thread Adam Antal (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908100#comment-16908100
 ] 

Adam Antal commented on YARN-9743:
--

build with openjdk 11.04, jdk11 target, run on jdk11 using the following 
command:
{noformat}
mvn clean test -Dit.test=None -Djavac.version=11 
-Dtest=org.apache.hadoop.yarn.server.timeline.webapp.TestTimelineWebServices#testContextFactory

{noformat}

Still produces this error with the same stack trace.

[~Prabhu Joseph] can you take a look at this issue? Looks like that piece of 
code has been touched by you lately - maybe you know what the issue is. I 
searched around in the internet, but I found no solution. It does seem like 
other projects targetinig Java11 are also facing this problem.

> [JDK11] TestTimelineWebServices.testContextFactory fails
> 
>
> Key: YARN-9743
> URL: https://issues.apache.org/jira/browse/YARN-9743
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineservice
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Priority: Major
>
> Tested on OpenJDK 11.0.2 on a Mac.
> Stack trace:
> {noformat}
> [ERROR] Tests run: 29, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 
> 36.016 s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.timeline.webapp.TestTimelineWebServices
> [ERROR] 
> testContextFactory(org.apache.hadoop.yarn.server.timeline.webapp.TestTimelineWebServices)
>   Time elapsed: 1.031 s  <<< ERROR!
> java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory
>   at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:583)
>   at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
>   at java.base/java.lang.Class.forName0(Native Method)
>   at java.base/java.lang.Class.forName(Class.java:315)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.ContextFactory.newContext(ContextFactory.java:85)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.ContextFactory.createContext(ContextFactory.java:112)
>   at 
> org.apache.hadoop.yarn.server.timeline.webapp.TestTimelineWebServices.testContextFactory(TestTimelineWebServices.java:1039)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {no

[jira] [Commented] (YARN-8586) Extract log aggregation related fields and methods from RMAppImpl

2019-08-15 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908096#comment-16908096
 ] 

Hadoop QA commented on YARN-8586:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}127m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-3.1 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
36s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  9s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} branch-3.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 0 new + 107 unchanged - 10 fixed = 107 total (was 117) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 52s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 69m 
45s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}257m 59s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=18.09.7 Server=18.09.7 Image:yetus/hadoop:080e9d0f9b3 |
| JIRA Issue | YARN-8586 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12977693/YARN-8586-branch-3.1.001.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 9b33067ec310 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 
22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-3.1 / 96ea7f0 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24569/testReport/ |
| Max. process+thread count | 833 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U:

[jira] [Commented] (YARN-9480) createAppDir() in LogAggregationService shouldn't block dispatcher thread of ContainerManagerImpl

2019-08-15 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908094#comment-16908094
 ] 

Hadoop QA commented on YARN-9480:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 46s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 25s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 1 new + 188 unchanged - 1 fixed = 189 total (was 189) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 21m 11s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 76m  6s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9480 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12977701/YARN-9480.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 60c9ecbc2c78 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 
08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 3468164 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/24571/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/24571/artifact/out/patch-unit-hadoop-yarn-project_ha

[jira] [Updated] (YARN-9217) Nodemanager will fail to start if GPU is misconfigured on the node or GPU drivers missing

2019-08-15 Thread Peter Bacsko (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9217:
---
Attachment: YARN-9217.011.patch

> Nodemanager will fail to start if GPU is misconfigured on the node or GPU 
> drivers missing
> -
>
> Key: YARN-9217
> URL: https://issues.apache.org/jira/browse/YARN-9217
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9217.001.patch, YARN-9217.002.patch, 
> YARN-9217.003.patch, YARN-9217.004.patch, YARN-9217.005.patch, 
> YARN-9217.006.patch, YARN-9217.007.patch, YARN-9217.008.patch, 
> YARN-9217.009.patch, YARN-9217.010.patch, YARN-9217.011.patch
>
>
> Nodemanager will not start
> 1. If Autodiscovery is enabled:
>  * If nvidia-smi path is misconfigured or the file does not exist.
>  * There is 0 GPU found
>  * If the file exists but it is not pointing to an nvidia-smi
>  * if the binary is ok but there is an IOException
> 2. If the manually configured GPU devices are misconfigured
>  * Any index:minor number format failure will cause a problem
>  * 0 configured device will cause a problem
>  * NumberFormatException is not handled
> It would be a better option to add warnings about the configuration, set 0 
> available GPUs and let the node work and run non-gpu jobs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9217) Nodemanager will fail to start if GPU is misconfigured on the node or GPU drivers missing

2019-08-15 Thread Peter Bacsko (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9217:
---
Attachment: (was: YARN-9217.011.patch)

> Nodemanager will fail to start if GPU is misconfigured on the node or GPU 
> drivers missing
> -
>
> Key: YARN-9217
> URL: https://issues.apache.org/jira/browse/YARN-9217
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9217.001.patch, YARN-9217.002.patch, 
> YARN-9217.003.patch, YARN-9217.004.patch, YARN-9217.005.patch, 
> YARN-9217.006.patch, YARN-9217.007.patch, YARN-9217.008.patch, 
> YARN-9217.009.patch, YARN-9217.010.patch, YARN-9217.011.patch
>
>
> Nodemanager will not start
> 1. If Autodiscovery is enabled:
>  * If nvidia-smi path is misconfigured or the file does not exist.
>  * There is 0 GPU found
>  * If the file exists but it is not pointing to an nvidia-smi
>  * if the binary is ok but there is an IOException
> 2. If the manually configured GPU devices are misconfigured
>  * Any index:minor number format failure will cause a problem
>  * 0 configured device will cause a problem
>  * NumberFormatException is not handled
> It would be a better option to add warnings about the configuration, set 0 
> available GPUs and let the node work and run non-gpu jobs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9217) Nodemanager will fail to start if GPU is misconfigured on the node or GPU drivers missing

2019-08-15 Thread Peter Bacsko (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9217:
---
Attachment: YARN-9217.011.patch

> Nodemanager will fail to start if GPU is misconfigured on the node or GPU 
> drivers missing
> -
>
> Key: YARN-9217
> URL: https://issues.apache.org/jira/browse/YARN-9217
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9217.001.patch, YARN-9217.002.patch, 
> YARN-9217.003.patch, YARN-9217.004.patch, YARN-9217.005.patch, 
> YARN-9217.006.patch, YARN-9217.007.patch, YARN-9217.008.patch, 
> YARN-9217.009.patch, YARN-9217.010.patch, YARN-9217.011.patch
>
>
> Nodemanager will not start
> 1. If Autodiscovery is enabled:
>  * If nvidia-smi path is misconfigured or the file does not exist.
>  * There is 0 GPU found
>  * If the file exists but it is not pointing to an nvidia-smi
>  * if the binary is ok but there is an IOException
> 2. If the manually configured GPU devices are misconfigured
>  * Any index:minor number format failure will cause a problem
>  * 0 configured device will cause a problem
>  * NumberFormatException is not handled
> It would be a better option to add warnings about the configuration, set 0 
> available GPUs and let the node work and run non-gpu jobs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9676) Add DEBUG and TRACE level messages to AppLogAggregatorImpl and connected classes

2019-08-15 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908049#comment-16908049
 ] 

Szilard Nemeth commented on YARN-9676:
--

Thanks [~adam.antal]!

> Add DEBUG and TRACE level messages to AppLogAggregatorImpl and connected 
> classes
> 
>
> Key: YARN-9676
> URL: https://issues.apache.org/jira/browse/YARN-9676
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
>
> During the development of the last items of YARN-6875, it was typically 
> difficult to extract information about the internal state of some log 
> aggregation related classes (e.g. {{AppLogAggregatiorImpl}} and 
> {{LogAggregationFileController}}). 
> On my fork I added a few more messages to those classes like:
> - displaying the number of log aggregation cycles
> - displaying the names of the files currently considered for log aggregation 
> by containers
> - immediately displaying any exception caught (and sent to the RM in the 
> diagnostic messages) during the log aggregation process.
> Those messages were quite useful for debugging if any issue occurs, but 
> otherwise it flooded the NM log file with these messages that are usually not 
> needed. I suggest to add (some of) these messages in DEBUG or TRACE level.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9676) Add DEBUG and TRACE level messages to AppLogAggregatorImpl and connected classes

2019-08-15 Thread Adam Antal (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908046#comment-16908046
 ] 

Adam Antal commented on YARN-9676:
--

Cherry-picking failed because YARN-8273 only got in branch-3.2 and not 
branch-3.1.

I'd rather omit the patch to branch-3.1. It is good as it is, thanks. I resolve 
this issue.

> Add DEBUG and TRACE level messages to AppLogAggregatorImpl and connected 
> classes
> 
>
> Key: YARN-9676
> URL: https://issues.apache.org/jira/browse/YARN-9676
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
>
> During the development of the last items of YARN-6875, it was typically 
> difficult to extract information about the internal state of some log 
> aggregation related classes (e.g. {{AppLogAggregatiorImpl}} and 
> {{LogAggregationFileController}}). 
> On my fork I added a few more messages to those classes like:
> - displaying the number of log aggregation cycles
> - displaying the names of the files currently considered for log aggregation 
> by containers
> - immediately displaying any exception caught (and sent to the RM in the 
> diagnostic messages) during the log aggregation process.
> Those messages were quite useful for debugging if any issue occurs, but 
> otherwise it flooded the NM log file with these messages that are usually not 
> needed. I suggest to add (some of) these messages in DEBUG or TRACE level.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5857) TestLogAggregationService.testFixedSizeThreadPool fails intermittently on trunk

2019-08-15 Thread Adam Antal (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908038#comment-16908038
 ] 

Adam Antal commented on YARN-5857:
--

The test timed out, because it was waiting for the latch, which only counted 
down when the lock is released. That has happened after 35sec, while the test 
has 30sec timeout. We should make sure all the threads are active when we 
count, so I'd rather put the latch-countDown part in the try part, right after 
the rLock.tryLock call.

> TestLogAggregationService.testFixedSizeThreadPool fails intermittently on 
> trunk
> ---
>
> Key: YARN-5857
> URL: https://issues.apache.org/jira/browse/YARN-5857
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: YARN-5857-001.patch, testFixedSizeThreadPool failure 
> reproduction
>
>
> {noformat}
> testFixedSizeThreadPool(org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService)
>   Time elapsed: 0.11 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testFixedSizeThreadPool(TestLogAggregationService.java:1139)
> {noformat}
> Refer to https://builds.apache.org/job/PreCommit-YARN-Build/13829/testReport/



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9217) Nodemanager will fail to start if GPU is misconfigured on the node or GPU drivers missing

2019-08-15 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908030#comment-16908030
 ] 

Hadoop QA commented on YARN-9217:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
44s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
50s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 
0 new + 224 unchanged - 2 fixed = 224 total (was 226) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 49s{color} 
| {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 21m 39s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
57s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 99m 12s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.conf.TestYarnConfigurationFields |
|   | 
hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9217 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12977695/YARN-9217.010.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux e766ba2afa77 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 
08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git

[jira] [Commented] (YARN-9488) Skip YARNFeatureNotEnabledException from ClientRMService

2019-08-15 Thread Adam Antal (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908029#comment-16908029
 ] 

Adam Antal commented on YARN-9488:
--

+1 (non-binding) on patch v2.

> Skip YARNFeatureNotEnabledException from ClientRMService
> 
>
> Key: YARN-9488
> URL: https://issues.apache.org/jira/browse/YARN-9488
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: YARN-9488-001.patch, YARN-9488-002.patch, 
> YARN-9488-002.patch
>
>
> RM logs are accumulated with YARNFeatureNotEnabledException when running 
> DIstributed Shell jobs while {{ClientRMService#getResourceProfiles}}
> {code}
> 2019-04-16 07:10:47,699 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 0 on 8050, call Call#5 Retry#0 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getResourceProfiles 
> from 172.26.81.91:41198
> org.apache.hadoop.yarn.exceptions.YARNFeatureNotEnabledException: Resource 
> profile is not enabled, please enable resource profile feature before using 
> its functions. (by setting yarn.resourcemanager.resource-profiles.enabled to 
> true)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.resource.ResourceProfilesManagerImpl.checkAndThrowExceptionWhenFeatureDisabled(ResourceProfilesManagerImpl.java:191)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.resource.ResourceProfilesManagerImpl.getResourceProfiles(ResourceProfilesManagerImpl.java:214)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getResourceProfiles(ClientRMService.java:1833)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getResourceProfiles(ApplicationClientProtocolPBServiceImpl.java:670)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:665)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9746) RM should merge local config for token renewal

2019-08-15 Thread Junfan Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated YARN-9746:
---
Summary: RM should merge local config for token renewal  (was: Rm should 
only rewrite partial jobConf passed by app when supporting multi-cluster token 
renew)

> RM should merge local config for token renewal
> --
>
> Key: YARN-9746
> URL: https://issues.apache.org/jira/browse/YARN-9746
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Priority: Major
> Attachments: YARN-9746-01.patch
>
>
> This issue links to YARN-5910.
> When to support multi-cluster delegation token renew, the path of YARN-5910 
> works in most scenarios.
> But when intergrating with Oozie, we encounter some problems. In Oozie having 
> multi delegation tokens including HDFS_DELEGATION_TOKEN(another cluster HA 
> token) and MR_DELEGATION_TOKEN(Oozie mr launcher token), to support renew 
> another cluster's token, YARN-5910 was patched and related config was set. 
> The config is as follows
> {code:xml}
> 
> mapreduce.job.send-token-conf
> 
> dfs.namenode.kerberos.principal|dfs.nameservices|^dfs.namenode.rpc-address.*$|^dfs.ha.namenodes.*$|^dfs.client.failover.proxy.provider.*$
> 
> 
> dfs.nameservices
> 
> hadoop-clusterA-ns01,hadoop-clusterA-ns02,hadoop-clusterA-ns03,hadoop-clusterA-ns04,hadoop-clusterB-ns01,hadoop-clusterB-ns02,hadoop-clusterB-ns03,hadoop-clusterB-ns04
> 
> 
> dfs.ha.namenodes.hadoop-clusterB-ns01
> nn1,nn2
> 
> 
> 
> dfs.namenode.rpc-address.hadoop-clusterB-ns01.nn1
> namenode01-clusterB.hadoop:8020
> 
> 
> 
> dfs.namenode.rpc-address.hadoop-clusterB-ns01.nn2
> namenode02-clusterB.hadoop:8020
> 
> 
> 
> dfs.client.failover.proxy.provider.hadoop-clusterB-ns01
> 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> 
> {code}
> However, the MR_DELEGATION_TOKEN could‘t be renewed, because of lacking some 
> config. Although we can set the required configurations through the app, this 
> is not a good idea. So i think rm should only rewrite the jobConf passed by 
> app to solve the above situation.  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9217) Nodemanager will fail to start if GPU is misconfigured on the node or GPU drivers missing

2019-08-15 Thread Peter Bacsko (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907985#comment-16907985
 ] 

Peter Bacsko commented on YARN-9217:


Rebased patch (again) + introduced new fail-fast property.

> Nodemanager will fail to start if GPU is misconfigured on the node or GPU 
> drivers missing
> -
>
> Key: YARN-9217
> URL: https://issues.apache.org/jira/browse/YARN-9217
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9217.001.patch, YARN-9217.002.patch, 
> YARN-9217.003.patch, YARN-9217.004.patch, YARN-9217.005.patch, 
> YARN-9217.006.patch, YARN-9217.007.patch, YARN-9217.008.patch, 
> YARN-9217.009.patch, YARN-9217.010.patch
>
>
> Nodemanager will not start
> 1. If Autodiscovery is enabled:
>  * If nvidia-smi path is misconfigured or the file does not exist.
>  * There is 0 GPU found
>  * If the file exists but it is not pointing to an nvidia-smi
>  * if the binary is ok but there is an IOException
> 2. If the manually configured GPU devices are misconfigured
>  * Any index:minor number format failure will cause a problem
>  * 0 configured device will cause a problem
>  * NumberFormatException is not handled
> It would be a better option to add warnings about the configuration, set 0 
> available GPUs and let the node work and run non-gpu jobs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9217) Nodemanager will fail to start if GPU is misconfigured on the node or GPU drivers missing

2019-08-15 Thread Peter Bacsko (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9217:
---
Attachment: YARN-9217.010.patch

> Nodemanager will fail to start if GPU is misconfigured on the node or GPU 
> drivers missing
> -
>
> Key: YARN-9217
> URL: https://issues.apache.org/jira/browse/YARN-9217
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9217.001.patch, YARN-9217.002.patch, 
> YARN-9217.003.patch, YARN-9217.004.patch, YARN-9217.005.patch, 
> YARN-9217.006.patch, YARN-9217.007.patch, YARN-9217.008.patch, 
> YARN-9217.009.patch, YARN-9217.010.patch
>
>
> Nodemanager will not start
> 1. If Autodiscovery is enabled:
>  * If nvidia-smi path is misconfigured or the file does not exist.
>  * There is 0 GPU found
>  * If the file exists but it is not pointing to an nvidia-smi
>  * if the binary is ok but there is an IOException
> 2. If the manually configured GPU devices are misconfigured
>  * Any index:minor number format failure will cause a problem
>  * 0 configured device will cause a problem
>  * NumberFormatException is not handled
> It would be a better option to add warnings about the configuration, set 0 
> available GPUs and let the node work and run non-gpu jobs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9100) Add tests for GpuResourceAllocator and do minor code cleanup

2019-08-15 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907983#comment-16907983
 ] 

Hadoop QA commented on YARN-9100:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 54s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 0 new + 10 unchanged - 6 fixed = 10 total (was 16) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 59s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 21m  2s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 68m 23s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9100 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12977691/YARN-9100-008.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 6025450bf327 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 3468164 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/24568/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24568/testReport/ |
| Max. process+thread count | 448 (vs. ulim

[jira] [Updated] (YARN-9746) Rm should only rewrite partial jobConf passed by app when supporting multi-cluster token renew

2019-08-15 Thread Junfan Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated YARN-9746:
---
Description: 
This issue links to YARN-5910.

When to support multi-cluster delegation token renew, the path of YARN-5910 
works in most scenarios.

But when intergrating with Oozie, we encounter some problems. In Oozie having 
multi delegation tokens including HDFS_DELEGATION_TOKEN(another cluster HA 
token) and MR_DELEGATION_TOKEN(Oozie mr launcher token), to support renew 
another cluster's token, YARN-5910 was patched and related config was set. The 
config is as follows
{code:xml}

mapreduce.job.send-token-conf

dfs.namenode.kerberos.principal|dfs.nameservices|^dfs.namenode.rpc-address.*$|^dfs.ha.namenodes.*$|^dfs.client.failover.proxy.provider.*$


dfs.nameservices

hadoop-clusterA-ns01,hadoop-clusterA-ns02,hadoop-clusterA-ns03,hadoop-clusterA-ns04,hadoop-clusterB-ns01,hadoop-clusterB-ns02,hadoop-clusterB-ns03,hadoop-clusterB-ns04


dfs.ha.namenodes.hadoop-clusterB-ns01
nn1,nn2



dfs.namenode.rpc-address.hadoop-clusterB-ns01.nn1
namenode01-clusterB.hadoop:8020



dfs.namenode.rpc-address.hadoop-clusterB-ns01.nn2
namenode02-clusterB.hadoop:8020



dfs.client.failover.proxy.provider.hadoop-clusterB-ns01

org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

{code}
However, the MR_DELEGATION_TOKEN could‘t be renewed, because of lacking some 
config. Although we can set the required configurations through the app, this 
is not a good idea. So i think rm should only rewrite the jobConf passed by app 
to solve the above situation.  

  was:
This issue links to YARN-5910.

When to support multi-cluster delegation token renew, the path of YARN-5910 
works in most scenarios.

But when intergrating with Oozie, we encounter some problems. In Oozie having 
multi delegation tokens including HDFS_DELEGATION_TOKEN(another cluster HA 
token) and MR_DELEGATION_TOKEN(Oozie mr launcher token), to support renew 
another cluster's token, YARN-5910 was patched and related config was set. The 
config is as follows
{code:xml}

mapreduce.job.send-token-conf

dfs.namenode.kerberos.principal|dfs.nameservices|^dfs.namenode.rpc-address.*$|^dfs.ha.namenodes.*$|^dfs.client.failover.proxy.provider.*$


dfs.nameservices

hadoop-clusterA-ns01,hadoop-clusterA-ns02,hadoop-clusterA-ns03,hadoop-clusterA-ns04,hadoop-clusterB-ns01,hadoop-clusterB-ns02,hadoop-clusterB-ns03,hadoop-clusterB-ns04


dfs.ha.namenodes.hadoop-clusterB-ns01
nn1,nn2



dfs.namenode.rpc-address.hadoop-clusterB-ns01.nn1
namenode01-clusterB.qiyi.hadoop:8020



dfs.namenode.rpc-address.hadoop-clusterB-ns01.nn2
namenode02-clusterB.qiyi.hadoop:8020



dfs.client.failover.proxy.provider.hadoop-clusterB-ns01

org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

{code}
However, the MR_DELEGATION_TOKEN could‘t be renewed, because of lacking some 
config. Although we can set the required configurations through the app, this 
is not a good idea. So i think rm should only rewrite the jobConf passed by app 
to solve the above situation.  


> Rm should only rewrite partial jobConf passed by app when supporting 
> multi-cluster token renew
> --
>
> Key: YARN-9746
> URL: https://issues.apache.org/jira/browse/YARN-9746
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Priority: Major
> Attachments: YARN-9746-01.patch
>
>
> This issue links to YARN-5910.
> When to support multi-cluster delegation token renew, the path of YARN-5910 
> works in most scenarios.
> But when intergrating with Oozie, we encounter some problems. In Oozie having 
> multi delegation tokens including HDFS_DELEGATION_TOKEN(another cluster HA 
> token) and MR_DELEGATION_TOKEN(Oozie mr launcher token), to support renew 
> another cluster's token, YARN-5910 was patched and related config was set. 
> The config is as follows
> {code:xml}
> 
> mapreduce.job

[jira] [Updated] (YARN-8586) Extract log aggregation related fields and methods from RMAppImpl

2019-08-15 Thread Peter Bacsko (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-8586:
---
Attachment: YARN-8586-branch-3.1.001.patch

> Extract log aggregation related fields and methods from RMAppImpl
> -
>
> Key: YARN-8586
> URL: https://issues.apache.org/jira/browse/YARN-8586
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-8586-branch-3.1.001.patch, YARN-8586.001.patch, 
> YARN-8586.002.patch, YARN-8586.002.patch, YARN-8586.003.patch, 
> YARN-8586.004.patch, YARN-8586.branch-3.2.001.patch
>
>
> Given that RMAppImpl is already above 2000 lines and it is very complex, as a 
> very simple 
> and straightforward step, all Log aggregation related fields and methods 
> could be extracted to a new class.
> The clients of RMAppImpl may access the same methods and RMAppImpl would 
> delegate all those calls to the newly introduced class.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9100) Add tests for GpuResourceAllocator and do minor code cleanup

2019-08-15 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907951#comment-16907951
 ] 

Szilard Nemeth commented on YARN-9100:
--

Patch looks good in overall, I only found one minor bit of weirdness in the 
test code (TestGpuResourceAllocator): 
In TestGpuResourceAllocator#assertAllocatedGpu: 

We have:
{code:java}
assertEquals(1, allocation.getAllowedGPUs().size());
assertEquals(0, allocation.getDeniedGPUs().size());

Set allowedGPUs = allocation.getAllowedGPUs();
assertEquals(1, allowedGPUs.size());

GpuDevice allocatedGpu = (GpuDevice) allowedGPUs.toArray()[0];
assertEquals(expectedGpu, allocatedGpu);
assertAssignmentInStateStore(expectedGpu, container);
{code}

I think the code block of 

{code:java}
Set allowedGPUs = allocation.getAllowedGPUs();
assertEquals(1, allowedGPUs.size());
{code}
is superfluous, as the size of allowed GPUs list is already checked above for 
the allocation.
Please fix this minor bit and I think we are good to go!

Thanks! 

> Add tests for GpuResourceAllocator and do minor code cleanup
> 
>
> Key: YARN-9100
> URL: https://issues.apache.org/jira/browse/YARN-9100
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9100-004.patch, YARN-9100-005.patch, 
> YARN-9100-006.patch, YARN-9100-007.patch, YARN-9100-008.patch, 
> YARN-9100.001.patch, YARN-9100.002.patch, YARN-9100.003.patch
>
>
> Add tests for GpuResourceAllocator and do minor code cleanup
> - Improved log and exception messages
> - Added some new debug logs
> - Some methods are named like *Copy, these are returning copies of internal 
> data structures. The word "copy" is just a noise in their name, so they have 
> been renamed. Additionally, the copied data structures modified to be 
> immutable.
> - The waiting loop in method assignGpus were decoupled into a new class, 
> RetryCommand. 
> Some more words about the new class RetryCommand: 
> There are some similar waiting loops in the code in: AMRMClient, 
> AMRMClientAsync and even in GenericTestUtils (see waitFor method). 
> RetryCommand could be a future replacement of these duplicated code, as it 
> gives a solution to this waiting loop problem in a generic way.
> The only downside of the usage of RetryCommand in GpuResourceAllocator 
> (startGpuAssignmentLoop) is the ugly exception handling part, but that's 
> solely because how Java deals with checked exceptions vs. lambdas. If there's 
> a cleaner way to solve the exception handling, I'm open for any suggestions.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9100) Add tests for GpuResourceAllocator and do minor code cleanup

2019-08-15 Thread Szilard Nemeth (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-9100:
-
Attachment: YARN-9100-008.patch

> Add tests for GpuResourceAllocator and do minor code cleanup
> 
>
> Key: YARN-9100
> URL: https://issues.apache.org/jira/browse/YARN-9100
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9100-004.patch, YARN-9100-005.patch, 
> YARN-9100-006.patch, YARN-9100-007.patch, YARN-9100-008.patch, 
> YARN-9100.001.patch, YARN-9100.002.patch, YARN-9100.003.patch
>
>
> Add tests for GpuResourceAllocator and do minor code cleanup
> - Improved log and exception messages
> - Added some new debug logs
> - Some methods are named like *Copy, these are returning copies of internal 
> data structures. The word "copy" is just a noise in their name, so they have 
> been renamed. Additionally, the copied data structures modified to be 
> immutable.
> - The waiting loop in method assignGpus were decoupled into a new class, 
> RetryCommand. 
> Some more words about the new class RetryCommand: 
> There are some similar waiting loops in the code in: AMRMClient, 
> AMRMClientAsync and even in GenericTestUtils (see waitFor method). 
> RetryCommand could be a future replacement of these duplicated code, as it 
> gives a solution to this waiting loop problem in a generic way.
> The only downside of the usage of RetryCommand in GpuResourceAllocator 
> (startGpuAssignmentLoop) is the ugly exception handling part, but that's 
> solely because how Java deals with checked exceptions vs. lambdas. If there's 
> a cleaner way to solve the exception handling, I'm open for any suggestions.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9100) Add tests for GpuResourceAllocator and do minor code cleanup

2019-08-15 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907946#comment-16907946
 ] 

Szilard Nemeth commented on YARN-9100:
--

Hi [~pbacsko]!
patch006 had some trivial import conflicts, so I resolved them and uploaded 
patch007.
Pending jenkins.

> Add tests for GpuResourceAllocator and do minor code cleanup
> 
>
> Key: YARN-9100
> URL: https://issues.apache.org/jira/browse/YARN-9100
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9100-004.patch, YARN-9100-005.patch, 
> YARN-9100-006.patch, YARN-9100-007.patch, YARN-9100.001.patch, 
> YARN-9100.002.patch, YARN-9100.003.patch
>
>
> Add tests for GpuResourceAllocator and do minor code cleanup
> - Improved log and exception messages
> - Added some new debug logs
> - Some methods are named like *Copy, these are returning copies of internal 
> data structures. The word "copy" is just a noise in their name, so they have 
> been renamed. Additionally, the copied data structures modified to be 
> immutable.
> - The waiting loop in method assignGpus were decoupled into a new class, 
> RetryCommand. 
> Some more words about the new class RetryCommand: 
> There are some similar waiting loops in the code in: AMRMClient, 
> AMRMClientAsync and even in GenericTestUtils (see waitFor method). 
> RetryCommand could be a future replacement of these duplicated code, as it 
> gives a solution to this waiting loop problem in a generic way.
> The only downside of the usage of RetryCommand in GpuResourceAllocator 
> (startGpuAssignmentLoop) is the ugly exception handling part, but that's 
> solely because how Java deals with checked exceptions vs. lambdas. If there's 
> a cleaner way to solve the exception handling, I'm open for any suggestions.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9100) Add tests for GpuResourceAllocator and do minor code cleanup

2019-08-15 Thread Szilard Nemeth (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-9100:
-
Attachment: YARN-9100-007.patch

> Add tests for GpuResourceAllocator and do minor code cleanup
> 
>
> Key: YARN-9100
> URL: https://issues.apache.org/jira/browse/YARN-9100
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9100-004.patch, YARN-9100-005.patch, 
> YARN-9100-006.patch, YARN-9100-007.patch, YARN-9100.001.patch, 
> YARN-9100.002.patch, YARN-9100.003.patch
>
>
> Add tests for GpuResourceAllocator and do minor code cleanup
> - Improved log and exception messages
> - Added some new debug logs
> - Some methods are named like *Copy, these are returning copies of internal 
> data structures. The word "copy" is just a noise in their name, so they have 
> been renamed. Additionally, the copied data structures modified to be 
> immutable.
> - The waiting loop in method assignGpus were decoupled into a new class, 
> RetryCommand. 
> Some more words about the new class RetryCommand: 
> There are some similar waiting loops in the code in: AMRMClient, 
> AMRMClientAsync and even in GenericTestUtils (see waitFor method). 
> RetryCommand could be a future replacement of these duplicated code, as it 
> gives a solution to this waiting loop problem in a generic way.
> The only downside of the usage of RetryCommand in GpuResourceAllocator 
> (startGpuAssignmentLoop) is the ugly exception handling part, but that's 
> solely because how Java deals with checked exceptions vs. lambdas. If there's 
> a cleaner way to solve the exception handling, I'm open for any suggestions.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9100) Add tests for GpuResourceAllocator and do minor code cleanup

2019-08-15 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907944#comment-16907944
 ] 

Szilard Nemeth commented on YARN-9100:
--

Hi [~pbacsko]!
About to review the latest patch.
Agree to remove the RetryCommand from this patch.

Quoting from the description: 
{quote}
Some more words about the new class RetryCommand: 
There are some similar waiting loops in the code in: AMRMClient, 
AMRMClientAsync and even in GenericTestUtils (see waitFor method). RetryCommand 
could be a future replacement of these duplicated code, as it gives a solution 
to this waiting loop problem in a generic way.
The only downside of the usage of RetryCommand in GpuResourceAllocator 
(startGpuAssignmentLoop) is the ugly exception handling part, but that's solely 
because how Java deals with checked exceptions vs. lambdas. If there's a 
cleaner way to solve the exception handling, I'm open for any suggestions.
{quote}

What about filing a follow-up jira to consolidate the retry logic of 
GpuResourceAllocator and several other classes like AMRMClient, 
AMRMClientAsync, GenericTestUtils, etc?
I think what we had in patch003 for the RetryCommand can be a good starting 
point to extract the retry behaviour into one common class from the mentioned 
classes.
What do you think? 


> Add tests for GpuResourceAllocator and do minor code cleanup
> 
>
> Key: YARN-9100
> URL: https://issues.apache.org/jira/browse/YARN-9100
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9100-004.patch, YARN-9100-005.patch, 
> YARN-9100-006.patch, YARN-9100.001.patch, YARN-9100.002.patch, 
> YARN-9100.003.patch
>
>
> Add tests for GpuResourceAllocator and do minor code cleanup
> - Improved log and exception messages
> - Added some new debug logs
> - Some methods are named like *Copy, these are returning copies of internal 
> data structures. The word "copy" is just a noise in their name, so they have 
> been renamed. Additionally, the copied data structures modified to be 
> immutable.
> - The waiting loop in method assignGpus were decoupled into a new class, 
> RetryCommand. 
> Some more words about the new class RetryCommand: 
> There are some similar waiting loops in the code in: AMRMClient, 
> AMRMClientAsync and even in GenericTestUtils (see waitFor method). 
> RetryCommand could be a future replacement of these duplicated code, as it 
> gives a solution to this waiting loop problem in a generic way.
> The only downside of the usage of RetryCommand in GpuResourceAllocator 
> (startGpuAssignmentLoop) is the ugly exception handling part, but that's 
> solely because how Java deals with checked exceptions vs. lambdas. If there's 
> a cleaner way to solve the exception handling, I'm open for any suggestions.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9217) Nodemanager will fail to start if GPU is misconfigured on the node or GPU drivers missing

2019-08-15 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907942#comment-16907942
 ] 

Szilard Nemeth commented on YARN-9217:
--

Hi [~pbacsko]!

{quote}
Fundamental question: is this the way how we want to use thig plugin? Just 
asking because we might accidentally mask erratic behavior. Eg. a Hadoop user 
might think that he has a cluster with 10 GPUs. In reality, the plugin failed 
to detect some cards, and only 5 NMs support GPU scheduling. If it's not 
explicitly displayed, the user might be under the impression that 10 GPUs are 
ready to run YARN workloads. This can be very misleading.

At the very least, a fail-fast method should be considered.
{quote}
I agree with your approach on the fail-fast config flag so please fix the TODO 
and upload a new patch, then I can start reviewing it!

Thanks!

> Nodemanager will fail to start if GPU is misconfigured on the node or GPU 
> drivers missing
> -
>
> Key: YARN-9217
> URL: https://issues.apache.org/jira/browse/YARN-9217
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9217.001.patch, YARN-9217.002.patch, 
> YARN-9217.003.patch, YARN-9217.004.patch, YARN-9217.005.patch, 
> YARN-9217.006.patch, YARN-9217.007.patch, YARN-9217.008.patch, 
> YARN-9217.009.patch
>
>
> Nodemanager will not start
> 1. If Autodiscovery is enabled:
>  * If nvidia-smi path is misconfigured or the file does not exist.
>  * There is 0 GPU found
>  * If the file exists but it is not pointing to an nvidia-smi
>  * if the binary is ok but there is an IOException
> 2. If the manually configured GPU devices are misconfigured
>  * Any index:minor number format failure will cause a problem
>  * 0 configured device will cause a problem
>  * NumberFormatException is not handled
> It would be a better option to add warnings about the configuration, set 0 
> available GPUs and let the node work and run non-gpu jobs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9748) Allow capacity-scheduler configuration on HDFS

2019-08-15 Thread zhoukang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhoukang updated YARN-9748:
---
Description: 
Improvement:
Support auto reload from hdfs

> Allow capacity-scheduler configuration on HDFS
> --
>
> Key: YARN-9748
> URL: https://issues.apache.org/jira/browse/YARN-9748
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 3.1.2
>Reporter: zhoukang
>Priority: Major
>
> Improvement:
> Support auto reload from hdfs



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

66 matches

Mail list logo