[ 
https://issues.apache.org/jira/browse/YARN-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16631131#comment-16631131
 ] 

Haibo Chen commented on YARN-8775:
----------------------------------

Thanks [~bsteinbach] for the patch. I think we can reduce the leak of 
LocalDirsHandlerService implementation details in TestDiskFailures,

by disabling the periodical health check in LocalDirHandlerService, and calling 
LocalDirsHandlerService.checkDirs() every time before we check verify disk 
health.

checkDirs() is currently private, so we'll need to make it public (make sure to 
add '@VisibleForTesting')

 

One question I have is why do we need to retry inside prepareDirToFail()?

> TestDiskFailures.testLocalDirsFailures sometimes can fail on concurrent File 
> modifications
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-8775
>                 URL: https://issues.apache.org/jira/browse/YARN-8775
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: test, yarn
>    Affects Versions: 3.0.0
>            Reporter: Antal Bálint Steinbach
>            Assignee: Antal Bálint Steinbach
>            Priority: Major
>         Attachments: YARN-8775.001.patch, YARN-8775.002.patch
>
>
> The test can fail sometimes when file operations were done during the check 
> done by the thread in _LocalDirsHandlerService._
> {code:java}
> java.lang.AssertionError: NodeManager could not identify disk failure.
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.assertTrue(Assert.java:41)
>       at 
> org.apache.hadoop.yarn.server.TestDiskFailures.verifyDisksHealth(TestDiskFailures.java:239)
>       at 
> org.apache.hadoop.yarn.server.TestDiskFailures.testDirsFailures(TestDiskFailures.java:202)
>       at 
> org.apache.hadoop.yarn.server.TestDiskFailures.testLocalDirsFailures(TestDiskFailures.java:99)
> Stderr
> 2018-09-13 08:21:49,822 INFO [main] server.TestDiskFailures 
> (TestDiskFailures.java:prepareDirToFail(277)) - Prepared 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1
>  to fail.
> 2018-09-13 08:21:49,823 INFO [main] server.TestDiskFailures 
> (TestDiskFailures.java:prepareDirToFail(277)) - Prepared 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3
>  to fail.
> 2018-09-13 08:21:49,823 WARN [DiskHealthMonitor-Timer] 
> nodemanager.DirectoryCollection (DirectoryCollection.java:checkDirs(283)) - 
> Directory 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1
>  error, Not a directory: 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1,
>  removing from list of valid directories
> 2018-09-13 08:21:49,824 WARN [DiskHealthMonitor-Timer] 
> localizer.ResourceLocalizationService 
> (ResourceLocalizationService.java:initializeLogDir(1329)) - Could not 
> initialize log dir 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3
> java.io.FileNotFoundException: Destination exists and is not a directory: 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:515)
> at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:496)
> at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1081)
> at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:178)
> at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:205)
> at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:747)
> at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:743)
> at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
> at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:743)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.initializeLogDir(ResourceLocalizationService.java:1324)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.initializeLogDirs(ResourceLocalizationService.java:1318)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.access$000(ResourceLocalizationService.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$2.onDirsChanged(ResourceLocalizationService.java:269)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.checkDirs(DirectoryCollection.java:317)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.checkDirs(LocalDirsHandlerService.java:452)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.access$500(LocalDirsHandlerService.java:52)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService$MonitoringTimerTask.run(LocalDirsHandlerService.java:166)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> 2018-09-13 08:21:59,824 INFO [main] server.TestDiskFailures 
> (TestDiskFailures.java:verifyDisksHealth(237)) - ExpectedDirs=
> 2018-09-13 08:21:59,825 INFO [main] server.TestDiskFailures 
> (TestDiskFailures.java:verifyDisksHealth(238)) - 
> SeenDirs=/tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to