[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622338#comment-13622338 ] Hudson commented on YARN-467: - Integrated in Hadoop-Mapreduce-trunk #1390 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1390/]) YARN-467. Modify public distributed cache to localize files such that no local directory hits unix file count limits and thus prevent job failures. Contributed by Omkar Vinit Joshi. (Revision 1463823) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463823 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalCacheDirectoryManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalCacheDirectoryManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceRetention.java > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Fix For: 2.0.5-beta > > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, > yarn-467-20130401.patch, yarn-467-20130402.1.patch, > yarn-467-20130402.2.patch, yarn-467-20130402.patch, yarn-467-testCode.tar > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Thread
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622144#comment-13622144 ] Hudson commented on YARN-467: - Integrated in Hadoop-Hdfs-trunk #1363 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1363/]) YARN-467. Modify public distributed cache to localize files such that no local directory hits unix file count limits and thus prevent job failures. Contributed by Omkar Vinit Joshi. (Revision 1463823) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463823 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalCacheDirectoryManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalCacheDirectoryManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceRetention.java > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Fix For: 2.0.5-beta > > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, > yarn-467-20130401.patch, yarn-467-20130402.1.patch, > yarn-467-20130402.2.patch, yarn-467-20130402.patch, yarn-467-testCode.tar > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecut
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622018#comment-13622018 ] Hudson commented on YARN-467: - Integrated in Hadoop-Yarn-trunk #174 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/174/]) YARN-467. Modify public distributed cache to localize files such that no local directory hits unix file count limits and thus prevent job failures. Contributed by Omkar Vinit Joshi. (Revision 1463823) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463823 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalCacheDirectoryManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalCacheDirectoryManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceRetention.java > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Fix For: 2.0.5-beta > > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, > yarn-467-20130401.patch, yarn-467-20130402.1.patch, > yarn-467-20130402.2.patch, yarn-467-20130402.patch, yarn-467-testCode.tar > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620617#comment-13620617 ] Hudson commented on YARN-467: - Integrated in Hadoop-trunk-Commit #3552 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3552/]) YARN-467. Modify public distributed cache to localize files such that no local directory hits unix file count limits and thus prevent job failures. Contributed by Omkar Vinit Joshi. (Revision 1463823) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463823 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalCacheDirectoryManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalCacheDirectoryManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceRetention.java > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Fix For: 2.0.5-beta > > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, > yarn-467-20130401.patch, yarn-467-20130402.1.patch, > yarn-467-20130402.2.patch, yarn-467-20130402.patch > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(T
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620608#comment-13620608 ] Vinod Kumar Vavilapalli commented on YARN-467: -- Perfect, the latest patch looks good. Checking it in. > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, > yarn-467-20130401.patch, yarn-467-20130402.1.patch, > yarn-467-20130402.2.patch, yarn-467-20130402.patch > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > we need to have a mechanism where in we can create directory hierarchy and > limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620546#comment-13620546 ] Hadoop QA commented on YARN-467: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576705/yarn-467-20130402.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/657//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/657//console This message is automatically generated. > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, > yarn-467-20130401.patch, yarn-467-20130402.1.patch, > yarn-467-20130402.2.patch, yarn-467-20130402.patch > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > we need to have a mechanism where in we can create directory hierarchy and > limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620535#comment-13620535 ] Omkar Vinit Joshi commented on YARN-467: I have tested this code for below scenarios * I used 4 local-dirs to see if the localization gets distributed across them and LocalCacheDirectoryManager is managing them separately * I tested for various values of "yarn.nodemanager.local-cache.max-files-per-directory" <=36, 37 , 40 and much larger.. * I modified the cache cleanup interval and cache target size in mb to see older files getting removed from cache and LocalCacheDirectoryManager's sub directories are getting reused. * I tested that we never run into a situation where we have more number of files or sub directories in any local-directory than what is specified in the configuration. > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, > yarn-467-20130401.patch, yarn-467-20130402.1.patch, yarn-467-20130402.patch > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > we need to have a mechanism where in we can create directory hierarchy and > limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620412#comment-13620412 ] Hadoop QA commented on YARN-467: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576688/yarn-467-20130402.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/654//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/654//console This message is automatically generated. > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, > yarn-467-20130401.patch, yarn-467-20130402.1.patch, yarn-467-20130402.patch > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > we need to have a mechanism where in we can create directory hierarchy and > limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620384#comment-13620384 ] Hadoop QA commented on YARN-467: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576681/yarn-467-20130402.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/652//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/652//console This message is automatically generated. > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, > yarn-467-20130401.patch, yarn-467-20130402.patch > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > we need to have a mechanism where in we can create directory hierarchy and > limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619376#comment-13619376 ] Hadoop QA commented on YARN-467: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576466/yarn-467-20130401.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/641//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/641//console This message is automatically generated. > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, > yarn-467-20130401.patch > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > we need to have a mechanism where in we can create directory hierarchy and > limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619304#comment-13619304 ] Omkar Vinit Joshi commented on YARN-467: I ran the test on Mac and got below results. I think keeping a default of 8192 would be good.. ||Total Number of files || Total time taken (in millis)|| ||32||4|| ||64||7|| ||128||15|| ||256||27|| ||512||60|| ||1024||120|| ||2048||219|| ||4096||524|| ||8192||1845|| ||16384||7332|| I have incorporated all the comments in the latest patch. > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > we need to have a mechanism where in we can create directory hierarchy and > limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13618997#comment-13618997 ] Siddharth Seth commented on YARN-467: - bq. Another thing I've been looking hard is to see if LocalResourceTracker.localizationCompleted() can be done away with completely in favour of the handle() method. But to do that we need to handle both successful and failing localizations via handle(). I can already see a couple of bugs related to localization failures, so let's do this separately. That could be the route to reach the LocalizedResources, instaed of sending events to them directly. IAC, can be figured out in the follow-up jiras. Had looked at this patch earlier as well; mostly looks good in terms of functionality. It was a little tough to read, hopefully some of the changes suggested by Vinod will make that easier. > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > we need to have a mechanism where in we can create directory hierarchy and > limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616995#comment-13616995 ] Hadoop QA commented on YARN-467: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576003/yarn-467-20130328.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/627//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/627//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/627//console This message is automatically generated. > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > we need to have a mechanism where in we can create directory hierarchy and > limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616558#comment-13616558 ] Vinod Kumar Vavilapalli commented on YARN-467: -- Another thing I've been looking hard is to see if LocalResourceTracker.localizationCompleted() can be done away with completely in favour of the handle() method. But to do that we need to handle both successful and failing localizations via handle(). I can already see a couple of bugs related to localization failures, so let's do this separately. > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > we need to have a mechanism where in we can create directory hierarchy and > limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13615908#comment-13615908 ] Omkar Vinit Joshi commented on YARN-467: Adding tests to validate the expected behavior :- * TestHierarchicalDirectory ** testHierarchicalSubDirectoryCreation :- It tests below scenarios *** Limiting files per directory to YarnConfiguration.NM_LOCAL_CACHE_NUM_FILES_PER_DIRECTORY ( which includes 36 directories) *** If a file is removed (decFileCountForPath call) from any subdirectory then those directories are reused the order in which their state changes to DirectoryState.VACANT *** Checks path generation upto 2nd level. ** testMinimumPerDirectoryFileLimit :- This tests if the configuration parameter is set to a value which is <= 36. * TestLocalResourcesTrackerImpl ** testMinimumPerDirectoryFileLimit :- It is testing Public resources for HierarchicalDirectory structure. > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > we need to have a mechanism where in we can create directory hierarchy and > limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13615894#comment-13615894 ] Omkar Vinit Joshi commented on YARN-467: The Underlying problem here is that ResourceLocalization is trying to localize files more than the allowed file limit per directory for the underlying local file system. Proposed Solution :- ( For Public resources - localized under :- /filecache/ ) We are going to maintain hierarchical directory structure inside the local directories for filecache. so the directory structure will look like this .../filecache/ .../filecache/<36 directories (0-9 & a-z)>/ .../filecache/<36 directories (0-9 & a-z)>/<36 directories (0-9 & a-z)> . So in all every directory will have (8192-36) localized files and 36 sub directories named 0-9 and a-z. These sub directories are created only if they are required. They will not be created in advance. Likewise every sub directory will have similar structure. Now to manage files and to limit the number of files per directory to HierarchicalDirectory#PER_DIR_FILE_LIMIT (in this case 8192) introducing below classes / implementation. * LocalResourcesTrackerImpl :- ** maintainHierarchicalDir :- a boolean flag. It should be set when you want to use this resource tracker to track resources with hierarchical directory structure. ** directoryMap :- Map of . It makes sure that we have one HierarchicalDirectory for every localPath. ( For example if we have two local-dirs configured then it will have 2 entries.) ** inProgressRsrcMap :- Map of . This is used while local resource is getting localized. This map helps in two ways *** If the resource localization fails for that resource then we can retrieve the path and remove the file reservation (file count) *** If the LocalResourceRequest comes again for the same resourcerequest ( which is highly unlikely for today's implementation) it can return the same path back. ** getPathForLocalResource :- This method should be called to retrieve the Hierarchical directory path for the local-dir identified by the localDirPath. Internally it adds this request and returned path to inProgressRsrcMap and makes a reservation into the HierarchicalDirectory tracking this local-dir-path. ** decFileCountForHierarchicalPath :- It retrieves the localizedPath from either inProgressRsrcMap or from LocalizedResource and then reduces file count for the HierarchicalDirectory tracking it. ** localizationCompleted :- (Parameter - success) If true then it will only update inProgressRsrcMap; otherwise it will update inProgressRsrcMap and will also call decFileCountForHierarchicalPath. * HierarchicalDirectory :- It just helps in managing hierarchical directories. ** PER_DIR_FILE_LIMIT :- It controls the files per directory /sub directories of it. Can be controlled but should not be set too low (YarnConfiguration.NM_LOCAL_CACHE_NUM_FILES_PER_DIRECTORY). ** DIRECTORIES_PER_LEVEL (constant 36) :- So every directory/sub-directory will have total 36 directories only if they are required. ( 0-9 and a-z). Reason behind using single character is the file length limit for windows. ** vacantSubDirectories :- Queue :- at the beginning this will have root of the HierarchicalDirectory as the only sub directory. if the queue becomes empty then new sub directory will be created starting with 0. Note:- It will only create internal tracking for this and doesn't create an actual directory on file system. ** knownSubDirectories :- Map of - Root directory is identified by an empty string "" and then other sub directories by their relative paths. like for directory 0:"0" for 0/a :"0/a" ** getHierarchicalPath :- (synchronized) This method returns the relative path for the sub directory which is empty (has not reached its directory file limit). If no empty sub directory is present then it will create one using totalSubDirectories. ** decFileCountForPath :- (synchronized) This method reduces the count for the HierarchicalSubDirectory representing the passed in relative path. > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path > > > If we have multiple jobs which uses distributed cache with small size of > files, the directo