[jira] [Updated] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits

2013-04-08 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-99:
--

Attachment: yarn-99-20130408.1.patch

> Jobs fail during resource localization when private distributed-cache hits 
> unix directory limits
> 
>
> Key: YARN-99
> URL: https://issues.apache.org/jira/browse/YARN-99
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.0.0-alpha
>Reporter: Devaraj K
>Assignee: Omkar Vinit Joshi
> Attachments: yarn-99-20130324.patch, yarn-99-20130403.1.patch, 
> yarn-99-20130403.patch, yarn-99-20130408.1.patch, yarn-99-20130408.patch
>
>
> If we have multiple jobs which uses distributed cache with small size of 
> files, the directory limit reaches before reaching the cache size and fails 
> to create any directories in file cache. The jobs start failing with the 
> below exception.
> {code:xml}
> java.io.IOException: mkdir of 
> /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>   at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {code}
> We should have a mechanism to clean the cache files if it crosses specified 
> number of directories like cache size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits

2013-04-08 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-99:
--

Attachment: yarn-99-20130408.patch

> Jobs fail during resource localization when private distributed-cache hits 
> unix directory limits
> 
>
> Key: YARN-99
> URL: https://issues.apache.org/jira/browse/YARN-99
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.0.0-alpha
>Reporter: Devaraj K
>Assignee: Omkar Vinit Joshi
> Attachments: yarn-99-20130324.patch, yarn-99-20130403.1.patch, 
> yarn-99-20130403.patch, yarn-99-20130408.patch
>
>
> If we have multiple jobs which uses distributed cache with small size of 
> files, the directory limit reaches before reaching the cache size and fails 
> to create any directories in file cache. The jobs start failing with the 
> below exception.
> {code:xml}
> java.io.IOException: mkdir of 
> /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>   at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {code}
> We should have a mechanism to clean the cache files if it crosses specified 
> number of directories like cache size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits

2013-04-04 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-99:


Issue Type: Sub-task  (was: Bug)
Parent: YARN-543

> Jobs fail during resource localization when private distributed-cache hits 
> unix directory limits
> 
>
> Key: YARN-99
> URL: https://issues.apache.org/jira/browse/YARN-99
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.0.0-alpha
>Reporter: Devaraj K
>Assignee: Omkar Vinit Joshi
> Attachments: yarn-99-20130324.patch, yarn-99-20130403.1.patch, 
> yarn-99-20130403.patch
>
>
> If we have multiple jobs which uses distributed cache with small size of 
> files, the directory limit reaches before reaching the cache size and fails 
> to create any directories in file cache. The jobs start failing with the 
> below exception.
> {code:xml}
> java.io.IOException: mkdir of 
> /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>   at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {code}
> We should have a mechanism to clean the cache files if it crosses specified 
> number of directories like cache size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits

2013-04-03 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-99:
--

Attachment: yarn-99-20130403.1.patch

> Jobs fail during resource localization when private distributed-cache hits 
> unix directory limits
> 
>
> Key: YARN-99
> URL: https://issues.apache.org/jira/browse/YARN-99
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.0.0-alpha
>Reporter: Devaraj K
>Assignee: Omkar Vinit Joshi
> Attachments: yarn-99-20130324.patch, yarn-99-20130403.1.patch, 
> yarn-99-20130403.patch
>
>
> If we have multiple jobs which uses distributed cache with small size of 
> files, the directory limit reaches before reaching the cache size and fails 
> to create any directories in file cache. The jobs start failing with the 
> below exception.
> {code:xml}
> java.io.IOException: mkdir of 
> /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>   at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {code}
> We should have a mechanism to clean the cache files if it crosses specified 
> number of directories like cache size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits

2013-04-03 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-99:
--

Attachment: yarn-99-20130403.patch

> Jobs fail during resource localization when private distributed-cache hits 
> unix directory limits
> 
>
> Key: YARN-99
> URL: https://issues.apache.org/jira/browse/YARN-99
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.0.0-alpha
>Reporter: Devaraj K
>Assignee: Omkar Vinit Joshi
> Attachments: yarn-99-20130324.patch, yarn-99-20130403.patch
>
>
> If we have multiple jobs which uses distributed cache with small size of 
> files, the directory limit reaches before reaching the cache size and fails 
> to create any directories in file cache. The jobs start failing with the 
> below exception.
> {code:xml}
> java.io.IOException: mkdir of 
> /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>   at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {code}
> We should have a mechanism to clean the cache files if it crosses specified 
> number of directories like cache size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits

2013-03-27 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-99:


Summary: Jobs fail during resource localization when private 
distributed-cache hits unix directory limits  (was: Jobs fail during resource 
localization when directories in file cache reaches to unix directory limit)

> Jobs fail during resource localization when private distributed-cache hits 
> unix directory limits
> 
>
> Key: YARN-99
> URL: https://issues.apache.org/jira/browse/YARN-99
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.0.0-alpha
>Reporter: Devaraj K
>Assignee: Omkar Vinit Joshi
> Attachments: yarn-99-20130324.patch
>
>
> If we have multiple jobs which uses distributed cache with small size of 
> files, the directory limit reaches before reaching the cache size and fails 
> to create any directories in file cache. The jobs start failing with the 
> below exception.
> {code:xml}
> java.io.IOException: mkdir of 
> /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>   at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {code}
> We should have a mechanism to clean the cache files if it crosses specified 
> number of directories like cache size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira