[
https://issues.apache.org/jira/browse/YARN-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899837#comment-16899837
]
wangchengwei commented on YARN-9616:
------------------------------------
Hi, [~wzzdreamer] , I have figured out a solution to this issue.
What caused this issue is that the packed arhcive files would be unpacked by NM
automatically after localization, then the _SharedCacheUploader_ coludn't find
the origin archive files and threw _FileNotFoundException_. This issue wolud
lead to the packed archive files wolud never upload to share cache, and would
be uploaded and localized again and again.
All origin resource files wolud be upload to a hdfs path (staging or specified
by user) before job submitted, so all resource files cloud be found at hdfs. As
the origin resource files of packed archives cloud not found in NM, we cloud
get these files from their hdfs path rather than NM local path. So the solution
to this issue is:
# *check whether the resource is packed archive before upload*
# *if not, uploaded it from NM local path*
# *if yes, copied origin file in hdfs to the shared cache path*
The solution colud solve this issue in my tests. I submit the patch here,
please review it if possible.
> Shared Cache Manager Failed To Upload Unpacked Resources
> --------------------------------------------------------
>
> Key: YARN-9616
> URL: https://issues.apache.org/jira/browse/YARN-9616
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.8.3, 2.9.2, 2.8.5
> Reporter: zhenzhao wang
> Assignee: zhenzhao wang
> Priority: Major
>
> Yarn will unpack archives files and some other files based on the file type
> and configuration. E.g.
> If I started an MR job with -archive one.zip, then the one.zip will be
> unpacked while download. Let's say there're file1 && file2 inside one.zip.
> Then the files kept on local disk will be like
> /disk3/yarn/local/filecache/352/one.zip/file1
> and/disk3/yarn/local/filecache/352/one.zip/file2. So the shared cache
> uploader couldn't upload one.zip to shared cache as it was removed during
> localization. The following errors will be thrown.
> {code:java}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader:
> Exception while uploading the file dict.zip
> java.io.FileNotFoundException: File
> /disk3/yarn/local/filecache/352/one.zip/one.zip does not exist
> at
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:631)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:857)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:621)
> at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
> at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:146)
> at
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:926)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.computeChecksum(SharedCacheUploader.java:257)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:128)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:55)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]