[
https://issues.apache.org/jira/browse/YARN-8667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855539#comment-16855539
]
zhoukang commented on YARN-8667:
--------------------------------
This will cause error below:
{code:java}
echo "Setting up job resources"
ln -sf --
"/home/work/hdd5/yarn/c4prc-preview/nodemanager/usercache/hdfs_prc/filecache/23/__spark_conf__.zip"
"__spark_conf__"
ln -sf --
"/home/work/hdd4/yarn/c4prc-preview/nodemanager/usercache/hdfs_prc/filecache/22/__spark_libs__1672741658354675955.zip"
"__spark_libs__"
ln -sf --
"/home/work/hdd3/yarn/c4prc-preview/nodemanager/filecache/22/oom_script.sh"
"oom_script.sh"
ln -sf --
"/home/work/hdd6/yarn/c4prc-preview/nodemanager/usercache/hdfs_prc/filecache/24/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar"
"__app__.jar"
ln -sf --
"/home/work/hdd9/yarn/c4prc-preview/nodemanager/filecache/21/pmap_watcher.sh"
"watcher.sh"
echo "Copying debugging information"
Log Type: prelaunch.err
Log Upload Time: Tue Jun 04 17:05:22 +0800 2019
Log Length: 297
find: File system loop detected;
‘./__spark_libs__/__spark_libs__1672741658354675955.zip’ is part of the same
file system loop as ‘./__spark_libs__’.
find: File system loop detected; ‘./__spark_conf__/__spark_conf__.zip’ is part
of the same file system loop as ‘./__spark_conf__’.
{code}
> Cleanup symlinks when container restarted by NM to solve issue "find: File
> system loop detected;" for tar ball artifacts.
> -------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-8667
> URL: https://issues.apache.org/jira/browse/YARN-8667
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Rohith Sharma K S
> Assignee: Chandni Singh
> Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8667.001.patch, YARN-8667.002.patch
>
>
> Service is launched with TAR BALL artifacts. If a container is exited due to
> any reasons, container relaunch policy try to relaunch the container on same
> node with same container work space. As a result, container relaunch is keep
> on failing.
> If container relaunch max-retry policy is disabled, then container never
> launched in any other nodes also rather it keep on retrying on same node
> manager which never succeeds.
> {code}
> Relaunching Container container_e05_1533635581781_0001_01_000002. Remaining
> retry attempts(after relaunch) : -4816.
> {code}
> There are two issues
> # Container relaunch is keep on failing
> # Log message is misleading
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]