[ 
https://issues.apache.org/jira/browse/YARN-8667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855539#comment-16855539
 ] 

zhoukang commented on YARN-8667:
--------------------------------

This will cause error below:

{code:java}
echo "Setting up job resources"
ln -sf -- 
"/home/work/hdd5/yarn/c4prc-preview/nodemanager/usercache/hdfs_prc/filecache/23/__spark_conf__.zip"
 "__spark_conf__"
ln -sf -- 
"/home/work/hdd4/yarn/c4prc-preview/nodemanager/usercache/hdfs_prc/filecache/22/__spark_libs__1672741658354675955.zip"
 "__spark_libs__"
ln -sf -- 
"/home/work/hdd3/yarn/c4prc-preview/nodemanager/filecache/22/oom_script.sh" 
"oom_script.sh"
ln -sf -- 
"/home/work/hdd6/yarn/c4prc-preview/nodemanager/usercache/hdfs_prc/filecache/24/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar"
 "__app__.jar"
ln -sf -- 
"/home/work/hdd9/yarn/c4prc-preview/nodemanager/filecache/21/pmap_watcher.sh" 
"watcher.sh"
echo "Copying debugging information"

Log Type: prelaunch.err

Log Upload Time: Tue Jun 04 17:05:22 +0800 2019

Log Length: 297

find: File system loop detected; 
‘./__spark_libs__/__spark_libs__1672741658354675955.zip’ is part of the same 
file system loop as ‘./__spark_libs__’.
find: File system loop detected; ‘./__spark_conf__/__spark_conf__.zip’ is part 
of the same file system loop as ‘./__spark_conf__’.
{code}


> Cleanup symlinks when container restarted by NM to solve issue "find: File 
> system loop detected;" for tar ball artifacts.
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-8667
>                 URL: https://issues.apache.org/jira/browse/YARN-8667
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Rohith Sharma K S
>            Assignee: Chandni Singh
>            Priority: Critical
>             Fix For: 3.2.0, 3.1.2
>
>         Attachments: YARN-8667.001.patch, YARN-8667.002.patch
>
>
> Service is launched with TAR BALL artifacts. If a container is exited due to 
> any reasons, container relaunch policy try to relaunch the container on same 
> node with same container work space. As a result, container relaunch is keep 
> on failing. 
> If container relaunch max-retry policy is disabled, then  container never 
> launched in any other nodes also rather it keep on retrying on same node 
> manager which never succeeds.
> {code}
> Relaunching Container container_e05_1533635581781_0001_01_000002. Remaining 
> retry attempts(after relaunch) : -4816.
> {code}
> There are two issues
> # Container relaunch is keep on failing
> # Log message is misleading



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to