[
https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zac Zhou updated YARN-8698:
---------------------------
Description:
when a standalone submarine tf job is submitted, the following error is got :
INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11)
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0,
kerbTicketCachePath=(NULL), userNa
me=(NULL)) error:
(unable to get root cause for java.lang.NoClassDefFoundError)
(unable to get stack trace for java.lang.NoClassDefFoundError)
hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0,
kerbTicketCachePath=(NULL), userNa
me=(NULL)) error:
(unable to get root cause for java.lang.NoClassDefFoundError)
(unable to get stack trace for java.lang.NoClassDefFoundError)
This error may be related to hadoop classpath
Hadoop env variables of launch_container.sh are as follows:
export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"}
export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"}
export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"}
export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"}
run-PRIMARY_WORKER.sh is like:
export HADOOP_YARN_HOME=
export HADOOP_HDFS_HOME=/hadoop-3.1.0
export HADOOP_CONF_DIR=$WORK_DIR
was:
when a standalone submarine tf job is submitted, the following error was got :
INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11)
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0,
kerbTicketCachePath=(NULL), userNa
me=(NULL)) error:
(unable to get root cause for java.lang.NoClassDefFoundError)
(unable to get stack trace for java.lang.NoClassDefFoundError)
hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0,
kerbTicketCachePath=(NULL), userNa
me=(NULL)) error:
(unable to get root cause for java.lang.NoClassDefFoundError)
(unable to get stack trace for java.lang.NoClassDefFoundError)
This error may be related to hadoop classpath
Hadoop env variables of launch_container.sh are as follows:
export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"}
export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"}
export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"}
export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"}
run-PRIMARY_WORKER.sh is like:
export HADOOP_YARN_HOME=
export HADOOP_HDFS_HOME=/hadoop-3.1.0
export HADOOP_CONF_DIR=$WORK_DIR
> Failed to add hadoop dependencies in docker container when submitting a
> submarine job
> -------------------------------------------------------------------------------------
>
> Key: YARN-8698
> URL: https://issues.apache.org/jira/browse/YARN-8698
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Zac Zhou
> Priority: Major
>
> when a standalone submarine tf job is submitted, the following error is got :
> INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11)
> INFO:tensorflow:Done calling model_fn.
> INFO:tensorflow:Create CheckpointSaverHook.
> hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0,
> kerbTicketCachePath=(NULL), userNa
> me=(NULL)) error:
> (unable to get root cause for java.lang.NoClassDefFoundError)
> (unable to get stack trace for java.lang.NoClassDefFoundError)
> hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0,
> kerbTicketCachePath=(NULL), userNa
> me=(NULL)) error:
> (unable to get root cause for java.lang.NoClassDefFoundError)
> (unable to get stack trace for java.lang.NoClassDefFoundError)
>
> This error may be related to hadoop classpath
> Hadoop env variables of launch_container.sh are as follows:
> export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"}
> export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"}
> export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"}
> export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"}
> export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"}
>
> run-PRIMARY_WORKER.sh is like:
> export HADOOP_YARN_HOME=
> export HADOOP_HDFS_HOME=/hadoop-3.1.0
> export HADOOP_CONF_DIR=$WORK_DIR
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]