Hi all,

I am using Hadoop 3.2.0. I am trying few examples using Submarine to run
TensorFlow jobs in a docker container.
I would like to understand few details regarding Read/Write HDFS data
during/after application launch/execution. Have highlighted the questions

When launching the application which reads input from HDFS, we configure
*--input_path* to a hdfs path, as mentioned in the standard example.

yarn jar hadoop-yarn-applications-submarine-<version>.jar job run \
 --name tf-job-001 --docker_image <your docker image> \
 --input_path hdfs://default/dataset/cifar-10-data \
 --checkpoint_path hdfs://default/tmp/cifar-10-jobdir \
 --env DOCKER_JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/ \
 --env DOCKER_HADOOP_HDFS_HOME=/hadoop-3.1.0 \
 --num_workers 2 \
 --worker_resources memory=8G,vcores=2,gpu=1 --worker_launch_cmd "cmd for
worker ..." \
 --num_ps 2 \
 --ps_resources memory=4G,vcores=2,gpu=0 --ps_launch_cmd "cmd for ps" \

*Question 1 : What if I have more than 1 dataset in a separate HDFS paths?
Can --input_path take multiple paths in any fashion or is it expected to
maintain all the datasets under one path.?*

"DOCKER_JAVA_HOME points to JAVA_HOME inside Docker image"

*Question 2 : What is the exact expectation here.? In the sense, is there
any relation/connection with the Hadoop running outside the docker
container.? I guess read HDFS data into the docker container happens during
Container localization, but how does output data write back happens to HDFS
running outside the docker container.?*

Assuming a scenario where Application 1 creates a model and Application 2
performs scoring. Both the applications run in a separate docker
containers. I would like the understand how does the data read and write
across applications happen in this case.
Would be of great help if anyone can be guide me understanding this or
direct me to a blog or write up which explains the above.

*Thanks and regards*
*Vinay Kashyap*

Reply via email to