[ https://issues.apache.org/jira/browse/SUBMARINE-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823335#comment-16823335 ]
Szilard Nemeth commented on SUBMARINE-54: ----------------------------------------- Hi [~tangzhankun]! Thanks for your review comments! Here are my answers: 1. Good catch, the comments for these classes were wrong, indeed. 2. Yes, it is replacable as the strings are the same. Fixed that! 3. componentToLocalLaunchScriptPath as a field is coming from YarnServiceJobSubmitter. In the original case, the field was only updated / cleared because of the test uses this mapping. I wanted to emphasize this more with the current code so that later on we can refactor this. If you're fine by it, I would rather file a follow-up jira. 4. I'm trying to make the single worker TF job work. So far I only have a successfully submitted service conf but I have some exceptions from the Service AM because the script that sets up Hadoop is not prepared for deploying ZK and its configuration on all cluster machines. FYI, the command I'm using to start up the single node training job is: {code:java} /opt/hadoop/bin/yarn jar /home/systest/hadoop-yarn-submarine-3.3.0-SNAPSHOT.jar job run \ --name tf-job-001 --verbose --docker_image hadoopsubmarine/tf-1.8.0-gpu:0.0.1 \ --input_path hdfs://default/dataset/cifar-10-data \ --env DOCKER_JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre \ --env DOCKER_HADOOP_HDFS_HOME=/hadoop-3.1.0 \ --num_workers 1 --worker_resources memory=5G,vcores=2 \ --worker_launch_cmd "cd /test/models/tutorials/image/cifar10_estimator && python cifar10_main.py --data-dir=%input_path% --job-dir=%checkpoint_path% --train-steps=10000 --eval-batch-size=16 --train-batch-size=16 --num-gpus=2 --sync" \ --tensorboard --tensorboard_docker_image wtan/tf-1.8.0-cpu:0.0.3 {code} This is almost identical to the one in the Submarine Examples documentation. > Add test coverage for YarnServiceJobSubmitter and make it ready for extension > for PyTorch > ----------------------------------------------------------------------------------------- > > Key: SUBMARINE-54 > URL: https://issues.apache.org/jira/browse/SUBMARINE-54 > Project: Hadoop Submarine > Issue Type: Sub-task > Reporter: Szilard Nemeth > Assignee: Szilard Nemeth > Priority: Major > Attachments: SUBMARINE-54.001.patch, SUBMARINE-54.002.patch, > SUBMARINE-54.003.patch, SUBMARINE-54.004.patch, SUBMARINE-54.005.patch, > SUBMARINE-54.006.patch, SUBMARINE-54.007.patch > > > This crucial class has no associated test yet. We need to improve this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)