[
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508408#comment-16508408
]
Eric Yang commented on YARN-8220:
---------------------------------
[~leftnoteasy] Base on RunTensorflowJobUsingNativeServiceSpec.md the code can
be changed to:
{code}
{
"name": "single-node-tensorflow",
"version": "1.0.0",
"components": [
{
"artifact" : {
"id" : <docker-image-name>,
"type" : "DOCKER"
},
"name": "worker",
"dependencies": [],
"resource": {
"cpus": 1,
"memory": "4096",
"additional" : {
"yarn.io/gpu" : {
"value" : 2
}
}
},
"launch_command":
"--data-dir=hdfs://default/tmp/cifar-10-data,--job-dir=hdfs://default/tmp/cifar-10-jobdir,--num-gpus=1,--train-batch-size=16,--train-steps=40000",
"number_of_containers": 1,
"run_privileged_container": false,
"configuration": {
"env": {
"HADOOP_HOME": "/hadoop-3.1.0",
"HADOOP_HDFS_HOME": "",
"HADOOP_YARN_HOME": "",
"HADOOP_CONF_DIR": "/etc/hadoop/conf",
"YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE":"true"
}
}
}
],
"kerberos_principal" : {
"principal_name" : "[email protected]",
"keytab" : "file:///etc/security/keytabs/test-user.headless.keytab"
}
}
{code}
JAVA_HOME, LD_LIBRARY_PATH, and CLASSPATH can be variables that are defined in
/etc/profile.d or Dockerfile to avoid having to specify them externally. The
same for {{cd /test/cifar10_estimator}} can be replaced with WORKDIR directive
in Dockerfile. Dockerfile defines:
{code}
WORKDIR /test/models/tutorials/image/cifar10_estimator
ENTRYPOINT ["/usr/bin/python", "cifar10_main.py"]
{code}
This would help with readability of the configurations.
> Running Tensorflow on YARN with GPU and Docker - Examples
> ---------------------------------------------------------
>
> Key: YARN-8220
> URL: https://issues.apache.org/jira/browse/YARN-8220
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: yarn-native-services
> Reporter: Sunil Govindan
> Assignee: Sunil Govindan
> Priority: Critical
> Attachments: YARN-8220.001.patch, YARN-8220.002.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]