Re: spark-submit can find python?
tionMaster.scala:768) > > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67) > > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66) > > at java.security.AccessController.doPrivileged(Native Method) > >at javax.security.auth.Subject.doAs(Subject.java:422) > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) > > at > org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66) > > at > org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:766) > > at > org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) > > Caused by: java.io.IOException: Cannot run program > "/d0/hadoop/yarn/local/usercache/mansop/appcache/application_1512016123441_0045/container_1512016123441_0045_02_01/tmp/1516059780057-0/bin/python": > error=2, No such file or directory > > at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) > > at > org.apache.spark.api.python.VirtualEnvFactory.execCommand(VirtualEnvFactory.scala:103) > > at > org.apache.spark.api.python.VirtualEnvFactory.setupVirtualEnv(VirtualEnvFactory.scala:91) > > at > org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:52) > > at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:498) > > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:646) > > Caused by: java.io.IOException: error=2, No such file or directory > > at java.lang.UNIXProcess.forkAndExec(Native Method) > > at java.lang.UNIXProcess.(UNIXProcess.java:247) > > at java.lang.ProcessImpl.start(ProcessImpl.java:134) > > at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) > > ... 9 more > > 18/01/16 10:43:00 INFO ApplicationMaster: Unregistering ApplicationMaster > with FAILED (diag message: User class threw exception: java.io.IOException: > Cannot run program > "/d0/hadoop/yarn/local/usercache/mansop/appcache/application_1512016123441_0045/container_1512016123441_0045_02_01/tmp/1516059780057-0/bin/python": > error=2, No such file or directory) > > 18/01/16 10:43:00 INFO ApplicationMaster: Deleting staging directory > hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0045 > > 18/01/16 10:43:00 INFO ShutdownHookManager: Shutdown hook called > > > > Failing this attempt. Failing the application. > > ApplicationMaster host: N/A > > ApplicationMaster RPC port: -1 > > queue: default > > start time: 1516059772092 > > final status: FAILED > > tracking URL: > http://wp-hdp-ctrl03-mlx.mlx:8088/cluster/app/application_1512016123441_0045 > > user: mansop > > Exception in thread "main" org.apache.spark.SparkException: Application > application_1512016123441_0045 finished with failed status > > at org.apache.spark.deploy.yarn.Client.run(Client.scala:1187) > > at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1233) > > at org.apache.spark.deploy.yarn.Client.main(Client.scala) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:498) > > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:782) > > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) > > at > org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) > > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) > > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > 18/01/16 10:43:01 INFO ShutdownHookManager: Shutdown hook called > > 18/01/16 10:43:01 INFO ShutdownHookManager: Deleting directory > /tmp/spark-592e7e0f-6faa-4c3c-ab0f-7dd1cff21d17 > > > > QUESTION: > &g
RE: spark-submit can find python?
ption: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 9 more 18/01/16 10:43:00 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: java.io.IOException: Cannot run program "/d0/hadoop/yarn/local/usercache/mansop/appcache/application_1512016123441_0045/container_1512016123441_0045_02_01/tmp/1516059780057-0/bin/python": error=2, No such file or directory) 18/01/16 10:43:00 INFO ApplicationMaster: Deleting staging directory hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0045 18/01/16 10:43:00 INFO ShutdownHookManager: Shutdown hook called Failing this attempt. Failing the application. ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1516059772092 final status: FAILED tracking URL: http://wp-hdp-ctrl03-mlx.mlx:8088/cluster/app/application_1512016123441_0045 user: mansop Exception in thread "main" org.apache.spark.SparkException: Application application_1512016123441_0045 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1187) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1233) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:782) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 18/01/16 10:43:01 INFO ShutdownHookManager: Shutdown hook called 18/01/16 10:43:01 INFO ShutdownHookManager: Deleting directory /tmp/spark-592e7e0f-6faa-4c3c-ab0f-7dd1cff21d17 QUESTION: Why spark/yarn can't find this file /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1512016123441_0045/container_1512016123441_0045_02_01/tmp/1516059780057-0/bin/python? Who copies it and from where? And what do I need to do in order to make my spark-submit job to run? Thank you Manuel From: Manuel Sopena Ballesteros Sent: Tuesday, January 16, 2018 10:53 AM To: user@spark.apache.org Subject: spark-submit can find python? Hi all, I am quite new to spark and need some help troubleshooting the execution of an application running on a spark cluster... My spark environment is deployed using Ambari (HDP), YARM is the resource scheduler and hadoop as file system. The application I am trying to run is a python script (test.py). The worker nodes have python 2.6 so I am asking spark to spin up a virtual environment based on python 2.7. I can successfully run this test app in a single node (see below): -bash-4.1$ spark-submit \ > --conf spark.pyspark.virtualenv.type=native \ > --conf spark.pyspark.virtualenv.requirements=/home/mansop/requirements.txt \ > --conf > spark.pyspark.virtualenv.bin.path=/home/mansop/hail-test/python-2.7.2/bin/activate > \ > --conf spark.pyspark.python=/home/mansop/hail-test/python-2.7.2/bin/python \ > --jars $HAIL_HOME/build/libs/hail-all-spark.jar \ > --py-files $HAIL_HOME/build/distributions/hail-python.zip \ > test.py hail: info: SparkUI: http://192.168.10.201:4040 Welcome to __ __ <>__ / /_/ /__ __/ / / __ / _ `/ / / /_/ /_/\_,_/_/_/ version 0.1-0320a61 [Stage 2:==> (91 + 4) / 100]Summary(samples=3, variants=308, call_rate= 1.00, contigs=['1'], multiallelics=0, snps=308, mnps=0, insertions=0, deletions=0, complex=0, star=0, max_alleles=2) However spark crashes while trying to run my test script (error below) throwing this error message /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1512016123441_0032/container_1512016123441_0032_02_01/tmp/1515989862748-0/bin/python -bash-4.1$ spark-submit --master yarn \ > --deploy-mode cluster \ > --driver-memory 4g \ > --executor-memory 2g \ > --executor-cores 4 \ > --queue default \ > --conf spark.pyspark.virtualenv.type=native \ > --
spark-submit can find python?
Hi all, I am quite new to spark and need some help troubleshooting the execution of an application running on a spark cluster... My spark environment is deployed using Ambari (HDP), YARM is the resource scheduler and hadoop as file system. The application I am trying to run is a python script (test.py). The worker nodes have python 2.6 so I am asking spark to spin up a virtual environment based on python 2.7. I can successfully run this test app in a single node (see below): -bash-4.1$ spark-submit \ > --conf spark.pyspark.virtualenv.type=native \ > --conf spark.pyspark.virtualenv.requirements=/home/mansop/requirements.txt \ > --conf > spark.pyspark.virtualenv.bin.path=/home/mansop/hail-test/python-2.7.2/bin/activate > \ > --conf spark.pyspark.python=/home/mansop/hail-test/python-2.7.2/bin/python \ > --jars $HAIL_HOME/build/libs/hail-all-spark.jar \ > --py-files $HAIL_HOME/build/distributions/hail-python.zip \ > test.py hail: info: SparkUI: http://192.168.10.201:4040 Welcome to __ __ <>__ / /_/ /__ __/ / / __ / _ `/ / / /_/ /_/\_,_/_/_/ version 0.1-0320a61 [Stage 2:==> (91 + 4) / 100]Summary(samples=3, variants=308, call_rate= 1.00, contigs=['1'], multiallelics=0, snps=308, mnps=0, insertions=0, deletions=0, complex=0, star=0, max_alleles=2) However spark crashes while trying to run my test script (error below) throwing this error message /d0/hadoop/yarn/local/usercache/mansop/appcache/application_1512016123441_0032/container_1512016123441_0032_02_01/tmp/1515989862748-0/bin/python -bash-4.1$ spark-submit --master yarn \ > --deploy-mode cluster \ > --driver-memory 4g \ > --executor-memory 2g \ > --executor-cores 4 \ > --queue default \ > --conf spark.pyspark.virtualenv.type=native \ > --conf > spark.pyspark.virtualenv.requirements=/home/mansop/requirements.txt \ > --conf > spark.pyspark.virtualenv.bin.path=/home/mansop/hail-test/python-2.7.2/bin/activate > \ > --jars $HAIL_HOME/build/libs/hail-all-spark.jar \ > --py-files $HAIL_HOME/build/distributions/hail-python.zip \ > test.py 18/01/16 09:55:17 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/01/16 09:55:18 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 18/01/16 09:55:18 INFO RMProxy: Connecting to ResourceManager at wp-hdp-ctrl03-mlx.mlx/10.0.1.206:8050 18/01/16 09:55:18 INFO Client: Requesting a new application from cluster with 4 NodeManagers 18/01/16 09:55:18 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (450560 MB per container) 18/01/16 09:55:18 INFO Client: Will allocate AM container, with 4505 MB memory including 409 MB overhead 18/01/16 09:55:18 INFO Client: Setting up container launch context for our AM 18/01/16 09:55:18 INFO Client: Setting up the launch environment for our AM container 18/01/16 09:55:18 INFO Client: Preparing resources for our AM container 18/01/16 09:55:19 INFO Client: Use hdfs cache file as spark.yarn.archive for HDP, hdfsCacheFile:hdfs://wp-hdp-ctrl01-mlx.mlx:8020/hdp/apps/2.6.3.0-235/spark2/spark2-hdp-yarn-archive.tar.gz 18/01/16 09:55:19 INFO Client: Source and destination file systems are the same. Not copying hdfs://wp-hdp-ctrl01-mlx.mlx:8020/hdp/apps/2.6.3.0-235/spark2/spark2-hdp-yarn-archive.tar.gz 18/01/16 09:55:19 INFO Client: Uploading resource file:/home/mansop/hail-test2/hail/build/libs/hail-all-spark.jar -> hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0043/hail-all-spark.jar 18/01/16 09:55:20 INFO Client: Uploading resource file:/home/mansop/requirements.txt -> hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0043/requirements.txt 18/01/16 09:55:20 INFO Client: Uploading resource file:/home/mansop/test.py -> hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0043/test.py 18/01/16 09:55:20 INFO Client: Uploading resource file:/usr/hdp/2.6.3.0-235/spark2/python/lib/pyspark.zip -> hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0043/pyspark.zip 18/01/16 09:55:20 INFO Client: Uploading resource file:/usr/hdp/2.6.3.0-235/spark2/python/lib/py4j-0.10.4-src.zip -> hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0043/py4j-0.10.4-src.zip 18/01/16 09:55:20 INFO Client: Uploading resource file:/home/mansop/hail-test2/hail/build/distributions/hail-python.zip -> hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0043/hail-python.zip 18/01/16 09:55:20 INFO Client: Uploading resource