Re: spark-submit can find python?

2018-01-15 Thread Jeff Zhang
tionMaster.scala:768)
>
> at
> org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)
>
> at
> org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
>at javax.security.auth.Subject.doAs(Subject.java:422)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
>
> at
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
>
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:766)
>
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
>
> Caused by: java.io.IOException: Cannot run program
> "/d0/hadoop/yarn/local/usercache/mansop/appcache/application_1512016123441_0045/container_1512016123441_0045_02_01/tmp/1516059780057-0/bin/python":
> error=2, No such file or directory
>
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
>
> at
> org.apache.spark.api.python.VirtualEnvFactory.execCommand(VirtualEnvFactory.scala:103)
>
> at
> org.apache.spark.api.python.VirtualEnvFactory.setupVirtualEnv(VirtualEnvFactory.scala:91)
>
> at
> org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:52)
>
> at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:498)
>
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:646)
>
> Caused by: java.io.IOException: error=2, No such file or directory
>
> at java.lang.UNIXProcess.forkAndExec(Native Method)
>
> at java.lang.UNIXProcess.(UNIXProcess.java:247)
>
> at java.lang.ProcessImpl.start(ProcessImpl.java:134)
>
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
>
> ... 9 more
>
> 18/01/16 10:43:00 INFO ApplicationMaster: Unregistering ApplicationMaster
> with FAILED (diag message: User class threw exception: java.io.IOException:
> Cannot run program
> "/d0/hadoop/yarn/local/usercache/mansop/appcache/application_1512016123441_0045/container_1512016123441_0045_02_01/tmp/1516059780057-0/bin/python":
> error=2, No such file or directory)
>
> 18/01/16 10:43:00 INFO ApplicationMaster: Deleting staging directory
> hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0045
>
> 18/01/16 10:43:00 INFO ShutdownHookManager: Shutdown hook called
>
>
>
> Failing this attempt. Failing the application.
>
>  ApplicationMaster host: N/A
>
>  ApplicationMaster RPC port: -1
>
>  queue: default
>
>  start time: 1516059772092
>
>  final status: FAILED
>
>  tracking URL:
> http://wp-hdp-ctrl03-mlx.mlx:8088/cluster/app/application_1512016123441_0045
>
>  user: mansop
>
> Exception in thread "main" org.apache.spark.SparkException: Application
> application_1512016123441_0045 finished with failed status
>
> at org.apache.spark.deploy.yarn.Client.run(Client.scala:1187)
>
> at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1233)
>
> at org.apache.spark.deploy.yarn.Client.main(Client.scala)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>     at java.lang.reflect.Method.invoke(Method.java:498)
>
> at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:782)
>
> at
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>
> at
> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
>
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> 18/01/16 10:43:01 INFO ShutdownHookManager: Shutdown hook called
>
> 18/01/16 10:43:01 INFO ShutdownHookManager: Deleting directory
> /tmp/spark-592e7e0f-6faa-4c3c-ab0f-7dd1cff21d17
>
>
>
> QUESTION:
>
&g

RE: spark-submit can find python?

2018-01-15 Thread Manuel Sopena Ballesteros
ption: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 9 more
18/01/16 10:43:00 INFO ApplicationMaster: Unregistering ApplicationMaster with 
FAILED (diag message: User class threw exception: java.io.IOException: Cannot 
run program 
"/d0/hadoop/yarn/local/usercache/mansop/appcache/application_1512016123441_0045/container_1512016123441_0045_02_01/tmp/1516059780057-0/bin/python":
 error=2, No such file or directory)
18/01/16 10:43:00 INFO ApplicationMaster: Deleting staging directory 
hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0045
18/01/16 10:43:00 INFO ShutdownHookManager: Shutdown hook called

Failing this attempt. Failing the application.
 ApplicationMaster host: N/A
 ApplicationMaster RPC port: -1
 queue: default
 start time: 1516059772092
 final status: FAILED
 tracking URL: 
http://wp-hdp-ctrl03-mlx.mlx:8088/cluster/app/application_1512016123441_0045
 user: mansop
Exception in thread "main" org.apache.spark.SparkException: Application 
application_1512016123441_0045 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1187)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1233)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:782)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/01/16 10:43:01 INFO ShutdownHookManager: Shutdown hook called
18/01/16 10:43:01 INFO ShutdownHookManager: Deleting directory 
/tmp/spark-592e7e0f-6faa-4c3c-ab0f-7dd1cff21d17

QUESTION:
Why spark/yarn can't find this file 
/d0/hadoop/yarn/local/usercache/mansop/appcache/application_1512016123441_0045/container_1512016123441_0045_02_01/tmp/1516059780057-0/bin/python?
 Who copies it and from where? And what do I need to do in order to make my 
spark-submit job to run?

Thank you

Manuel

From: Manuel Sopena Ballesteros
Sent: Tuesday, January 16, 2018 10:53 AM
To: user@spark.apache.org
Subject: spark-submit can find python?

Hi all,

I am quite new to spark and need some help troubleshooting the execution of an 
application running on a spark cluster...

My spark environment is deployed using Ambari (HDP), YARM is the resource 
scheduler and hadoop as file system.

The application I am trying to run is a python script (test.py).

The worker nodes have python 2.6 so I am asking spark to spin up a virtual 
environment based on python 2.7.

I can successfully run this test app in a single node (see below):

-bash-4.1$ spark-submit \
> --conf spark.pyspark.virtualenv.type=native \
> --conf spark.pyspark.virtualenv.requirements=/home/mansop/requirements.txt \
> --conf 
> spark.pyspark.virtualenv.bin.path=/home/mansop/hail-test/python-2.7.2/bin/activate
>  \
> --conf spark.pyspark.python=/home/mansop/hail-test/python-2.7.2/bin/python \
> --jars $HAIL_HOME/build/libs/hail-all-spark.jar \
> --py-files $HAIL_HOME/build/distributions/hail-python.zip \
> test.py
hail: info: SparkUI: http://192.168.10.201:4040
Welcome to
 __  __ <>__
/ /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.1-0320a61
[Stage 2:==> (91 + 4) / 
100]Summary(samples=3, variants=308, call_rate= 

 1.00, contigs=['1'], multiallelics=0, snps=308, mnps=0, 
insertions=0, deletions=0, complex=0, star=0, max_alleles=2)


However spark crashes while trying to run my test script (error below) throwing 
this error message 
/d0/hadoop/yarn/local/usercache/mansop/appcache/application_1512016123441_0032/container_1512016123441_0032_02_01/tmp/1515989862748-0/bin/python

-bash-4.1$ spark-submit --master yarn \
> --deploy-mode cluster \
> --driver-memory 4g \
> --executor-memory 2g \
> --executor-cores 4 \
> --queue default \
> --conf spark.pyspark.virtualenv.type=native \
> --

spark-submit can find python?

2018-01-15 Thread Manuel Sopena Ballesteros
Hi all,

I am quite new to spark and need some help troubleshooting the execution of an 
application running on a spark cluster...

My spark environment is deployed using Ambari (HDP), YARM is the resource 
scheduler and hadoop as file system.

The application I am trying to run is a python script (test.py).

The worker nodes have python 2.6 so I am asking spark to spin up a virtual 
environment based on python 2.7.

I can successfully run this test app in a single node (see below):

-bash-4.1$ spark-submit \
> --conf spark.pyspark.virtualenv.type=native \
> --conf spark.pyspark.virtualenv.requirements=/home/mansop/requirements.txt \
> --conf 
> spark.pyspark.virtualenv.bin.path=/home/mansop/hail-test/python-2.7.2/bin/activate
>  \
> --conf spark.pyspark.python=/home/mansop/hail-test/python-2.7.2/bin/python \
> --jars $HAIL_HOME/build/libs/hail-all-spark.jar \
> --py-files $HAIL_HOME/build/distributions/hail-python.zip \
> test.py
hail: info: SparkUI: http://192.168.10.201:4040
Welcome to
 __  __ <>__
/ /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.1-0320a61
[Stage 2:==> (91 + 4) / 
100]Summary(samples=3, variants=308, call_rate= 

 1.00, contigs=['1'], multiallelics=0, snps=308, mnps=0, 
insertions=0, deletions=0, complex=0, star=0, max_alleles=2)


However spark crashes while trying to run my test script (error below) throwing 
this error message 
/d0/hadoop/yarn/local/usercache/mansop/appcache/application_1512016123441_0032/container_1512016123441_0032_02_01/tmp/1515989862748-0/bin/python

-bash-4.1$ spark-submit --master yarn \
> --deploy-mode cluster \
> --driver-memory 4g \
> --executor-memory 2g \
> --executor-cores 4 \
> --queue default \
> --conf spark.pyspark.virtualenv.type=native \
> --conf 
> spark.pyspark.virtualenv.requirements=/home/mansop/requirements.txt \
> --conf 
> spark.pyspark.virtualenv.bin.path=/home/mansop/hail-test/python-2.7.2/bin/activate
>  \
> --jars $HAIL_HOME/build/libs/hail-all-spark.jar \
> --py-files $HAIL_HOME/build/distributions/hail-python.zip \
> test.py
18/01/16 09:55:17 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
18/01/16 09:55:18 WARN DomainSocketFactory: The short-circuit local reads 
feature cannot be used because libhadoop cannot be loaded.
18/01/16 09:55:18 INFO RMProxy: Connecting to ResourceManager at 
wp-hdp-ctrl03-mlx.mlx/10.0.1.206:8050
18/01/16 09:55:18 INFO Client: Requesting a new application from cluster with 4 
NodeManagers
18/01/16 09:55:18 INFO Client: Verifying our application has not requested more 
than the maximum memory capability of the cluster (450560 MB per container)
18/01/16 09:55:18 INFO Client: Will allocate AM container, with 4505 MB memory 
including 409 MB overhead
18/01/16 09:55:18 INFO Client: Setting up container launch context for our AM
18/01/16 09:55:18 INFO Client: Setting up the launch environment for our AM 
container
18/01/16 09:55:18 INFO Client: Preparing resources for our AM container
18/01/16 09:55:19 INFO Client: Use hdfs cache file as spark.yarn.archive for 
HDP, 
hdfsCacheFile:hdfs://wp-hdp-ctrl01-mlx.mlx:8020/hdp/apps/2.6.3.0-235/spark2/spark2-hdp-yarn-archive.tar.gz
18/01/16 09:55:19 INFO Client: Source and destination file systems are the 
same. Not copying 
hdfs://wp-hdp-ctrl01-mlx.mlx:8020/hdp/apps/2.6.3.0-235/spark2/spark2-hdp-yarn-archive.tar.gz
18/01/16 09:55:19 INFO Client: Uploading resource 
file:/home/mansop/hail-test2/hail/build/libs/hail-all-spark.jar -> 
hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0043/hail-all-spark.jar
18/01/16 09:55:20 INFO Client: Uploading resource 
file:/home/mansop/requirements.txt -> 
hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0043/requirements.txt
18/01/16 09:55:20 INFO Client: Uploading resource file:/home/mansop/test.py -> 
hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0043/test.py
18/01/16 09:55:20 INFO Client: Uploading resource 
file:/usr/hdp/2.6.3.0-235/spark2/python/lib/pyspark.zip -> 
hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0043/pyspark.zip
18/01/16 09:55:20 INFO Client: Uploading resource 
file:/usr/hdp/2.6.3.0-235/spark2/python/lib/py4j-0.10.4-src.zip -> 
hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0043/py4j-0.10.4-src.zip
18/01/16 09:55:20 INFO Client: Uploading resource 
file:/home/mansop/hail-test2/hail/build/distributions/hail-python.zip -> 
hdfs://wp-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1512016123441_0043/hail-python.zip
18/01/16 09:55:20 INFO Client: Uploading resource