[jira] [Commented] (SPARK-10795) FileNotFoundException while deploying pyspark job on cluster

2020-02-26 Thread Somnath (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045607#comment-17045607
 ] 

Somnath commented on SPARK-10795:
-

 

Trying to submit the below test.py Spark app on a YARN cluster with the below 
command
{noformat}
PYSPARK_PYTHON=./venv/venv/bin/python spark-submit --conf 
spark.yarn.appMasterEnv.PYSPARK_PYTHON=./venv/venv/bin/python --master yarn 
--deploy-mode cluster --archives venv#venv test.py{noformat}
Note: I am not using local mode, but trying to use the python3.7 site-packages 
under the virtualenv used for building the code in PyCharm 

This is how the Python project structure looks. with the 

 

 
{noformat}
-rw-r--r-- 1 schakrabarti nobody 225908565 Feb 26 13:07 venv.tar.gz
-rw-r--r-- 1 schakrabarti nobody      1313 Feb 26 13:07 test.py
drwxr-xr-x 6 schakrabarti nobody      4096 Feb 26 13:07 venv
drwxr-xr-x 3 schakrabarti nobody  4096 Feb 26 13:07 venv/bin
drwxr-xr-x 3 schakrabarti nobody  4096 Feb 26 13:07 venv/share
-rw-r--r-- 1 schakrabarti nobody    75 Feb 26 13:07 venv/pyvenv.cfg
drwxr-xr-x 2 schakrabarti nobody  4096 Feb 26 13:07 venv/include
drwxr-xr-x 3 schakrabarti nobody  4096 Feb 26 13:07 venv/lib
{noformat}
 

 

Getting the same error of File does not exist - pyspark.zip (as shown below)
{noformat}
java.io.FileNotFoundException: File does not exist: 
hdfs://hostname-nn1.cluster.domain.com:8020/user/schakrabarti/.sparkStaging/application_1571868585150_999337/pyspark.zip{noformat}
 
{noformat}
#test.py
import json
from pyspark.sql import SparkSession

if __name__ == "__main__":
  spark = SparkSession.builder \
   .appName("Test_App") \
   .master("spark://gwrd352n36.red.ygrid.yahoo.com:41767") \
   .config("spark.ui.port", "4057") \
   .config("spark.executor.memory", "4g") \
   .getOrCreate()

  print(json.dumps(spark.sparkContext.getConf().getAll(), indent=4))

  spark.stop(){noformat}
 

 

 

 

 

 

> FileNotFoundException while deploying pyspark job on cluster
> 
>
> Key: SPARK-10795
> URL: https://issues.apache.org/jira/browse/SPARK-10795
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
> Environment: EMR 
>Reporter: Harshit
>Priority: Major
>  Labels: bulk-closed
>
> I am trying to run simple spark job using pyspark, it works as standalone , 
> but while I deploy over cluster it fails.
> Events :
> 2015-09-24 10:38:49,602 INFO  [main] yarn.Client (Logging.scala:logInfo(59)) 
> - Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> 
> hdfs://ip-.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> Above uploading resource file is successfull , I manually checked file is 
> present in above specified path , but after a while I face following error :
> Diagnostics: File does not exist: 
> hdfs://ip-xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ip-1xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10795) FileNotFoundException while deploying pyspark job on cluster

2018-08-24 Thread Furcy Pin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591868#comment-16591868
 ] 

Furcy Pin commented on SPARK-10795:
---

Hi, I came across this ticket with the same issue: my yarn job was failing with 
an error {code:java}java.io.FileNotFoundException: File does not exist{code} 
for some file called *__spark_conf__.zip* or *pyspark.zip* on hdfs, in the 
staging directory.

For me too, the files where uploaded correctly on hdfs, and the error happened 
at shutdown, because something was trying to read them after the staging 
directory had been wiped.

Thanks to Carlos Bribiescas's comment, I found out that I had left a 
{code:java}
SparkSession.builder.master("local[4]"){code}
in my code. After removing it everything worked like a charm.

I suggest creating a new ticket to add a check with a nice error message when 
the users make such kind of mistakes and close this ticket when it's done.


> FileNotFoundException while deploying pyspark job on cluster
> 
>
> Key: SPARK-10795
> URL: https://issues.apache.org/jira/browse/SPARK-10795
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
> Environment: EMR 
>Reporter: Harshit
>Priority: Major
>
> I am trying to run simple spark job using pyspark, it works as standalone , 
> but while I deploy over cluster it fails.
> Events :
> 2015-09-24 10:38:49,602 INFO  [main] yarn.Client (Logging.scala:logInfo(59)) 
> - Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> 
> hdfs://ip-.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> Above uploading resource file is successfull , I manually checked file is 
> present in above specified path , but after a while I face following error :
> Diagnostics: File does not exist: 
> hdfs://ip-xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ip-1xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10795) FileNotFoundException while deploying pyspark job on cluster

2017-06-08 Thread Nico Pappagianis (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043550#comment-16043550
 ] 

Nico Pappagianis commented on SPARK-10795:
--

[~HackerWilson] Were you able to resolve this? I'm hitting the same thing 
running Spark 2.0.1 and Hadoop 2.7.2.

My Python code is just creating a SparkContext and then calling sc.stop().

In the YARN logs I see:

INFO: 2017-06-08 22:16:24,462 INFO  [main] yarn.Client - Uploading resource 
file:/home/.../python/lib/py4j-0.10.1-src.zip -> 
hdfs://.../.sparkStaging/application_1494012577752_1403/py4j-0.10.1-src.zip

when I do an fs -ls on the above HDFS directory it shows the py4j file, but the 
job fails with a FileNotFoundException for the py4j file above:

File does not exist: 
hdfs://.../.sparkStaging/application_1494012577752_1403/py4j-0.10.1-src.zip
(stack trace here: 
https://gist.github.com/anonymous/5506654b88e19e6f51ffbd85cd3f25ee)

One thing to note is that I am launching a Map-only job that launches a the 
Spark application on the cluster. The launcher job is using SparkLauncher 
(Java). Master and deploy mode are set to "yarn" and "cluster", respectively.

When I submit the Python job from via a spark-submit it runs successfully (I 
set the HADOOP_CONF_DIR and HADOOP_JAVA_HOME to the same as what I am setting 
using the launcher job).






> FileNotFoundException while deploying pyspark job on cluster
> 
>
> Key: SPARK-10795
> URL: https://issues.apache.org/jira/browse/SPARK-10795
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
> Environment: EMR 
>Reporter: Harshit
>
> I am trying to run simple spark job using pyspark, it works as standalone , 
> but while I deploy over cluster it fails.
> Events :
> 2015-09-24 10:38:49,602 INFO  [main] yarn.Client (Logging.scala:logInfo(59)) 
> - Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> 
> hdfs://ip-.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> Above uploading resource file is successfull , I manually checked file is 
> present in above specified path , but after a while I face following error :
> Diagnostics: File does not exist: 
> hdfs://ip-xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ip-1xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10795) FileNotFoundException while deploying pyspark job on cluster

2016-08-29 Thread HackerWilson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445095#comment-15445095
 ] 

HackerWilson commented on SPARK-10795:
--

Hi All, I am facing the same problem, too. I have tested 
`spark-2.0.0-bin-hadoop2.6` & `spark-1.6.2-bin-hadoop2.6` with `hadoop2.6`, 
these two exceptions are different.

for `spark-2.0.0-bin-hadoop2.6`:
Diagnostics: File 
file:/tmp/spark-d7b81767-66bb-431f-9817-c623787fe2ac/__spark_libs__1242224443873929949.zip
 does not exist

for `spark-2.0.0-bin-hadoop2.6`:
Diagnostics: File 
file:/home/platform/services/spark/spark-1.6.2-bin-hadoop2.6/python/lib/pyspark.zip
 does not exist

YARN client didn't copy them beacuse `yarn.Client: Source and destination file 
systems are the same. Not copying file...`,
but these two files can be found in `nm-local-dir/usercache/$user/filecache`.

The task I run on YARN cluster is simply `examples/src/main/python/pi.py`, 
sometimes this task can be successfully complete but sometimes are not. 
Actually, `spark-2.0.0-bin-hadoop2.7` with `hadoop2.7` did have the same 
problem.  

> FileNotFoundException while deploying pyspark job on cluster
> 
>
> Key: SPARK-10795
> URL: https://issues.apache.org/jira/browse/SPARK-10795
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
> Environment: EMR 
>Reporter: Harshit
>
> I am trying to run simple spark job using pyspark, it works as standalone , 
> but while I deploy over cluster it fails.
> Events :
> 2015-09-24 10:38:49,602 INFO  [main] yarn.Client (Logging.scala:logInfo(59)) 
> - Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> 
> hdfs://ip-.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> Above uploading resource file is successfull , I manually checked file is 
> present in above specified path , but after a while I face following error :
> Diagnostics: File does not exist: 
> hdfs://ip-xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ip-1xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10795) FileNotFoundException while deploying pyspark job on cluster

2016-03-04 Thread Daniel Jouany (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15180011#comment-15180011
 ] 

Daniel Jouany commented on SPARK-10795:
---

Hi there,
If i follow your suggestions, it works.

Our code was like that :

{{
Import numpy as np
Import SparkContext
foo = np.genfromtext(x)
sc=SparkContext(...)
#compute
}}

*===> It fails*

We have just moved the global variable initialization *after* the context init:

{{
Import numpy as np
Import SparkContext
global foo
sc=SparkContext(...)
foo = np.genfromtext(x)
#compute
}}
*===> It works perfectly*

Note that you could reproduce this behaviour with something else than a numpy 
call - eventhough not every statement does entail the crash.
The question is : why is this *non-spark* variable init interfering with the 
SparkContext 

> FileNotFoundException while deploying pyspark job on cluster
> 
>
> Key: SPARK-10795
> URL: https://issues.apache.org/jira/browse/SPARK-10795
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
> Environment: EMR 
>Reporter: Harshit
>
> I am trying to run simple spark job using pyspark, it works as standalone , 
> but while I deploy over cluster it fails.
> Events :
> 2015-09-24 10:38:49,602 INFO  [main] yarn.Client (Logging.scala:logInfo(59)) 
> - Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> 
> hdfs://ip-.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> Above uploading resource file is successfull , I manually checked file is 
> present in above specified path , but after a while I face following error :
> Diagnostics: File does not exist: 
> hdfs://ip-xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ip-1xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10795) FileNotFoundException while deploying pyspark job on cluster

2016-03-01 Thread Carlos Bribiescas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173723#comment-15173723
 ] 

Carlos Bribiescas commented on SPARK-10795:
---

Have you tried just specifying the SparkContext and nothing else?  For example, 
if you tried to specify a Master via the spark context but also did so on the 
command line I don't know the expected output.  I suggest doing that before 
trying to cut up your code too much.

I do realize that there may be many other causes of this issue, so I dont mean 
to suggest that not initializing your SparkContext is the only way. Just trying 
to rule this one cause out.

> FileNotFoundException while deploying pyspark job on cluster
> 
>
> Key: SPARK-10795
> URL: https://issues.apache.org/jira/browse/SPARK-10795
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
> Environment: EMR 
>Reporter: Harshit
>
> I am trying to run simple spark job using pyspark, it works as standalone , 
> but while I deploy over cluster it fails.
> Events :
> 2015-09-24 10:38:49,602 INFO  [main] yarn.Client (Logging.scala:logInfo(59)) 
> - Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> 
> hdfs://ip-.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> Above uploading resource file is successfull , I manually checked file is 
> present in above specified path , but after a while I face following error :
> Diagnostics: File does not exist: 
> hdfs://ip-xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ip-1xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10795) FileNotFoundException while deploying pyspark job on cluster

2016-03-01 Thread Daniel Jouany (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173650#comment-15173650
 ] 

Daniel Jouany commented on SPARK-10795:
---

I am using spark 1.4.1 on HDP 2.3.2.

My code is a bit complex, i'll try to reduce it to the minimum failing code and 
then post it!

> FileNotFoundException while deploying pyspark job on cluster
> 
>
> Key: SPARK-10795
> URL: https://issues.apache.org/jira/browse/SPARK-10795
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
> Environment: EMR 
>Reporter: Harshit
>
> I am trying to run simple spark job using pyspark, it works as standalone , 
> but while I deploy over cluster it fails.
> Events :
> 2015-09-24 10:38:49,602 INFO  [main] yarn.Client (Logging.scala:logInfo(59)) 
> - Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> 
> hdfs://ip-.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> Above uploading resource file is successfull , I manually checked file is 
> present in above specified path , but after a while I face following error :
> Diagnostics: File does not exist: 
> hdfs://ip-xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ip-1xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10795) FileNotFoundException while deploying pyspark job on cluster

2016-03-01 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173503#comment-15173503
 ] 

Jeff Zhang commented on SPARK-10795:


What's your spark version ? And is it possible for you attach your code if it 
is simple and not sensitive ?

> FileNotFoundException while deploying pyspark job on cluster
> 
>
> Key: SPARK-10795
> URL: https://issues.apache.org/jira/browse/SPARK-10795
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
> Environment: EMR 
>Reporter: Harshit
>
> I am trying to run simple spark job using pyspark, it works as standalone , 
> but while I deploy over cluster it fails.
> Events :
> 2015-09-24 10:38:49,602 INFO  [main] yarn.Client (Logging.scala:logInfo(59)) 
> - Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> 
> hdfs://ip-.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> Above uploading resource file is successfull , I manually checked file is 
> present in above specified path , but after a while I face following error :
> Diagnostics: File does not exist: 
> hdfs://ip-xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ip-1xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10795) FileNotFoundException while deploying pyspark job on cluster

2016-03-01 Thread Daniel Jouany (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173489#comment-15173489
 ] 

Daniel Jouany commented on SPARK-10795:
---

Hi - I am facing the exact same problem. 

However 
* I do initialize my ??SparkContext?? correctly, as the first statement in my 
main method.
* I have spark-submitted the job with your exact command line : {{spark-submit 
--master yarn-cluster --num-executors 1 --driver-memory 1g --executor-memory 1g 
--executor-cores 1}}

Can it be a configuration problem? The user that launches the spark-submit does 
have sufficient rights in the given HDFS directory 
(??/user/$USERNAME/.sparkStaging/??...)

Thanks in advance!

> FileNotFoundException while deploying pyspark job on cluster
> 
>
> Key: SPARK-10795
> URL: https://issues.apache.org/jira/browse/SPARK-10795
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
> Environment: EMR 
>Reporter: Harshit
>
> I am trying to run simple spark job using pyspark, it works as standalone , 
> but while I deploy over cluster it fails.
> Events :
> 2015-09-24 10:38:49,602 INFO  [main] yarn.Client (Logging.scala:logInfo(59)) 
> - Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> 
> hdfs://ip-.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> Above uploading resource file is successfull , I manually checked file is 
> present in above specified path , but after a while I face following error :
> Diagnostics: File does not exist: 
> hdfs://ip-xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ip-1xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10795) FileNotFoundException while deploying pyspark job on cluster

2016-02-22 Thread Carlos Bribiescas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157683#comment-15157683
 ] 

Carlos Bribiescas commented on SPARK-10795:
---

Using this command spark-submit --master yarn-cluster --num-executors 1 
--driver-memory 1g --executor-memory 1g --executor-cores 1 MyPythonFile.py

if MyPythonFile.py looks like this
{code}
from pyspark import SparkContext

jobName="My Name"
sc = SparkContext(appName=jobName)

{code}
Then everything is fine.  

If MyPythonFile.py does not specify a spark context (As one would in the 
interactive shell) then it gives me the error you say in your bug.

However if I use this file instead

{code}
from pyspark import SparkContext

jobName="My Name"
# sc = SparkContext(appName=jobName)

{code}

So I suspect you just didn't define a spark context properly for a cluster.  
Hope this helps


> FileNotFoundException while deploying pyspark job on cluster
> 
>
> Key: SPARK-10795
> URL: https://issues.apache.org/jira/browse/SPARK-10795
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
> Environment: EMR 
>Reporter: Harshit
>
> I am trying to run simple spark job using pyspark, it works as standalone , 
> but while I deploy over cluster it fails.
> Events :
> 2015-09-24 10:38:49,602 INFO  [main] yarn.Client (Logging.scala:logInfo(59)) 
> - Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> 
> hdfs://ip-.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> Above uploading resource file is successfull , I manually checked file is 
> present in above specified path , but after a while I face following error :
> Diagnostics: File does not exist: 
> hdfs://ip-xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ip-1xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10795) FileNotFoundException while deploying pyspark job on cluster

2016-02-22 Thread Carlos Bribiescas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157612#comment-15157612
 ] 

Carlos Bribiescas commented on SPARK-10795:
---

What is the command you use when this happens?  I had this issue previously but 
only when using --py-files in my spark-submit.  Not other than that though.

> FileNotFoundException while deploying pyspark job on cluster
> 
>
> Key: SPARK-10795
> URL: https://issues.apache.org/jira/browse/SPARK-10795
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
> Environment: EMR 
>Reporter: Harshit
>
> I am trying to run simple spark job using pyspark, it works as standalone , 
> but while I deploy over cluster it fails.
> Events :
> 2015-09-24 10:38:49,602 INFO  [main] yarn.Client (Logging.scala:logInfo(59)) 
> - Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> 
> hdfs://ip-.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> Above uploading resource file is successfull , I manually checked file is 
> present in above specified path , but after a while I face following error :
> Diagnostics: File does not exist: 
> hdfs://ip-xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ip-1xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10795) FileNotFoundException while deploying pyspark job on cluster

2016-01-06 Thread NISHAN SATHARASINGHE (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085395#comment-15085395
 ] 

NISHAN SATHARASINGHE commented on SPARK-10795:
--

Having the same issue . could someone help here ?

> FileNotFoundException while deploying pyspark job on cluster
> 
>
> Key: SPARK-10795
> URL: https://issues.apache.org/jira/browse/SPARK-10795
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
> Environment: EMR 
>Reporter: Harshit
>
> I am trying to run simple spark job using pyspark, it works as standalone , 
> but while I deploy over cluster it fails.
> Events :
> 2015-09-24 10:38:49,602 INFO  [main] yarn.Client (Logging.scala:logInfo(59)) 
> - Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> 
> hdfs://ip-.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> Above uploading resource file is successfull , I manually checked file is 
> present in above specified path , but after a while I face following error :
> Diagnostics: File does not exist: 
> hdfs://ip-xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ip-1xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org