Hyukjin Kwon created SPARK-24384:
------------------------------------

             Summary: spark-submit --py-files with .py files doesn't work in 
client mode before context initialization
                 Key: SPARK-24384
                 URL: https://issues.apache.org/jira/browse/SPARK-24384
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.3.0, 2.4.0
            Reporter: Hyukjin Kwon


In case the given Python file is .py file (zip file seems fine), seems the 
python path is dynamically added after the context is got initialized.

with this pyFile:

{code}
$ cat /home/spark/tmp.py
def testtest():
    return 1
{code}

This works:

{code}
$ cat app.py
import pyspark
pyspark.sql.SparkSession.builder.getOrCreate()
import tmp
print("************************%s" % tmp.testtest())

$ ./bin/spark-submit --master yarn --deploy-mode client --py-files 
/home/spark/tmp.py app.py
...
************************1
{code}

but this doesn't:

{code}
$ cat app.py
import pyspark
import tmp
pyspark.sql.SparkSession.builder.getOrCreate()
print("************************%s" % tmp.testtest())

$ ./bin/spark-submit --master yarn --deploy-mode client --py-files 
/home/spark/tmp.py app.py
Traceback (most recent call last):
  File "/home/spark/spark/app.py", line 2, in <module>
    import tmp
ImportError: No module named tmp
{code}

See 
https://issues.apache.org/jira/browse/SPARK-21945?focusedCommentId=16488486&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16488486



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to