Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21267#discussion_r188144573
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,22 @@ def _do_init(self, master, appName, sparkHome,
pyFiles, environment, batchSize,
for path in self._conf.get("spark.submit.pyFiles", "").split(","):
if path != "":
(dirname, filename) = os.path.split(path)
- if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
- self._python_includes.append(filename)
- sys.path.insert(1,
os.path.join(SparkFiles.getRootDirectory(), filename))
+ try:
+ filepath = os.path.join(SparkFiles.getRootDirectory(),
filename)
+ if not os.path.exists(filepath):
+ # In case of YARN with shell mode,
'spark.submit.pyFiles' files are
+ # not added via SparkContext.addFile. Here we
check if the file exists,
+ # try to copy and then add it to the path. See
SPARK-21945.
+ shutil.copyfile(path, filepath)
+ if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
+ self._python_includes.append(filename)
+ sys.path.insert(1, filepath)
+ except Exception:
+ from pyspark import util
+ warnings.warn(
--- End diff --
Likewise, I checked the warning manually:
```
.../pyspark/context.py:229: RuntimeWarning: Failed to add file
[/home/spark/tmp.py] speficied in 'spark.submit.pyFiles' to Python path:
...
/usr/lib64/python27.zip
/usr/lib64/python2.7
...
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]