Hi Ilan/Yinan,
My observation is as follows:
The dependent files specified with “--py-files
http://10.75.145.25:80/Spark/getNN.py” are being downloaded and available in
the container at
“/var/data/spark-c163f15e-d59d-4975-b9be-91b6be062da9/spark-61094ca2-125b-48de-a154-214304dbe74/”.
I guess we need to export PYTHONPATH with this path as well with following code
change in entrypoint.sh
if [ -n "$PYSPARK_FILES" ]; then
PYTHONPATH="$PYTHONPATH:$PYSPARK_FILES"
fi
to
if [ -n "$PYSPARK_FILES" ]; then
PYTHONPATH="$PYTHONPATH:"
fi
Let me know, if this approach is fine.
Please correct me if my understanding is wrong with this approach.
Regards
Surya
From: Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Sent: Wednesday, September 26, 2018 9:14 AM
To: Ilan Filonenko ; liyinan...@gmail.com
Cc: Spark dev list ; user@spark.apache.org
Subject: RE: Python kubernetes spark 2.4 branch
Hi Ilan/ Yinan,
Yes my test case is also similar to the one described in
https://issues.apache.org/jira/browse/SPARK-24736
My spark-submit is as follows:
./spark-submit --deploy-mode cluster --master
k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf
spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf
--py-files http://10.75.145.25:80/Spark/getNN.py
http://10.75.145.25:80/Spark/test.py
Following is the error observed:
+ exec /sbin/tini -s – /opt/spark/bin/spark-submit --conf
spark.driver.bindAddress=192.168.1.22 --deploy-mode client --properties-file
/opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner
http://10.75.145.25:80/Spark/test.py
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/opt/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/opt/spark/jars/phoenix-4.13.1-HBase-1.3-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Traceback (most recent call last):
File "/tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229/test.py", line 13, in
from getNN import *
ImportError: No module named getNN
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Shutdown hook called
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Deleting directory
/tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229
Observing the same kind of behaviour as mentioned in
https://issues.apache.org/jira/browse/SPARK-24736 (file getting downloaded and
available in pod)
This is also the same with the local files as well:
./spark-submit --deploy-mode cluster --master
k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf
spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf
--py-files ./getNN.py http://10.75.145.25:80/Spark/test.py
test.py has dependencies from getNN.py.
But the same is working in spark 2.2 k8s branch.
Regards
Surya
From: Ilan Filonenko mailto:i...@cornell.edu>>
Sent: Wednesday, September 26, 2018 2:06 AM
To: liyinan...@gmail.com<mailto:liyinan...@gmail.com>
Cc: Garlapati, Suryanarayana (Nokia - IN/Bangalore)
mailto:suryanarayana.garlap...@nokia.com>>;
Spark dev list mailto:d...@spark.apache.org>>;
user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: Python kubernetes spark 2.4 branch
Is this in reference to: https://issues.apache.org/jira/browse/SPARK-24736 ?
On Tue, Sep 25, 2018 at 12:38 PM Yinan Li
mailto:liyinan...@gmail.com>> wrote:
Can you give more details on how you ran your app, did you build your own
image, and which image are you using?
On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia -
IN/Bangalore)
mailto:suryanarayana.garlap...@nokia.com>>
wrote:
Hi,
I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1.
When the dependent files are passed through the --py-files option, they are not
getting resolved by the main python script. Please let me know, is this a known
issue?
Regards
Surya