RE: Python kubernetes spark 2.4 branch
Hi Ilan/Yinan, My observation is as follows: The dependent files specified with “--py-files http://10.75.145.25:80/Spark/getNN.py” are being downloaded and available in the container at “/var/data/spark-c163f15e-d59d-4975-b9be-91b6be062da9/spark-61094ca2-125b-48de-a154-214304dbe74/”. I guess we need to export PYTHONPATH with this path as well with following code change in entrypoint.sh if [ -n "$PYSPARK_FILES" ]; then PYTHONPATH="$PYTHONPATH:$PYSPARK_FILES" fi to if [ -n "$PYSPARK_FILES" ]; then PYTHONPATH="$PYTHONPATH:" fi Let me know, if this approach is fine. Please correct me if my understanding is wrong with this approach. Regards Surya From: Garlapati, Suryanarayana (Nokia - IN/Bangalore) Sent: Wednesday, September 26, 2018 9:14 AM To: Ilan Filonenko ; liyinan...@gmail.com Cc: Spark dev list ; u...@spark.apache.org Subject: RE: Python kubernetes spark 2.4 branch Hi Ilan/ Yinan, Yes my test case is also similar to the one described in https://issues.apache.org/jira/browse/SPARK-24736 My spark-submit is as follows: ./spark-submit --deploy-mode cluster --master k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf --py-files http://10.75.145.25:80/Spark/getNN.py http://10.75.145.25:80/Spark/test.py Following is the error observed: + exec /sbin/tini -s – /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=192.168.1.22 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner http://10.75.145.25:80/Spark/test.py SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/spark/jars/phoenix-4.13.1-HBase-1.3-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Traceback (most recent call last): File "/tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229/test.py", line 13, in from getNN import * ImportError: No module named getNN 2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Shutdown hook called 2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229 Observing the same kind of behaviour as mentioned in https://issues.apache.org/jira/browse/SPARK-24736 (file getting downloaded and available in pod) This is also the same with the local files as well: ./spark-submit --deploy-mode cluster --master k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf --py-files ./getNN.py http://10.75.145.25:80/Spark/test.py test.py has dependencies from getNN.py. But the same is working in spark 2.2 k8s branch. Regards Surya From: Ilan Filonenko mailto:i...@cornell.edu>> Sent: Wednesday, September 26, 2018 2:06 AM To: liyinan...@gmail.com<mailto:liyinan...@gmail.com> Cc: Garlapati, Suryanarayana (Nokia - IN/Bangalore) mailto:suryanarayana.garlap...@nokia.com>>; Spark dev list mailto:dev@spark.apache.org>>; u...@spark.apache.org<mailto:u...@spark.apache.org> Subject: Re: Python kubernetes spark 2.4 branch Is this in reference to: https://issues.apache.org/jira/browse/SPARK-24736 ? On Tue, Sep 25, 2018 at 12:38 PM Yinan Li mailto:liyinan...@gmail.com>> wrote: Can you give more details on how you ran your app, did you build your own image, and which image are you using? On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia - IN/Bangalore) mailto:suryanarayana.garlap...@nokia.com>> wrote: Hi, I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. When the dependent files are passed through the --py-files option, they are not getting resolved by the main python script. Please let me know, is this a known issue? Regards Surya
RE: Python kubernetes spark 2.4 branch
Hi Ilan/ Yinan, Yes my test case is also similar to the one described in https://issues.apache.org/jira/browse/SPARK-24736 My spark-submit is as follows: ./spark-submit --deploy-mode cluster --master k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf --py-files http://10.75.145.25:80/Spark/getNN.py http://10.75.145.25:80/Spark/test.py Following is the error observed: + exec /sbin/tini -s – /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=192.168.1.22 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner http://10.75.145.25:80/Spark/test.py SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/spark/jars/phoenix-4.13.1-HBase-1.3-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Traceback (most recent call last): File "/tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229/test.py", line 13, in from getNN import * ImportError: No module named getNN 2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Shutdown hook called 2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229 Observing the same kind of behaviour as mentioned in https://issues.apache.org/jira/browse/SPARK-24736 (file getting downloaded and available in pod) This is also the same with the local files as well: ./spark-submit --deploy-mode cluster --master k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf --py-files ./getNN.py http://10.75.145.25:80/Spark/test.py test.py has dependencies from getNN.py. But the same is working in spark 2.2 k8s branch. Regards Surya From: Ilan Filonenko Sent: Wednesday, September 26, 2018 2:06 AM To: liyinan...@gmail.com Cc: Garlapati, Suryanarayana (Nokia - IN/Bangalore) ; Spark dev list ; u...@spark.apache.org Subject: Re: Python kubernetes spark 2.4 branch Is this in reference to: https://issues.apache.org/jira/browse/SPARK-24736 ? On Tue, Sep 25, 2018 at 12:38 PM Yinan Li mailto:liyinan...@gmail.com>> wrote: Can you give more details on how you ran your app, did you build your own image, and which image are you using? On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia - IN/Bangalore) mailto:suryanarayana.garlap...@nokia.com>> wrote: Hi, I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. When the dependent files are passed through the --py-files option, they are not getting resolved by the main python script. Please let me know, is this a known issue? Regards Surya
Re: Python kubernetes spark 2.4 branch
Is this in reference to: https://issues.apache.org/jira/browse/SPARK-24736 ? On Tue, Sep 25, 2018 at 12:38 PM Yinan Li wrote: > Can you give more details on how you ran your app, did you build your own > image, and which image are you using? > > On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia - > IN/Bangalore) wrote: > >> Hi, >> >> I am trying to run spark python testcases on k8s based on tag >> spark-2.4-rc1. When the dependent files are passed through the --py-files >> option, they are not getting resolved by the main python script. Please let >> me know, is this a known issue? >> >> >> >> Regards >> >> Surya >> >> >> >
Re: Python kubernetes spark 2.4 branch
Can you give more details on how you ran your app, did you build your own image, and which image are you using? On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia - IN/Bangalore) wrote: > Hi, > > I am trying to run spark python testcases on k8s based on tag > spark-2.4-rc1. When the dependent files are passed through the --py-files > option, they are not getting resolved by the main python script. Please let > me know, is this a known issue? > > > > Regards > > Surya > > >
Python kubernetes spark 2.4 branch
Hi, I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. When the dependent files are passed through the --py-files option, they are not getting resolved by the main python script. Please let me know, is this a known issue? Regards Surya