GitHub user ifilonenko opened a pull request:
https://github.com/apache/spark/pull/21092
[SPARK-23984][K8S][WIP] Initial Python Bindings for PySpark on K8s
## What changes were proposed in this pull request?
Introducing Python Bindings for PySpark.
- [ ] Running PySpark Jobs
- [ ] Increased Default Memory Overhead value
- [ ] Dependency Management for virtualenv/conda
## How was this patch tested?
This patch was tested with
- [ ] Unit Tests
- [ ] Integration tests with [this
addition](https://github.com/apache-spark-on-k8s/spark-integration/pull/46)
```
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- Run SparkPi with a test secret mounted into the driver and executor pods
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run PySpark on simple pi.py example
Run completed in 4 minutes, 3 seconds.
Total number of tests run: 9
Suites: completed 2, aborted 0
Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```
## Problematic Comments from [ifilonenko]
- [ ] Currently Docker image is built with Python2 --> needs to be generic
for Python2/3
- [ ] `--py-files` is properly distributing but it seems that example
commands like
```
exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf
spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file
/opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner
/opt/spark/examples/src/main/python/pi.py
/opt/spark/examples/src/main/python/sort.py
```
is causing errors of `/opt/spark/examples/src/main/python/pi.py` thinking
that `/opt/spark/examples/src/main/python/sort.py is an argument
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ifilonenko/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21092.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21092
----
commit fb5b9ed83d4e5ed73bc44b9d719ac0e52702655e
Author: Ilan Filonenko <if56@...>
Date: 2018-04-16T03:23:43Z
initial architecture for PySpark w/o dockerfile work
commit b7b3db0abfbf425120fa21cc61e603c5d766f8af
Author: Ilan Filonenko <if56@...>
Date: 2018-04-17T19:13:45Z
included entrypoint logic
commit 98cef8ceb0f04cfcefbc482c2a0fe39c75f620c4
Author: Ilan Filonenko <if56@...>
Date: 2018-04-18T02:22:55Z
satisfying integration tests
commit dc670dcd07944ae30b9b425c26250a21986b2699
Author: Ilan Filonenko <if56@...>
Date: 2018-04-18T05:20:12Z
end-to-end working pyspark
commit eabe4b9b784f37cca3dd9bcff17110944b50f5c8
Author: Ilan Filonenko <ifilondz@...>
Date: 2018-04-18T05:20:42Z
Merge pull request #1 from ifilonenko/py-spark
Initial architecture for PySpark w/o dependency management
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]