GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/21714

    [SPARK-24739][PYTHON] Make PySpark compatible with Python 3.7

    ## What changes were proposed in this pull request?
    
    This PR proposes to make PySpark compatible with Python 3.7.  There are 
rather radical change in semantic of `StopIteration` within a generator. It now 
throws it as a `RuntimeError`.
    
    To make it compatible, we should fix it:
    
    ```
    try:
        next(...)
    except StopIteration
        return
    ```
    
    See [release 
note](https://docs.python.org/3/whatsnew/3.7.html#porting-to-python-3-7) and 
[PEP 479](https://www.python.org/dev/peps/pep-0479/).
    
    ## How was this patch tested?
    
    Manually tested:
    
    ```
     $ ./run-tests --python-executables=python3.7
    Running PySpark tests. Output is in 
/Users/hkwon/workspace/repos/forked/spark/python/unit-tests.log
    Will test against the following Python executables: ['python3.7']
    Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 
'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
    Starting test(python3.7): pyspark.mllib.tests
    Starting test(python3.7): pyspark.sql.tests
    Starting test(python3.7): pyspark.streaming.tests
    Starting test(python3.7): pyspark.tests
    Finished test(python3.7): pyspark.streaming.tests (130s)
    Starting test(python3.7): pyspark.accumulators
    Finished test(python3.7): pyspark.accumulators (8s)
    Starting test(python3.7): pyspark.broadcast
    Finished test(python3.7): pyspark.broadcast (9s)
    Starting test(python3.7): pyspark.conf
    Finished test(python3.7): pyspark.conf (6s)
    Starting test(python3.7): pyspark.context
    Finished test(python3.7): pyspark.context (27s)
    Starting test(python3.7): pyspark.ml.classification
    Finished test(python3.7): pyspark.tests (200s) ... 3 tests were skipped
    Starting test(python3.7): pyspark.ml.clustering
    Finished test(python3.7): pyspark.mllib.tests (244s)
    Starting test(python3.7): pyspark.ml.evaluation
    Finished test(python3.7): pyspark.ml.classification (63s)
    Starting test(python3.7): pyspark.ml.feature
    Finished test(python3.7): pyspark.ml.clustering (48s)
    Starting test(python3.7): pyspark.ml.fpm
    Finished test(python3.7): pyspark.ml.fpm (0s)
    Starting test(python3.7): pyspark.ml.image
    Finished test(python3.7): pyspark.ml.evaluation (23s)
    Starting test(python3.7): pyspark.ml.linalg.__init__
    Finished test(python3.7): pyspark.ml.linalg.__init__ (0s)
    Starting test(python3.7): pyspark.ml.recommendation
    Finished test(python3.7): pyspark.ml.image (20s)
    Starting test(python3.7): pyspark.ml.regression
    Finished test(python3.7): pyspark.ml.regression (58s)
    Starting test(python3.7): pyspark.ml.stat
    Finished test(python3.7): pyspark.ml.feature (90s)
    Starting test(python3.7): pyspark.ml.tests
    Finished test(python3.7): pyspark.ml.recommendation (82s)
    Starting test(python3.7): pyspark.ml.tuning
    Finished test(python3.7): pyspark.ml.stat (27s)
    Starting test(python3.7): pyspark.mllib.classification
    Finished test(python3.7): pyspark.sql.tests (362s) ... 102 tests were 
skipped
    Starting test(python3.7): pyspark.mllib.clustering
    Finished test(python3.7): pyspark.ml.tuning (29s)
    Starting test(python3.7): pyspark.mllib.evaluation
    Finished test(python3.7): pyspark.mllib.classification (39s)
    Starting test(python3.7): pyspark.mllib.feature
    Finished test(python3.7): pyspark.mllib.evaluation (30s)
    Starting test(python3.7): pyspark.mllib.fpm
    Finished test(python3.7): pyspark.mllib.feature (44s)
    Starting test(python3.7): pyspark.mllib.linalg.__init__
    Finished test(python3.7): pyspark.mllib.linalg.__init__ (0s)
    Starting test(python3.7): pyspark.mllib.linalg.distributed
    Finished test(python3.7): pyspark.mllib.clustering (78s)
    Starting test(python3.7): pyspark.mllib.random
    Finished test(python3.7): pyspark.mllib.fpm (33s)
    Starting test(python3.7): pyspark.mllib.recommendation
    Finished test(python3.7): pyspark.mllib.random (12s)
    Starting test(python3.7): pyspark.mllib.regression
    Finished test(python3.7): pyspark.mllib.linalg.distributed (45s)
    Starting test(python3.7): pyspark.mllib.stat.KernelDensity
    Finished test(python3.7): pyspark.mllib.stat.KernelDensity (0s)
    Starting test(python3.7): pyspark.mllib.stat._statistics
    Finished test(python3.7): pyspark.mllib.recommendation (41s)
    Starting test(python3.7): pyspark.mllib.tree
    Finished test(python3.7): pyspark.mllib.regression (44s)
    Starting test(python3.7): pyspark.mllib.util
    Finished test(python3.7): pyspark.mllib.stat._statistics (20s)
    Starting test(python3.7): pyspark.profiler
    Finished test(python3.7): pyspark.mllib.tree (26s)
    Starting test(python3.7): pyspark.rdd
    Finished test(python3.7): pyspark.profiler (11s)
    Starting test(python3.7): pyspark.serializers
    Finished test(python3.7): pyspark.mllib.util (24s)
    Starting test(python3.7): pyspark.shuffle
    Finished test(python3.7): pyspark.shuffle (0s)
    Starting test(python3.7): pyspark.sql.catalog
    Finished test(python3.7): pyspark.serializers (15s)
    Starting test(python3.7): pyspark.sql.column
    Finished test(python3.7): pyspark.rdd (27s)
    Starting test(python3.7): pyspark.sql.conf
    Finished test(python3.7): pyspark.sql.catalog (24s)
    Starting test(python3.7): pyspark.sql.context
    Finished test(python3.7): pyspark.sql.conf (8s)
    Starting test(python3.7): pyspark.sql.dataframe
    Finished test(python3.7): pyspark.sql.column (29s)
    Starting test(python3.7): pyspark.sql.functions
    Finished test(python3.7): pyspark.sql.context (26s)
    Starting test(python3.7): pyspark.sql.group
    Finished test(python3.7): pyspark.sql.dataframe (51s)
    Starting test(python3.7): pyspark.sql.readwriter
    Finished test(python3.7): pyspark.ml.tests (266s)
    Starting test(python3.7): pyspark.sql.session
    Finished test(python3.7): pyspark.sql.group (36s)
    Starting test(python3.7): pyspark.sql.streaming
    Finished test(python3.7): pyspark.sql.functions (57s)
    Starting test(python3.7): pyspark.sql.types
    Finished test(python3.7): pyspark.sql.session (25s)
    Starting test(python3.7): pyspark.sql.udf
    Finished test(python3.7): pyspark.sql.types (10s)
    Starting test(python3.7): pyspark.sql.window
    Finished test(python3.7): pyspark.sql.readwriter (31s)
    Starting test(python3.7): pyspark.streaming.util
    Finished test(python3.7): pyspark.sql.streaming (22s)
    Starting test(python3.7): pyspark.util
    Finished test(python3.7): pyspark.util (0s)
    Finished test(python3.7): pyspark.streaming.util (0s)
    Finished test(python3.7): pyspark.sql.udf (16s)
    Finished test(python3.7): pyspark.sql.window (12s)
    Tests passed in 645 seconds
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-24739

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21714.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21714
    
----
commit fd5ce3877a65393e44fd5d003cf76c4392a7a73a
Author: hyukjinkwon <gurwls223@...>
Date:   2018-07-04T16:00:32Z

    Make PySpark compatible with Python 3.7

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to