GitHub user rxin opened a pull request:

    https://github.com/apache/spark/pull/5605

    [SPARK-6953] [PySpark] speed up python tests

    This PR try to speed up some python tests:
    
    ```
    tests.py                        144s -> 103s      -41s
    mllib/classification.py   24s -> 17s          -7s
    mllib/regression.py       27s -> 15s          -12s
    mllib/tree.py                 27s  -> 13s         -14s
    mllib/tests.py                64s -> 31s         -33s
    streaming/tests.py       185s -> 84s        -101s
    ```
    Considering python3, the total saving will be 558s (almost 10 minutes) 
(core, and streaming run three times, mllib runs twice).
    
    During testing, it will show used time for each test file:
    ```
    Run core tests ...
    Running test: pyspark/rdd.py ... ok (22s)
    Running test: pyspark/context.py ... ok (16s)
    Running test: pyspark/conf.py ... ok (4s)
    Running test: pyspark/broadcast.py ... ok (4s)
    Running test: pyspark/accumulators.py ... ok (4s)
    Running test: pyspark/serializers.py ... ok (6s)
    Running test: pyspark/profiler.py ... ok (5s)
    Running test: pyspark/shuffle.py ... ok (1s)
    Running test: pyspark/tests.py ... ok (103s)   144s
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rxin/spark python-tests-speed

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5605.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5605
    
----
commit 3ad23871567eda755658bf043db0161317ff1a8e
Author: Reynold Xin <[email protected]>
Date:   2015-04-21T07:11:16Z

    Merge pull request #5427 from davies/python_tests
    
    [SPARK-6953] [PySpark] speed up python tests
    
    Signed-off-by: Reynold Xin <[email protected]>
    
    Conflicts:
        python/pyspark/streaming/tests.py
    
    (cherry picked from commit 21b15f5ad8098e2db1a89472228d1978f0b4b18c)
    Signed-off-by: Reynold Xin <[email protected]>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to