GitHub user rxin opened a pull request:
https://github.com/apache/spark/pull/5605
[SPARK-6953] [PySpark] speed up python tests
This PR try to speed up some python tests:
```
tests.py 144s -> 103s -41s
mllib/classification.py 24s -> 17s -7s
mllib/regression.py 27s -> 15s -12s
mllib/tree.py 27s -> 13s -14s
mllib/tests.py 64s -> 31s -33s
streaming/tests.py 185s -> 84s -101s
```
Considering python3, the total saving will be 558s (almost 10 minutes)
(core, and streaming run three times, mllib runs twice).
During testing, it will show used time for each test file:
```
Run core tests ...
Running test: pyspark/rdd.py ... ok (22s)
Running test: pyspark/context.py ... ok (16s)
Running test: pyspark/conf.py ... ok (4s)
Running test: pyspark/broadcast.py ... ok (4s)
Running test: pyspark/accumulators.py ... ok (4s)
Running test: pyspark/serializers.py ... ok (6s)
Running test: pyspark/profiler.py ... ok (5s)
Running test: pyspark/shuffle.py ... ok (1s)
Running test: pyspark/tests.py ... ok (103s) 144s
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/rxin/spark python-tests-speed
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/5605.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5605
----
commit 3ad23871567eda755658bf043db0161317ff1a8e
Author: Reynold Xin <[email protected]>
Date: 2015-04-21T07:11:16Z
Merge pull request #5427 from davies/python_tests
[SPARK-6953] [PySpark] speed up python tests
Signed-off-by: Reynold Xin <[email protected]>
Conflicts:
python/pyspark/streaming/tests.py
(cherry picked from commit 21b15f5ad8098e2db1a89472228d1978f0b4b18c)
Signed-off-by: Reynold Xin <[email protected]>
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]