GitHub user freeman-lab opened a pull request:
https://github.com/apache/spark/pull/1725
StatCounter on NumPy arrays [PYSPARK][SPARK-2012]
These changes allow StatCounters to work properly on NumPy arrays, to fix
the issue reported here (https://issues.apache.org/jira/browse/SPARK-2012).
If NumPy is installed, the NumPy functions ``maximum``, ``minimum``, and
``sqrt``, which work on arrays, are used to merge statistics. If not, we fall
back on scalar operators, so it will work on arrays with NumPy, but will also
work without NumPy.
New unit tests added, along with a check for NumPy in the tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/freeman-lab/spark numpy-max-statcounter
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1725.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1725
----
commit 176a127c3c35512a2690ad8ccfb020ea94e42596
Author: Jeremy Freeman <[email protected]>
Date: 2014-08-01T22:47:50Z
Use numpy arrays in StatCounter
- If NumPy is installed, use maximum/minimum/sqry so that StatCounters
work on NumPy arrays
- Otherwise, fall back on scalar operators
commit 1c8a832ac71dafad893b3f92d12d57c284496402
Author: Jeremy Freeman <[email protected]>
Date: 2014-08-01T22:48:04Z
Unit tests for StatCounter with NumPy arrays
commit 875414c6d79ef8e8a8938cf888eba71a9bdad070
Author: Jeremy Freeman <[email protected]>
Date: 2014-08-01T23:04:16Z
Fixed indents
commit 8e764dd0e77e1c32827859fe09019c9c912defb1
Author: Jeremy Freeman <[email protected]>
Date: 2014-08-01T23:07:31Z
Explicit numpy imports
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---