GitHub user MechCoder opened a pull request:
https://github.com/apache/spark/pull/7854
[SPARK-9525] [PySpark] [MLlib] Optimize SparseVector initialization
1. Remove sorting of indices and assume that the user gives a sorted tuple
of indices, values etc
2. Avoid iterating twice to get the indices and values if the argument
provided is a dict.
3. Add checks such that the length of the indices should be less than the
size provided.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MechCoder/spark init_sparse
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/7854.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #7854
----
commit f173c16b6dbc29175c93f39f3bff8410ff5ed906
Author: MechCoder <[email protected]>
Date: 2015-08-01T15:16:18Z
[SPARK-9525] Optimize SparseVector initialization
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]