GitHub user facaiy opened a pull request:
https://github.com/apache/spark/pull/17383
[SPARK-3165][MLlib][WIP] DecisionTree does not use sparsity in data
## What changes were proposed in this pull request?
DecisionTree should take advantage of sparse feature vectors. Aggregation
over training data could handle the empty/zero-valued data elements more
efficiently.
## How was this patch tested?
Modifying Inner implementation won't change behavior of DecisionTree module,
hence all unit tests before should pass.
Some performance benchmark perhaps are need.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/facaiy/spark ENH/use_sparsity_in_decision_tree
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/17383.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #17383
----
commit d2eea0645110b3bcc6c0b905bc55e43e0af9debb
Author: é¢åæï¼Yan Facaiï¼ <[email protected]>
Date: 2017-03-22T05:45:58Z
CLN: use Vector to implement binnedFeatures in TreePoint
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]