GitHub user facaiy reopened a pull request:
https://github.com/apache/spark/pull/17383
[SPARK-3165][MLlib][WIP] DecisionTree does not use sparsity in data
## What changes were proposed in this pull request?
DecisionTree should take advantage of sparse feature vectors. Aggregation
over training data could handle the empty/zero-valued data elements more
efficiently.
## How was this patch tested?
Modifying Inner implementation won't change behavior of DecisionTree module,
hence all unit tests before should pass.
Some performance benchmark perhaps are need.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/facaiy/spark ENH/use_sparsity_in_decision_tree
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/17383.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #17383
----
commit d2eea0645110b3bcc6c0b905bc55e43e0af9debb
Author: é¢åæï¼Yan Facaiï¼ <[email protected]>
Date: 2017-03-22T05:45:58Z
CLN: use Vector to implement binnedFeatures in TreePoint
commit 9ce6b813beffb9d58e7b2907425a1262610256be
Author: é¢åæï¼Yan Facaiï¼ <[email protected]>
Date: 2017-03-22T09:15:30Z
BUG: fix for incompatible argument of predictImpl method
commit 37f05f9b0386acc8bea048e72aff2b9c37ca4ca6
Author: é¢åæï¼Yan Facaiï¼ <[email protected]>
Date: 2017-03-22T09:18:04Z
CLN: create sparse vector when converting to TreePoint
commit c9664ce6c94b98cbc76253817e637d9a968e4bd6
Author: é¢åæï¼Yan Facaiï¼ <[email protected]>
Date: 2017-03-22T09:21:59Z
CLN: change Array to Vector in TreePoint when created
commit d6ef9e512ea4a58db2dccf3e7cca95f9e8b0df8f
Author: é¢åæï¼Yan Facaiï¼ <[email protected]>
Date: 2017-03-23T02:12:22Z
PREP: use Vector[Int] to store binnedFeature
commit 59eb779a9d4f711e7b28d31d579cc49e3d3cc370
Author: é¢åæï¼Yan Facaiï¼ <[email protected]>
Date: 2017-03-23T03:50:14Z
CLN: change binnedFeatures from def to val
commit 9cbe577b408e987f3026d01316f5a7f2d4c5cfb2
Author: é¢åæï¼Yan Facaiï¼ <[email protected]>
Date: 2017-03-28T00:57:42Z
CLN: use filter to select non-zero bits
commit b5b0dc8683b6e2d7d274aa8d39932dec61e6193d
Author: é¢åæï¼Yan Facaiï¼ <[email protected]>
Date: 2017-03-28T01:03:55Z
BUG: fix, compile fails
commit cf7e3d8e03f73df725336d0d5a9dd6cc16e7bf95
Author: Yan Facai (é¢åæ) <[email protected]>
Date: 2017-07-05T05:42:09Z
Merge branch 'master' into ENH/use_sparsity_in_decision_tree
commit 032d50d8c8a851671ba2754cec817d0f6e9ae70f
Author: Yan Facai (é¢åæ) <[email protected]>
Date: 2017-07-05T06:20:38Z
CLN: use BSV in predictImpl
commit 257ddf773eb47499962d6cc57fd1323324dd4ab8
Author: Yan Facai (é¢åæ) <[email protected]>
Date: 2017-07-05T06:42:24Z
ENH: create subclass TreeSparsePoint
commit 8a919735f9474283d263df78feb2e176f66917f3
Author: Yan Facai (é¢åæ) <[email protected]>
Date: 2017-07-05T06:58:54Z
ENH: use TreeDensePoint when numFeatures < 10000
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]