GitHub user facaiy opened a pull request:

    https://github.com/apache/spark/pull/17383

    [SPARK-3165][MLlib][WIP] DecisionTree does not use sparsity in data

    ## What changes were proposed in this pull request?
    
    DecisionTree should take advantage of sparse feature vectors. Aggregation 
over training data could handle the empty/zero-valued data elements more 
efficiently.
    
    
    ## How was this patch tested?
    
    Modifying Inner implementation won't change behavior of DecisionTree module,
    hence all unit tests before should pass.
    
    Some performance benchmark perhaps are need.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/facaiy/spark ENH/use_sparsity_in_decision_tree

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17383.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17383
    
----
commit d2eea0645110b3bcc6c0b905bc55e43e0af9debb
Author: 颜发才(Yan Facai) <facai....@gmail.com>
Date:   2017-03-22T05:45:58Z

    CLN: use Vector to implement binnedFeatures in TreePoint

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to