GitHub user avulanov opened a pull request:

    https://github.com/apache/spark/pull/1484

    [MLLIB] [WIP] SPARK-1473: Feature selection for high dimensional datasets

    The following is implemented: 
    1) generic traits for feature selection and filtering
    2) trait for feature selection of LabeledPoint with discrete data
    3) traits for calculation of contingency table and chi squared
    4) class for chi-squared feature selection
    5) tests for the above
    
    Needs some optimization in matrix operations.
    
    This request is a try to implement feature selection for MLLIB, the 
previous work by the issue author @izendejas was not finished 
(https://issues.apache.org/jira/browse/SPARK-1473). This request is also 
related to data discretization issues: 
https://issues.apache.org/jira/browse/SPARK-1303 and 
https://issues.apache.org/jira/browse/SPARK-1216 that weren't merged.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/avulanov/spark featureselection

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1484.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1484
    
----
commit 560dc08d7e2cbc191016a3ebbec1eb8146630bc7
Author: Alexander Ulanov <[email protected]>
Date:   2014-07-08T08:25:57Z

    Chi Squared feature selection: initial version

commit 6a35bcf64ff9c71445dc48f8299f8f78a5e324d5
Author: Alexander Ulanov <[email protected]>
Date:   2014-07-08T09:43:27Z

    Code style

commit dfb09fbf2732682d0b86afcbe02eb097e7d9c09e
Author: Alexander Ulanov <[email protected]>
Date:   2014-07-09T10:06:54Z

    Feature selection filter

commit fa5fd1119c6cc0e2c48a74baed89d32c5a1b5a58
Author: Alexander Ulanov <[email protected]>
Date:   2014-07-09T15:55:07Z

    Traits for FeatureSelection, CombinationsCalculator and FeatureFilter

commit 9a8f968ef07ee9a3cfc372d6e4d335d45ef5c065
Author: Alexander Ulanov <[email protected]>
Date:   2014-07-11T09:14:29Z

    Feature selection redesign with vigdorchik

commit 099fb135e159407ae9acf0a1dcbaf23fbc5e781a
Author: Alexander Ulanov <[email protected]>
Date:   2014-07-11T16:04:36Z

    Feature selector, fix of lazyness

commit 774b5ca9d4155315b388aae12e58d32b90c479fe
Author: Alexander Ulanov <[email protected]>
Date:   2014-07-14T16:52:28Z

    Combinations and chi-squared values test

commit 43a1169687db70ea52753e7e86eccb55ed0bf43e
Author: Alexander Ulanov <[email protected]>
Date:   2014-07-17T15:20:31Z

    Chi Squared by contingency table. Refactoring

commit 6890617e47f03278d08ea17929adf74dfa668230
Author: Alexander Ulanov <[email protected]>
Date:   2014-07-18T09:41:05Z

    Scala style fix

commit 2565f6d9a24c892a6ed28ec1174b5a7077fd8c77
Author: Alexander Ulanov <[email protected]>
Date:   2014-07-18T11:10:48Z

    Tests, comments, apache headers and scala style

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to