GitHub user hhbyyh opened a pull request:

    https://github.com/apache/spark/pull/11601

    [SPARK-13568] [ML] Create feature transformer to impute missing values

    ## What changes were proposed in this pull request?
    
    It is quite common to encounter missing values in data sets. It would be 
useful to implement a Transformer that can impute missing data points, similar 
to e.g. Imputer in scikit-learn.
    Initially, options for imputation could include mean, median and most 
frequent, but we could add various other approaches. Where possible existing 
DataFrame code can be used (e.g. for approximate quantiles etc).
    
    Currently this PR supports imputation for Double and Vector (null and NaN 
in Vector).
    
    
    ## How was this patch tested?
    
    new unit tests and manual test
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hhbyyh/spark imputer

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11601.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11601
    
----
commit 2999b268192e244bd7a520d62a0914e4742ee45d
Author: Yuhao Yang <[email protected]>
Date:   2016-02-29T17:46:04Z

    initial commit for Imputer

commit 8335cf21ebde164a22f3447000a1c468a69f39fc
Author: Yuhao Yang <[email protected]>
Date:   2016-02-29T18:27:40Z

    adjust mean and most

commit 7be5e9bcb2c9cd7671d128b01f5090ee737d207a
Author: Yuhao Yang <[email protected]>
Date:   2016-03-02T17:44:50Z

    Merge remote-tracking branch 'upstream/master' into imputer

commit 131f7d5b061a75242e7c305ba14c8c759d09c532
Author: Yuhao Yang <[email protected]>
Date:   2016-03-03T03:07:21Z

    Merge remote-tracking branch 'upstream/master' into imputer

commit a72a3ea81f6f76439068650cf47e4f784e0c4b7c
Author: Yuhao Yang <[email protected]>
Date:   2016-03-05T19:00:37Z

    Merge remote-tracking branch 'upstream/master' into imputer

commit 78df589e488bbec963b3969012cf9266fe4895cb
Author: Yuhao Yang <[email protected]>
Date:   2016-03-07T20:26:00Z

    Merge remote-tracking branch 'upstream/master' into imputer

commit b949be5746608ca3861df672ccd76d9af4257ae2
Author: Yuhao Yang <[email protected]>
Date:   2016-03-09T02:19:32Z

    refine code and add ut

commit 79b1c62b644aa05f07a33f13cc78f47a99d7e861
Author: Yuhao Yang <[email protected]>
Date:   2016-03-09T02:19:39Z

    Merge remote-tracking branch 'upstream/master' into imputer

commit c3d5d554f5ee90a18d96ff043f03f51f49d2ca7f
Author: Yuhao Yang <[email protected]>
Date:   2016-03-09T03:52:04Z

    minor change

commit 1b3966800982fa980307d1b6ded6e28e5f5985e8
Author: Yuhao Yang <[email protected]>
Date:   2016-03-09T07:57:38Z

    add object Imputer and ut refine

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to