GitHub user rxin opened a pull request:

    https://github.com/apache/spark/pull/5274

    [SPARK-6119][SQL] DataFrame support for missing data handling

    This pull request adds variants of DataFrame.na.drop and DataFrame.na.fill 
to the Scala/Java API, and DataFrame.fillna and DataFrame.dropna to the Python 
API.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rxin/spark df-missing-value

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5274.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5274
    
----
commit 67d70035178de5cc630d585bdd2baa8e87a685a8
Author: Reynold Xin <[email protected]>
Date:   2015-03-29T06:01:17Z

    [SPARK-6119][SQL] DataFrame.na.drop (Scala/Java) and DataFrame.dropna 
(Python)

commit 6a73c68c80857e08718be2e3cf85dcac99c374e9
Author: Reynold Xin <[email protected]>
Date:   2015-03-29T06:04:46Z

    Missing file.

commit 249b94ec74f841396146f547105c360a98859880
Author: Reynold Xin <[email protected]>
Date:   2015-03-29T06:05:54Z

    Removing undefined functions.

commit 749eb47ded777a1d6d72792f986534cd6dfd3866
Author: Reynold Xin <[email protected]>
Date:   2015-03-29T22:28:01Z

    fillna

commit 185c67e6a5075fad549d53a6b683f312fd928c8f
Author: Reynold Xin <[email protected]>
Date:   2015-03-29T23:18:04Z

    Allow specifying column subsets in fill.

commit 914a3743801c7e1637fb43ef841d2d76fc3e4ce7
Author: Reynold Xin <[email protected]>
Date:   2015-03-30T07:22:09Z

    fill with map.

commit 2385d003597448bca38929e7f84850ddcd7052ec
Author: Reynold Xin <[email protected]>
Date:   2015-03-30T20:00:29Z

    Feedback from Xiangrui on "how".

commit d56f5a5c4036de7ba6344c5ef86fa070f039b909
Author: Reynold Xin <[email protected]>
Date:   2015-03-30T21:53:08Z

    Added replace for Scala/Java.

commit bc4fdbb7a634590dd15ca121284e8e2c1792862e
Author: Reynold Xin <[email protected]>
Date:   2015-03-30T22:06:53Z

    Added documentation for replace.

commit 33a330c1eb40c4e3f8e1346b20f8da0045bb8328
Author: Reynold Xin <[email protected]>
Date:   2015-03-30T22:22:38Z

    Remove replace for now.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to