GitHub user zero323 opened a pull request:

    https://github.com/apache/spark/pull/16793

    [SPARK-19454][PYTHON][SQL] DataFrame.replace improvements

    ## What changes were proposed in this pull request?
    
    - Allows skipping `value` argument if `to_replace` is a `dict`:
        ```python
        In [1]: df = sc.parallelize([("Alice", 1, 3.0)]).toDF()
        
        In [2]: df.replace({"Alice": "Bob"}).show()
        +---+---+---+
        | _1| _2| _3|
        +---+---+---+
        |Bob|  1|3.0|
        +---+---+---+
        ````
    - Adds validation step to ensure homogeneous values / replacements.
    - Simplifies internal control flow.
    - Improves unit tests coverage.
    
    ## How was this patch tested?
    
    Existing unit tests, additional unit tests, manual testing.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zero323/spark SPARK-19454

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16793.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16793
    
----
commit 9045a3587b8ad27df893182bff67800118c661d3
Author: zero323 <zero...@users.noreply.github.com>
Date:   2017-02-02T00:16:05Z

    Ignore value in DataFrame.replace if to_replace is dict

commit e13351f2635dd58c575c665d1626d4d3e495e91b
Author: zero323 <zero...@users.noreply.github.com>
Date:   2017-02-03T10:26:53Z

    Test if failure conditions are recognized

commit 5550ba7738cd44a4f50c2e22ffd2bcc5f8ececab
Author: zero323 <zero...@users.noreply.github.com>
Date:   2017-02-03T10:35:15Z

    Add tests for DataFrame.replace failures

commit 557c4fd8713324f3684a3dbdf4ef83c41278850a
Author: zero323 <zero...@users.noreply.github.com>
Date:   2017-02-03T10:42:24Z

    Group preconditions in DataFrame.replace

commit 609563b9b353d7a8e520dc13f83e3741c3a9a3f5
Author: zero323 <zero...@users.noreply.github.com>
Date:   2017-02-03T10:53:21Z

    Add tests for DataFrame.with tuple and multi-element sequence

commit bc1ed34d7695596c67199ae49b8d2e3339059b3d
Author: zero323 <zero...@users.noreply.github.com>
Date:   2017-02-03T10:57:24Z

    Remove obsolete casts to tuple

commit 68826073151e6bea9c25d67bdd3f49881940fcb9
Author: zero323 <zero...@users.noreply.github.com>
Date:   2017-02-03T10:58:26Z

    Simplify overall workflow

commit f6d9a5cd254ef81fc12f8c4d06865586007f3e98
Author: zero323 <zero...@users.noreply.github.com>
Date:   2017-02-03T11:03:57Z

    Reorder pre-conditions and extend error messages

commit de6167696645eecee216ac09cd33d89e62a7fff3
Author: zero323 <zero...@users.noreply.github.com>
Date:   2017-02-03T11:13:59Z

    Issue a warning when to_replace is dict but the value has been provided

commit 86db30b19d8986947d76cf578bb0f8fba9ff06ed
Author: zero323 <zero...@users.noreply.github.com>
Date:   2017-02-03T11:32:28Z

    Raise ValuError if received mixed types

commit 904db242be9dde4b05b46bfdc58758b85af28c90
Author: zero323 <zero...@users.noreply.github.com>
Date:   2017-02-03T11:35:15Z

    Explain purpose of each section

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to