GitHub user zero323 opened a pull request: https://github.com/apache/spark/pull/16793
[SPARK-19454][PYTHON][SQL] DataFrame.replace improvements ## What changes were proposed in this pull request? - Allows skipping `value` argument if `to_replace` is a `dict`: ```python In [1]: df = sc.parallelize([("Alice", 1, 3.0)]).toDF() In [2]: df.replace({"Alice": "Bob"}).show() +---+---+---+ | _1| _2| _3| +---+---+---+ |Bob| 1|3.0| +---+---+---+ ```` - Adds validation step to ensure homogeneous values / replacements. - Simplifies internal control flow. - Improves unit tests coverage. ## How was this patch tested? Existing unit tests, additional unit tests, manual testing. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zero323/spark SPARK-19454 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16793.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16793 ---- commit 9045a3587b8ad27df893182bff67800118c661d3 Author: zero323 <zero...@users.noreply.github.com> Date: 2017-02-02T00:16:05Z Ignore value in DataFrame.replace if to_replace is dict commit e13351f2635dd58c575c665d1626d4d3e495e91b Author: zero323 <zero...@users.noreply.github.com> Date: 2017-02-03T10:26:53Z Test if failure conditions are recognized commit 5550ba7738cd44a4f50c2e22ffd2bcc5f8ececab Author: zero323 <zero...@users.noreply.github.com> Date: 2017-02-03T10:35:15Z Add tests for DataFrame.replace failures commit 557c4fd8713324f3684a3dbdf4ef83c41278850a Author: zero323 <zero...@users.noreply.github.com> Date: 2017-02-03T10:42:24Z Group preconditions in DataFrame.replace commit 609563b9b353d7a8e520dc13f83e3741c3a9a3f5 Author: zero323 <zero...@users.noreply.github.com> Date: 2017-02-03T10:53:21Z Add tests for DataFrame.with tuple and multi-element sequence commit bc1ed34d7695596c67199ae49b8d2e3339059b3d Author: zero323 <zero...@users.noreply.github.com> Date: 2017-02-03T10:57:24Z Remove obsolete casts to tuple commit 68826073151e6bea9c25d67bdd3f49881940fcb9 Author: zero323 <zero...@users.noreply.github.com> Date: 2017-02-03T10:58:26Z Simplify overall workflow commit f6d9a5cd254ef81fc12f8c4d06865586007f3e98 Author: zero323 <zero...@users.noreply.github.com> Date: 2017-02-03T11:03:57Z Reorder pre-conditions and extend error messages commit de6167696645eecee216ac09cd33d89e62a7fff3 Author: zero323 <zero...@users.noreply.github.com> Date: 2017-02-03T11:13:59Z Issue a warning when to_replace is dict but the value has been provided commit 86db30b19d8986947d76cf578bb0f8fba9ff06ed Author: zero323 <zero...@users.noreply.github.com> Date: 2017-02-03T11:32:28Z Raise ValuError if received mixed types commit 904db242be9dde4b05b46bfdc58758b85af28c90 Author: zero323 <zero...@users.noreply.github.com> Date: 2017-02-03T11:35:15Z Explain purpose of each section ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org