GitHub user zero323 opened a pull request:
https://github.com/apache/spark/pull/16793
[SPARK-19454][PYTHON][SQL] DataFrame.replace improvements
## What changes were proposed in this pull request?
- Allows skipping `value` argument if `to_replace` is a `dict`:
```python
In [1]: df = sc.parallelize([("Alice", 1, 3.0)]).toDF()
In [2]: df.replace({"Alice": "Bob"}).show()
+---+---+---+
| _1| _2| _3|
+---+---+---+
|Bob| 1|3.0|
+---+---+---+
````
- Adds validation step to ensure homogeneous values / replacements.
- Simplifies internal control flow.
- Improves unit tests coverage.
## How was this patch tested?
Existing unit tests, additional unit tests, manual testing.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zero323/spark SPARK-19454
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/16793.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #16793
----
commit 9045a3587b8ad27df893182bff67800118c661d3
Author: zero323 <[email protected]>
Date: 2017-02-02T00:16:05Z
Ignore value in DataFrame.replace if to_replace is dict
commit e13351f2635dd58c575c665d1626d4d3e495e91b
Author: zero323 <[email protected]>
Date: 2017-02-03T10:26:53Z
Test if failure conditions are recognized
commit 5550ba7738cd44a4f50c2e22ffd2bcc5f8ececab
Author: zero323 <[email protected]>
Date: 2017-02-03T10:35:15Z
Add tests for DataFrame.replace failures
commit 557c4fd8713324f3684a3dbdf4ef83c41278850a
Author: zero323 <[email protected]>
Date: 2017-02-03T10:42:24Z
Group preconditions in DataFrame.replace
commit 609563b9b353d7a8e520dc13f83e3741c3a9a3f5
Author: zero323 <[email protected]>
Date: 2017-02-03T10:53:21Z
Add tests for DataFrame.with tuple and multi-element sequence
commit bc1ed34d7695596c67199ae49b8d2e3339059b3d
Author: zero323 <[email protected]>
Date: 2017-02-03T10:57:24Z
Remove obsolete casts to tuple
commit 68826073151e6bea9c25d67bdd3f49881940fcb9
Author: zero323 <[email protected]>
Date: 2017-02-03T10:58:26Z
Simplify overall workflow
commit f6d9a5cd254ef81fc12f8c4d06865586007f3e98
Author: zero323 <[email protected]>
Date: 2017-02-03T11:03:57Z
Reorder pre-conditions and extend error messages
commit de6167696645eecee216ac09cd33d89e62a7fff3
Author: zero323 <[email protected]>
Date: 2017-02-03T11:13:59Z
Issue a warning when to_replace is dict but the value has been provided
commit 86db30b19d8986947d76cf578bb0f8fba9ff06ed
Author: zero323 <[email protected]>
Date: 2017-02-03T11:32:28Z
Raise ValuError if received mixed types
commit 904db242be9dde4b05b46bfdc58758b85af28c90
Author: zero323 <[email protected]>
Date: 2017-02-03T11:35:15Z
Explain purpose of each section
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]