Github user holdenk commented on a diff in the pull request:
https://github.com/apache/spark/pull/16792#discussion_r99478139
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1272,16 +1272,18 @@ def replace(self, to_replace, value, subset=None):
"""Returns a new :class:`DataFrame` replacing a value with another
value.
:func:`DataFrame.replace` and :func:`DataFrameNaFunctions.replace`
are
aliases of each other.
+ Values `to_replace` and `value` should be homogeneous. Mixed
string and numeric
--- End diff --
We can leave the warning about truncation off if you think its unnecessary
- but I do like the description you came up with for it.
I think the current proposed text on the expected types is a little vague
(as you said) and I think we can have something that is precise, accurate, and
easy to read so I'd like us to give it a shot :)
How about something like: "The element(s) `to_replace` and `value` should
be the same type(s) (either all numerics, all booleans, or all strings)." This
way we've clarified that mixing the different numerics is ok since it all gets
converted to doubles at the end of the day? I'm open to other suggestions
though too.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]