bzhaoopenstack opened a new pull request, #37365:
URL: https://github.com/apache/spark/pull/37365
We need to follow the pandas behavior of prefix/suffix parameter validation
in add_prefix/add_suffix.
Now, we force to validate it as a String type. But pandas looks all values
which can be formated as String(implement __str__ func). So it's different here.
### What changes were proposed in this pull request?
We support all kind inputs which can be formated as string.
### Why are the changes needed?
As pandas behavior is different with PySpark when we input other types into
add_prefix/add_suffix funcs.
PySpark
```
>>> from pyspark import pandas as ps
>>> df = ps.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]}, columns=['A',
'B'])
>>> df.add_suffix(666)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/spark/spark/python/pyspark/pandas/frame.py", line 9060, in
add_suffix
assert isinstance(suffix, str)
AssertionError
>>> df.add_suffix(True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/spark/spark/python/pyspark/pandas/frame.py", line 9060, in
add_suffix
assert isinstance(suffix, str)
AssertionError
```
Pandas: 1.3.X/1.4.X
```
>>> pdf.add_suffix(0.1)
A0.1 B0.1
0 1 3
1 2 4
2 3 5
3 4 6
>>> pdf.add_suffix(True)
ATrue BTrue
0 1 3
1 2 4
2 3 5
3 4 6
```
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Input any can be stringable input into add_prefix/add_suffix funcs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]