bzhaoopenstack opened a new pull request, #37366:
URL: https://github.com/apache/spark/pull/37366
PySpark raises Error when we call shift func with periods=0.
The behavior of Pandas will return a same copy for the said obj.
### What changes were proposed in this pull request?
Will return self.copy when period == 0
### Why are the changes needed?
Behaviors between PySpark and pandas are different
PySpark:
```
>>> df = ps.DataFrame({'Col1': [10, 20, 15, 30, 45], 'Col2': [13, 23, 18,
33, 48],'Col3': [17, 27, 22, 37, 52]},columns=['Col1', 'Col2', 'Col3'])
>>> df.Col1.shift(periods=3)
0 NaN
1 NaN
2 NaN
3 10.0
4 20.0
Name: Col1, dtype: float64
>>> df.Col1.shift(periods=0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/spark/spark/python/pyspark/pandas/base.py", line 1170, in shift
return self._shift(periods, fill_value).spark.analyzed
File "/home/spark/spark/python/pyspark/pandas/spark/accessors.py", line
256, in analyzed
return first_series(DataFrame(self._data._internal.resolved_copy))
File "/home/spark/spark/python/pyspark/pandas/utils.py", line 589, in
wrapped_lazy_property
setattr(self, attr_name, fn(self))
File "/home/spark/spark/python/pyspark/pandas/internal.py", line 1173, in
resolved_copy
sdf = self.spark_frame.select(self.spark_columns + list(HIDDEN_COLUMNS))
File "/home/spark/spark/python/pyspark/sql/dataframe.py", line 2073, in
select
jdf = self._jdf.select(self._jcols(*cols))
File
"/home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/py4j/java_gateway.py",
line 1321, in __call__
return_value = get_return_value(
File "/home/spark/spark/python/pyspark/sql/utils.py", line 196, in deco
raise converted from None
pyspark.sql.utils.AnalysisException: Cannot specify window frame for lag
function
```
pandas:
```
>>> pdf = pd.DataFrame({'Col1': [10, 20, 15, 30, 45], 'Col2': [13, 23, 18,
33, 48],'Col3': [17, 27, 22, 37, 52]},columns=['Col1', 'Col2', 'Col3'])
>>> pdf.Col1.shift(periods=3)
0 NaN
1 NaN
2 NaN
3 10.0
4 20.0
Name: Col1, dtype: float64
>>> pdf.Col1.shift(periods=0)
0 10
1 20
2 15
3 30
4 45
Name: Col1, dtype: int64
```
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
call shift func with period == 0.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]