This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new c5dd72cff124 [SPARK-46553][PS] `FutureWarning` for `interpolate` with object dtype c5dd72cff124 is described below commit c5dd72cff1243507f47623c0f697873977b23380 Author: Haejoon Lee <haejoon....@databricks.com> AuthorDate: Tue Jan 2 17:45:52 2024 +0900 [SPARK-46553][PS] `FutureWarning` for `interpolate` with object dtype ### What changes were proposed in this pull request? This PR proposes to issue a `FutureWarning` for `(DataFrame|Series).interpolate` with object dtype. ### Why are the changes needed? To match the behavior with Pandas. Using object dtype for `interpolate` is deprecated and will raise exception in the future version, so we should issue the proper warning such as Pandas does. ### Does this PR introduce _any_ user-facing change? Given DataFrame below, ```python >>> psdf = ps.DataFrame({"A": ['a', 'b', 'c'], "B": [1, 2, 3]}) >>> psdf A B 0 a 1 1 b 2 2 c 3 ``` **Before** ```python >>> psdf.interpolate() # Excluding column with object dtype without any warning unlike pandas B 0 1 1 2 2 3 ``` **After** ```python >>> psdf.interpolate() # Issuing a proper warning FutureWarning: DataFrame.interpolate with object dtype is deprecated and will raise in a future version. Call df.infer_objects(copy=False) before interpolating instead. warnings.warn( B 0 1 1 2 2 3 ``` ### How was this patch tested? No behavior changes, so the existing CI should pass. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44550 from itholic/SPARK-46553. Authored-by: Haejoon Lee <haejoon....@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls...@apache.org> --- python/pyspark/pandas/frame.py | 7 +++++++ python/pyspark/pandas/series.py | 6 ++++++ 2 files changed, 13 insertions(+) diff --git a/python/pyspark/pandas/frame.py b/python/pyspark/pandas/frame.py index 9846dc0ae10b..a7edac5509b1 100644 --- a/python/pyspark/pandas/frame.py +++ b/python/pyspark/pandas/frame.py @@ -6126,6 +6126,13 @@ defaultdict(<class 'list'>, {'col..., 'col...})] raise ValueError("invalid limit_direction: '{}'".format(limit_direction)) if (limit_area is not None) and (limit_area not in ["inside", "outside"]): raise ValueError("invalid limit_area: '{}'".format(limit_area)) + for dtype in self.dtypes.values: + if dtype == "object": + warnings.warn( + "DataFrame.interpolate with object dtype is deprecated and will raise in a " + "future version. Convert to a specific numeric type before interpolating.", + FutureWarning, + ) numeric_col_names = [] for label in self._internal.column_labels: diff --git a/python/pyspark/pandas/series.py b/python/pyspark/pandas/series.py index 6d7a7c1f2e56..a35e19545d5a 100644 --- a/python/pyspark/pandas/series.py +++ b/python/pyspark/pandas/series.py @@ -2231,6 +2231,12 @@ class Series(Frame, IndexOpsMixin, Generic[T]): limit_direction: Optional[str] = None, limit_area: Optional[str] = None, ) -> "Series": + if self.dtype == "object": + warnings.warn( + "Series.interpolate with object dtype is deprecated and will raise in a " + "future version. Convert to a specific numeric type before interpolating.", + FutureWarning, + ) if method not in ["linear"]: raise NotImplementedError("interpolate currently works only for method='linear'") if (limit is not None) and (not limit > 0): --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org