zhengruifeng commented on code in PR #53730:
URL: https://github.com/apache/spark/pull/53730#discussion_r2672029145
##########
python/pyspark/sql/pandas/types.py:
##########
@@ -891,16 +921,17 @@ def _create_converter_to_pandas(
pandas_type = _to_corrected_pandas_type(data_type)
if pandas_type is not None:
- # SPARK-21766: if an integer field is nullable and has null values, it
can be
- # inferred by pandas as a float column. If we convert the column with
NaN back
- # to integer type e.g., np.int16, we will hit an exception. So we use
the
- # pandas-inferred float type, rather than the corrected type from the
schema
- # in this case.
if isinstance(data_type, IntegralType) and nullable:
+ if integer_object_nulls:
+ # pandas_type like np.int64 doesn't support nullable data
+ # use Pandas extension type instead
+ nullable_type = _to_corrected_pandas_ext_type(data_type)
+ else:
+ nullable_type = np.float64
def correct_dtype(pser: pd.Series) -> pd.Series:
if pser.isnull().any():
- return pser.astype(np.float64, copy=False)
+ return pser.astype(nullable_type, copy=False)
Review Comment:
the solution is:
```
In [58]: arr = pa.array([1,2,3,None])
In [59]: ser = arr.to_pandas(integer_object_nulls=True)
In [60]: ser = ser.astype(pd.Int64Dtype(), copy=False)
In [61]: ser
Out[61]:
0 1
1 2
2 3
3 <NA>
dtype: Int64
```
actually a better solution is
```
In [62]: arr = pa.array([1,2,3,None])
In [63]: ser = pd.Series(arr, dtype=pd.Int64Dtype())
```
in which the intermediate object-typed `ser` can be optimized out, how ever
it doesn't fit with existing framework: `pa.Array.to_pandas +
_create_converter_to_pandas`, we will revisit and refactor the `converter` in
the future.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]