DAAworld commented on PR #37232:
URL: https://github.com/apache/spark/pull/37232#issuecomment-2378324786
> With the release of pandas 2.0, I think this is PR should be re-opened,
right?
>
> I can recreate the issue originally described with
>
> ```python
> Python 3.9.16 (main, May 3 2023, 09:54:39)
> [GCC 10.2.1 20210110] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyspark
> >>> pyspark.__version__
> '3.4.0'
> >>> import pandas
> >>> pandas.__version__
> '2.0.1'
> >>> import pyspark.pandas as ps
> >>> ps.DatetimeIndex(["1970-01-01", "1970-01-02", "1970-01-03"])
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).
> 23/05/18 21:07:30 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
> 23/05/18 21:07:31 WARN Utils: Service 'SparkUI' could not bind on port
4040. Attempting port 4041.
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File
"/home/ubuntu/.local/lib/python3.9/site-packages/pyspark/pandas/indexes/base.py",
line 2705, in __repr__
> pindex =
self._psdf._get_or_create_repr_pandas_cache(max_display_count).index
> File
"/home/ubuntu/.local/lib/python3.9/site-packages/pyspark/pandas/frame.py", line
13347, in _get_or_create_repr_pandas_cache
> self, "_repr_pandas_cache", {n: self.head(n + 1)._to_internal_pandas()}
> File
"/home/ubuntu/.local/lib/python3.9/site-packages/pyspark/pandas/frame.py", line
13342, in _to_internal_pandas
> return self._internal.to_pandas_frame
> File
"/home/ubuntu/.local/lib/python3.9/site-packages/pyspark/pandas/utils.py", line
588, in wrapped_lazy_property
> setattr(self, attr_name, fn(self))
> File
"/home/ubuntu/.local/lib/python3.9/site-packages/pyspark/pandas/internal.py",
line 1056, in to_pandas_frame
> pdf = sdf.toPandas()
> File
"/home/ubuntu/.local/lib/python3.9/site-packages/pyspark/sql/pandas/conversion.py",
line 251, in toPandas
> if (t is not None and not
all([is_timedelta64_dtype(t),is_datetime64_dtype(t)])) or
should_check_timedelta:
> File
"/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/generic.py", line
6324, in astype
> new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
> File
"/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/internals/managers.py",
line 451, in astype
> return self.apply(
> File
"/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/internals/managers.py",
line 352, in apply
> applied = getattr(b, f)(**kwargs)
> File
"/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/internals/blocks.py",
line 511, in astype
> new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
> File
"/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/dtypes/astype.py",
line 242, in astype_array_safe
> new_values = astype_array(values, dtype, copy=copy)
> File
"/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/dtypes/astype.py",
line 184, in astype_array
> values = values.astype(dtype, copy=copy)
> File
"/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py",
line 694, in astype
> raise TypeError(
> TypeError: Casting to unit-less dtype 'datetime64' is not supported. Pass
e.g. 'datetime64[ns]' instead.
> ```
my pandas == 2.2.2,pyspark==3.4.3,also raise TypeError: Casting to unit-less
dtype 'datetime64' is not supported. Pass e.g. 'datetime64[ns]' instead.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]