Re: [PR] [SPARK-39821][PYTHON][PS] Fix error during using DatetimeIndex [spark]

via GitHub Thu, 26 Sep 2024 20:52:51 -0700


DAAworld commented on PR #37232:
URL: https://github.com/apache/spark/pull/37232#issuecomment-2378324786


   > With the release of pandas 2.0, I think this is PR should be re-opened, 
right?
   > 
   > I can recreate the issue originally described with
   > 
   > ```python
   > Python 3.9.16 (main, May  3 2023, 09:54:39) 
   > [GCC 10.2.1 20210110] on linux
   > Type "help", "copyright", "credits" or "license" for more information.
   > >>> import pyspark
   > >>> pyspark.__version__
   > '3.4.0'
   > >>> import pandas
   > >>> pandas.__version__
   > '2.0.1'
   > >>> import pyspark.pandas as ps
   > >>> ps.DatetimeIndex(["1970-01-01", "1970-01-02", "1970-01-03"])
   > Setting default log level to "WARN".
   > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
   > 23/05/18 21:07:30 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   > 23/05/18 21:07:31 WARN Utils: Service 'SparkUI' could not bind on port 
4040. Attempting port 4041.
   > Traceback (most recent call last):
   >   File "<stdin>", line 1, in <module>
   >   File 
"/home/ubuntu/.local/lib/python3.9/site-packages/pyspark/pandas/indexes/base.py",
 line 2705, in __repr__
   >     pindex = 
self._psdf._get_or_create_repr_pandas_cache(max_display_count).index
   >   File 
"/home/ubuntu/.local/lib/python3.9/site-packages/pyspark/pandas/frame.py", line 
13347, in _get_or_create_repr_pandas_cache
   >     self, "_repr_pandas_cache", {n: self.head(n + 1)._to_internal_pandas()}
   >   File 
"/home/ubuntu/.local/lib/python3.9/site-packages/pyspark/pandas/frame.py", line 
13342, in _to_internal_pandas
   >     return self._internal.to_pandas_frame
   >   File 
"/home/ubuntu/.local/lib/python3.9/site-packages/pyspark/pandas/utils.py", line 
588, in wrapped_lazy_property
   >     setattr(self, attr_name, fn(self))
   >   File 
"/home/ubuntu/.local/lib/python3.9/site-packages/pyspark/pandas/internal.py", 
line 1056, in to_pandas_frame
   >     pdf = sdf.toPandas()
   >   File 
"/home/ubuntu/.local/lib/python3.9/site-packages/pyspark/sql/pandas/conversion.py",
 line 251, in toPandas
   >     if (t is not None and not 
all([is_timedelta64_dtype(t),is_datetime64_dtype(t)])) or 
should_check_timedelta:
   >   File 
"/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/generic.py", line 
6324, in astype
   >     new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
   >   File 
"/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/internals/managers.py",
 line 451, in astype
   >     return self.apply(
   >   File 
"/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/internals/managers.py",
 line 352, in apply
   >     applied = getattr(b, f)(**kwargs)
   >   File 
"/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/internals/blocks.py",
 line 511, in astype
   >     new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
   >   File 
"/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/dtypes/astype.py", 
line 242, in astype_array_safe
   >     new_values = astype_array(values, dtype, copy=copy)
   >   File 
"/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/dtypes/astype.py", 
line 184, in astype_array
   >     values = values.astype(dtype, copy=copy)
   >   File 
"/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py",
 line 694, in astype
   >     raise TypeError(
   > TypeError: Casting to unit-less dtype 'datetime64' is not supported. Pass 
e.g. 'datetime64[ns]' instead.
   > ```
   
   my pandas == 2.2.2,pyspark==3.4.3,also raise TypeError: Casting to unit-less 
dtype 'datetime64' is not supported. Pass e.g. 'datetime64[ns]' instead.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-39821][PYTHON][PS] Fix error during using DatetimeIndex [spark]

Reply via email to