bzhaoopenstack commented on PR #37232:
URL: https://github.com/apache/spark/pull/37232#issuecomment-1189975116
Looks there are other testcases need to be fixed. This is I testing on
master without any change.
```
spark@DESKTOP-U0I7MO9:~/spark$ python/run-tests --testnames
'pyspark.sql.tests.test_dataframe'
Running PySpark tests. Output is in /home/spark/spark/python/unit-tests.log
Will test against the following Python executables:
['/home/spark/.pyenv/versions/3.8.13/bin/python3']
Will test the following Python tests: ['pyspark.sql.tests.test_dataframe']
/home/spark/.pyenv/versions/3.8.13/bin/python3 python_implementation is
CPython
/home/spark/.pyenv/versions/3.8.13/bin/python3 version is: Python 3.8.13
Starting test(/home/spark/.pyenv/versions/3.8.13/bin/python3):
pyspark.sql.tests.test_dataframe (temp output:
/tmp/home_spark_.pyenv_versions_3.8.13_bin_python3__pyspark.sql.tests.test_dataframe__3gog72u3.log)
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).
Running tests...
----------------------------------------------------------------------
test_cache (pyspark.sql.tests.test_dataframe.DataFrameTests) ... OK
(0.535s)
test_column_iterator (pyspark.sql.tests.test_dataframe.DataFrameTests) ...
OK (0.005s)
test_create_dataframe_from_array_of_long
(pyspark.sql.tests.test_dataframe.DataFrameTests) ... OK (1.341s)
test_create_dataframe_from_pandas_with_day_time_interval
(pyspark.sql.tests.test_dataframe.DataFrameTests) ...
/home/spark/spark/python/pyspark/sql/pandas/conversion.py:474: FutureWarning:
iteritems is deprecated and will be removed in a future version. Use .items
instead.
for column, series in pdf.iteritems():
/home/spark/spark/python/pyspark/sql/pandas/conversion.py:486:
FutureWarning: iteritems is deprecated and will be removed in a future version.
Use .items instead.
for column, series in pdf.iteritems():
OK (0.156s)
test_create_dataframe_from_pandas_with_dst
(pyspark.sql.tests.test_dataframe.DataFrameTests) ...
/home/spark/spark/python/pyspark/sql/pandas/conversion.py:474: FutureWarning:
iteritems is deprecated and will be removed in a future version. Use .items
instead.
for column, series in pdf.iteritems():
/home/spark/spark/python/pyspark/sql/pandas/conversion.py:486:
FutureWarning: iteritems is deprecated and will be removed in a future version.
Use .items instead.
for column, series in pdf.iteritems():
ERROR (0.140s)
test_create_dataframe_from_pandas_with_timestamp
(pyspark.sql.tests.test_dataframe.DataFrameTests) ...
/home/spark/spark/python/pyspark/sql/pandas/conversion.py:474: FutureWarning:
iteritems is deprecated and will be removed in a future version. Use .items
instead.
for column, series in pdf.iteritems():
/home/spark/spark/python/pyspark/sql/pandas/conversion.py:486:
FutureWarning: iteritems is deprecated and will be removed in a future version.
Use .items instead.
for column, series in pdf.iteritems():
OK (0.120s)
test_create_dataframe_required_pandas_not_found
(pyspark.sql.tests.test_dataframe.DataFrameTests) ... SKIP (0.000s)
test_create_nan_decimal_dataframe
(pyspark.sql.tests.test_dataframe.DataFrameTests) ... OK (0.207s)
test_df_show (pyspark.sql.tests.test_dataframe.DataFrameTests) ... OK
(0.940s)
test_drop_duplicates (pyspark.sql.tests.test_dataframe.DataFrameTests) ...
OK (1.007s)
test_dropna (pyspark.sql.tests.test_dataframe.DataFrameTests) ... OK
(1.602s)
test_duplicated_column_names
(pyspark.sql.tests.test_dataframe.DataFrameTests) ... OK (0.195s)
test_extended_hint_types (pyspark.sql.tests.test_dataframe.DataFrameTests)
... OK (0.114s)
test_fillna (pyspark.sql.tests.test_dataframe.DataFrameTests) ... OK
(1.543s)
test_freqItems (pyspark.sql.tests.test_dataframe.DataFrameTests) ... OK
(0.288s)
test_generic_hints (pyspark.sql.tests.test_dataframe.DataFrameTests) ...
OK (0.127s)
test_help_command (pyspark.sql.tests.test_dataframe.DataFrameTests) ... OK
(0.372s)
test_input_files (pyspark.sql.tests.test_dataframe.DataFrameTests) ... OK
(1.143s)
test_invalid_join_method (pyspark.sql.tests.test_dataframe.DataFrameTests)
... OK (0.050s)
test_join_without_on (pyspark.sql.tests.test_dataframe.DataFrameTests) ...
OK (0.152s)
test_observe (pyspark.sql.tests.test_dataframe.DataFrameTests) ... OK
(0.425s)
test_observe_str (pyspark.sql.tests.test_dataframe.DataFrameTests) ... OK
(10.161s)
test_pandas_api (pyspark.sql.tests.test_dataframe.DataFrameTests) ...
/home/spark/spark/python/pyspark/pandas/utils.py:976:
PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's
memory. It should only be used if the resulting pandas DataFrame is expected to
be small.
warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/home/spark/spark/python/pyspark/pandas/utils.py:976:
PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's
memory. It should only be used if the resulting pandas DataFrame is expected to
be small.
warnings.warn(message, PandasAPIOnSparkAdviceWarning)
OK (0.697s)
test_range (pyspark.sql.tests.test_dataframe.DataFrameTests) ... OK
(0.235s)
test_repartitionByRange_dataframe
(pyspark.sql.tests.test_dataframe.DataFrameTests) ... OK (1.960s)
test_replace (pyspark.sql.tests.test_dataframe.DataFrameTests) ...
/home/spark/spark/python/pyspark/sql/dataframe.py:2791: UserWarning: to_replace
is a dict and value is not None. value will be ignored.
warnings.warn("to_replace is a dict and value is not None. value will be
ignored.")
OK (2.918s)
test_repr_behaviors (pyspark.sql.tests.test_dataframe.DataFrameTests) ...
OK (0.835s)
test_require_cross (pyspark.sql.tests.test_dataframe.DataFrameTests) ...
OK (0.556s)
test_same_semantics_error
(pyspark.sql.tests.test_dataframe.DataFrameTests) ... OK (0.023s)
test_sample (pyspark.sql.tests.test_dataframe.DataFrameTests) ... OK
(0.042s)
test_toDF_with_schema_string
(pyspark.sql.tests.test_dataframe.DataFrameTests) ... OK (1.424s)
test_to_local_iterator (pyspark.sql.tests.test_dataframe.DataFrameTests)
... OK (0.535s)
test_to_local_iterator_not_fully_consumed
(pyspark.sql.tests.test_dataframe.DataFrameTests) ... OK (2.608s)
test_to_local_iterator_prefetch
(pyspark.sql.tests.test_dataframe.DataFrameTests) ... OK (0.262s)
test_to_pandas (pyspark.sql.tests.test_dataframe.DataFrameTests) ... ERROR
(0.093s)
test_to_pandas_avoid_astype
(pyspark.sql.tests.test_dataframe.DataFrameTests) ... OK (0.094s)
test_to_pandas_from_empty_dataframe
(pyspark.sql.tests.test_dataframe.DataFrameTests) ... ERROR (0.396s)
test_to_pandas_from_mixed_dataframe
(pyspark.sql.tests.test_dataframe.DataFrameTests) ... ERROR (0.167s)
test_to_pandas_from_null_dataframe
(pyspark.sql.tests.test_dataframe.DataFrameTests) ... OK (0.078s)
test_to_pandas_on_cross_join
(pyspark.sql.tests.test_dataframe.DataFrameTests) ... OK (0.162s)
test_to_pandas_required_pandas_not_found
(pyspark.sql.tests.test_dataframe.DataFrameTests) ... SKIP (0.000s)
test_to_pandas_with_duplicated_column_names
(pyspark.sql.tests.test_dataframe.DataFrameTests) ... OK (0.051s)
test_with_column_with_existing_name
(pyspark.sql.tests.test_dataframe.DataFrameTests) ... OK (0.085s)
test_with_columns (pyspark.sql.tests.test_dataframe.DataFrameTests) ... OK
(0.254s)
test_query_execution_listener_on_collect
(pyspark.sql.tests.test_dataframe.QueryExecutionListenerTests) ... OK (0.051s)
test_query_execution_listener_on_collect_with_arrow
(pyspark.sql.tests.test_dataframe.QueryExecutionListenerTests) ... OK (0.043s)
======================================================================
ERROR [0.140s]: test_create_dataframe_from_pandas_with_dst
(pyspark.sql.tests.test_dataframe.DataFrameTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/spark/spark/python/pyspark/sql/tests/test_dataframe.py", line
1008, in test_create_dataframe_from_pandas_with_dst
assert_frame_equal(pdf, df.toPandas())
File "/home/spark/spark/python/pyspark/sql/pandas/conversion.py", line
248, in toPandas
series = series.astype(t, copy=False)
File "/home/spark/upstream/pandas/pandas/core/generic.py", line 6095, in
astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
File "/home/spark/upstream/pandas/pandas/core/internals/managers.py", line
386, in astype
return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
File "/home/spark/upstream/pandas/pandas/core/internals/managers.py", line
308, in apply
applied = getattr(b, f)(**kwargs)
File "/home/spark/upstream/pandas/pandas/core/internals/blocks.py", line
526, in astype
new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
File "/home/spark/upstream/pandas/pandas/core/dtypes/astype.py", line 299,
in astype_array_safe
new_values = astype_array(values, dtype, copy=copy)
File "/home/spark/upstream/pandas/pandas/core/dtypes/astype.py", line 227,
in astype_array
values = values.astype(dtype, copy=copy)
File "/home/spark/upstream/pandas/pandas/core/arrays/datetimes.py", line
631, in astype
return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy)
File "/home/spark/upstream/pandas/pandas/core/arrays/datetimelike.py",
line 504, in astype
raise TypeError(msg)
TypeError: Cannot cast DatetimeArray to dtype datetime64
======================================================================
ERROR [0.093s]: test_to_pandas
(pyspark.sql.tests.test_dataframe.DataFrameTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/spark/spark/python/pyspark/sql/tests/test_dataframe.py", line
797, in test_to_pandas
pdf = self._to_pandas()
File "/home/spark/spark/python/pyspark/sql/tests/test_dataframe.py", line
791, in _to_pandas
return df.toPandas()
File "/home/spark/spark/python/pyspark/sql/pandas/conversion.py", line
248, in toPandas
series = series.astype(t, copy=False)
File "/home/spark/upstream/pandas/pandas/core/generic.py", line 6095, in
astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
File "/home/spark/upstream/pandas/pandas/core/internals/managers.py", line
386, in astype
return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
File "/home/spark/upstream/pandas/pandas/core/internals/managers.py", line
308, in apply
applied = getattr(b, f)(**kwargs)
File "/home/spark/upstream/pandas/pandas/core/internals/blocks.py", line
526, in astype
new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
File "/home/spark/upstream/pandas/pandas/core/dtypes/astype.py", line 299,
in astype_array_safe
new_values = astype_array(values, dtype, copy=copy)
File "/home/spark/upstream/pandas/pandas/core/dtypes/astype.py", line 227,
in astype_array
values = values.astype(dtype, copy=copy)
File "/home/spark/upstream/pandas/pandas/core/arrays/datetimes.py", line
631, in astype
return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy)
File "/home/spark/upstream/pandas/pandas/core/arrays/datetimelike.py",
line 504, in astype
raise TypeError(msg)
TypeError: Cannot cast DatetimeArray to dtype datetime64
======================================================================
ERROR [0.396s]: test_to_pandas_from_empty_dataframe
(pyspark.sql.tests.test_dataframe.DataFrameTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/spark/spark/python/pyspark/sql/tests/test_dataframe.py", line
886, in test_to_pandas_from_empty_dataframe
dtypes_when_nonempty_df = self.spark.sql(sql).toPandas().dtypes
File "/home/spark/spark/python/pyspark/sql/pandas/conversion.py", line
248, in toPandas
series = series.astype(t, copy=False)
File "/home/spark/upstream/pandas/pandas/core/generic.py", line 6095, in
astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
File "/home/spark/upstream/pandas/pandas/core/internals/managers.py", line
386, in astype
return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
File "/home/spark/upstream/pandas/pandas/core/internals/managers.py", line
308, in apply
applied = getattr(b, f)(**kwargs)
File "/home/spark/upstream/pandas/pandas/core/internals/blocks.py", line
526, in astype
new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
File "/home/spark/upstream/pandas/pandas/core/dtypes/astype.py", line 299,
in astype_array_safe
new_values = astype_array(values, dtype, copy=copy)
File "/home/spark/upstream/pandas/pandas/core/dtypes/astype.py", line 227,
in astype_array
values = values.astype(dtype, copy=copy)
File "/home/spark/upstream/pandas/pandas/core/arrays/datetimes.py", line
631, in astype
return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy)
File "/home/spark/upstream/pandas/pandas/core/arrays/datetimelike.py",
line 504, in astype
raise TypeError(msg)
TypeError: Cannot cast DatetimeArray to dtype datetime64
======================================================================
ERROR [0.167s]: test_to_pandas_from_mixed_dataframe
(pyspark.sql.tests.test_dataframe.DataFrameTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/spark/spark/python/pyspark/sql/tests/test_dataframe.py", line
952, in test_to_pandas_from_mixed_dataframe
pdf_with_some_nulls = self.spark.sql(sql).toPandas()
File "/home/spark/spark/python/pyspark/sql/pandas/conversion.py", line
248, in toPandas
series = series.astype(t, copy=False)
File "/home/spark/upstream/pandas/pandas/core/generic.py", line 6095, in
astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
File "/home/spark/upstream/pandas/pandas/core/internals/managers.py", line
386, in astype
return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
File "/home/spark/upstream/pandas/pandas/core/internals/managers.py", line
308, in apply
applied = getattr(b, f)(**kwargs)
File "/home/spark/upstream/pandas/pandas/core/internals/blocks.py", line
526, in astype
new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
File "/home/spark/upstream/pandas/pandas/core/dtypes/astype.py", line 299,
in astype_array_safe
new_values = astype_array(values, dtype, copy=copy)
File "/home/spark/upstream/pandas/pandas/core/dtypes/astype.py", line 227,
in astype_array
values = values.astype(dtype, copy=copy)
File "/home/spark/upstream/pandas/pandas/core/arrays/datetimes.py", line
631, in astype
return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy)
File "/home/spark/upstream/pandas/pandas/core/arrays/datetimelike.py",
line 504, in astype
raise TypeError(msg)
TypeError: Cannot cast DatetimeArray to dtype datetime64
----------------------------------------------------------------------
Ran 46 tests in 41.235s
FAILED (errors=4, skipped=2)
Generating XML reports...
+---+
| _1|
+---+
|foo|
+---+
+---+
| _1|
+---+
|foo|
+---+
-RECORD 0--
_1 | f
+---+
| _1|
+---+
| f|
+---+
+---+
| _1|
+---+
| f|
+---+
Had test failures in pyspark.sql.tests.test_dataframe with
/home/spark/.pyenv/versions/3.8.13/bin/python3; see logs.
spark@DESKTOP-U0I7MO9:~/spark$
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]