Yikun edited a comment on pull request #34717: URL: https://github.com/apache/spark/pull/34717#issuecomment-981220341
Sure, thanks for your suggestion, I'd like to update. and I added a simple test to install pandas v1.0.1 ~and run test on https://github.com/apache/spark/pull/34730 , wait for the result.~ : (, Update: pandas only publish ubuntu wheel after v1.2....we have to install many deps, otherwise it would be failed when using `pip install pandas==1.0.1`,so I just install in my local env (macos, x86, yes have the 1.0.1 wheel) and running pip install 'pandas==1.0.1' and `python/run-tests --modules=pyspark-pandas,pyspark-pandas-slow --parallelism=2 --python-executable=python3` to test it. and looks like there were some testcase are failed: ``` ====================================================================== ERROR: test_astype (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/tests/data_type_ops/test_categorical_ops.py", line 204, in test_astype self.assert_eq(pser.astype(int), psser.astype(int)) File "/Users/jiangyikun/spark/spark/python/pyspark/testing/pandasutils.py", line 224, in assert_eq robj = self._to_pandas(right) File "/Users/jiangyikun/spark/spark/python/pyspark/testing/pandasutils.py", line 245, in _to_pandas return obj.to_pandas() File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/series.py", line 1588, in to_pandas return self._to_pandas() File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/series.py", line 1594, in _to_pandas return self._to_internal_pandas().copy() File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/series.py", line 6349, in _to_internal_pandas return self._psdf._internal.to_pandas_frame[self.name] File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/utils.py", line 584, in wrapped_lazy_property setattr(self, attr_name, fn(self)) File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/internal.py", line 1049, in to_pandas_frame pdf = sdf.toPandas() File "/Users/jiangyikun/spark/spark/python/pyspark/sql/pandas/conversion.py", line 185, in toPandas pdf = pd.DataFrame(columns=tmp_column_names).astype( File "/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/frame.py", line 435, in __init__ mgr = init_dict(data, index, columns, dtype=dtype) File "/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 239, in init_dict val = construct_1d_arraylike_from_scalar(np.nan, len(index), nan_dtype) File "/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 1449, in construct_1d_arraylike_from_scalar dtype = dtype.dtype AttributeError: type object 'object' has no attribute 'dtype' ---------------------------------------------------------------------- ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org