[GitHub] [spark] Yikun edited a comment on pull request #34717: [SPARK-37465][PYTHON] Bump minimum pandas version to 1.0.0

GitBox Sun, 28 Nov 2021 23:07:06 -0800


Yikun edited a comment on pull request #34717:
URL: https://github.com/apache/spark/pull/34717#issuecomment-981220341



   Sure, thanks for your suggestion, I'd like to update. and I added a simple 
test to install pandas v1.0.1 ~and run test on 
https://github.com/apache/spark/pull/34730 , wait for the result.~
   
   : (, Update: pandas only publish ubuntu wheel after v1.2....we have to 
install many deps, otherwise it would be failed when using `pip install 
pandas==1.0.1`,so I just install in my local env (macos, x86, yes have the 
1.0.1 wheel) and running pip install 'pandas==1.0.1' and `python/run-tests 
--modules=pyspark-pandas,pyspark-pandas-slow --parallelism=2 
--python-executable=python3` to test it.
   
   and looks like there were some testcase are failed:
   ```
   ======================================================================
   ERROR: test_astype 
(pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest)
   ----------------------------------------------------------------------
   Traceback (most recent call last):
     File 
"/Users/jiangyikun/spark/spark/python/pyspark/pandas/tests/data_type_ops/test_categorical_ops.py",
 line 204, in test_astype
       self.assert_eq(pser.astype(int), psser.astype(int))
     File 
"/Users/jiangyikun/spark/spark/python/pyspark/testing/pandasutils.py", line 
224, in assert_eq
       robj = self._to_pandas(right)
     File 
"/Users/jiangyikun/spark/spark/python/pyspark/testing/pandasutils.py", line 
245, in _to_pandas
       return obj.to_pandas()
     File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/series.py", line 
1588, in to_pandas
       return self._to_pandas()
     File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/series.py", line 
1594, in _to_pandas
       return self._to_internal_pandas().copy()
     File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/series.py", line 
6349, in _to_internal_pandas
       return self._psdf._internal.to_pandas_frame[self.name]
     File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/utils.py", line 
584, in wrapped_lazy_property
       setattr(self, attr_name, fn(self))
     File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/internal.py", 
line 1049, in to_pandas_frame
       pdf = sdf.toPandas()
     File 
"/Users/jiangyikun/spark/spark/python/pyspark/sql/pandas/conversion.py", line 
185, in toPandas
       pdf = pd.DataFrame(columns=tmp_column_names).astype(
     File 
"/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/frame.py", line 
435, in __init__
       mgr = init_dict(data, index, columns, dtype=dtype)
     File 
"/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/internals/construction.py",
 line 239, in init_dict
       val = construct_1d_arraylike_from_scalar(np.nan, len(index), nan_dtype)
     File 
"/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/dtypes/cast.py",
 line 1449, in construct_1d_arraylike_from_scalar
       dtype = dtype.dtype
   AttributeError: type object 'object' has no attribute 'dtype'
   
   ----------------------------------------------------------------------
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Yikun edited a comment on pull request #34717: [SPARK-37465][PYTHON] Bump minimum pandas version to 1.0.0

Reply via email to