aimtsou commented on PR #37817:
URL: https://github.com/apache/spark/pull/37817#issuecomment-1444549904

   Hi @srowen,
   
   Thank you for your very prompt reply.
   
   You are not correct about the error, after 1.20.0 it creates an attribute 
error       
   ```
             if attr in __former_attrs__:
   >           raise AttributeError(__former_attrs__[attr])
   E           AttributeError: module 'numpy' has no attribute 'bool'.
   E           `np.bool` was a deprecated alias for the builtin `bool`. To 
avoid this error in existing code, use `bool` by itself. Doing this will not 
modify any behavior and is safe. If you specifically wanted the numpy scalar 
type, use `np.bool_` here.
   E           The aliases was originally deprecated in NumPy 1.20; for more 
details and guidance see the original release note at:
   E               
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
   
   /usr/local/lib/python3.9/site-packages/numpy/__init__.py:305: AttributeError
   ```
   
   This is the end of an error, coming after calling the function toPandas() 
from my tests:
   
   ```
   /usr/local/lib/python3.9/site-packages/<my-pkg>/unit/test_case_runner.py:26: 
in run_test
       self.assert_df_are_equal(expected_df, actual)
   /usr/local/lib/python3.9/site-packages/<my-pkg>/unit/test_case_runner.py:58: 
in assert_df_are_equal
       self.handler.compare_df(result, expected, config=self.compare_config)
   /usr/local/lib/python3.9/site-packages/<my-pkg>/spark_test_handler.py:38: in 
compare_df
       actual_pd = actual.toPandas().sort_values(by=sort_columns, 
ignore_index=True)
   /usr/local/lib/python3.9/site-packages/pyspark/sql/pandas/conversion.py:216: 
in toPandas
       pandas_type = 
PandasConversionMixin._to_corrected_pandas_type(field.dataType)
   /usr/local/lib/python3.9/site-packages/pyspark/sql/pandas/conversion.py:298: 
in _to_corrected_pandas_type
       return np.bool  # type: ignore[attr-defined]
   ```
   
   And the error does not come from the numpy in the system but by the numpy 
inside pyspark
   
   I agree about the comments on databricks but as shown above this does not 
work on Spark 3.3.1 independently if you want to be compliant with Databricks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to