[ 
https://issues.apache.org/jira/browse/SPARK-44670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17750959#comment-17750959
 ] 

Madhukar commented on SPARK-44670:
----------------------------------

Raised a PR for using openpyxl instead of xlrd - 
[https://github.com/apache/spark/pull/42339] 

> Fix the `test_to_excel` tests for python3.7
> -------------------------------------------
>
>                 Key: SPARK-44670
>                 URL: https://issues.apache.org/jira/browse/SPARK-44670
>             Project: Spark
>          Issue Type: Bug
>          Components: Pandas API on Spark
>    Affects Versions: 3.4.1
>            Reporter: Madhukar
>            Priority: Minor
>
> With python3.7 and openpyxl installed got error:
> ======================================================================
> ERROR: test_to_excel 
> (pyspark.pandas.tests.test_dataframe_conversion.DataFrameConversionTest)
> Traceback (most recent call last):
>   File 
> "/workspace/apache-spark/python/pyspark/pandas/tests/test_dataframe_conversion.py",
>  line 102, in test_to_excel
>     dataframes = self.get_excel_dfs(pandas_on_spark_location, pandas_location)
>   File 
> "/workspace/apache-spark/python/pyspark/pandas/tests/test_dataframe_conversion.py",
>  line 89, in get_excel_dfs
>     "got": pd.read_excel(pandas_on_spark_location, index_col=0),
>   File "/opt/conda/lib/python3.7/site-packages/pandas/util/_decorators.py", 
> line 296, in wrapper
>     return func(*args, **kwargs)
>   File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_base.py", 
> line 304, in read_excel
>     io = ExcelFile(io, engine=engine)
>   File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_base.py", 
> line 867, in __init__
>     self._reader = self._engines[engine](self._io)
>   File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py", 
> line 21, in __init__
>     import_optional_dependency("xlrd", extra=err_msg)
>   File "/opt/conda/lib/python3.7/site-packages/pandas/compat/_optional.py", 
> line 110, in import_optional_dependency
>     raise ImportError(msg) from None
> ImportError: Missing optional dependency 'xlrd'. Install xlrd >= 1.0.0 for 
> Excel support Use pip or conda to install xlrd.
> ----------------------------------------------------------------------
>  
>  
>  
> But with xlrd 2.0.1 installed getting error
> ======================================================================
> ERROR: test_to_excel 
> (pyspark.pandas.tests.test_dataframe_conversion.DataFrameConversionTest)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File 
> "/workspace/apache-spark/python/pyspark/pandas/tests/test_dataframe_conversion.py",
>  line 102, in test_to_excel
>     dataframes = self.get_excel_dfs(pandas_on_spark_location, pandas_location)
>   File 
> "/workspace/apache-spark/python/pyspark/pandas/tests/test_dataframe_conversion.py",
>  line 89, in get_excel_dfs
>     "got": pd.read_excel(pandas_on_spark_location, index_col=0),
>   File "/opt/conda/lib/python3.7/site-packages/pandas/util/_decorators.py", 
> line 296, in wrapper
>     return func(*args, **kwargs)
>   File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_base.py", 
> line 304, in read_excel
>     io = ExcelFile(io, engine=engine)
>   File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_base.py", 
> line 867, in __init__
>     self._reader = self._engines[engine](self._io)
>   File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py", 
> line 22, in __init__
>     super().__init__(filepath_or_buffer)
>   File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_base.py", 
> line 353, in __init__
>     self.book = self.load_workbook(filepath_or_buffer)
>   File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py", 
> line 37, in load_workbook
>     return open_workbook(filepath_or_buffer)
>   File "/opt/conda/lib/python3.7/site-packages/xlrd/__init__.py", line 170, 
> in open_workbook
>     raise XLRDError(FILE_FORMAT_DESCRIPTIONS[file_format]+'; not supported')
> xlrd.biffh.XLRDError: Excel xlsx file; not supported
> ----------------------------------------------------------------------
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to