[ https://issues.apache.org/jira/browse/ARROW-5665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868572#comment-16868572 ]
Joris Van den Bossche commented on ARROW-5665: ---------------------------------------------- [~tnesztler] Can you try to provide a reproducible example? Based on the error message, it seems you have a column in your DataFrame that has Series objects as values in the rows. That's not support by pyarrow. If that is intentional, and you want to save them as a nested List type, then you need to convert the column of Series objects to a column of arrays or lists. > ArrowInvalid on converting Pandas Series with dtype float64 > ----------------------------------------------------------- > > Key: ARROW-5665 > URL: https://issues.apache.org/jira/browse/ARROW-5665 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Reporter: Thibaud Nesztler > Priority: Minor > > {code:java} > ('Could not convert 0 70.699997\n0 73.000000\n0 0.000000\nName: fact_value, > dtype: float64 with type Series: did not recognize Python value type when > inferring an Arrow data type', 'Conversion failed for column fact_value with > type float64'){code} > We are experiencing a lot of random errors (will run the same code and not > get the error at all) when converting Pandas Dataframe to parquet files using > pyarrow. > We use this line of code for the convertion: > {code:java} > dataframe.to_parquet(filePath, compression="snappy", index=False){code} > Note: `filePath` is an AWS S3 URI. > {code:java} > ArrowInvalid: ('Could not convert 0 70.699997\n0 73.000000\n0 0.000000\nName: > fact_value, dtype: float64 with type Series: did not recognize Python value > type when inferring an Arrow data type', 'Conversion failed for column > fact_value with type float64') > File "store_manager.py", line 25, in _write_files_and_partitions > dataframe.to_parquet(filePath, compression="snappy", index=False) > File "pandas/core/frame.py", line 2203, in to_parquet > partition_cols=partition_cols, **kwargs) > File "pandas/io/parquet.py", line 252, in to_parquet > partition_cols=partition_cols, **kwargs) > File "pandas/io/parquet.py", line 113, in write > table = self.api.Table.from_pandas(df, **from_pandas_kwargs) > File "pyarrow/table.pxi", line 1139, in pyarrow.lib.Table.from_pandas > names, arrays, metadata = dataframe_to_arrays( > File "pyarrow/pandas_compat.py", line 474, in dataframe_to_arrays > convert_types)) > File "concurrent/futures/_base.py", line 586, in result_iterator > yield fs.pop().result() > File "concurrent/futures/_base.py", line 425, in result > return self.__get_result() > File "concurrent/futures/_base.py", line 384, in __get_result > raise self._exception > File "concurrent/futures/thread.py", line 57, in run > result = self.fn(*self.args, **self.kwargs) > File "pyarrow/pandas_compat.py", line 463, in convert_column > raise e > File "pyarrow/pandas_compat.py", line 457, in convert_column > return pa.array(col, type=ty, from_pandas=True, safe=safe) > File "pyarrow/array.pxi", line 173, in pyarrow.lib.array > return _sequence_to_array(obj, mask, size, type, pool, from_pandas) > File "pyarrow/array.pxi", line 36, in pyarrow.lib._sequence_to_array > check_status(ConvertPySequence(sequence, mask, options, &out)) > File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status > raise ArrowInvalid(message){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)