Indeed it seems that structs are unhandled as items of lists that are represented as ndarrays when coming from pandas
https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/python_to_arrow.cc#L759 Thanks for the report, I have filed https://issues.apache.org/jira/browse/ARROW-9610 On Thu, Jul 30, 2020 at 6:20 PM Xiaozhen Liu <[email protected]> wrote: > > Hi, > > > > Sorry for not being clear. > > Pyarrow version is 0.17.1. > > > > Here is the full stacktree: > > > > Traceback (most recent call last): > > File "tobacco_relevancy_classify.py", line 169, in do_action > > output_data = pyarrow.Table.from_pandas(output_dataframe) > > File "pyarrow\table.pxi", line 1451, in pyarrow.lib.Table.from_pandas > > File > "C:\Users\Jamie\AppData\Local\Programs\Python\Python37\lib\site-packages\pyarrow\pandas_compat.py", > line 575, in dataframe_to_arrays > > for c, f in zip(columns_to_convert, convert_fields)] > > File > "C:\Users\Jamie\AppData\Local\Programs\Python\Python37\lib\site-packages\pyarrow\pandas_compat.py", > line 575, in <listcomp> > > for c, f in zip(columns_to_convert, convert_fields)] > > File > "C:\Users\Jamie\AppData\Local\Programs\Python\Python37\lib\site-packages\pyarrow\pandas_compat.py", > line 566, in convert_column > > raise e > > File > "C:\Users\Jamie\AppData\Local\Programs\Python\Python37\lib\site-packages\pyarrow\pandas_compat.py", > line 560, in convert_column > > result = pa.array(col, type=type_, from_pandas=True, safe=safe) > > File "pyarrow\array.pxi", line 265, in pyarrow.lib.array > > File "pyarrow\array.pxi", line 80, in pyarrow.lib._ndarray_to_array > > File "pyarrow\error.pxi", line 108, in pyarrow.lib.check_status > > pyarrow.lib.ArrowTypeError: ('Unknown list item type: struct<attributeName: > string, end: int64, key: string, start: int64, tokenOffset: int64, value: > string>', 'Conversion failed for column payload with type object') > > > > The column that causes this error has the following type: > > > > payload: list<Span: struct<attributeName: string, start: int32, end: int32, > key: string, value: string, tokenOffset: int32>> > > child 0, Span: struct<attributeName: string, start: int32, end: int32, key: > string, value: string, tokenOffset: int32> > > child 0, attributeName: string > > child 1, start: int32 > > child 2, end: int32 > > child 3, key: string > > child 4, value: string > > child 5, tokenOffset: int32 > > > > This column can be successfully converted to Dataframe, but cannot be > converted back to Arrow Table. > > > > Thank you. > > > > Xiaozhen Liu > > > > From: Micah Kornfield > Sent: Thursday, July 30, 2020 10:56 PM > To: [email protected] > Subject: Re: Error with Arrow Table and Pandas DataFrame conversion > > > > Please include pyarrow version as well. > > On Thursday, July 30, 2020, Wes McKinney <[email protected]> wrote: > > Could you provide more complete details about the error (an example if > possible and the full error and stacktrace)? > > On Thu, Jul 30, 2020 at 4:32 AM Xiaozhen Liu <[email protected]> wrote: > > > > Hi everyone, > > > > > > > > I’m using pyarrow to convert an Arrow Table with a column whose type is > > List<Struct> to pandas.DataFrame, and this table is passed from Java to > > Python using Arrow Flight. It seems pyarrow has no problem converting this > > to a DataFrame, but errors when converting this DataFrame back to Arrow > > Table. The error I’m getting is ArrowTypeError. The Struct has 5 child > > types that are either Int or Utf8. > > > > > > > > Why am I getting this kind of error when forward conversion (Arrow Table -> > > Pandas Dataframe) is successful? Is this a feature not implemented? And, > > how can I fix this? > > > > > > > > Thank you. > > > > > > > > > > > > Best, > > > > Xiaozhen Liu > >
