Hi,
Sorry for not being clear.
Pyarrow version is 0.17.1.
Here is the full stacktree:
Traceback (most recent call last):
File "tobacco_relevancy_classify.py", line 169, in do_action
output_data = pyarrow.Table.from_pandas(output_dataframe)
File "pyarrow\table.pxi", line 1451, in pyarrow.lib.Table.from_pandas
File
"C:\Users\Jamie\AppData\Local\Programs\Python\Python37\lib\site-packages\pyarrow\pandas_compat.py",
line 575, in dataframe_to_arrays
for c, f in zip(columns_to_convert, convert_fields)]
File
"C:\Users\Jamie\AppData\Local\Programs\Python\Python37\lib\site-packages\pyarrow\pandas_compat.py",
line 575, in <listcomp>
for c, f in zip(columns_to_convert, convert_fields)]
File
"C:\Users\Jamie\AppData\Local\Programs\Python\Python37\lib\site-packages\pyarrow\pandas_compat.py",
line 566, in convert_column
raise e
File
"C:\Users\Jamie\AppData\Local\Programs\Python\Python37\lib\site-packages\pyarrow\pandas_compat.py",
line 560, in convert_column
result = pa.array(col, type=type_, from_pandas=True, safe=safe)
File "pyarrow\array.pxi", line 265, in pyarrow.lib.array
File "pyarrow\array.pxi", line 80, in pyarrow.lib._ndarray_to_array
File "pyarrow\error.pxi", line 108, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: ('Unknown list item type: struct<attributeName:
string, end: int64, key: string, start: int64, tokenOffset: int64, value:
string>', 'Conversion failed for column payload with type object')
The column that causes this error has the following type:
payload: list<Span: struct<attributeName: string, start: int32, end: int32,
key: string, value: string, tokenOffset: int32>>
child 0, Span: struct<attributeName: string, start: int32, end: int32, key:
string, value: string, tokenOffset: int32>
child 0, attributeName: string
child 1, start: int32
child 2, end: int32
child 3, key: string
child 4, value: string
child 5, tokenOffset: int32
This column can be successfully converted to Dataframe, but cannot be converted
back to Arrow Table.
Thank you.
Xiaozhen Liu
From: Micah Kornfield
Sent: Thursday, July 30, 2020 10:56 PM
To: [email protected]
Subject: Re: Error with Arrow Table and Pandas DataFrame conversion
Please include pyarrow version as well.
On Thursday, July 30, 2020, Wes McKinney <[email protected]> wrote:
Could you provide more complete details about the error (an example if
possible and the full error and stacktrace)?
On Thu, Jul 30, 2020 at 4:32 AM Xiaozhen Liu <[email protected]> wrote:
>
> Hi everyone,
>
>
>
> I’m using pyarrow to convert an Arrow Table with a column whose type is
> List<Struct> to pandas.DataFrame, and this table is passed from Java to
> Python using Arrow Flight. It seems pyarrow has no problem converting this to
> a DataFrame, but errors when converting this DataFrame back to Arrow Table.
> The error I’m getting is ArrowTypeError. The Struct has 5 child types that
> are either Int or Utf8.
>
>
>
> Why am I getting this kind of error when forward conversion (Arrow Table ->
> Pandas Dataframe) is successful? Is this a feature not implemented? And, how
> can I fix this?
>
>
>
> Thank you.
>
>
>
>
>
> Best,
>
> Xiaozhen Liu