Joris Van den Bossche created ARROW-9078: --------------------------------------------
Summary: [C++] Parquet writing of extension type with nested storage type fails Key: ARROW-9078 URL: https://issues.apache.org/jira/browse/ARROW-9078 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Joris Van den Bossche A reproducer in Python: {code:python} import pyarrow as pa import pyarrow.parquet as pq class MyStructType(pa.PyExtensionType): def __init__(self): pa.PyExtensionType.__init__(self, pa.struct([('left', pa.int64()), ('right', pa.int64())])) def __reduce__(self): return MyStructType, () struct_array = pa.StructArray.from_arrays( [ pa.array([0, 1], type="int64", from_pandas=True), pa.array([1, 2], type="int64", from_pandas=True), ], names=["left", "right"], ) # works table = pa.table({'a': struct_array}) pq.write_table(table, "test_struct.parquet") # doesn't work mystruct_array = pa.ExtensionArray.from_storage(MyStructType(), struct_array) table = pa.table({'a': mystruct_array}) pq.write_table(table, "test_struct.parquet") {code} Writing the simple StructArray nowadays works (and reading it back in as well). But when the struct array is the storage array of an ExtensionType, it fails with the following error: {code} ArrowException: Unknown error: data type leaf_count != builder_leaf_count1 2 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)