&res created ARROW-18439:
----------------------------

             Summary: Misleading message when loading parquet data with invalid 
null data
                 Key: ARROW-18439
                 URL: https://issues.apache.org/jira/browse/ARROW-18439
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
    Affects Versions: 10.0.1
            Reporter: &res


I'm saving an arrow table to parquet. One column is a list of structs, which 
elements are marked as non nullable. But the data isn't valid because I've put 
a null in one of the nested field. 

When I save this data to parquet and try to load it back I get a very 
misleading message:
{code:java}
 Length spanned by list offsets (2) larger than values array (length 1){code}
I would rather arrow complains when creating the table or when saving it to 
parquet.

Here's how to reproduce the issue:
{code:java}
struct = pa.struct(
    [
        pa.field("nested_string", pa.string(), nullable=False),
    ]
)

schema = pa.schema(
    [pa.field("list_column", pa.list_(pa.field("item", struct, 
nullable=False)))]
)
table = pa.table(
    {"list_column": [[{"nested_string": ""}, {"nested_string": None}]]}, 
schema=schema
)
with io.BytesIO() as file:
    pq.write_table(table, file)
    file.seek(0)
    pq.read_table(file) # Raises pa.ArrowInvalid
 {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to