Joris Van den Bossche created ARROW-5572:
--------------------------------------------

             Summary: [Python] raise error message when passing invalid filter 
in parquet reading
                 Key: ARROW-5572
                 URL: https://issues.apache.org/jira/browse/ARROW-5572
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.13.0
            Reporter: Joris Van den Bossche


>From 
>https://stackoverflow.com/questions/56522977/using-predicates-to-filter-rows-from-pyarrow-parquet-parquetdataset

For example, when specifying a column in the filter which is a normal column 
and not a key in your partitioned folder hierarchy, the filter gets silently 
ignored. It would be nice to get an error message for this.  
Reproducible example:

{code:python}
df = pd.DataFrame({'a': [0, 0, 1, 1], 'b': [0, 1, 0, 1], 'c': [1, 2, 3, 4]})
table = pa.Table.from_pandas(df)
pq.write_to_dataset(table, 'test_parquet_row_filters', partition_cols=['a'])
# filter on 'a' (partition column) -> works
pq.read_table('test_parquet_row_filters', filters=[('a', '=', 1)]).to_pandas()
# filter on normal column (in future could do row group filtering) -> silently 
does nothing
pq.read_table('test_parquet_row_filters', filters=[('b', '=', 1)]).to_pandas()
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to