Hello Abe,
I think the problems lies in the case that you mix two syntaxes. We either
support a "list of tuples" or "list of lists of tuples". Furthermore the
correct DNF for your filter would be (A ⋀ B ⋀ C) ⋁ (A ⋀ B ⋀ D), thus you
should use
filters = [[("col", ">=", "<A>"), ("col", "<=", "<B>"), ("col", "=", "<C>")],
[("col", ">=", "<A>"), ("col", "<=", "<B>"), ("col", "=", "<D>")]]
Uwe
[[("col", ">=", "<>"),
> >> ("col", "<=", "<>"),
> >> [[("col", "=", "<>")], [("col", "=", "<>")]]
> >>
On Wed, May 22, 2019, at 9:12 PM, Wes McKinney wrote:
> hi Abe -- you may have to open a JIRA about documentation improvement
> and/or bug fix for this. I don't know off-hand. Copying the dev@ list
>
> - Wes
>
> On Tue, May 21, 2019 at 12:05 PM Abraham Elmahrek <[email protected]> wrote:
> >
> > Folks
> >
> > Does any one know how to do the following with filters for ParquetDataset
> > (DNF): A ⋀ B ⋀ (C ⋁ D)?
> >
> > I've tried the following without luck:
> >
> >> dataset = pq.ParquetDataset("<>", filesystem=s3fs.S3FileSystem(), filters=[
> >> ("col", ">=", "<>"),
> >> ("col", "<=", "<>"),
> >> [[("col", "=", "<>")], [("col", "=", "<>")]]
> >> ])
> >
> >
> > Where A = ("col", ">=", "<>"), B = ("col", "<=", "<>"), C = ("col", "=",
> > "<>"), and D = ("col", "=", "<>").
> >
> > In the above example, I get the following error:
> >>
> >> File
> >> "/opt/miniconda/envs/flatiron-cron/lib/python3.6/site-packages/pyarrow-0.13.0-py3.6-linux-x86_64.egg/pyarrow/parquet.py",
> >> line 961, in __init__
> >> filters = _check_filters(filters)
> >> File
> >> "/opt/miniconda/envs/flatiron-cron/lib/python3.6/site-packages/pyarrow-0.13.0-py3.6-linux-x86_64.egg/pyarrow/parquet.py",
> >> line 93, in _check_filters
> >> for col, op, val in conjunction:
> >> ValueError: not enough values to unpack (expected 3, got 2)
> >
> >
> > Abe
>