Re: ParquetDataset Filters Question

Uwe L. Korn Thu, 23 May 2019 04:06:18 -0700

Hello Abe,

I think the problems lies in the case that you mix two syntaxes. We either 
support a "list of tuples" or "list of lists of tuples". Furthermore the 
correct DNF for your filter would be (A ⋀ B ⋀ C)  ⋁  (A ⋀ B ⋀ D), thus you 
should use


filters = [[("col", ">=", "<A>"),  ("col", "<=", "<B>"), ("col", "=", "<C>")],  
[("col", ">=", "<A>"),  ("col", "<=", "<B>"), ("col", "=", "<D>")]]

Uwe
 
[[("col", ">=", "<>"),
> >>     ("col", "<=", "<>"),
> >>     [[("col", "=", "<>")], [("col", "=", "<>")]]
> >> 

On Wed, May 22, 2019, at 9:12 PM, Wes McKinney wrote:
> hi Abe -- you may have to open a JIRA about documentation improvement
> and/or bug fix for this. I don't know off-hand. Copying the dev@ list
> 
> - Wes
> 
> On Tue, May 21, 2019 at 12:05 PM Abraham Elmahrek <[email protected]> wrote:
> >
> > Folks
> >
> > Does any one know how to do the following with filters for ParquetDataset 
> > (DNF): A ⋀ B ⋀ (C ⋁ D)?
> >
> > I've tried the following without luck:
> >
> >> dataset = pq.ParquetDataset("<>", filesystem=s3fs.S3FileSystem(), filters=[
> >>     ("col", ">=", "<>"),
> >>     ("col", "<=", "<>"),
> >>     [[("col", "=", "<>")], [("col", "=", "<>")]]
> >> ])
> >
> >
> > Where A = ("col", ">=", "<>"), B = ("col", "<=", "<>"), C = ("col", "=", 
> > "<>"), and D = ("col", "=", "<>").
> >
> > In the above example, I get the following error:
> >>
> >>   File 
> >> "/opt/miniconda/envs/flatiron-cron/lib/python3.6/site-packages/pyarrow-0.13.0-py3.6-linux-x86_64.egg/pyarrow/parquet.py",
> >>  line 961, in __init__
> >>     filters = _check_filters(filters)
> >>   File 
> >> "/opt/miniconda/envs/flatiron-cron/lib/python3.6/site-packages/pyarrow-0.13.0-py3.6-linux-x86_64.egg/pyarrow/parquet.py",
> >>  line 93, in _check_filters
> >>     for col, op, val in conjunction:
> >> ValueError: not enough values to unpack (expected 3, got 2)
> >
> >
> > Abe
>

Re: ParquetDataset Filters Question

Reply via email to