OK, I get it! Indeed better to do it separately (loading the file in a dataset and then applying the filters.
Thank you for your answer :) > Le 2 août 2021 à 20:37, Weston Pace <[email protected]> a écrit : > > Hmm, it seems you managed to find a bit of an (I think) unintended use case > :). > > The docs for pyarrow.parquet.read_table describe the "filters" property as: > >> Each tuple has format: (key, op, value) and compares the key with the value. >> The >> supported op are: = or ==, !=, <, >, <=, >=, in and not in. If the op is in >> or not in, >> the value must be a collection such as a list, a set or a tuple. >> >> Examples: >> >> ('x', '=', 0) >> ('y', 'in', ['a', 'b', 'c']) >> ('z', 'not in', {'a','b'}) > > On the other hand, the filter you describe > "~ds.field('my_field').is_valid()" is one of > the new pyarrow.dataset expression-based filters. > > pyarrow.parquet.read_table has been slowly migrating over to use the new > dataset > scanning (controlled by use_legacy_dataset). It seems in 3.0.0 we > must have taken > whatever filters argument was given and passed it directly as a > filter. In 4.0.0 we try > and take a list of the previously described tuples and convert them to > dataset filters. > > So the easiest fix is probably to just use the new datasets API directly: > > TL:DR; > > my_dataset = ds.dataset('myparquetFile.parquet') > table = my_dataset.to_table(filter=~ds.field('data').is_valid()) > > On Mon, Aug 2, 2021 at 3:01 AM Fabrice Lefloch <[email protected]> wrote: >> >> Hello, >> >> Previously when using pyarrow 3.0.0 when trying to filter null columns on >> read_table I was doing it this way: >> pq.read_table(myparquetFile.parquet', filters=~ds.field(« >> my_field").is_valid()) >> It was working fine, but when upgrading top yarrow 4.0.0 I am now receiving >> an error >> "ValueError: An Expression cannot be evaluated to python True or False. If >> you are using the 'and', 'or' or 'not' operators, use '&', '|' or '~' >> instead. » >> I tried to use is_null() instead of is_valid() but with no luck either. >> >> Is there some other way to apply this filter? >> >> Thank you.
