[ https://issues.apache.org/jira/browse/ARROW-12114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ben Kietzman closed ARROW-12114. -------------------------------- Resolution: Not A Problem > [C++] Dataset to table filter expression API change > --------------------------------------------------- > > Key: ARROW-12114 > URL: https://issues.apache.org/jira/browse/ARROW-12114 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Reporter: Diana Clarke > Assignee: Ben Kietzman > Priority: Major > > Ben: > Can you please confirm that we're aware and okay with the following API > change? Thanks! > {code} > import pyarrow.dataset > path_prefix = "ursa-labs-taxi-data-repartitioned-10k/" > paths = [ > > f"ursa-labs-taxi-data-repartitioned-10k/{year}/{month:02}/{part:04}/data.parquet" > for year in range(2009, 2020) > for month in range(1, 13) > for part in range(101) > if not (year == 2019 and month > 6) # Data ends in 2019/06 > and not (year == 2010 and month == 3) # Data is missing in 2010/03 > ] > partitioning = pyarrow.dataset.DirectoryPartitioning.discover( > field_names=["year", "month", "part"], > infer_dictionary=True, > ) > s3 = pyarrow.fs.S3FileSystem(region="us-east-2") > dataset = pyarrow.dataset.dataset( > paths, > format="parquet", > filesystem=s3, > partitioning=partitioning, > partition_base_dir=path_prefix, > ) > year = pyarrow.dataset.field("year") > month = pyarrow.dataset.field("month") > part = pyarrow.dataset.field("part") > filter_expr = (year == "2011") & (month == 1) & (part == 2) > dataset.to_table(filter=filter_expr) > {code} > In arrow 3.0, the above code executes without error. > On head[1], {{year == "2011"}}, which should be {{year == 2011}} (no quotes), > raises the following exception. > {code} > pyarrow.lib.ArrowNotImplementedError: Function equal has no kernel matching > input types (array[int32], scalar[string]) > {code} > This API change appears to have been introduced in ARROW-8919. Perhaps it was > intentional, just figured we should double check. Thanks again! > [1] {{51c97799b8302466b9dfbb657dc23fd3f0cd8e61}} -- This message was sent by Atlassian Jira (v8.3.4#803005)