[ https://issues.apache.org/jira/browse/ARROW-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney updated ARROW-6923: -------------------------------- Fix Version/s: 1.0.0 > [C++] Option for Filter kernel how to handle nulls in the selection vector > -------------------------------------------------------------------------- > > Key: ARROW-6923 > URL: https://issues.apache.org/jira/browse/ARROW-6923 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Joris Van den Bossche > Priority: Major > Fix For: 1.0.0 > > > How nulls are handled in the boolean mask (selection vector) in a filter > kernel varies between languages / data analytics systems (e.g. base R > propagates nulls, dplyr R skips (sees as False), SQL generally skips them as > well I think, Julia raises an error). > Currently, in Arrow C++ we "propagate" nulls (null in the selection vector > gives a null in the output): > {code} > In [7]: arr = pa.array([1, 2, 3]) > In [8]: mask = pa.array([True, False, None]) > In [9]: arr.filter(mask) > Out[9]: > <pyarrow.lib.Int64Array object at 0x7fefe44b3048> > [ > 1, > null > ] > {code} > Given the different ways this could be done (propagate, skip, error), should > we provide an option to control this behaviour? -- This message was sent by Atlassian Jira (v8.3.4#803005)