That's a good point, Gang. To perform deletes, we definitely need the row
index, so we'll want that regardless of whether it's used in scans.
> I'm not sure a mask would be the ideal solution for Iceberg (though it is
a reasonable feature in its own right) because I think position-based
deletes,
IMO, the adding a row_index column from the reader is orthogonal to
the mask implementation. Table formats (e.g. Apache Iceberg and
Delta) require the knowledge of row index to finalize row deletion. It
would be trivial to natively support row index from the file reader.
Best,
Gang
On Fri, Jun
I agree that having a row_index is a good approach. I'm not sure a mask
would be the ideal solution for Iceberg (though it is a reasonable feature
in its own right) because I think position-based deletes, in Iceberg, are
still done using an anti-join and not a filter.
That being said, we
I would like to invite anyone with opinions or perspectives from the
community to participate in two ongoing discussions about DataFusion and
its future.
* Move Apache Arrow Datafusion to a new top level Apache projection [1]
* Goals / Vision for DataFusion [2]
Thank you,
Andrew
[1]: