Re: [DISCUSS] Acero's ScanNode and Row Indexing across Scans

2023-06-01 Thread Will Jones
That's a good point, Gang. To perform deletes, we definitely need the row index, so we'll want that regardless of whether it's used in scans. > I'm not sure a mask would be the ideal solution for Iceberg (though it is a reasonable feature in its own right) because I think position-based deletes,

Re: [DISCUSS] Acero's ScanNode and Row Indexing across Scans

2023-06-01 Thread Gang Wu
IMO, the adding a row_index column from the reader is orthogonal to the mask implementation. Table formats (e.g. Apache Iceberg and Delta) require the knowledge of row index to finalize row deletion. It would be trivial to natively support row index from the file reader. Best, Gang On Fri, Jun

Re: [DISCUSS] Acero's ScanNode and Row Indexing across Scans

2023-06-01 Thread Weston Pace
I agree that having a row_index is a good approach. I'm not sure a mask would be the ideal solution for Iceberg (though it is a reasonable feature in its own right) because I think position-based deletes, in Iceberg, are still done using an anti-join and not a filter. That being said, we

[DISCUSS] [DataFusion]

2023-06-01 Thread Andrew Lamb
I would like to invite anyone with opinions or perspectives from the community to participate in two ongoing discussions about DataFusion and its future. * Move Apache Arrow Datafusion to a new top level Apache projection [1] * Goals / Vision for DataFusion [2] Thank you, Andrew [1]: