aokolnychyi commented on PR #36304:
URL: https://github.com/apache/spark/pull/36304#issuecomment-1258618345
I want to resume working on this PR but I need feedback on one point.
In the original implementation, @cloud-fan and I discussed supporting a
separate scan builder for runtime group filtering in row-level operations. That
way, we can prune columns and push down filters while looking for groups that
have matches. We can't do that in the main row-level scan for group-based data
sources as non-matching records in matching groups have to be copied over. See
PR #35395 for context.
The only challenge is ensuring the same version of the table is scanned in
the main row-level scan and in the scan that searches for matching groups to
rewrite. There are multiple solutions to consider.
**Option 1**
The first option is shown in this PR. We can add a new method to
`RowLevelOperation` that would provide us a scan builder for runtime group
filtering.
```
interface RowLevelOperation {
// existing method
ScanBuilder newScanBuilder(CaseInsensitiveStringMap options);
// new method
default ScanBuilder newAffectedGroupsScanBuilder(CaseInsensitiveStringMap
options) {
return newScanBuilder(options);
}
...
}
```
Under this implementation, it is up to data sources to ensure the same
version is scanned in both scans. It is a fairly simple approach but it
complicates the row-level API. On top, the new method is useless for data
sources that can handle a delta of rows.
**Option 2**
The main row-level `Scan` can report scanned `tableVersion` and we can use
that information to load a correct table version in the rule that assigns a
runtime filter. This can be done via `TableCatalog$load(ident, version)`. The
only API change is to extend `Scan` with `tableVersion` to know which table
version is being read in the main scan.
**Option 3**
The rule that assigns a runtime group filter has access to the original
`Table` object. We could just call `newScanBuilder` on it. However, I don't see
anything in the API implying that reusing the `Table` instance guarantees the
same version of the table will be scanned. If we call `newScanBuilder` on the
same `Table` instance, do we expect the same version to be scanned? Seems like
it is NOT the assumption right now.
If we can somehow benefit from reusing `Table` object, it will be the
cleanest option from the API perspective.
Any ideas how to make Option 3 work?
cc @cloud-fan @rdblue @huaxingao @dongjoon-hyun @sunchao @viirya
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]