aokolnychyi commented on PR #36304:
URL: https://github.com/apache/spark/pull/36304#issuecomment-1258618345

   I want to resume working on this PR but I need feedback on one point.
   
   In the original implementation, @cloud-fan and I discussed supporting a 
separate scan builder for runtime group filtering in row-level operations. That 
way, we can prune columns and push down filters while looking for groups that 
have matches. We can't do that in the main row-level scan for group-based data 
sources as non-matching records in matching groups have to be copied over. See 
PR #35395 for context.
   
   The only challenge is ensuring the same version of the table is scanned in 
the main row-level scan and in the scan that searches for matching groups to 
rewrite. There are multiple solutions to consider.
   
   **Option 1**
   
   The first option is shown in this PR. We can add a new method to 
`RowLevelOperation` that would provide us a scan builder for runtime group 
filtering.
   
   ```
   interface RowLevelOperation {
     // existing method
     ScanBuilder newScanBuilder(CaseInsensitiveStringMap options);
   
     // new method
     default ScanBuilder newAffectedGroupsScanBuilder(CaseInsensitiveStringMap 
options) {
        return newScanBuilder(options);
     }
   
     ...
   }
   ```
   
   Under this implementation, it is up to data sources to ensure the same 
version is scanned in both scans. It is a fairly simple approach but it 
complicates the row-level API. On top, the new method is useless for data 
sources that can handle a delta of rows.
   
   **Option 2**
   
   The main row-level `Scan` can report scanned `tableVersion` and we can use 
that information to load a correct table version in the rule that assigns a 
runtime filter. This can be done via `TableCatalog$load(ident, version)`. The 
only API change is to extend `Scan` with `tableVersion` to know which table 
version is being read in the main scan.
   
   **Option 3**
   
   The rule that assigns a runtime group filter has access to the original 
`Table` object. We could just call `newScanBuilder` on it. However, I don't see 
anything in the API implying that reusing the `Table` instance guarantees the 
same version of the table will be scanned. If we call `newScanBuilder` on the 
same `Table` instance, do we expect the same version to be scanned? Seems like 
it is NOT the assumption right now.
   
   If we can somehow benefit from reusing `Table` object, it will be the 
cleanest option from the API perspective.
   
   Any ideas how to make Option 3 work?
   
   cc @cloud-fan @rdblue @huaxingao @dongjoon-hyun @sunchao @viirya
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to