aokolnychyi commented on code in PR #36304: URL: https://github.com/apache/spark/pull/36304#discussion_r855452395
########## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/RowLevelOperation.java: ########## @@ -69,6 +70,33 @@ default String description() { */ ScanBuilder newScanBuilder(CaseInsensitiveStringMap options); + /** + * Returns a {@link ScanBuilder} to configure a {@link Scan} for runtime filtering of groups + * that are affected by this row-level operation. + * <p> + * Data sources that replace groups of data (e.g. files, partitions) may exclude entire groups + * using provided data source filters when building the primary scan for this row-level operation. + * However, such data skipping is limited as not all expressions can be converted into data source + * filters and some can only be evaluated by Spark (e.g. subqueries). Since rewriting groups is + * expensive, Spark allows group-based data sources to filter groups at runtime. The runtime + * filtering enables data sources to narrow down the scope of rewriting to only groups that must + * be rewritten. If the primary scan implements {@link SupportsRuntimeFiltering}, Spark will + * dynamically execute a query to find which records match the condition. The information about + * these records will be then passed back into the primary scan, allowing data sources to discard + * groups that don't have to be rewritten. + * <p> + * This method allows data sources to provide a dedicated scan builder for group filtering. + * Scans built for runtime group filtering are not required to produce all rows in a group + * if any are returned. Instead, they can push filters into files (file granularity) or + * prune files within partitions (partition granularity). + * <p> + * Data sources that rely on multi-version concurrency control must ensure the same version of + * the table is scanned in both scans. + */ + default ScanBuilder newAffectedGroupsScanBuilder(CaseInsensitiveStringMap options) { Review Comment: I am not entirely happy with the API (naming ideas are welcome) but let me explain my current thoughts. I like using runtime filtering to filter our groups as it allows us to benefit from the existing runtime filtering framework and get features like reuse of subqueries for free. The runtime filter is also part of the same plan shown in the Spark UI so users won't have to guess which Spark jobs belong to the DELETE operation. We also don't have to execute any queries in the optimizer, which makes EXPLAIN fast and descriptive. The rule below shows when a runtime filter is injected. It is fairly straightforward. Instead of exposing this method, I considered scanning `Table` which we can access via `RowLevelOperationTable`. However, we need to ensure the same version/snapshot of the table is scanned in the primary and filtering scans. Just reusing the same `Table` instance in both queries does not seem to guarantee the same version of the table will be scanned. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org