aokolnychyi commented on code in PR #36304:
URL: https://github.com/apache/spark/pull/36304#discussion_r984982201
##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/RowLevelOperation.java:
##########
@@ -69,6 +70,33 @@ default String description() {
*/
ScanBuilder newScanBuilder(CaseInsensitiveStringMap options);
+ /**
+ * Returns a {@link ScanBuilder} to configure a {@link Scan} for runtime
filtering of groups
+ * that are affected by this row-level operation.
+ * <p>
+ * Data sources that replace groups of data (e.g. files, partitions) may
exclude entire groups
+ * using provided data source filters when building the primary scan for
this row-level operation.
+ * However, such data skipping is limited as not all expressions can be
converted into data source
+ * filters and some can only be evaluated by Spark (e.g. subqueries). Since
rewriting groups is
+ * expensive, Spark allows group-based data sources to filter groups at
runtime. The runtime
+ * filtering enables data sources to narrow down the scope of rewriting to
only groups that must
+ * be rewritten. If the primary scan implements {@link
SupportsRuntimeFiltering}, Spark will
+ * dynamically execute a query to find which records match the condition.
The information about
+ * these records will be then passed back into the primary scan, allowing
data sources to discard
+ * groups that don't have to be rewritten.
+ * <p>
+ * This method allows data sources to provide a dedicated scan builder for
group filtering.
+ * Scans built for runtime group filtering are not required to produce all
rows in a group
+ * if any are returned. Instead, they can push filters into files (file
granularity) or
+ * prune files within partitions (partition granularity).
+ * <p>
+ * Data sources that rely on multi-version concurrency control must ensure
the same version of
+ * the table is scanned in both scans.
+ */
+ default ScanBuilder newAffectedGroupsScanBuilder(CaseInsensitiveStringMap
options) {
Review Comment:
Resolving as it no longer applies.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]