gengliangwang commented on code in PR #56121:
URL: https://github.com/apache/spark/pull/56121#discussion_r3364760596


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/transactions/Transaction.java:
##########
@@ -66,6 +67,20 @@ public interface Transaction extends Closeable {
    */
   void abort();
 
+  /**
+   * Attempts to register materialized scans against this transaction's read 
set.
+   * <p>
+   * An example use case is cache reuse. Spark passes the scans of a candidate 
cached subtree
+   * for the transaction's catalog and the connector decides whether to accept 
them.
+   * <p>
+   * The connector must either accept (returning {@code true} after adding any 
relevant scans
+   * to the read set) or refuse (returning {@code false} without modifying the 
read set).
+   *
+   * @param scans the scans to register.

Review Comment:
   Two small doc points on this `@Evolving` API:
   
   1. The accept/refuse decision is all-or-nothing for the whole `scans` array, 
but the doc doesn't say so — "must either accept … or refuse" can read as if it 
were per-scan. Worth one sentence that the return value covers the entire 
batch: the connector accepts reuse of all the passed scans or none, there's no 
partial acceptance — a `false` makes Spark bypass the whole cached subtree.
   
   2. `@param scans the scans to register.` is circular (it restates the method 
name). For a public API, describe what they are:
   ```suggestion
      * @param scans the materialized scans Spark offers for registration 
against this transaction's read set.
   ```
   
   Both are contract-level, so they keep the body generic per @aokolnychyi's 
earlier point about not over-fitting to caching. The caller-specific scope — 
that Spark passes only this one cached subtree's scans, already filtered to the 
transaction's catalog — is better documented at the call site 
(`CacheManager.validateCachedEntryForTransaction`) than baked into this 
contract.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to