HeartSaVioR commented on a change in pull request #26935:
URL: https://github.com/apache/spark/pull/26935#discussion_r512328018



##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala
##########
@@ -89,16 +116,16 @@ trait StateStore {
   def commit(): Long
 
   /**
-   * Abort all the updates that have been made to the store. Implementations 
should ensure that
-   * no more updates (puts, removes) can be after an abort in order to avoid 
incorrect usage.
+   * Return an iterator containing all the key-value pairs in the StateStore. 
Implementations must
+   * ensure that updates (puts, removes) can be made while iterating over this 
iterator.
    */
-  def abort(): Unit
+  override def iterator(): Iterator[UnsafeRowPair]
 
   /**
-   * Return an iterator containing all the key-value pairs in the StateStore. 
Implementations must
-   * ensure that updates (puts, removes) can be made while iterating over this 
iterator.
+   * Abort all the updates that have been made to the store. Implementations 
should ensure that
+   * no more updates (puts, removes) can be after an abort in order to avoid 
incorrect usage.
    */
-  def iterator(): Iterator[UnsafeRowPair]
+  override def abort(): Unit

Review comment:
       The pattern `prepare -> commit or abort -> (abort if commit fails)` is 
widely used across Spark, which both `commit` and `abort` have the 
responsibility of cleaning up. (`abort` needs to be careful of not impacted by 
double cleanup)
   
   In ReadStateStore, the pattern is simplified to `prepare -> abort`, which 
`abort` has the responsibility of cleaning up.
   
   I see one case the issue may come up:
   
   - `class A implements ReadStateStore`
   - `class B extends A implements StateStore`
   - Both A and B have its own resources and needs to be cleaned up.
   
   In this case, correctly cleaning up A's resource from B becomes non-trivial. 
B can call `A.abort` in both `B.commit` and `B.abort` to make sure A's resource 
is cleaned up but it is B's responsibility to not call A.abort multiple times 
as A would think there's no double cleanup as the simplified pattern.
   
   The answer is probably the same as before: we know the problem - we need a 
`close` method, but backward compatibility also matters as we should provide 
the default implementation of `close` which is very tricky for current 
implementations (`close` should rely on resource cleanup in `commit`/`abort`) 
and we decided not to.
   
   My feeling is that once the provider has different implementations between 
ReadStateStore and StateStore they would have different implementations (no 
inheritance between them). If it isn't  quite sure and we don't want to open 
the possibility, then adding `close` could be probably reconsidered.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to