HeartSaVioR commented on a change in pull request #26935:
URL: https://github.com/apache/spark/pull/26935#discussion_r512328018
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala
##########
@@ -89,16 +116,16 @@ trait StateStore {
def commit(): Long
/**
- * Abort all the updates that have been made to the store. Implementations
should ensure that
- * no more updates (puts, removes) can be after an abort in order to avoid
incorrect usage.
+ * Return an iterator containing all the key-value pairs in the StateStore.
Implementations must
+ * ensure that updates (puts, removes) can be made while iterating over this
iterator.
*/
- def abort(): Unit
+ override def iterator(): Iterator[UnsafeRowPair]
/**
- * Return an iterator containing all the key-value pairs in the StateStore.
Implementations must
- * ensure that updates (puts, removes) can be made while iterating over this
iterator.
+ * Abort all the updates that have been made to the store. Implementations
should ensure that
+ * no more updates (puts, removes) can be after an abort in order to avoid
incorrect usage.
*/
- def iterator(): Iterator[UnsafeRowPair]
+ override def abort(): Unit
Review comment:
The pattern `prepare -> commit or abort -> (abort if commit fails)` is
widely used across Spark, which both `commit` and `abort` have the
responsibility of cleaning up. (`abort` needs to be careful of not impacted by
double cleanup)
In ReadStateStore, the pattern is simplified to `prepare -> abort`, which
`abort` has the responsibility of cleaning up.
I see one case the issue may come up:
- `class A implements ReadStateStore`
- `class B extends A implements StateStore`
- Both A and B have its own resources and needs to be cleaned up.
In this case, correctly cleaning up A's resource from B becomes non-trivial.
B can call `A.abort` in both `B.commit` and `B.abort` to make sure A's resource
is cleaned up but it is B's responsibility to not call A.abort multiple times
as A would think there's no double cleanup as the simplified pattern.
The answer is probably the same as before: we know the problem - we need a
`close` method, but backward compatibility also matters as we should provide
the default implementation of `close` which is very tricky for current
implementations (`close` should rely on resource cleanup in `commit`/`abort`)
and we decided not to.
My feeling is that once the provider has different implementations between
ReadStateStore and StateStore they would have different implementations (no
inheritance between them). If it isn't quite sure and we don't want to open
the possibility, then adding `close` could be probably reconsidered.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]