HeartSaVioR commented on PR #38853: URL: https://github.com/apache/spark/pull/38853#issuecomment-1333166532
Just to give you more context for your previous comment (https://github.com/apache/spark/pull/38853#issuecomment-1333073885)... We have two different set of code path, 1) two physical nodes for one stateful operator (streaming aggregation) 2) one physical node for one stateful operator (others). For the latter, it only initializes read-write state store, and at the task completion, it only calls abort if the task failed to do commit. For the former, it initializes read only state store as well, which we call abort at the task completion to clean up the resource (NOTE: not to rollback as this is read-only. The name is unfortunately due to compatibility issue during the addition of new interface. See https://github.com/apache/spark/commit/21413b7dd4e19f725b21b92cddfbe73d1b381a05). It is safe for read-only state store to call abort() even there is another read-write store referring the same, because read-write store would have completed to call commit() if the task works correctly and we expect (effectively) no-op from calling abort() after commit(). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
