Kimahriman commented on PR #38853:
URL: https://github.com/apache/spark/pull/38853#issuecomment-1333651886

   
   
   
   
   > Just to give you more context for your previous comment ([#38853 
(comment)](https://github.com/apache/spark/pull/38853#issuecomment-1333073885))...
   > 
   > We have two different set of code path, 1) two physical nodes for one 
stateful operator (streaming aggregation) 2) one physical node for one stateful 
operator (others). For the latter, it only initializes read-write state store, 
and at the task completion, it only calls abort if the task failed to do 
commit. For the former, it initializes read only state store as well, which we 
call abort at the task completion to clean up the resource (NOTE: not to 
rollback as this is read-only. The name is unfortunately due to compatibility 
issue during the addition of new interface. See 
[21413b7](https://github.com/apache/spark/commit/21413b7dd4e19f725b21b92cddfbe73d1b381a05)).
   > 
   > It is safe for read-only state store to call abort() even there is another 
read-write store referring the same, because read-write store would have 
completed to call commit() if the task works correctly and we expect 
(effectively) no-op from calling abort() after commit().
   
   So the abort is there to basically make sure any writes to the read only 
state store get reverted unless a downstream read-write instance has already 
committed changes. I know there are a few different code paths for stateful 
things (sessions, flatmapgroupswithstate, etc), do we know if that abort call 
covers all the cases for https://issues.apache.org/jira/browse/SPARK-38277? Or 
do some of the other code paths not end up using that read-only path at all? Or 
would it be better just do explicitly clear the write batch anyway even if it 
is redundant with the abort call?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to