zhengruifeng commented on pull request #31540:
URL: https://github.com/apache/spark/pull/31540#issuecomment-837702523


   > This does not necessarily solve the issue that @zsxwing detailed - the 
issue here is `registerAccumulator` should not be called in `readObject` before 
subclasses have completed readObject.
   > 
   > One possible solution would be to introduce two methods.
   > 
   > a) A protected method `doHandleDriverSideAccumulator()` in `AccumulatorV2` 
- which has all the code after `defaultReadObject` in readObject.
   > b) Call `handleDriverSideAccumulator` after `defaultReadObject` in 
`AccumulatorV2`. In `AccumulatorV2`, this protected method will simply delegate 
to `doHandleDriverSideAccumulator`.
   > c) In subclasses with local state, override 
`doHandleDriverSideAccumulator` to make it do nothing - and after readObject in 
subclass is done, invoke `doHandleDriverSideAccumulator`
   > 
   > This will ensure AccumulatorV2 and subclasses will register only after 
state has been initialized.
   > (Rough sketch, please change logic/names/etc as relevant).
   > 
   > Note, there are other accumulators with local state; we should do this for 
all.
   > Thoughts ?
   
   +1
   
   I recently impl some accv2 (some complex statistics containing transient 
lazy vars and using collections like openhashmap/array/etc) in my work, there 
are lots of NPE which make task probablly fail. I has tried the method like 
this PR, but it do not help evidently.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to