zhengruifeng commented on pull request #31540: URL: https://github.com/apache/spark/pull/31540#issuecomment-837702523
> This does not necessarily solve the issue that @zsxwing detailed - the issue here is `registerAccumulator` should not be called in `readObject` before subclasses have completed readObject. > > One possible solution would be to introduce two methods. > > a) A protected method `doHandleDriverSideAccumulator()` in `AccumulatorV2` - which has all the code after `defaultReadObject` in readObject. > b) Call `handleDriverSideAccumulator` after `defaultReadObject` in `AccumulatorV2`. In `AccumulatorV2`, this protected method will simply delegate to `doHandleDriverSideAccumulator`. > c) In subclasses with local state, override `doHandleDriverSideAccumulator` to make it do nothing - and after readObject in subclass is done, invoke `doHandleDriverSideAccumulator` > > This will ensure AccumulatorV2 and subclasses will register only after state has been initialized. > (Rough sketch, please change logic/names/etc as relevant). > > Note, there are other accumulators with local state; we should do this for all. > Thoughts ? +1 I recently impl some accv2 (some complex statistics containing transient lazy vars and using collections like openhashmap/array/etc) in my work, there are lots of NPE which make task probablly fail. I has tried the method like this PR, but it do not help evidently. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org