Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/21481 Hey @kiszk, thanks for tracking this down. This change looks good to me. I have a couple of questions, mostly aimed towards figuring out how we can categorically solve this problem: 1. What's the impact of this issue? Have you observed actual crashes or silent data corruption caused by this problem? I ask because it looks like this could be a plausible cause of a data corruption bug that I've been investigating. 2. Are these five files the only occurrence of this problem or are there potentially others? I've noticed that all of the files modified here are `.java` files: is that a coincidence or is that the result of running some Java linting tool? If the latter, is it possible that we have similar issues in other files which we haven't found yet? 3. Do you have any ideas for how we can categorically prevent this class of problem in the future? Are there compiler plugins or linters which can warn on these cases and turn them into compile-time errors? Or code-review checklists that we can employ to more easily spot these potential overflows? This is a hard problem, but I think it's worth brainstorming on categorical solutions since this isn't the first time we've hit this class of problem (but hopefully it can be the last!). I think we should definitely backport this fix, at least to 2.3 and possibly earlier.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org