Github user JoshRosen commented on the issue:

    https://github.com/apache/spark/pull/21481
  
    Hey @kiszk, thanks for tracking this down. This change looks good to me.
    
    I have a couple of questions, mostly aimed towards figuring out how we can 
categorically solve this problem:
    
    1. What's the impact of this issue? Have you observed actual crashes or 
silent data corruption caused by this problem? I ask because it looks like this 
could be a plausible cause of a data corruption bug that I've been 
investigating.
    2. Are these five files the only occurrence of this problem or are there 
potentially others? I've noticed that all of the files modified here are 
`.java` files: is that a coincidence or is that the result of running some Java 
linting tool? If the latter, is it possible that we have similar issues in 
other files which we haven't found yet?
    3. Do you have any ideas for how we can categorically prevent this class of 
problem in the future? Are there compiler plugins or linters which can warn on 
these cases and turn them into compile-time errors? Or code-review checklists 
that we can employ to more easily spot these potential overflows? This is a 
hard problem, but I think it's worth brainstorming on categorical solutions 
since this isn't the first time we've hit this class of problem (but hopefully 
it can be the last!).
    
    I think we should definitely backport this fix, at least to 2.3 and 
possibly earlier.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to