skambha commented on pull request #29125: URL: https://github.com/apache/spark/pull/29125#issuecomment-668764003
- Sum operation is very common and heavily used by users. - Returning incorrect results silently is serious as there is no way for a user to know that their query returned incorrect results. Earlier the user would get an error and they can possibly increase the precision and rerun their query, but now they will not even know it is incorrect results unless they manually verify (which may not even be possible for large data). We are now exposing more cases which will return incorrect results now with this back port. The [Spark website](https://spark.apache.org/contributing.html) states this “Note that, **data correctness/data loss bugs are very serious**. Make sure the corresponding bug report JIRA ticket is labeled as correctness or data-loss. If the bug report doesn’t get enough attention, please send an email to [email protected], to draw more attentions." Incorrect results/data correctness are very serious As already discussed, yes the UnsafeRow has far reaching impact and has unsafe side effects. In my opinion we should not back port just this change to v3. and v2.4.x line specially in a point release and expose wrong results to user for a common operation like sum. So, my vote would be to not have this UnsafeRow only change in v3.0.x and v2.x.x — @cloud-fan Regarding your question on back porting the sum change, I think the issue was the streaming backward compatibility impact which blocked that change from going in. I am not that familiar with the streaming backward compatibility implications. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
