skambha commented on pull request #29125:
URL: https://github.com/apache/spark/pull/29125#issuecomment-670210823


   IIUC, The solutions you mention were also discussed earlier and were not 
accepted by you. If you do not want to revert this  backport, then I hope you 
agree it is critical to fix it so users do not run into this incorrectness 
issue.  Please feel free to go ahead with the option you prefer.  
   
   I have expressed the issues and will summarize them below and also put it in 
the JIRA.  
   
   The important issue is we should not return incorrect results.  In general, 
it is not a good practice to back port a change to a stable branch and cause 
more queries to return incorrect results.
   
   Just to reiterate:
   
   1. This current PR that has back ported the UnsafeRow fix causes queries to 
return incorrect results.  This is for v2.4.x and v3.0.x line.   This change by 
itself has unsafe side effects and results in incorrect results being returned. 
  
   2. It does not matter whether you have whole stage on or off, ansi on or 
off, you will get more queries returning incorrect results.
   ``` 
   
   scala> val decStr = "1" + "0" * 19
   decStr: String = 10000000000000000000
   
   scala> val d3 = spark.range(0, 1, 1, 1).union(spark.range(0, 11, 1, 1))
   d3: org.apache.spark.sql.Dataset[Long] = [id: bigint]
   
   scala>  val d5 = d3.select(expr(s"cast('$decStr' as decimal (38, 18)) as 
d"),lit(1).as("key")).groupBy("key").agg(sum($"d").alias("sumd")).select($"sumd")
   d5: org.apache.spark.sql.DataFrame = [sumd: decimal(38,18)]
   
   scala> d5.show(false)   <----  INCORRECT RESULTS RETURNED
   +---------------------------------------+
   |sumd                                   |
   +---------------------------------------+
   |20000000000000000000.000000000000000000|
   +---------------------------------------+
   
   ```
   3.  Incorrect results is very serious and it is not good for Spark users to 
run into it for common operations like sum.
      


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to