skambha commented on pull request #29125:
URL: https://github.com/apache/spark/pull/29125#issuecomment-670210823
IIUC, The solutions you mention were also discussed earlier and were not
accepted by you. If you do not want to revert this backport, then I hope you
agree it is critical to fix it so users do not run into this incorrectness
issue. Please feel free to go ahead with the option you prefer.
I have expressed the issues and will summarize them below and also put it in
the JIRA.
The important issue is we should not return incorrect results. In general,
it is not a good practice to back port a change to a stable branch and cause
more queries to return incorrect results.
Just to reiterate:
1. This current PR that has back ported the UnsafeRow fix causes queries to
return incorrect results. This is for v2.4.x and v3.0.x line. This change by
itself has unsafe side effects and results in incorrect results being returned.
2. It does not matter whether you have whole stage on or off, ansi on or
off, you will get more queries returning incorrect results.
```
scala> val decStr = "1" + "0" * 19
decStr: String = 10000000000000000000
scala> val d3 = spark.range(0, 1, 1, 1).union(spark.range(0, 11, 1, 1))
d3: org.apache.spark.sql.Dataset[Long] = [id: bigint]
scala> val d5 = d3.select(expr(s"cast('$decStr' as decimal (38, 18)) as
d"),lit(1).as("key")).groupBy("key").agg(sum($"d").alias("sumd")).select($"sumd")
d5: org.apache.spark.sql.DataFrame = [sumd: decimal(38,18)]
scala> d5.show(false) <---- INCORRECT RESULTS RETURNED
+---------------------------------------+
|sumd |
+---------------------------------------+
|20000000000000000000.000000000000000000|
+---------------------------------------+
```
3. Incorrect results is very serious and it is not good for Spark users to
run into it for common operations like sum.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]