peter-toth commented on code in PR #42559:
URL: https://github.com/apache/spark/pull/42559#discussion_r1300043130


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -4368,6 +4368,16 @@ object SQLConf {
       .booleanConf
       .createWithDefault(true)
 
+  val LEGACY_PERCENTILE_DISC_CALCULATION = 
buildConf("spark.sql.legacy.percentileDiscCalculation")
+    .internal()
+    .doc("If true, the old bogus percentile_disc calculation is used. The old 
calculation " +
+      "incorrectly mapped the requested percentile to the sorted range of 
values in some cases " +
+      "and so returned incorrect results. Also, the new implementation is 
faster as it doesn't " +
+      "contain the interpolation logic that the old percentile_cont based one 
did.")
+    .version("3.3.4")

Review Comment:
   This bug was introduced with the very first version of percentile_disc in 
3.3.0: https://issues.apache.org/jira/browse/SPARK-37691 so 3.3.4 seems to be 
the earliest still active release where we should backport this fix to.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to