Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/19438#discussion_r143324966
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala ---
@@ -157,21 +157,21 @@ class DataFrameStatSuite extends QueryTest with
SharedSQLContext {
val error_single = 2 * 1000 * epsilon
val error_double = 2 * 2000 * epsilon
- assert(math.abs(single1 - q1 * n) < error_single)
- assert(math.abs(double2 - 2 * q2 * n) < error_double)
- assert(math.abs(s1 - q1 * n) < error_single)
- assert(math.abs(s2 - q2 * n) < error_single)
- assert(math.abs(d1 - 2 * q1 * n) < error_double)
- assert(math.abs(d2 - 2 * q2 * n) < error_double)
+ assert(math.abs(single1 - q1 * n) <= error_single)
--- End diff --
Were these failing?
I think the test is a little off. The input col "singles" is 0-999, not
1-1000. The median, for example, could really be any number between 499 and
500. It might conventionally be defined as 499.5 but given that this is
approximate and chooses an integral rank, 499 and 500 are OK, as are 498 and
501.
I think loosening the condition like this is OK, it's coherent. It also
strikes me that changing to `Seq.tablulate(n+1)...` above would make the
expected values implied here correct and thus also fix it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]