Github user holdenk commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42697800
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala ---
@@ -221,4 +221,40 @@ class DataFrameAggregateSuite extends QueryTest with
SharedSQLContext {
emptyTableData.agg(sumDistinct('a)),
Row(null))
}
+
+ test("moments") {
+ checkAnswer(
+ testData2.agg(skewness('a)),
+ Row(0.0))
+
+ checkAnswer(
--- End diff --
So there are a few options for how to go about this. In spark-testing-base
(which I'd like to eventually merge some of the functionality of) I made a
modified version of checkAnswer which also takes tolerance and does tolerance
based matching for floating point data (see
https://github.com/holdenk/spark-testing-base/blob/master/src/main/scala/com/holdenkarau/spark/testing/DataFrameSuiteBase.scala
& approxEqualDataFrames + approxEquals for comparing the rows). The other
option would be, since for these tests its just a single value, just collect
the result back, extract the value and do a regular assertion with tolerance.
I think collecting the result and just checking the single value is
probably the best plan for this PR and eventually maybe looking at a more
general version of check answer.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]