Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42889558
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -930,3 +930,327 @@ object HyperLogLogPlusPlus {
)
// scalastyle:on
}
+
+/**
+ * A central moment is the expected value of a specified power of the
deviation of a random
+ * variable from the mean. Central moments are often used to characterize
the properties of about
+ * the shape of a distribution.
+ *
+ * This class implements online, one-pass algorithms for computing the
central moments of a set of
+ * points.
+ *
+ * References:
+ * - Xiangrui Meng. "Simpler Online Updates for Arbitrary-Order Central
Moments."
+ * 2015. http://arxiv.org/abs/1510.04923
+ *
+ * @see [[https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+ * Algorithms for calculating variance (Wikipedia)]]
+ *
+ * @param child to compute central moments of.
+ */
+abstract class CentralMomentAgg(child: Expression) extends
ImperativeAggregate with Serializable {
+
+ /**
+ * The maximum central moment order to be computed.
+ */
+ protected def momentOrder: Int
+
+ /**
+ * Array of sufficient moments need to compute the aggregate statistic.
+ */
+ protected def sufficientMoments: Array[Int]
--- End diff --
Removed this def and instead pass all moments up to the maximum moment to
the `getStatistic` function.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]