Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/19156#discussion_r137603986
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -109,31 +108,47 @@ object Summarizer extends Logging {
}
@Since("2.3.0")
- def mean(col: Column): Column = getSingleMetric(col, "mean")
+ def mean(col: Column, weightCol: Column = lit(1.0)): Column = {
--- End diff --
I am not a fan of default parameters, it tends to cause issues with binary
compatibility. Unless you have some good reasons, you should have two different
functions:
```scala
def mean(col: Column): Column = mean(col, lit(1.0))
def mean(col: Column, weightCol: Column): Column = ...
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]