Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/20446#discussion_r165362148
--- Diff: docs/ml-statistics.md ---
@@ -89,4 +89,26 @@ Refer to the [`ChiSquareTest` Python
docs](api/python/index.html#pyspark.ml.stat
{% include_example python/ml/chi_square_test_example.py %}
</div>
+</div>
+
+## Summarizer
+
+We provide vector column summary statistics for `Dataframe` through
`Summarizer`.
+Available metrics contain the column-wise max, min, mean, variance, and
number of nonzeros, as well as the total count.
+
+<div class="codetabs">
+<div data-lang="scala" markdown="1">
+[`Summarizer`](api/scala/index.html#org.apache.spark.ml.stat.Summarizer$)
--- End diff --
Perhaps "The following example demonstrates using `Summarizer`(...) to
compute the mean and variance for the input dataframe, with and without a
weight column"?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]