Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/20446#discussion_r165568368
--- Diff: docs/ml-statistics.md ---
@@ -89,4 +89,26 @@ Refer to the [`ChiSquareTest` Python
docs](api/python/index.html#pyspark.ml.stat
{% include_example python/ml/chi_square_test_example.py %}
</div>
+</div>
+
+## Summarizer
+
+We provide vector column summary statistics for `Dataframe` through
`Summarizer`.
+Available metrics are the column-wise max, min, mean, variance, and number
of nonzeros, as well as the total count.
+
+<div class="codetabs">
+<div data-lang="scala" markdown="1">
+The following example demonstrates using
[`Summarizer`](api/scala/index.html#org.apache.spark.ml.stat.Summarizer$)
+to compute the mean and variance for the input dataframe, with and without
a weight column.
--- End diff --
sorry, one more comment here
I think perhaps "... to compute the mean and variance for a vector column
of the input dataframe ..."
(and same below)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]