Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/20446#discussion_r165360692
--- Diff: docs/ml-statistics.md ---
@@ -89,4 +89,26 @@ Refer to the [`ChiSquareTest` Python
docs](api/python/index.html#pyspark.ml.stat
{% include_example python/ml/chi_square_test_example.py %}
</div>
+</div>
+
+## Summarizer
+
+We provide vector column summary statistics for `Dataframe` through
`Summarizer`.
+Available metrics contain the column-wise max, min, mean, variance, and
number of nonzeros, as well as the total count.
--- End diff --
Perhaps "contain" -> "are" or "include"?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]