Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20446#discussion_r165360692 --- Diff: docs/ml-statistics.md --- @@ -89,4 +89,26 @@ Refer to the [`ChiSquareTest` Python docs](api/python/index.html#pyspark.ml.stat {% include_example python/ml/chi_square_test_example.py %} </div> +</div> + +## Summarizer + +We provide vector column summary statistics for `Dataframe` through `Summarizer`. +Available metrics contain the column-wise max, min, mean, variance, and number of nonzeros, as well as the total count. --- End diff -- Perhaps "contain" -> "are" or "include"?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org