Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20446#discussion_r165362148 --- Diff: docs/ml-statistics.md --- @@ -89,4 +89,26 @@ Refer to the [`ChiSquareTest` Python docs](api/python/index.html#pyspark.ml.stat {% include_example python/ml/chi_square_test_example.py %} </div> +</div> + +## Summarizer + +We provide vector column summary statistics for `Dataframe` through `Summarizer`. +Available metrics contain the column-wise max, min, mean, variance, and number of nonzeros, as well as the total count. + +<div class="codetabs"> +<div data-lang="scala" markdown="1"> +[`Summarizer`](api/scala/index.html#org.apache.spark.ml.stat.Summarizer$) --- End diff -- Perhaps "The following example demonstrates using `Summarizer`(...) to compute the mean and variance for the input dataframe, with and without a weight column"?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org