Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20446#discussion_r165362148
  
    --- Diff: docs/ml-statistics.md ---
    @@ -89,4 +89,26 @@ Refer to the [`ChiSquareTest` Python 
docs](api/python/index.html#pyspark.ml.stat
     {% include_example python/ml/chi_square_test_example.py %}
     </div>
     
    +</div>
    +
    +## Summarizer
    +
    +We provide vector column summary statistics for `Dataframe` through 
`Summarizer`.
    +Available metrics contain the column-wise max, min, mean, variance, and 
number of nonzeros, as well as the total count.
    +
    +<div class="codetabs">
    +<div data-lang="scala" markdown="1">
    +[`Summarizer`](api/scala/index.html#org.apache.spark.ml.stat.Summarizer$)
    --- End diff --
    
    Perhaps "The following example demonstrates using `Summarizer`(...) to 
compute the mean and variance for the input dataframe, with and without a 
weight column"?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to