[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

MLnick Thu, 01 Feb 2018 22:42:07 -0800

Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20446#discussion_r165568368
  
    --- Diff: docs/ml-statistics.md ---
    @@ -89,4 +89,26 @@ Refer to the [`ChiSquareTest` Python 
docs](api/python/index.html#pyspark.ml.stat
     {% include_example python/ml/chi_square_test_example.py %}
     </div>
     
    +</div>
    +
    +## Summarizer
    +
    +We provide vector column summary statistics for `Dataframe` through 
`Summarizer`.
    +Available metrics are the column-wise max, min, mean, variance, and number 
of nonzeros, as well as the total count.
    +
    +<div class="codetabs">
    +<div data-lang="scala" markdown="1">
    +The following example demonstrates using 
[`Summarizer`](api/scala/index.html#org.apache.spark.ml.stat.Summarizer$)
    +to compute the mean and variance for the input dataframe, with and without 
a weight column.
    --- End diff --
    
    sorry, one more comment here
    
    I think perhaps "... to compute the mean and variance for a vector column 
of the input dataframe ..." 
    
    (and same below)



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20446: [SPARK-23254][ML] Add user guide entry for DataFr...

Reply via email to