Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8154#discussion_r37105837
  
    --- Diff: docs/mllib-statistics.md ---
    @@ -438,22 +438,42 @@ run a 1-sample, 2-sided Kolmogorov-Smirnov test. The 
following example demonstra
     and interpret the hypothesis tests.
     
     {% highlight scala %}
    -import org.apache.spark.SparkContext
    -import org.apache.spark.mllib.stat.Statistics._
    +import org.apache.spark.mllib.stat.Statistics
     
     val data: RDD[Double] = ... // an RDD of sample data
     
     // run a KS test for the sample versus a standard normal distribution
     val testResult = Statistics.kolmogorovSmirnovTest(data, "norm", 0, 1)
     println(testResult) // summary of the test including the p-value, test 
statistic,
    -                      // and null hypothesis
    -                      // if our p-value indicates significance, we can 
reject the null hypothesis
    +                    // and null hypothesis
    +                    // if our p-value indicates significance, we can 
reject the null hypothesis
     
     // perform a KS test using a cumulative distribution function of our making
     val myCDF: Double => Double = ...
     val testResult2 = Statistics.kolmogorovSmirnovTest(data, myCDF)
     {% endhighlight %}
     </div>
    +
    +<div data-lang="python" markdown="1">
    
+[`Statistics`](api/python/index.html#org.apache.spark.mllib.stat.Statistics$) 
provides methods to
    +run a 1-sample, 2-sided Kolmogorov-Smirnov test. The following example 
demonstrates how to run
    +and interpret the hypothesis tests.
    +
    +{% highlight python %}
    +from pyspark.mllib.stat import Statistics
    +
    +localData = [1.0, 2.0, ... ] #a list of doubles
    +parallelData =  sc.parallelize(localData) # an RDD of Double
    --- End diff --
    
    `data = sc.parallelize([1.0, 2.0, ...])` should be sufficient.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to