Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/8154#discussion_r37105837
--- Diff: docs/mllib-statistics.md ---
@@ -438,22 +438,42 @@ run a 1-sample, 2-sided Kolmogorov-Smirnov test. The
following example demonstra
and interpret the hypothesis tests.
{% highlight scala %}
-import org.apache.spark.SparkContext
-import org.apache.spark.mllib.stat.Statistics._
+import org.apache.spark.mllib.stat.Statistics
val data: RDD[Double] = ... // an RDD of sample data
// run a KS test for the sample versus a standard normal distribution
val testResult = Statistics.kolmogorovSmirnovTest(data, "norm", 0, 1)
println(testResult) // summary of the test including the p-value, test
statistic,
- // and null hypothesis
- // if our p-value indicates significance, we can
reject the null hypothesis
+ // and null hypothesis
+ // if our p-value indicates significance, we can
reject the null hypothesis
// perform a KS test using a cumulative distribution function of our making
val myCDF: Double => Double = ...
val testResult2 = Statistics.kolmogorovSmirnovTest(data, myCDF)
{% endhighlight %}
</div>
+
+<div data-lang="python" markdown="1">
+[`Statistics`](api/python/index.html#org.apache.spark.mllib.stat.Statistics$)
provides methods to
+run a 1-sample, 2-sided Kolmogorov-Smirnov test. The following example
demonstrates how to run
+and interpret the hypothesis tests.
+
+{% highlight python %}
+from pyspark.mllib.stat import Statistics
+
+localData = [1.0, 2.0, ... ] #a list of doubles
+parallelData = sc.parallelize(localData) # an RDD of Double
--- End diff --
`data = sc.parallelize([1.0, 2.0, ...])` should be sufficient.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]