Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/8154#discussion_r37255361
--- Diff: docs/mllib-statistics.md ---
@@ -438,22 +438,63 @@ run a 1-sample, 2-sided Kolmogorov-Smirnov test. The
following example demonstra
and interpret the hypothesis tests.
{% highlight scala %}
-import org.apache.spark.SparkContext
-import org.apache.spark.mllib.stat.Statistics._
+import org.apache.spark.mllib.stat.Statistics
val data: RDD[Double] = ... // an RDD of sample data
// run a KS test for the sample versus a standard normal distribution
val testResult = Statistics.kolmogorovSmirnovTest(data, "norm", 0, 1)
println(testResult) // summary of the test including the p-value, test
statistic,
- // and null hypothesis
- // if our p-value indicates significance, we can
reject the null hypothesis
+ // and null hypothesis
+ // if our p-value indicates significance, we can
reject the null hypothesis
// perform a KS test using a cumulative distribution function of our making
val myCDF: Double => Double = ...
val testResult2 = Statistics.kolmogorovSmirnovTest(data, myCDF)
{% endhighlight %}
</div>
+
+<div data-lang="java" markdown="1">
+[`Statistics`](api/java/org/apache/spark/mllib/stat/Statistics.html)
provides methods to
+run a 1-sample, 2-sided Kolmogorov-Smirnov test. The following example
demonstrates how to run
+and interpret the hypothesis tests.
+
+{% highlight java %}
+import com.google.common.collect.Lists;
+import org.apache.spark.api.java.JavaDoubleRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.mllib.stat.Statistics;
+import org.apache.spark.mllib.stat.test.KolmogorovSmirnovTestResult;
+
+JavaSparkContext jsc = ...
+JavaDoubleRDD data = jsc.parallelizeDoubles(Lists.newArrayList(0.2, 1.0,
...));
+KolmogorovSmirnovTestResult testResult =
Statistics.kolmogorovSmirnovTest(data, "norm", 0.0, 1.0);
+// summary of the test including the p-value, test statistic,
+// and null hypothesis
+// if our p-value indicates significance, we can reject the null hypothesis
+System.out.println(testResult1);
+{% endhighlight %}
+</div>
+
+<div data-lang="python" markdown="1">
+[`Statistics`](api/python/index.html#pyspark.mllib.stat.Statistics$)
provides methods to
--- End diff --
The link should be
`api/python/pyspark.mllib.html#pyspark.mllib.stat.Statistics`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]