[GitHub] spark pull request: [SPARK 9902] [MLlib] Add Java and Python examp...

mengxr Mon, 17 Aug 2015 18:09:06 -0700

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8154#discussion_r37255361
  
    --- Diff: docs/mllib-statistics.md ---
    @@ -438,22 +438,63 @@ run a 1-sample, 2-sided Kolmogorov-Smirnov test. The 
following example demonstra
     and interpret the hypothesis tests.
     
     {% highlight scala %}
    -import org.apache.spark.SparkContext
    -import org.apache.spark.mllib.stat.Statistics._
    +import org.apache.spark.mllib.stat.Statistics
     
     val data: RDD[Double] = ... // an RDD of sample data
     
     // run a KS test for the sample versus a standard normal distribution
     val testResult = Statistics.kolmogorovSmirnovTest(data, "norm", 0, 1)
     println(testResult) // summary of the test including the p-value, test 
statistic,
    -                      // and null hypothesis
    -                      // if our p-value indicates significance, we can 
reject the null hypothesis
    +                    // and null hypothesis
    +                    // if our p-value indicates significance, we can 
reject the null hypothesis
     
     // perform a KS test using a cumulative distribution function of our making
     val myCDF: Double => Double = ...
     val testResult2 = Statistics.kolmogorovSmirnovTest(data, myCDF)
     {% endhighlight %}
     </div>
    +
    +<div data-lang="java" markdown="1">
    +[`Statistics`](api/java/org/apache/spark/mllib/stat/Statistics.html) 
provides methods to
    +run a 1-sample, 2-sided Kolmogorov-Smirnov test. The following example 
demonstrates how to run
    +and interpret the hypothesis tests.
    +
    +{% highlight java %}
    +import com.google.common.collect.Lists;
    +import org.apache.spark.api.java.JavaDoubleRDD;
    +import org.apache.spark.api.java.JavaSparkContext;
    +import org.apache.spark.mllib.stat.Statistics;
    +import org.apache.spark.mllib.stat.test.KolmogorovSmirnovTestResult;
    +
    +JavaSparkContext jsc = ...
    +JavaDoubleRDD data = jsc.parallelizeDoubles(Lists.newArrayList(0.2, 1.0, 
...));
    +KolmogorovSmirnovTestResult testResult = 
Statistics.kolmogorovSmirnovTest(data, "norm", 0.0, 1.0);
    +// summary of the test including the p-value, test statistic,
    +// and null hypothesis
    +// if our p-value indicates significance, we can reject the null hypothesis
    +System.out.println(testResult1);
    +{% endhighlight %}
    +</div>
    +
    +<div data-lang="python" markdown="1">
    +[`Statistics`](api/python/index.html#pyspark.mllib.stat.Statistics$) 
provides methods to
    --- End diff --
    
    The link should be 
`api/python/pyspark.mllib.html#pyspark.mllib.stat.Statistics`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK 9902] [MLlib] Add Java and Python examp...

Reply via email to