Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/6994#discussion_r34077619
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/Statistics.scala 
---
    @@ -158,4 +158,25 @@ object Statistics {
       def chiSqTest(data: RDD[LabeledPoint]): Array[ChiSqTestResult] = {
         ChiSqTest.chiSquaredFeatures(data)
       }
    +
    +  /**
    +   * Conduct a one-sample, two sided Kolmogorov Smirnov test for 
probability distribution equality
    +   * @param data an `RDD[Double]` containing the sample of data to test
    +   * @param cdf a `Double => Double` function to calculate the theoretical 
CDF at a given value
    +   * @return KSTestResult object containing test statistic, p-value, and 
null hypothesis.
    +   */
    +  def ksTest(data: RDD[Double], cdf: Double => Double): KSTestResult = {
    +    KSTest.testOneSample(data, cdf)
    +  }
    +
    +  /**
    +   * Convenience function to conduct a one-sample, two sided Kolmogorov 
Smirnov test for probability
    +   * distribution equality. Currently supports standard normal 
distribution only.
    +   * @param data an `RDD[Double]` containing the sample of data to test
    +   * @param name a `String` name for a theoretical distribution
    --- End diff --
    
    The API doesn't take a Math3 object though... it just has to be a function 
and that can be serializable. It can be a reusable wrapper around any Math3 
distribution, that uses un-serializable implementations inside and manages 
serializing / recreating the distribution from parameters.
    
    ... but actually the distribution objects appear to be serializable? 
`AbstractRealDistribution` is so the `NormalDistribution` is and others.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to