Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r179832114 --- Diff: python/pyspark/ml/stat.py --- @@ -134,6 +134,65 @@ def corr(dataset, column, method="pearson"): return _java2py(sc, javaCorrObj.corr(*args)) +class KolmogorovSmirnovTest(object): + """ + .. note:: Experimental + + Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled from a continuous + distribution. + + By comparing the largest difference between the empirical cumulative + distribution of the sample data and the theoretical distribution we can provide a test for the + the null hypothesis that the sample data comes from that theoretical distribution. + + :param dataset: --- End diff -- I see you're following the example of ChiSquareTest, but this Param documentation belongs with the test method, not the class. Could you please shift it? (Feel free to correct it for ChiSquareTest here or in another PR.)
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org