Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20904#discussion_r179833156
  
    --- Diff: python/pyspark/ml/stat.py ---
    @@ -134,6 +134,65 @@ def corr(dataset, column, method="pearson"):
             return _java2py(sc, javaCorrObj.corr(*args))
     
     
    +class KolmogorovSmirnovTest(object):
    +    """
    +    .. note:: Experimental
    +
    +    Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled 
from a continuous
    +    distribution.
    +
    +    By comparing the largest difference between the empirical cumulative
    +    distribution of the sample data and the theoretical distribution we 
can provide a test for the
    +    the null hypothesis that the sample data comes from that theoretical 
distribution.
    +
    +    :param dataset:
    +      a dataset or a dataframe containing the sample of data to test.
    +    :param sampleCol:
    +      Name of sample column in dataset, of any numerical type.
    +    :param distName:
    +      a `string` name for a theoretical distribution, currently only 
support "norm".
    +    :param params:
    +      a list of `Double` values specifying the parameters to be used for 
the theoretical
    --- End diff --
    
    I realized we should list what the parameters are, both here and in the 
Scala docs.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to