[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

jkbradley Mon, 09 Apr 2018 15:07:03 -0700

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20904#discussion_r180228711
  
    --- Diff: python/pyspark/ml/stat.py ---
    @@ -127,13 +113,86 @@ class Correlation(object):
         def corr(dataset, column, method="pearson"):
             """
             Compute the correlation matrix with specified method using dataset.
    +
    +        :param dataset:
    +          A Dataset or a DataFrame.
    +        :param column:
    +          The name of the column of vectors for which the correlation 
coefficient needs
    +          to be computed. This must be a column of the dataset, and it 
must contain
    +          Vector objects.
    +        :param method:
    +          String specifying the method to use for computing correlation.
    +          Supported: `pearson` (default), `spearman`.
    +        :return:
    +          A DataFrame that contains the correlation matrix of the column 
of vectors. This
    +          DataFrame contains a single row and a single column of name
    +          '$METHODNAME($COLUMN)'.
             """
             sc = SparkContext._active_spark_context
             javaCorrObj = _jvm().org.apache.spark.ml.stat.Correlation
             args = [_py2java(sc, arg) for arg in (dataset, column, method)]
             return _java2py(sc, javaCorrObj.corr(*args))
     
     
    +class KolmogorovSmirnovTest(object):
    +    """
    +    .. note:: Experimental
    +
    +    Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled 
from a continuous
    +    distribution.
    +
    +    By comparing the largest difference between the empirical cumulative
    +    distribution of the sample data and the theoretical distribution we 
can provide a test for the
    +    the null hypothesis that the sample data comes from that theoretical 
distribution.
    +
    +    >>> from pyspark.ml.stat import KolmogorovSmirnovTest
    --- End diff --
    
    Thanks for moving the method-specific documentation.  These doctests are 
method-specific too, though, so can you please move them as well?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

Reply via email to