[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-162258164 LGTM. @felixcheung I think the current resolution of not adding `cor` until we can support it is fine by me. Its better to not mask existing functions unless we can match existing functionality. Merging this to master, branch-1.6 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/9680 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-160810068 **[Test build #46911 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46911/consoleFull)** for PR 9680 at commit [`b8090f3`](https://github.com/apache/spark/commit/b8090f38a5d9be5cc50049531de87bcb1c405b07). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-160817390 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-160815852 second is a git error (seems like having a lot these days?) `hudson.plugins.git.GitException: Failed to fetch from https://github.com/apache/spark.git at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:763)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-160819835 **[Test build #46922 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46922/consoleFull)** for PR 9680 at commit [`659f805`](https://github.com/apache/spark/commit/659f8057de38e33d9a9fddc6ff60eebca834850e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-160806942 **[Test build #46911 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46911/consoleFull)** for PR 9680 at commit [`b8090f3`](https://github.com/apache/spark/commit/b8090f38a5d9be5cc50049531de87bcb1c405b07). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-160810186 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46911/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-160810185 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-160804279 stats::cov name conflict: https://issues.apache.org/jira/browse/SPARK-11886 will open a new JIRA on cor alias once this is merged --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-16080 thanks, rebased. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-160810876 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-160810878 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46914/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-160845233 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-160845163 **[Test build #46922 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46922/consoleFull)** for PR 9680 at commit [`659f805`](https://github.com/apache/spark/commit/659f8057de38e33d9a9fddc6ff60eebca834850e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-160845234 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46922/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-160627755 @felixcheung, sorry for late response. Since there is no agreement now, I am fine that we don't add "cor" alias in this PR. Let's get this PR merged. Could you submit a new JIRA addressing the issue of adding alias of "cor" and also the issue of existing "cov" which masks stats::cov? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-160627949 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-159493667 @sun-rui ? I"m fine with adding `cor` - are we ok with masking `stats::cor`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/9680#discussion_r45259387 --- Diff: R/pkg/R/functions.R --- @@ -259,6 +259,21 @@ setMethod("column", function(x) { col(x) }) +#' corr +#' +#' Computes the Pearson Correlation Coefficient for two Columns. +#' +#' @rdname corr +#' @name corr +#' @family math_funcs +#' @export +#' @examples \dontrun{corr(df$c, df$d)} +setMethod("corr", signature(x = "Column"), --- End diff -- That's the same for `count`, `max`, `mean` and so on, so change we would need to change every function here - should we do that? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-157913206 @shivaram Can we go ahead with this? I think we could consider adding all character overload for DataFrame functions in a different JIRA. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-157975716 As per this https://stat.ethz.ch/R-manual/R-devel/library/stats/html/cor.html It support "vector, matrix or data frame", and it doesn't say a subset of a data frame. I'm not sure a column in a data frame is like a vector? Instead, I think #9366 should support 2 DataFrames. And we should come up with a signature similar to stats::cor --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-157973061 adding all character overload for DataFrame functions in a different JIRA is OK. But for alias of corr(), #9366 only supports inter-column cov and cor of a DataFrame, not between columns of two DataFrames. I think actually it is better to add alias in this PR. corr() operating on two columns is similar to R cor() on two vectors. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-157977761 Also, since here we are working with 2 columns, by adding a alias `cor` we will need to create a generic with a different signature that again masks `stats::cor`, like what's happening with `stats::cov`, and that seems problematic. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/9680#discussion_r45167221 --- Diff: R/pkg/R/functions.R --- @@ -259,6 +259,20 @@ setMethod("column", function(x) { col(x) }) +#' corr +#' +#' Computes the Pearson Correlation Coefficient for two Columns. +#' +#' @rdname corr +#' @name corr +#' @family math_funcs +#' @export +#' @examples \dontrun{corr(df$c, df$d)} +setMethod("corr", signature(x = "Column", col1 = "Column", col2 = "missing", method = "missing"), --- End diff -- #9366 only supports inter-column cov and cor of a DataFrame, not between columns of two DataFrames. I think actually it is better to add alias in this PR. corr() operating on two columns is similar to R cor() on two vectors. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/9680#discussion_r45166927 --- Diff: R/pkg/R/functions.R --- @@ -259,6 +259,21 @@ setMethod("column", function(x) { col(x) }) +#' corr +#' +#' Computes the Pearson Correlation Coefficient for two Columns. +#' +#' @rdname corr +#' @name corr +#' @family math_funcs +#' @export +#' @examples \dontrun{corr(df$c, df$d)} +setMethod("corr", signature(x = "Column"), --- End diff -- There are two versions of corr(): ``` def corr(column1: Column, column2: Column): Column def corr(columnName1: String, columnName2: String): Column ``` We'd better support both. Something like: ``` setMethod("corr", signature(x = "characterOrColumn"), ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-157513054 any more comment? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/9680#discussion_r44974143 --- Diff: R/pkg/R/functions.R --- @@ -259,6 +259,20 @@ setMethod("column", function(x) { col(x) }) +#' corr +#' +#' Computes the Pearson Correlation Coefficient for two Columns. +#' +#' @rdname corr +#' @name corr +#' @family math_funcs +#' @export +#' @examples \dontrun{corr(df$c, df$d)} +setMethod("corr", signature(x = "Column", col1 = "Column", col2 = "missing", method = "missing"), --- End diff -- as for doc, the DataFrame corr in stats.R has `@rdname statfunctions` this one has `@rdname corr` so they go to different HTML page generated by roxygen2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/9680#discussion_r44973809 --- Diff: R/pkg/R/functions.R --- @@ -259,6 +259,20 @@ setMethod("column", function(x) { col(x) }) +#' corr +#' +#' Computes the Pearson Correlation Coefficient for two Columns. +#' +#' @rdname corr +#' @name corr +#' @family math_funcs +#' @export +#' @examples \dontrun{corr(df$c, df$d)} +setMethod("corr", signature(x = "Column", col1 = "Column", col2 = "missing", method = "missing"), --- End diff -- right, I like the approach of changing the existing generic definition. perhaps we should align the method signature with the stats::cor ``` cor(x, y = NULL, use = "everything", method = c("pearson", "kendall", "spearman")) ``` do you know why we decide to name it `corr` (vs. `cor`) in other places? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/9680#discussion_r45019825 --- Diff: R/pkg/R/functions.R --- @@ -259,6 +259,20 @@ setMethod("column", function(x) { col(x) }) +#' corr +#' +#' Computes the Pearson Correlation Coefficient for two Columns. +#' +#' @rdname corr +#' @name corr +#' @family math_funcs +#' @export +#' @examples \dontrun{corr(df$c, df$d)} +setMethod("corr", signature(x = "Column", col1 = "Column", col2 = "missing", method = "missing"), --- End diff -- Thinking more about this, I think what's being added in #9366 matches https://stat.ethz.ch/R-manual/R-devel/library/stats/html/cor.html better. When we are adding that support in R we could add it as `cor` matching stats::cor. Meanwhile I'll change `corr` to what you suggested with `function(x, ...)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-157268953 @sun-rui I updated it. I think it's a bit not as strongly typed as I'd like but if I add `col2 = "Column"` to signature I get this error: ``` Error in match.call(definition, call, expand.dots, envir) : unused argument (col2 = c("Column", "")) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/9680#discussion_r45018419 --- Diff: R/pkg/R/functions.R --- @@ -259,6 +259,20 @@ setMethod("column", function(x) { col(x) }) +#' corr +#' +#' Computes the Pearson Correlation Coefficient for two Columns. +#' +#' @rdname corr +#' @name corr +#' @family math_funcs +#' @export +#' @examples \dontrun{corr(df$c, df$d)} +setMethod("corr", signature(x = "Column", col1 = "Column", col2 = "missing", method = "missing"), --- End diff -- Maybe we can add cor() as alias for corr(), as you did in https://github.com/apache/spark/pull/9489 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-157272016 **[Test build #46055 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46055/consoleFull)** for PR 9680 at commit [`c5f5524`](https://github.com/apache/spark/commit/c5f5524b741ee908abb25f93d12c5256aa596413). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-157275713 **[Test build #46055 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46055/consoleFull)** for PR 9680 at commit [`c5f5524`](https://github.com/apache/spark/commit/c5f5524b741ee908abb25f93d12c5256aa596413). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-157275771 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-157275772 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46055/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-156957370 these are two different issues. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/9680#discussion_r44899365 --- Diff: R/pkg/R/functions.R --- @@ -259,6 +259,20 @@ setMethod("column", function(x) { col(x) }) +#' corr +#' +#' Computes the Pearson Correlation Coefficient for two Columns. +#' +#' @rdname corr +#' @name corr +#' @family math_funcs +#' @export +#' @examples \dontrun{corr(df$c, df$d)} +setMethod("corr", signature(x = "Column", col1 = "Column", col2 = "missing", method = "missing"), --- End diff -- this signature looks confusing. Maybe change the generic function definition of "corr" is better: ``` setGeneric("corr", function(x, ...) {standardGeneric("corr") }) setMethod("corr", signature(x = "DataFrame"), function(x, col1, col2, method = "pearson") { ... } setMethod("corr", signature(x = "characterOrColumn"), function(x, col2) { ... } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/9680#discussion_r44899428 --- Diff: R/pkg/R/functions.R --- @@ -259,6 +259,20 @@ setMethod("column", function(x) { col(x) }) +#' corr +#' +#' Computes the Pearson Correlation Coefficient for two Columns. +#' +#' @rdname corr +#' @name corr +#' @family math_funcs +#' @export +#' @examples \dontrun{corr(df$c, df$d)} +setMethod("corr", signature(x = "Column", col1 = "Column", col2 = "missing", method = "missing"), --- End diff -- One concern how documentation for these "corr" methods are generated? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user NarineK commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-156372445 in R the general formula for correlation is the following: cor(x, y = NULL, use = "everything", method = c("pearson", "kendall", "spearman")) in #9366 is the special case when x = y, but in general x and y are 2 different dataframes with the same number of rows. I did the pull request specifically for x = y, but the jira is more general. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-156484659 So #9366 is for all columns in DataFrame x=y or different x, y DataFrames And this #9680 is for 2 columns in one DataFrame. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-156280168 **[Test build #45789 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45789/consoleFull)** for PR 9680 at commit [`940164c`](https://github.com/apache/spark/commit/940164c32aa75f3c74bdb966ecac69df602b5aa2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-156278569 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-156278588 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
GitHub user felixcheung opened a pull request: https://github.com/apache/spark/pull/9680 [SPARK-11715][SPARKR] Add R support corr for Column Aggregration Need to match existing method signature You can merge this pull request into a Git repository by running: $ git pull https://github.com/felixcheung/spark rcorr Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9680.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9680 commit 68f1254ba525297857b0fc6959ca5d54c1509af9 Author: felixcheungDate: 2015-11-13T00:18:28Z Add R support corr for Column Aggregration commit 940164c32aa75f3c74bdb966ecac69df602b5aa2 Author: felixcheung Date: 2015-11-13T00:20:22Z fix doc text --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-156331258 I think 9366 is about computing corr or cov matrix whereas this is computing corr between two columns. They seem to be useful in their own ways. Also this is already supported in Scala and Python. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-156322704 cc @NarineK I think https://github.com/apache/spark/pull/9366 is a more general form of this ? If so we could probably just wait for #9366 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-156283280 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45789/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-156283275 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9680#issuecomment-156283072 **[Test build #45789 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45789/consoleFull)** for PR 9680 at commit [`940164c`](https://github.com/apache/spark/commit/940164c32aa75f3c74bdb966ecac69df602b5aa2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org