[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-12-05 Thread shivaram
Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-162258164
  
LGTM. @felixcheung I think the current resolution of not adding `cor` until 
we can support it is fine by me. Its better to not mask existing functions 
unless we can match existing functionality. 

Merging this to master, branch-1.6


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-12-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/9680


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-160810068
  
**[Test build #46911 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46911/consoleFull)**
 for PR 9680 at commit 
[`b8090f3`](https://github.com/apache/spark/commit/b8090f38a5d9be5cc50049531de87bcb1c405b07).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-30 Thread shivaram
Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-160817390
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-30 Thread felixcheung
Github user felixcheung commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-160815852
  
second is a git error (seems like having a lot these days?)
`hudson.plugins.git.GitException: Failed to fetch from 
https://github.com/apache/spark.git
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:763)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-160819835
  
**[Test build #46922 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46922/consoleFull)**
 for PR 9680 at commit 
[`659f805`](https://github.com/apache/spark/commit/659f8057de38e33d9a9fddc6ff60eebca834850e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-160806942
  
**[Test build #46911 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46911/consoleFull)**
 for PR 9680 at commit 
[`b8090f3`](https://github.com/apache/spark/commit/b8090f38a5d9be5cc50049531de87bcb1c405b07).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-160810186
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46911/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-160810185
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-30 Thread felixcheung
Github user felixcheung commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-160804279
  
stats::cov name conflict: https://issues.apache.org/jira/browse/SPARK-11886
will open a new JIRA on cor alias once this is merged




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-30 Thread felixcheung
Github user felixcheung commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-16080
  
thanks, rebased.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-160810876
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-160810878
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46914/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-160845233
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-160845163
  
**[Test build #46922 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46922/consoleFull)**
 for PR 9680 at commit 
[`659f805`](https://github.com/apache/spark/commit/659f8057de38e33d9a9fddc6ff60eebca834850e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-160845234
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46922/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-30 Thread sun-rui
Github user sun-rui commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-160627755
  
@felixcheung, sorry for late response.

Since there is no agreement now, I am fine that we don't add "cor" alias in 
this PR. Let's get this PR merged.

Could you submit a new JIRA addressing the issue of adding alias of "cor" 
and also the issue of existing "cov" which masks stats::cov?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-30 Thread sun-rui
Github user sun-rui commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-160627949
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-24 Thread felixcheung
Github user felixcheung commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-159493667
  
@sun-rui ? I"m fine with adding `cor` - are we ok with masking `stats::cor`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/9680#discussion_r45259387
  
--- Diff: R/pkg/R/functions.R ---
@@ -259,6 +259,21 @@ setMethod("column",
   function(x) {
 col(x)
   })
+#' corr
+#'
+#' Computes the Pearson Correlation Coefficient for two Columns.
+#'
+#' @rdname corr
+#' @name corr
+#' @family math_funcs
+#' @export
+#' @examples \dontrun{corr(df$c, df$d)}
+setMethod("corr", signature(x = "Column"),
--- End diff --

That's the same for `count`, `max`, `mean` and so on, so change we would 
need to change every function here - should we do that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-18 Thread felixcheung
Github user felixcheung commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-157913206
  
@shivaram Can we go ahead with this? I think we could consider adding all 
character overload for DataFrame functions in a different JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-18 Thread felixcheung
Github user felixcheung commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-157975716
  
As per this 
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/cor.html
It support "vector, matrix or data frame", and it doesn't say a subset of a 
data frame.
I'm not sure a column in a data frame is like a vector?
Instead, I think #9366 should support 2 DataFrames. And we should come up 
with a signature similar to stats::cor

 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-18 Thread sun-rui
Github user sun-rui commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-157973061
  
adding all character overload for DataFrame functions in a different JIRA 
is OK.

But for alias of corr(), #9366 only supports inter-column cov and cor of a 
DataFrame, not between columns of two DataFrames. I think actually it is better 
to add alias in this PR. corr() operating on two columns is similar to R cor() 
on two vectors.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-18 Thread felixcheung
Github user felixcheung commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-157977761
  
Also, since here we are working with 2 columns, by adding a alias `cor` we 
will need to create a generic with a different signature that again masks 
`stats::cor`, like what's happening with `stats::cov`, and that seems 
problematic.
 
  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-17 Thread sun-rui
Github user sun-rui commented on a diff in the pull request:

https://github.com/apache/spark/pull/9680#discussion_r45167221
  
--- Diff: R/pkg/R/functions.R ---
@@ -259,6 +259,20 @@ setMethod("column",
   function(x) {
 col(x)
   })
+#' corr
+#'
+#' Computes the Pearson Correlation Coefficient for two Columns.
+#'
+#' @rdname corr
+#' @name corr
+#' @family math_funcs
+#' @export
+#' @examples \dontrun{corr(df$c, df$d)}
+setMethod("corr", signature(x = "Column", col1 = "Column", col2 = 
"missing", method = "missing"),
--- End diff --

#9366 only supports inter-column cov and cor of a DataFrame, not between 
columns of two DataFrames. I think actually it is better to add alias in this 
PR. corr() operating on two columns is similar to R cor() on two vectors.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-17 Thread sun-rui
Github user sun-rui commented on a diff in the pull request:

https://github.com/apache/spark/pull/9680#discussion_r45166927
  
--- Diff: R/pkg/R/functions.R ---
@@ -259,6 +259,21 @@ setMethod("column",
   function(x) {
 col(x)
   })
+#' corr
+#'
+#' Computes the Pearson Correlation Coefficient for two Columns.
+#'
+#' @rdname corr
+#' @name corr
+#' @family math_funcs
+#' @export
+#' @examples \dontrun{corr(df$c, df$d)}
+setMethod("corr", signature(x = "Column"),
--- End diff --

There are two versions of corr():
```
def corr(column1: Column, column2: Column): Column
def corr(columnName1: String, columnName2: String): Column
```
We'd better support both. Something like:
```
setMethod("corr", signature(x = "characterOrColumn"),
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-17 Thread felixcheung
Github user felixcheung commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-157513054
  
any more comment?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-16 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/9680#discussion_r44974143
  
--- Diff: R/pkg/R/functions.R ---
@@ -259,6 +259,20 @@ setMethod("column",
   function(x) {
 col(x)
   })
+#' corr
+#'
+#' Computes the Pearson Correlation Coefficient for two Columns.
+#'
+#' @rdname corr
+#' @name corr
+#' @family math_funcs
+#' @export
+#' @examples \dontrun{corr(df$c, df$d)}
+setMethod("corr", signature(x = "Column", col1 = "Column", col2 = 
"missing", method = "missing"),
--- End diff --

as for doc, the DataFrame corr in stats.R has `@rdname statfunctions`
this one has `@rdname corr`

so they go to different HTML page generated by roxygen2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-16 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/9680#discussion_r44973809
  
--- Diff: R/pkg/R/functions.R ---
@@ -259,6 +259,20 @@ setMethod("column",
   function(x) {
 col(x)
   })
+#' corr
+#'
+#' Computes the Pearson Correlation Coefficient for two Columns.
+#'
+#' @rdname corr
+#' @name corr
+#' @family math_funcs
+#' @export
+#' @examples \dontrun{corr(df$c, df$d)}
+setMethod("corr", signature(x = "Column", col1 = "Column", col2 = 
"missing", method = "missing"),
--- End diff --

right, I like the approach of changing the existing generic definition.

perhaps we should align the method signature with the stats::cor
```
cor(x, y = NULL, use = "everything",
method = c("pearson", "kendall", "spearman"))
```

do you know why we decide to name it `corr` (vs. `cor`) in other places?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-16 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/9680#discussion_r45019825
  
--- Diff: R/pkg/R/functions.R ---
@@ -259,6 +259,20 @@ setMethod("column",
   function(x) {
 col(x)
   })
+#' corr
+#'
+#' Computes the Pearson Correlation Coefficient for two Columns.
+#'
+#' @rdname corr
+#' @name corr
+#' @family math_funcs
+#' @export
+#' @examples \dontrun{corr(df$c, df$d)}
+setMethod("corr", signature(x = "Column", col1 = "Column", col2 = 
"missing", method = "missing"),
--- End diff --

Thinking more about this, I think what's being added in #9366 matches 
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/cor.html better. When 
we are adding that support in R we could add it as `cor` matching stats::cor.

Meanwhile I'll change `corr` to what you suggested with `function(x, ...)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-16 Thread felixcheung
Github user felixcheung commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-157268953
  
@sun-rui I updated it. I think it's a bit not as strongly typed as I'd like 
but if I add `col2 = "Column"` to signature I get this error:
```
Error in match.call(definition, call, expand.dots, envir) :
  unused argument (col2 = c("Column", ""))
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-16 Thread sun-rui
Github user sun-rui commented on a diff in the pull request:

https://github.com/apache/spark/pull/9680#discussion_r45018419
  
--- Diff: R/pkg/R/functions.R ---
@@ -259,6 +259,20 @@ setMethod("column",
   function(x) {
 col(x)
   })
+#' corr
+#'
+#' Computes the Pearson Correlation Coefficient for two Columns.
+#'
+#' @rdname corr
+#' @name corr
+#' @family math_funcs
+#' @export
+#' @examples \dontrun{corr(df$c, df$d)}
+setMethod("corr", signature(x = "Column", col1 = "Column", col2 = 
"missing", method = "missing"),
--- End diff --

Maybe we can add cor() as alias for corr(), as you did in 
https://github.com/apache/spark/pull/9489



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-157272016
  
**[Test build #46055 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46055/consoleFull)**
 for PR 9680 at commit 
[`c5f5524`](https://github.com/apache/spark/commit/c5f5524b741ee908abb25f93d12c5256aa596413).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-157275713
  
**[Test build #46055 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46055/consoleFull)**
 for PR 9680 at commit 
[`c5f5524`](https://github.com/apache/spark/commit/c5f5524b741ee908abb25f93d12c5256aa596413).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-157275771
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-157275772
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46055/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-16 Thread sun-rui
Github user sun-rui commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-156957370
  
these are two different issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-16 Thread sun-rui
Github user sun-rui commented on a diff in the pull request:

https://github.com/apache/spark/pull/9680#discussion_r44899365
  
--- Diff: R/pkg/R/functions.R ---
@@ -259,6 +259,20 @@ setMethod("column",
   function(x) {
 col(x)
   })
+#' corr
+#'
+#' Computes the Pearson Correlation Coefficient for two Columns.
+#'
+#' @rdname corr
+#' @name corr
+#' @family math_funcs
+#' @export
+#' @examples \dontrun{corr(df$c, df$d)}
+setMethod("corr", signature(x = "Column", col1 = "Column", col2 = 
"missing", method = "missing"),
--- End diff --

this signature looks confusing. Maybe change the generic function 
definition of "corr" is better:
```
setGeneric("corr", function(x, ...) {standardGeneric("corr") })

setMethod("corr",
  signature(x = "DataFrame"),
  function(x, col1, col2, method = "pearson") {
  ...
}

setMethod("corr", signature(x = "characterOrColumn"),
  function(x, col2) {
  ...
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-16 Thread sun-rui
Github user sun-rui commented on a diff in the pull request:

https://github.com/apache/spark/pull/9680#discussion_r44899428
  
--- Diff: R/pkg/R/functions.R ---
@@ -259,6 +259,20 @@ setMethod("column",
   function(x) {
 col(x)
   })
+#' corr
+#'
+#' Computes the Pearson Correlation Coefficient for two Columns.
+#'
+#' @rdname corr
+#' @name corr
+#' @family math_funcs
+#' @export
+#' @examples \dontrun{corr(df$c, df$d)}
+setMethod("corr", signature(x = "Column", col1 = "Column", col2 = 
"missing", method = "missing"),
--- End diff --

One concern how documentation for these "corr" methods are generated?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-13 Thread NarineK
Github user NarineK commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-156372445
  
in R the general formula for correlation is the following: 
cor(x, y = NULL, use = "everything", method = c("pearson", "kendall", 
"spearman"))
in #9366 is the special case when x = y, but in general x and y are 2 
different dataframes with the same number of rows. 
I did the pull request specifically for x = y, but the jira is more general.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-13 Thread felixcheung
Github user felixcheung commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-156484659
  
So #9366 is for all columns in DataFrame x=y or different x, y DataFrames
And this #9680 is for 2 columns in one DataFrame.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-156280168
  
**[Test build #45789 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45789/consoleFull)**
 for PR 9680 at commit 
[`940164c`](https://github.com/apache/spark/commit/940164c32aa75f3c74bdb966ecac69df602b5aa2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-156278569
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-156278588
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-12 Thread felixcheung
GitHub user felixcheung opened a pull request:

https://github.com/apache/spark/pull/9680

[SPARK-11715][SPARKR] Add R support corr for Column Aggregration

Need to match existing method signature

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/felixcheung/spark rcorr

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9680.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9680


commit 68f1254ba525297857b0fc6959ca5d54c1509af9
Author: felixcheung 
Date:   2015-11-13T00:18:28Z

Add R support corr for Column Aggregration

commit 940164c32aa75f3c74bdb966ecac69df602b5aa2
Author: felixcheung 
Date:   2015-11-13T00:20:22Z

fix doc text




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-12 Thread felixcheung
Github user felixcheung commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-156331258
  
I think 9366 is about computing corr or cov matrix whereas this is 
computing corr between two columns. They seem to be useful in their own ways. 
Also this is already supported in Scala and Python.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-12 Thread shivaram
Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-156322704
  
cc @NarineK 

I think https://github.com/apache/spark/pull/9366 is a more general form of 
this ? If so we could probably just wait for #9366 ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-156283280
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45789/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-156283275
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11715][SPARKR] Add R support corr for C...

2015-11-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9680#issuecomment-156283072
  
**[Test build #45789 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45789/consoleFull)**
 for PR 9680 at commit 
[`940164c`](https://github.com/apache/spark/commit/940164c32aa75f3c74bdb966ecac69df602b5aa2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org