[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13394


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-15 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r67259794
  
--- Diff: R/pkg/R/mllib.R ---
@@ -402,6 +406,8 @@ setMethod("spark.naiveBayes", signature(data = 
"SparkDataFrame", formula = "form
 return(new("NaiveBayesModel", jobj = jobj))
 })
 
+#' Save fitted MLlib model to the input path
--- End diff --

@jkbradley Likewise, I changed title `write.ml` to `Save fitted MLlib model 
to the input path` rather than `Save the Bernoulli naive Bayes model to the 
input path.` for all four different models.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r67223092
  
--- Diff: R/pkg/R/mllib.R ---
@@ -197,7 +201,7 @@ print.summary.GeneralizedLinearRegressionModel <- 
function(x, ...) {
   invisible(x)
   }
 
-#' Make predictions from a generalized linear model
+#' predict
--- End diff --

Ohh, I see.  Thanks, yes, I think it's fine as you have it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-14 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66917105
  
--- Diff: R/pkg/R/mllib.R ---
@@ -197,7 +201,7 @@ print.summary.GeneralizedLinearRegressionModel <- 
function(x, ...) {
   invisible(x)
   }
 
-#' Make predictions from a generalized linear model
+#' predict
--- End diff --

@jkbradley The reason I want add title here is the title of current 
documentation is like  `Make predictions from a generalized linear model` not 
`predict`. 
![qq20160613-0 
2x](https://cloud.githubusercontent.com/assets/3419881/16033524/4ab5e288-31c1-11e6-892c-c9c15258cc05.png)
But with adding title `predict`, it looks like this
![qq20160613-1 
2x](https://cloud.githubusercontent.com/assets/3419881/16033671/24ab8fce-31c2-11e6-86a4-7b7771c10451.png)
So which one do you think is better?





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66741569
  
--- Diff: R/pkg/R/mllib.R ---
@@ -218,9 +222,9 @@ setMethod("predict", signature(object = 
"GeneralizedLinearRegressionModel"),
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
 
-#' Make predictions from a naive Bayes model
+#' predict
--- End diff --

Same here: keep longer title


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66741577
  
--- Diff: R/pkg/R/mllib.R ---
@@ -582,9 +586,9 @@ setMethod("summary", signature(object = 
"AFTSurvivalRegressionModel"),
 return(list(coefficients = coefficients))
   })
 
-#' Make predictions from an AFT survival regression model
+#' predict
--- End diff --

ditto: keep long title


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66741575
  
--- Diff: R/pkg/R/mllib.R ---
@@ -357,9 +361,9 @@ setMethod("summary", signature(object = "KMeansModel"),
cluster = cluster, is.loaded = is.loaded))
   })
 
-#' Make predictions from a k-means model
+#' predict
--- End diff --

ditto: keep long title


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66741568
  
--- Diff: R/pkg/R/mllib.R ---
@@ -197,7 +201,7 @@ print.summary.GeneralizedLinearRegressionModel <- 
function(x, ...) {
   invisible(x)
   }
 
-#' Make predictions from a generalized linear model
+#' predict
--- End diff --

No need for this change.  We can keep the longer title.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66741560
  
--- Diff: R/pkg/R/column.R ---
@@ -170,6 +172,8 @@ setMethod("between", signature(x = "Column"),
 }
   })
 
+#' cast
+#'
 #' Casts the column to a different data type.
--- End diff --

This can remain the title, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-12 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66741566
  
--- Diff: R/pkg/R/functions.R ---
@@ -249,10 +249,7 @@ col <- function(x) {
 #'
 #' Returns a Column based on the given column name.
 #'
-#' @rdname col
-#' @name column
 #' @family normal_funcs
-#' @export
--- End diff --

This function is exported, right?  It's ```col``` which is not exported.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-12 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66719273
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -851,6 +849,8 @@ setMethod("nrow",
 count(x)
   })
 
+#' ncol
--- End diff --

Yes, this doesn't seem to be consistent. I reverted these since the 
description is sufficient to explain.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-12 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66719192
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2766,18 +2780,21 @@ setMethod("histogram",
 return(histStats)
   })
 
-#' Saves the content of the SparkDataFrame to an external database table 
via JDBC
+#' Save the content of DataFrame to an external database table via JDBC.
 #'
-#' Additional JDBC database connection properties can be set (...)
+#' Saves the content of the SparkDataFrame to an external database table 
via JDBC. Additional JDBC
--- End diff --

I also think we should consociate the plurality of first word which is 
verbal in description


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-12 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66718939
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2766,18 +2780,21 @@ setMethod("histogram",
 return(histStats)
   })
 
-#' Saves the content of the SparkDataFrame to an external database table 
via JDBC
+#' Save the content of DataFrame to an external database table via JDBC.
 #'
-#' Additional JDBC database connection properties can be set (...)
+#' Saves the content of the SparkDataFrame to an external database table 
via JDBC. Additional JDBC
--- End diff --

It should be same and I changed both these two to plural.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-12 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66718189
  
--- Diff: R/pkg/R/mllib.R ---
@@ -197,11 +197,10 @@ print.summary.GeneralizedLinearRegressionModel <- 
function(x, ...) {
   invisible(x)
   }
 
-#' Make predictions from a generalized linear model
-#'
 #' Makes predictions from a generalized linear model produced by glm() or 
spark.glm(),
 #' similarly to R's predict().
 #'
+#' @title predict
--- End diff --

Ok, I will do this in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-08 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66363113
  
--- Diff: R/pkg/R/mllib.R ---
@@ -197,11 +197,10 @@ print.summary.GeneralizedLinearRegressionModel <- 
function(x, ...) {
   invisible(x)
   }
 
-#' Make predictions from a generalized linear model
-#'
 #' Makes predictions from a generalized linear model produced by glm() or 
spark.glm(),
 #' similarly to R's predict().
 #'
+#' @title predict
--- End diff --

Ok - then I'd say lets use the first line as the title convention for all 
our documentation. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-08 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66357796
  
--- Diff: R/pkg/R/mllib.R ---
@@ -197,11 +197,10 @@ print.summary.GeneralizedLinearRegressionModel <- 
function(x, ...) {
   invisible(x)
   }
 
-#' Make predictions from a generalized linear model
-#'
 #' Makes predictions from a generalized linear model produced by glm() or 
spark.glm(),
 #' similarly to R's predict().
 #'
+#' @title predict
--- End diff --

I'm pretty sure roxgen2 only picks up the first title defined for that 
doc/rdname so that probably won't help us much in that case...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-08 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66299402
  
--- Diff: R/pkg/R/mllib.R ---
@@ -197,11 +197,10 @@ print.summary.GeneralizedLinearRegressionModel <- 
function(x, ...) {
   invisible(x)
   }
 
-#' Make predictions from a generalized linear model
-#'
 #' Makes predictions from a generalized linear model produced by glm() or 
spark.glm(),
 #' similarly to R's predict().
 #'
+#' @title predict
--- End diff --

I personally prefer using the first sentence as the convention to define 
the title. One thing though is I'm not sure how the titles get collated when we 
put multiple functions in the same doc page. @felixcheung do you know if using 
`@title` would make it easier to have a canonical title in that case ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-07 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66182476
  
--- Diff: R/pkg/R/mllib.R ---
@@ -197,11 +197,10 @@ print.summary.GeneralizedLinearRegressionModel <- 
function(x, ...) {
   invisible(x)
   }
 
-#' Make predictions from a generalized linear model
-#'
 #' Makes predictions from a generalized linear model produced by glm() or 
spark.glm(),
 #' similarly to R's predict().
 #'
+#' @title predict
--- End diff --

I think people have been trying to be consistent within the given R source 
file but total agree there are many different forms in used and it would be 
nice to have a single format for all R sources.







---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66177097
  
--- Diff: R/pkg/R/mllib.R ---
@@ -197,11 +197,10 @@ print.summary.GeneralizedLinearRegressionModel <- 
function(x, ...) {
   invisible(x)
   }
 
-#' Make predictions from a generalized linear model
-#'
 #' Makes predictions from a generalized linear model produced by glm() or 
spark.glm(),
 #' similarly to R's predict().
 #'
+#' @title predict
--- End diff --

Shouldn't we follow a single convention for defining the title, either 
using ```@title``` or using the first sentence in the description?  @shivaram 
do you have a preference?  Looking at the Github history, I see lots of SparkR 
contributors do both.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-07 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66030853
  
--- Diff: R/pkg/R/functions.R ---
@@ -249,6 +249,10 @@ col <- function(x) {
 #'
 #' Returns a Column based on the given column name.
 #'
+#' Though scala functions has "col" function, we don't expose it in SparkR
--- End diff --

Let's move this comment back to L241?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-07 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66030066
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2766,18 +2780,21 @@ setMethod("histogram",
 return(histStats)
   })
 
-#' Saves the content of the SparkDataFrame to an external database table 
via JDBC
+#' Save the content of DataFrame to an external database table via JDBC.
 #'
-#' Additional JDBC database connection properties can be set (...)
+#' Saves the content of the SparkDataFrame to an external database table 
via JDBC. Additional JDBC
--- End diff --

I think we have had similar discussions in another PR, but I'm not sure I 
follow the reasoning for L2783 to be "Save the content of" and L2785 to be 
"Saves the content of" - why is one singular another plural?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-07 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66029758
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2766,18 +2780,21 @@ setMethod("histogram",
 return(histStats)
   })
 
-#' Saves the content of the SparkDataFrame to an external database table 
via JDBC
+#' Save the content of DataFrame to an external database table via JDBC.
--- End diff --

ditto - "SparkDataFrame"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-07 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66029614
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2173,20 +2182,22 @@ setMethod("except",
 dataFrame(excepted)
   })
 
-#' Save the contents of the SparkDataFrame to a data source
+#' Save the contents of DataFrame to a data source.
 #'
-#' The data source is specified by the `source` and a set of options (...).
-#' If `source` is not specified, the default data source configured by
-#' spark.sql.sources.default will be used.
+#' Save the contents of the SparkDataFrame to a data source. The data 
source is specified by the
+#' `source` and a set of options (...). If `source` is not specified, the 
default data source
--- End diff --

hmm,
```The data source is specified by the
 `source` and a set of options (...).
```
that doesn't sound quite right? Data source should be in the source 
parameter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-07 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66029225
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2046,6 +2054,7 @@ setMethod("merge",
 joinRes
   })
 
+#' generateAliasesForIntersectedCols
--- End diff --

this really shouldn't have a "title" - it's not a exported function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-07 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66029121
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -851,6 +849,8 @@ setMethod("nrow",
 count(x)
   })
 
+#' ncol
--- End diff --

this doesn't seem consistent? in L713, or L675 we are changing doc title to 
something descriptive, yet here and below we are changing title to just the 
function name?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-07 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66028863
  
--- Diff: R/pkg/R/functions.R ---
@@ -238,9 +238,9 @@ setMethod("ceil",
 column(jc)
   })
 
-#' Though scala functions has "col" function, we don't expose it in SparkR
-#' because we don't want to conflict with the "col" function in the R base
-#' package and we also have "column" function exported which is an alias 
of "col".
+#' @rdname col
+#' @name column
+#' @export
--- End diff --

this is not exported - it should not have `@rdname`, `@name` or `@export` 
here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-07 Thread sun-rui
Github user sun-rui commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66019351
  
--- Diff: R/pkg/R/functions.R ---
@@ -249,6 +249,10 @@ col <- function(x) {
 #'
 #' Returns a Column based on the given column name.
 #'
+#' Though scala functions has "col" function, we don't expose it in SparkR
--- End diff --

No. "col" is not exposed in SparkR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r65969103
  
--- Diff: R/pkg/R/functions.R ---
@@ -249,6 +249,10 @@ col <- function(x) {
 #'
 #' Returns a Column based on the given column name.
 #'
+#' Though scala functions has "col" function, we don't expose it in SparkR
--- End diff --

Is this out of date?  Isn't ```col``` exposed now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-06 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r65959598
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -647,7 +645,7 @@ setMethod("toJSON",
 RDD(jrdd, serializedMode = "string")
   })
 
-#' write.json
+#' Save the contents of DataFrame as a JSON file
--- End diff --

Can we use SparkDataFrame as opposed to DataFrame (see 
https://issues.apache.org/jira/browse/SPARK-12148 for some more details).  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r65946137
  
--- Diff: R/pkg/R/stats.R ---
@@ -135,13 +136,13 @@ setMethod("freqItems", signature(x = 
"SparkDataFrame", cols = "character"),
 #' Calculates the approximate quantiles of a numerical column of a 
SparkDataFrame.
 #'
 #' The result of this algorithm has the following deterministic bound:
-#' If the SparkDataFrame has N elements and if we request the quantile at 
probability `p` up to
-#' error `err`, then the algorithm will return a sample `x` from the 
SparkDataFrame so that the
-#' *exact* rank of `x` is close to (p * N). More precisely,
-#'   floor((p - err) * N) <= rank(x) <= ceil((p + err) * N).
-#' This method implements a variation of the Greenwald-Khanna algorithm 
(with some speed
-#' optimizations). The algorithm was first present in 
[[http://dx.doi.org/10.1145/375663.375670
-#' Space-efficient Online Computation of Quantile Summaries]] by Greenwald 
and Khanna.
+#' If the SparkDataFrame has N elements and if we request the quantile at 
probability \strong{p} up
--- End diff --

SGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-06 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r65849144
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -628,8 +628,6 @@ setMethod("repartition",
 #'
 #' @param x A SparkDataFrame
 #' @return A StringRRDD of JSON objects
-#' @family SparkDataFrame functions
--- End diff --

@felixcheung I removed these two lines in toJSON part. Correct me, if I am 
wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-03 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r65691008
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1069,6 +1079,8 @@ setMethod("first",
 #'
 #' @param x A SparkDataFrame
 #'
+#' @family SparkDataFrame functions
+#' @rdname tordd
--- End diff --

@felixcheung ok, should we also remove these two lines in `toJSON` part in 
line 631?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-02 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r65656790
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1069,6 +1079,8 @@ setMethod("first",
 #'
 #' @param x A SparkDataFrame
 #'
+#' @family SparkDataFrame functions
+#' @rdname tordd
--- End diff --

let's revert these 2 lines as well? thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-01 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r65483594
  
--- Diff: R/pkg/R/stats.R ---
@@ -135,13 +136,13 @@ setMethod("freqItems", signature(x = 
"SparkDataFrame", cols = "character"),
 #' Calculates the approximate quantiles of a numerical column of a 
SparkDataFrame.
 #'
 #' The result of this algorithm has the following deterministic bound:
-#' If the SparkDataFrame has N elements and if we request the quantile at 
probability `p` up to
-#' error `err`, then the algorithm will return a sample `x` from the 
SparkDataFrame so that the
-#' *exact* rank of `x` is close to (p * N). More precisely,
-#'   floor((p - err) * N) <= rank(x) <= ceil((p + err) * N).
-#' This method implements a variation of the Greenwald-Khanna algorithm 
(with some speed
-#' optimizations). The algorithm was first present in 
[[http://dx.doi.org/10.1145/375663.375670
-#' Space-efficient Online Computation of Quantile Summaries]] by Greenwald 
and Khanna.
+#' If the SparkDataFrame has N elements and if we request the quantile at 
probability \strong{p} up
--- End diff --

I think it would be great to leave this out and I will fix it in #13109? 
what do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-01 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r65434586
  
--- Diff: R/pkg/R/stats.R ---
@@ -135,13 +136,13 @@ setMethod("freqItems", signature(x = 
"SparkDataFrame", cols = "character"),
 #' Calculates the approximate quantiles of a numerical column of a 
SparkDataFrame.
 #'
 #' The result of this algorithm has the following deterministic bound:
-#' If the SparkDataFrame has N elements and if we request the quantile at 
probability `p` up to
-#' error `err`, then the algorithm will return a sample `x` from the 
SparkDataFrame so that the
-#' *exact* rank of `x` is close to (p * N). More precisely,
-#'   floor((p - err) * N) <= rank(x) <= ceil((p + err) * N).
-#' This method implements a variation of the Greenwald-Khanna algorithm 
(with some speed
-#' optimizations). The algorithm was first present in 
[[http://dx.doi.org/10.1145/375663.375670
-#' Space-efficient Online Computation of Quantile Summaries]] by Greenwald 
and Khanna.
+#' If the SparkDataFrame has N elements and if we request the quantile at 
probability \strong{p} up
--- End diff --

cc @felixcheung (we were partially discussing this in #13109)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-01 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r65418448
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2514,7 +2529,9 @@ setMethod("attach",
 #' environment. Then, the given expression is evaluated in this new
 #' environment.
 #'
+#' @title with
--- End diff --

@jkbradley I will do it ASAP.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-01 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r65410850
  
--- Diff: R/pkg/R/stats.R ---
@@ -135,13 +136,13 @@ setMethod("freqItems", signature(x = 
"SparkDataFrame", cols = "character"),
 #' Calculates the approximate quantiles of a numerical column of a 
SparkDataFrame.
 #'
 #' The result of this algorithm has the following deterministic bound:
-#' If the SparkDataFrame has N elements and if we request the quantile at 
probability `p` up to
-#' error `err`, then the algorithm will return a sample `x` from the 
SparkDataFrame so that the
-#' *exact* rank of `x` is close to (p * N). More precisely,
-#'   floor((p - err) * N) <= rank(x) <= ceil((p + err) * N).
-#' This method implements a variation of the Greenwald-Khanna algorithm 
(with some speed
-#' optimizations). The algorithm was first present in 
[[http://dx.doi.org/10.1145/375663.375670
-#' Space-efficient Online Computation of Quantile Summaries]] by Greenwald 
and Khanna.
+#' If the SparkDataFrame has N elements and if we request the quantile at 
probability \strong{p} up
--- End diff --

@vectorijk Are you able to separate them?  If you can't find a good way 
easily, ping & I can try to figure it out.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-01 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r65410753
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2514,7 +2529,9 @@ setMethod("attach",
 #' environment. Then, the given expression is evaluated in this new
 #' environment.
 #'
+#' @title with
--- End diff --

@vectorijk Could you please update the PR this way?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org