[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14705


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75566267
  
--- Diff: R/pkg/R/generics.R ---
@@ -735,6 +752,8 @@ setGeneric("between", function(x, bounds) { 
standardGeneric("between") })
 setGeneric("cast", function(x, dataType) { standardGeneric("cast") })
 
 #' @rdname columnfunctions
+#' @param x a Column object.
+#' @param ... additional argument(s).
--- End diff --

Ah I recall. These are generated by code but are exported functions - you 
see their names match the generics.

For these I think we should put the roxygen block in the R file that has 
the generator code, follow by NULL like what we have for write.ml in mllib.R




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75516177
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -932,7 +932,7 @@ setMethod("sample_frac",
 #' @param x a SparkDataFrame.
 #' @family SparkDataFrame functions
 #' @rdname nrow
-#' @name count
+#' @name nrow
--- End diff --

Maybe leave @name out then? That seems to fix the problem the other time?






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75515842
  
--- Diff: R/pkg/R/functions.R ---
@@ -2276,9 +2276,8 @@ setMethod("n_distinct", signature(x = "Column"),
 countDistinct(x, ...)
   })
 
-#' @rdname nrow
-#' @param x Column to compute on
-#'
+#' @param x a Column.
+#' @rdname n
--- End diff --

I don't feel strongly about it. dplyr has n though.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-19 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75441096
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -932,7 +932,7 @@ setMethod("sample_frac",
 #' @param x a SparkDataFrame.
 #' @family SparkDataFrame functions
 #' @rdname nrow
-#' @name count
+#' @name nrow
--- End diff --

Then the \name{count} will appear in both docs and the check complains...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75438540
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -932,7 +932,7 @@ setMethod("sample_frac",
 #' @param x a SparkDataFrame.
 #' @family SparkDataFrame functions
 #' @rdname nrow
-#' @name count
+#' @name nrow
--- End diff --

hmm, I don't think we should change @name - this should match the function 
name, "count"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-19 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75437776
  
--- Diff: R/pkg/R/functions.R ---
@@ -2276,9 +2276,8 @@ setMethod("n_distinct", signature(x = "Column"),
 countDistinct(x, ...)
   })
 
-#' @rdname nrow
-#' @param x Column to compute on
-#'
+#' @param x a Column.
+#' @rdname n
--- End diff --

Perhaps count is a better name? I investigated a little more. In the 
generated docs, each, say \name{count} and \alias{count} can only appear in one 
doc. Otherwise the check will complain duplicate names or aliases found in 1.rd 
and 2.rd.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-19 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75437423
  
--- Diff: R/pkg/R/functions.R ---
@@ -1335,7 +1336,7 @@ setMethod("rtrim",
 #' @note sd since 1.6.0
 setMethod("sd",
   signature(x = "Column"),
-  function(x) {
+  function(x, na.rm) {
--- End diff --

Yeah, I have to say the main (only) purpose is documentation... 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-19 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75437149
  
--- Diff: R/pkg/R/mllib.R ---
@@ -917,14 +922,14 @@ setMethod("spark.lda", signature(data = 
"SparkDataFrame"),
 # Returns a summary of the AFT survival regression model produced by 
spark.survreg,
 # similarly to R's summary().
 
-#' @param object A fitted AFT survival regression model
+#' @param object a fitted AFT survival regression model.
 #' @return \code{summary} returns a list containing the model's 
coefficients,
 #' intercept and log(scale)
 #' @rdname spark.survreg
 #' @export
 #' @note summary(AFTSurvivalRegressionModel) since 2.0.0
 setMethod("summary", signature(object = "AFTSurvivalRegressionModel"),
-  function(object, ...) {
+  function(object) {
--- End diff --

Hmm that's a good point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-19 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75431046
  
--- Diff: R/pkg/R/functions.R ---
@@ -319,7 +316,7 @@ setMethod("column",
 #'
 #' Computes the Pearson Correlation Coefficient for two Columns.
 #'
-#' @param x Column to compute on.
+#' @param col2 a (second) Column object.
--- End diff --

Done. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75430253
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -727,6 +730,7 @@ dropTempView <- function(viewName) {
 #' @param source The name of external data source
 #' @param schema The data schema defined in structType
 #' @param na.strings Default string value for NA when source is "csv"
+#' @param ... additional external data source specific named propertie(s).
--- End diff --

ah sorry, I've no idea why this happened - pretty sure I've already 
corrected that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75430152
  
--- Diff: R/pkg/R/functions.R ---
@@ -1848,7 +1850,7 @@ setMethod("upper",
 #' @note var since 1.6.0
 setMethod("var",
   signature(x = "Column"),
-  function(x) {
+  function(x, y, na.rm, use) {
--- End diff --

This is done only for the purpose of documenting `y, na.rm, use`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75430048
  
--- Diff: R/pkg/R/functions.R ---
@@ -3115,6 +3166,11 @@ setMethod("dense_rank",
 #'
 #' This is equivalent to the LAG function in SQL.
 #'
+#' @param x the column as a character string or a Column to compute on.
+#' @param offset the number of rows back from the current row from which 
to obtain a value.
+#'   If not specified, the default is 1.
+#' @param defaultValue default to use when the offset row does not exist.
+#' @param ... further arguments to be passed to or from other methods.
--- End diff --

Since we can only give one description, I guess "further arguments" might 
sound more reasonable?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r7543
  
--- Diff: R/pkg/R/functions.R ---
@@ -3115,6 +3166,11 @@ setMethod("dense_rank",
 #'
 #' This is equivalent to the LAG function in SQL.
 #'
+#' @param x the column as a character string or a Column to compute on.
+#' @param offset the number of rows back from the current row from which 
to obtain a value.
+#'   If not specified, the default is 1.
+#' @param defaultValue default to use when the offset row does not exist.
+#' @param ... further arguments to be passed to or from other methods.
--- End diff --

This is actually the issue I raised at the end of #14558. In the same doc, 
it also includes generic definition, which also comes with `...` and this has 
the actual meaning "further arguments to be passed". 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75429772
  
--- Diff: R/pkg/R/mllib.R ---
@@ -620,11 +625,12 @@ setMethod("predict", signature(object = 
"KMeansModel"),
 #' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
 #' Only categorical data is supported.
 #'
-#' @param data A \code{SparkDataFrame} of observations and labels for 
model fitting
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#' @param data a \code{SparkDataFrame} of observations and labels for 
model fitting.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
 #'   operators are supported, including '~', '.', ':', '+', 
and '-'.
-#' @param smoothing Smoothing parameter
-#' @return \code{spark.naiveBayes} returns a fitted naive Bayes model
+#' @param ... additional argument(s) passed to the method. Currently only 
\code{smoothing}.
--- End diff --

Thanks. I guess this might be caused by the merging...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75429664
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3187,6 +3221,7 @@ setMethod("histogram",
 #' @param x A SparkDataFrame
 #' @param url JDBC database url of the form `jdbc:subprotocol:subname`
 #' @param tableName The name of the table in the external database
+#' @param ... additional JDBC database connection properties.
--- End diff --

Done. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75429658
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3003,9 +3036,10 @@ setMethod("str",
 #' Returns a new SparkDataFrame with columns dropped.
 #' This is a no-op if schema doesn't contain column name(s).
 #'
-#' @param x A SparkDataFrame.
-#' @param cols A character vector of column names or a Column.
-#' @return A SparkDataFrame
+#' @param x a SparkDataFrame.
+#' @param ... further arguments to be passed to or from other methods.
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75429536
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2464,8 +2489,10 @@ setMethod("unionAll",
 #' Union two or more SparkDataFrames. This is equivalent to `UNION ALL` in 
SQL.
 #' Note that this does not remove duplicate rows across the two 
SparkDataFrames.
 #'
-#' @param x A SparkDataFrame
-#' @param ... Additional SparkDataFrame
+#' @param x a SparkDataFrame.
+#' @param ... additional SparkDataFrame(s).
+#' @param deparse.level currently not used (put here to match the 
signature of
--- End diff --

Here is in accordance with the order of arguments in function declaration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75429434
  
--- Diff: R/pkg/R/generics.R ---
@@ -735,6 +752,8 @@ setGeneric("between", function(x, bounds) { 
standardGeneric("between") })
 setGeneric("cast", function(x, dataType) { standardGeneric("cast") })
 
 #' @rdname columnfunctions
+#' @param x a Column object.
+#' @param ... additional argument(s).
--- End diff --

It seems the functions are defined in the following way:

column_functions2 <- c("like", "rlike", "getField", "getItem", "contains")

createColumnFunction2 <- function(name) {
  setMethod(name,
signature(x = "Column"),
function(x, data) {
  if (class(data) == "Column") {
data <- data@jc
  }
  jc <- callJMethod(x@jc, name, data)
  column(jc)
})
}

It seesm the functions are not exported. Perhaps we still leave it there?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75429158
  
--- Diff: R/pkg/R/mllib.R ---
@@ -917,14 +922,14 @@ setMethod("spark.lda", signature(data = 
"SparkDataFrame"),
 # Returns a summary of the AFT survival regression model produced by 
spark.survreg,
 # similarly to R's summary().
 
-#' @param object A fitted AFT survival regression model
+#' @param object a fitted AFT survival regression model.
 #' @return \code{summary} returns a list containing the model's 
coefficients,
 #' intercept and log(scale)
 #' @rdname spark.survreg
 #' @export
 #' @note summary(AFTSurvivalRegressionModel) since 2.0.0
 setMethod("summary", signature(object = "AFTSurvivalRegressionModel"),
-  function(object, ...) {
+  function(object) {
--- End diff --

probably I was more wondering why CRAN check didn't flag this...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75429017
  
--- Diff: R/pkg/R/mllib.R ---
@@ -504,14 +504,15 @@ setMethod("summary", signature(object = 
"IsotonicRegressionModel"),
 #' Users can call \code{summary} to print a summary of the fitted model, 
\code{predict} to make
 #' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
 #'
-#' @param data SparkDataFrame for training
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#' @param data a SparkDataFrame for training.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
 #'operators are supported, including '~', '.', ':', '+', 
and '-'.
 #'Note that the response variable of formula is empty in 
spark.kmeans.
-#' @param k Number of centers
-#' @param maxIter Maximum iteration number
-#' @param initMode The initialization algorithm choosen to fit the model
-#' @return \code{spark.kmeans} returns a fitted k-means model
+#' @param ... additional argument(s) passed to the method.
--- End diff --

Yeah, didn't notice this. Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75427819
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1719,12 +1732,13 @@ setMethod("[", signature(x = "SparkDataFrame"),
 #' Subset
 #'
 #' Return subsets of SparkDataFrame according to given conditions
-#' @param x A SparkDataFrame
-#' @param subset (Optional) A logical expression to filter on rows
-#' @param select expression for the single Column or a list of columns to 
select from the SparkDataFrame
+#' @param x a SparkDataFrame.
+#' @param i,subset (Optional) a logical expression to filter on rows.
+#' @param j,select expression for the single Column or a list of columns 
to select from the SparkDataFrame.
--- End diff --

perhaps better to rename `i`->`subset` and `j`->`select`?
I didn't find a reason for having `i` and `j` 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75427717
  
--- Diff: R/pkg/R/mllib.R ---
@@ -917,14 +922,14 @@ setMethod("spark.lda", signature(data = 
"SparkDataFrame"),
 # Returns a summary of the AFT survival regression model produced by 
spark.survreg,
 # similarly to R's summary().
 
-#' @param object A fitted AFT survival regression model
+#' @param object a fitted AFT survival regression model.
 #' @return \code{summary} returns a list containing the model's 
coefficients,
 #' intercept and log(scale)
 #' @rdname spark.survreg
 #' @export
 #' @note summary(AFTSurvivalRegressionModel) since 2.0.0
 setMethod("summary", signature(object = "AFTSurvivalRegressionModel"),
-  function(object, ...) {
+  function(object) {
--- End diff --

We have `...` for `summary`? That is used to match the `base::summary` 
signature. I am not completely sure about the exact reason, but I read from the 
[doc for 
Methods](https://stat.ethz.ch/R-manual/R-devel/library/methods/html/Methods.html)
 saying 

"By default, the signature of the generic consists of all the formal 
arguments except ..., in the order they appear in the function definition." 

Does that perhaps explain that behavior?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75427619
  
--- Diff: R/pkg/R/functions.R ---
@@ -1848,7 +1850,7 @@ setMethod("upper",
 #' @note var since 1.6.0
 setMethod("var",
   signature(x = "Column"),
-  function(x) {
+  function(x, y, na.rm, use) {
--- End diff --

please see the example for `sd`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75427590
  
--- Diff: R/pkg/R/functions.R ---
@@ -1335,7 +1336,7 @@ setMethod("rtrim",
 #' @note sd since 1.6.0
 setMethod("sd",
   signature(x = "Column"),
-  function(x) {
+  function(x, na.rm) {
--- End diff --

It seems to work:
```
> setGeneric("sd", function(x, na.rm = FALSE) { standardGeneric("sd") })
Creating a new generic function for ‘sd’ in the global environment
[1] "sd"
> setMethod("sd", signature(x = "character"), function(x) { print("blah") })
[1] "sd"
> sd(1)
[1] NA
> sd(1:2)
[1] 0.7071068
> sd("abc")
[1] "blah"
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75427019
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1202,6 +1215,7 @@ setMethod("toRDD",
 #' Groups the SparkDataFrame using the specified columns, so we can run 
aggregation on them.
 #'
 #' @param x a SparkDataFrame
+#' @param ... variable(s) (character names(s) or Column(s)) to group on.
 #' @return a GroupedData
--- End diff --

I basically follow the convention of the docs of many R base functions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75426918
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1202,6 +1215,7 @@ setMethod("toRDD",
 #' Groups the SparkDataFrame using the specified columns, so we can run 
aggregation on them.
 #'
 #' @param x a SparkDataFrame
+#' @param ... variable(s) (character names(s) or Column(s)) to group on.
 #' @return a GroupedData
--- End diff --

Yeah, thanks for the catch. This makes perfect sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75425834
  
--- Diff: R/pkg/R/functions.R ---
@@ -362,8 +357,8 @@ setMethod("cov", signature(x = "characterOrColumn"),
 
 #' @rdname cov
 #'
-#' @param col1 First column to compute cov_samp.
-#' @param col2 Second column to compute cov_samp.
+#' @param col1 the first Column object.
+#' @param col2 the second Column object.
--- End diff --

I'd say that applies to a couple of other cases in WindowsSpec.R or 
column.R too, but I'm ok either way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75425765
  
--- Diff: R/pkg/R/functions.R ---
@@ -362,8 +357,8 @@ setMethod("cov", signature(x = "characterOrColumn"),
 
 #' @rdname cov
 #'
-#' @param col1 First column to compute cov_samp.
-#' @param col2 Second column to compute cov_samp.
+#' @param col1 the first Column object.
+#' @param col2 the second Column object.
--- End diff --

I'd just say "the first Column", "the second Column` (without object) as in 
other places


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75425753
  
--- Diff: R/pkg/R/functions.R ---
@@ -319,7 +316,7 @@ setMethod("column",
 #'
 #' Computes the Pearson Correlation Coefficient for two Columns.
 #'
-#' @param x Column to compute on.
+#' @param col2 a (second) Column object.
--- End diff --

I'd just say "a (second) Column` (without object) as in other places


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75425616
  
--- Diff: R/pkg/R/functions.R ---
@@ -1273,12 +1271,15 @@ setMethod("round",
 #' bround
 #'
 #' Returns the value of the column `e` rounded to `scale` decimal places 
using HALF_EVEN rounding
-#' mode if `scale` >= 0 or at integral part when `scale` < 0.
+#' mode if `scale` >= 0 or at integer part when `scale` < 0.
 #' Also known as Gaussian rounding or bankers' rounding that rounds to the 
nearest even number.
 #' bround(2.5, 0) = 2, bround(3.5, 0) = 4.
 #'
 #' @param x Column to compute on.
-#'
+#' @param scale round to \code{scale} digits to the right of the decimal 
point when \code{scale} > 0,
+#'the nearest even number when \code{scale} = 0, and `scale` 
digits to the left
--- End diff --

do you want `\code{scale}` in place of "`"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75424988
  
--- Diff: R/pkg/R/functions.R ---
@@ -2276,9 +2276,8 @@ setMethod("n_distinct", signature(x = "Column"),
 countDistinct(x, ...)
   })
 
-#' @rdname nrow
-#' @param x Column to compute on
-#'
+#' @param x a Column.
+#' @rdname n
--- End diff --

this is an alias for `count` - suggest `@rdname count` here and in 
generics.R


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75424491
  
--- Diff: R/pkg/R/functions.R ---
@@ -832,7 +827,10 @@ setMethod("kurtosis",
 #' The function by default returns the last values it sees. It will return 
the last non-missing
 #' value it sees when na.rm is set to true. If all values are missing, 
then NA is returned.
 #'
-#' @param x Column to compute on.
+#' @param x column to compute on.
+#' @param na.rm a logical value indicating whether NA values should be 
stripped
+#'before the computation proceeds.
+#' @param ... further arguments to be passed to or from other methods.
--- End diff --

should this be
`@param ... currently not used.`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75424452
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -727,6 +730,7 @@ dropTempView <- function(viewName) {
 #' @param source The name of external data source
 #' @param schema The data schema defined in structType
 #' @param na.strings Default string value for NA when source is "csv"
+#' @param ... additional external data source specific named propertie(s).
--- End diff --

as before, `properties`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75424331
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -514,9 +519,10 @@ setMethod("registerTempTable",
 #'
 #' Insert the contents of a SparkDataFrame into a table registered in the 
current SparkSession.
 #'
-#' @param x A SparkDataFrame
-#' @param tableName A character vector containing the name of the table
-#' @param overwrite A logical argument indicating whether or not to 
overwrite
+#' @param x a SparkDataFrame.
+#' @param tableName a character vector containing the name of the table.
+#' @param overwrite a logical argument indicating whether or not to 
overwrite.
+#' @param ... further arguments to be passed to or from other methods.
--- End diff --

should this be
`@param ... currently not used.`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75424346
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -999,9 +1008,10 @@ setMethod("dim",
 
 #' Collects all the elements of a SparkDataFrame and coerces them into an 
R data.frame.
 #'
-#' @param x A SparkDataFrame
-#' @param stringsAsFactors (Optional) A logical indicating whether or not 
string columns
+#' @param x a SparkDataFrame.
+#' @param stringsAsFactors (Optional) a logical indicating whether or not 
string columns
 #' should be converted to factors. FALSE by default.
+#' @param ... further arguments to be passed to or from other methods.
--- End diff --

should this be
`@param ... currently not used.`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75424338
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -603,8 +611,9 @@ setMethod("persist",
 #' Mark this SparkDataFrame as non-persistent, and remove all blocks for 
it from memory and
 #' disk.
 #'
-#' @param x The SparkDataFrame to unpersist
-#' @param blocking Whether to block until all blocks are deleted
+#' @param x the SparkDataFrame to unpersist.
+#' @param blocking whether to block until all blocks are deleted.
+#' @param ... further arguments to be passed to or from other methods.
--- End diff --

should this be
`@param ... currently not used.`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75424376
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2759,31 +2793,27 @@ setMethod("dropna",
 dataFrame(sdf)
   })
 
+#' @param object a SparkDataFrame.
+#' @param ... further arguments to be passed to or from other methods.
--- End diff --

should this be
`@param ... currently not used.`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75424317
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -120,8 +120,9 @@ setMethod("schema",
 #'
 #' Print the logical and physical Catalyst plans to the console for 
debugging.
 #'
-#' @param x A SparkDataFrame
+#' @param x a SparkDataFrame.
 #' @param extended Logical. If extended is FALSE, explain() only prints 
the physical plan.
+#' @param ... further arguments to be passed to or from other methods.
--- End diff --

should this be
`@param ... currently not used.`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75424303
  
--- Diff: R/pkg/R/functions.R ---
@@ -1273,12 +1271,15 @@ setMethod("round",
 #' bround
 #'
 #' Returns the value of the column `e` rounded to `scale` decimal places 
using HALF_EVEN rounding
-#' mode if `scale` >= 0 or at integral part when `scale` < 0.
+#' mode if `scale` >= 0 or at integer part when `scale` < 0.
 #' Also known as Gaussian rounding or bankers' rounding that rounds to the 
nearest even number.
 #' bround(2.5, 0) = 2, bround(3.5, 0) = 4.
 #'
 #' @param x Column to compute on.
-#'
+#' @param scale round to \code{scale} digits to the right of the decimal 
point when \code{scale} > 0,
+#'the nearest even number when \code{scale} = 0, and `scale` 
digits to the left
+#'of the decimal point when \code{scale} < 0.
+#' @param ... further arguments to be passed to or from other methods.
--- End diff --

should this be
`@param ... currently not used.`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75424295
  
--- Diff: R/pkg/R/functions.R ---
@@ -1335,7 +1336,7 @@ setMethod("rtrim",
 #' @note sd since 1.6.0
 setMethod("sd",
   signature(x = "Column"),
-  function(x) {
+  function(x, na.rm) {
--- End diff --

does `function(x, ...)` work here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75424253
  
--- Diff: R/pkg/R/functions.R ---
@@ -1848,7 +1850,7 @@ setMethod("upper",
 #' @note var since 1.6.0
 setMethod("var",
   signature(x = "Column"),
-  function(x) {
+  function(x, y, na.rm, use) {
--- End diff --

does `function(x, ...)` work here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75424182
  
--- Diff: R/pkg/R/functions.R ---
@@ -2114,20 +2116,22 @@ setMethod("pmod", signature(y = "Column"),
 #' @rdname approxCountDistinct
 #' @name approxCountDistinct
 #'
+#' @param x Column to compute on.
 #' @param rsd maximum estimation error allowed (default = 0.05)
+#' @param ... further arguments to be passed to or from other methods.
--- End diff --

should this be
`@param ... currently not used.`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75424170
  
--- Diff: R/pkg/R/functions.R ---
@@ -2676,6 +2679,11 @@ setMethod("format_string", signature(format = 
"character", x = "Column"),
 #' representing the timestamp of that moment in the current system time 
zone in the given
 #' format.
 #'
+#' @param x a Column of unix timestamp.
+#' @param format the target format. See
+#'   
\href{http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html}{
+#'   Customizing Formats} for available options.
+#' @param ... further arguments to be passed to or from other methods.
--- End diff --

should this be
`@param ... currently not used.`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75424161
  
--- Diff: R/pkg/R/functions.R ---
@@ -2702,19 +2710,21 @@ setMethod("from_unixtime", signature(x = "Column"),
 #' [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond 
precision. Windows in
 #' the order of months are not supported.
 #'
-#' The time column must be of TimestampType.
-#'
-#' Durations are provided as strings, e.g. '1 second', '1 day 12 hours', 
'2 minutes'. Valid
-#' interval strings are 'week', 'day', 'hour', 'minute', 'second', 
'millisecond', 'microsecond'.
-#' If the `slideDuration` is not provided, the windows will be tumbling 
windows.
-#'
-#' The startTime is the offset with respect to 1970-01-01 00:00:00 UTC 
with which to start
-#' window intervals. For example, in order to have hourly tumbling windows 
that start 15 minutes
-#' past the hour, e.g. 12:15-13:15, 13:15-14:15... provide `startTime` as 
`15 minutes`.
-#'
-#' The output column will be a struct called 'window' by default with the 
nested columns 'start'
-#' and 'end'.
-#'
+#' @param x a time Column. Must be of TimestampType.
+#' @param windowDuration a string specifying the width of the window, e.g. 
'1 second',
+#'   '1 day 12 hours', '2 minutes'. Valid interval 
strings are 'week',
+#'   'day', 'hour', 'minute', 'second', 'millisecond', 
'microsecond'.
+#' @param slideDuration a string specifying the sliding interval of the 
window. Same format as
+#'  \code{windowDuration}. A new window will be 
generated every
+#'  \code{slideDuration}. Must be less than or equal to
+#'  the \code{windowDuration}.
+#' @param startTime the offset with respect to 1970-01-01 00:00:00 UTC 
with which to start
+#'  window intervals. For example, in order to have hourly 
tumbling windows
+#'  that start 15 minutes past the hour, e.g. 12:15-13:15, 
13:15-14:15... provide
+#'  \code{startTime} as \code{"15 minutes"}.
+#' @param ... further arguments to be passed to or from other methods.
--- End diff --

should this be
`@param ... currently not used.`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75424152
  
--- Diff: R/pkg/R/functions.R ---
@@ -2766,6 +2776,10 @@ setMethod("window", signature(x = "Column"),
 #' NOTE: The position is not zero based, but 1 based index, returns 0 if 
substr
 #' could not be found in str.
 #'
+#' @param substr a character string to be matched.
+#' @param str a Column where matches are sought for each entry.
+#' @param pos start position of search.
+#' @param ... further arguments to be passed to or from other methods.
--- End diff --

should this be
`@param ... currently not used.`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75424138
  
--- Diff: R/pkg/R/functions.R ---
@@ -3115,6 +3166,11 @@ setMethod("dense_rank",
 #'
 #' This is equivalent to the LAG function in SQL.
 #'
+#' @param x the column as a character string or a Column to compute on.
+#' @param offset the number of rows back from the current row from which 
to obtain a value.
+#'   If not specified, the default is 1.
+#' @param defaultValue default to use when the offset row does not exist.
+#' @param ... further arguments to be passed to or from other methods.
--- End diff --

should this be
`@param ... currently not used.`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75423807
  
--- Diff: R/pkg/R/mllib.R ---
@@ -620,11 +625,12 @@ setMethod("predict", signature(object = 
"KMeansModel"),
 #' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
 #' Only categorical data is supported.
 #'
-#' @param data A \code{SparkDataFrame} of observations and labels for 
model fitting
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#' @param data a \code{SparkDataFrame} of observations and labels for 
model fitting.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
 #'   operators are supported, including '~', '.', ':', '+', 
and '-'.
-#' @param smoothing Smoothing parameter
-#' @return \code{spark.naiveBayes} returns a fitted naive Bayes model
+#' @param ... additional argument(s) passed to the method. Currently only 
\code{smoothing}.
--- End diff --

@param ... should be last


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75423748
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3187,6 +3221,7 @@ setMethod("histogram",
 #' @param x A SparkDataFrame
 #' @param url JDBC database url of the form `jdbc:subprotocol:subname`
 #' @param tableName The name of the table in the external database
+#' @param ... additional JDBC database connection properties.
--- End diff --

@param ... should be last


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75423733
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3003,9 +3036,10 @@ setMethod("str",
 #' Returns a new SparkDataFrame with columns dropped.
 #' This is a no-op if schema doesn't contain column name(s).
 #'
-#' @param x A SparkDataFrame.
-#' @param cols A character vector of column names or a Column.
-#' @return A SparkDataFrame
+#' @param x a SparkDataFrame.
+#' @param ... further arguments to be passed to or from other methods.
--- End diff --

@param ... should be last


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75423717
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2464,8 +2489,10 @@ setMethod("unionAll",
 #' Union two or more SparkDataFrames. This is equivalent to `UNION ALL` in 
SQL.
 #' Note that this does not remove duplicate rows across the two 
SparkDataFrames.
 #'
-#' @param x A SparkDataFrame
-#' @param ... Additional SparkDataFrame
+#' @param x a SparkDataFrame.
+#' @param ... additional SparkDataFrame(s).
+#' @param deparse.level currently not used (put here to match the 
signature of
--- End diff --

@param ... should be last?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75423643
  
--- Diff: R/pkg/R/mllib.R ---
@@ -917,14 +922,14 @@ setMethod("spark.lda", signature(data = 
"SparkDataFrame"),
 # Returns a summary of the AFT survival regression model produced by 
spark.survreg,
 # similarly to R's summary().
 
-#' @param object A fitted AFT survival regression model
+#' @param object a fitted AFT survival regression model.
 #' @return \code{summary} returns a list containing the model's 
coefficients,
 #' intercept and log(scale)
 #' @rdname spark.survreg
 #' @export
 #' @note summary(AFTSurvivalRegressionModel) since 2.0.0
 setMethod("summary", signature(object = "AFTSurvivalRegressionModel"),
-  function(object, ...) {
+  function(object) {
--- End diff --

I really like the fact that we do not need to have `...` here in the func 
definition. But I'm not sure I understand why exactly - in generics.R we have 
`setGeneric("summary", function(object, ...) `, doesn't it mean we need to have 
 `...` here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75422922
  
--- Diff: R/pkg/R/generics.R ---
@@ -735,6 +752,8 @@ setGeneric("between", function(x, bounds) { 
standardGeneric("between") })
 setGeneric("cast", function(x, dataType) { standardGeneric("cast") })
 
 #' @rdname columnfunctions
+#' @param x a Column object.
+#' @param ... additional argument(s).
--- End diff --

not completely sure about `contains` - is it possible to put @param in the 
func definition?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75422686
  
--- Diff: R/pkg/R/mllib.R ---
@@ -504,14 +504,15 @@ setMethod("summary", signature(object = 
"IsotonicRegressionModel"),
 #' Users can call \code{summary} to print a summary of the fitted model, 
\code{predict} to make
 #' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
 #'
-#' @param data SparkDataFrame for training
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#' @param data a SparkDataFrame for training.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
 #'operators are supported, including '~', '.', ':', '+', 
and '-'.
 #'Note that the response variable of formula is empty in 
spark.kmeans.
-#' @param k Number of centers
-#' @param maxIter Maximum iteration number
-#' @param initMode The initialization algorithm choosen to fit the model
-#' @return \code{spark.kmeans} returns a fitted k-means model
+#' @param ... additional argument(s) passed to the method.
--- End diff --

roxygen would output this in order - don't we want `...` to be the last of 
param list?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75422499
  
--- Diff: R/pkg/R/mllib.R ---
@@ -917,14 +922,14 @@ setMethod("spark.lda", signature(data = 
"SparkDataFrame"),
 # Returns a summary of the AFT survival regression model produced by 
spark.survreg,
 # similarly to R's summary().
 
-#' @param object A fitted AFT survival regression model
+#' @param object a fitted AFT survival regression model.
 #' @return \code{summary} returns a list containing the model's 
coefficients,
 #' intercept and log(scale)
 #' @rdname spark.survreg
 #' @export
 #' @note summary(AFTSurvivalRegressionModel) since 2.0.0
 setMethod("summary", signature(object = "AFTSurvivalRegressionModel"),
-  function(object, ...) {
+  function(object) {
--- End diff --

I thought we need `...` in the generic for `summary` (since we have to 
match the base::summary signature and have variable parameters)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/14705#discussion_r75421912
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1202,6 +1215,7 @@ setMethod("toRDD",
 #' Groups the SparkDataFrame using the specified columns, so we can run 
aggregation on them.
 #'
 #' @param x a SparkDataFrame
+#' @param ... variable(s) (character names(s) or Column(s)) to group on.
 #' @return a GroupedData
--- End diff --

I know this is a super minor and I am not even used to R. But it seems the 
description for `@param` starts with  a lower case and the description for 
`@return` starts with an upper case assuming from the other codes here and what 
you are trying to change.

If this is correct, we should fix this too to `a GroupedData.` to avoid.
Also, it seems your fix is trying to adds `.` at the end of the 
descriptions for `@return` and `@param`? In this case,  we should fix `a 
SparkDataFrame` to `a SparkDataFrame.` above just in order to avoid a lot of 
small minor PRs to fix the documentation in the future.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/dupli...

2016-08-18 Thread junyangq
GitHub user junyangq opened a pull request:

https://github.com/apache/spark/pull/14705

[SPARK-16508][SparkR] Fix CRAN undocumented/duplicated arguments warnings.

## What changes were proposed in this pull request?

This PR tries to fix all the remaining "undocumented/duplicated arguments" 
warnings given by CRAN-check.

One left is doc for R `stats::glm` exported in SparkR. To mute that 
warning, we have to also provide document for all arguments of that non-SparkR 
function.

Some previous conversation is in #14558.

## How was this patch tested?

R unit test and `check-cran.sh` script (with no-test).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/junyangq/spark SPARK-16508-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14705.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14705


commit 8d606763fa3ebca8f7657384313f7805a3b086ad
Author: Junyang Qian 
Date:   2016-08-18T19:36:36Z

Fix CRAN undocumented/duplicated arguments warnings.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org