[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-18 Thread junyangq
Github user junyangq closed the pull request at:

https://github.com/apache/spark/pull/14558


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-18 Thread junyangq
GitHub user junyangq reopened a pull request:

https://github.com/apache/spark/pull/14558

[SPARK-16508][SparkR] Fix warnings on undocumented/duplicated arguments by 
CRAN-check

## What changes were proposed in this pull request?

This PR tries to fix all the remaining "undocumented/duplicated arguments" 
warnings given by CRAN-check.

## How was this patch tested?

R unit test and check-cran.sh script.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/junyangq/spark SPARK-16508-branch-2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14558.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14558


commit 82e2f09517e9f3d726af0046d251748f892f59c8
Author: Junyang Qian 
Date:   2016-08-09T04:52:34Z

Fix part of undocumented/duplicated arguments warnings by CRAN-check

commit 41d9dcac3e1d3ee1a27fe094ebb60c1c18d6bcff
Author: Mariusz Strzelecki 
Date:   2016-08-09T16:44:43Z

[SPARK-16950] [PYSPARK] fromOffsets parameter support in 
KafkaUtils.createDirectStream for python3

## What changes were proposed in this pull request?

Ability to use KafkaUtils.createDirectStream with starting offsets in 
python 3 by using java.lang.Number instead of Long during param mapping in 
scala helper. This allows py4j to pass Integer or Long to the map and resolves 
ClassCastException problems.

## How was this patch tested?

unit tests

jerryshao  - could you please look at this PR?

Author: Mariusz Strzelecki 

Closes #14540 from szczeles/kafka_pyspark.

(cherry picked from commit 29081b587f3423bf5a3e0066357884d0c26a04bf)
Signed-off-by: Davies Liu 

commit 44115e90ef2a80d8ecf3965b97ce7bee21e29158
Author: Josh Rosen 
Date:   2016-08-09T18:21:45Z

[SPARK-16956] Make ApplicationState.MAX_NUM_RETRY configurable

## What changes were proposed in this pull request?

This patch introduces a new configuration, 
`spark.deploy.maxExecutorRetries`, to let users configure an obscure behavior 
in the standalone master where the master will kill Spark applications which 
have experienced too many back-to-back executor failures. The current setting 
is a hardcoded constant (10); this patch replaces that with a new cluster-wide 
configuration.

**Background:** This application-killing was added in 
6b5980da796e0204a7735a31fb454f312bc9daac (from September 2012) and I believe 
that it was designed to prevent a faulty application whose executors could 
never launch from DOS'ing the Spark cluster via an infinite series of executor 
launch attempts. In a subsequent patch (#1360), this feature was refined to 
prevent applications which have running executors from being killed by this 
code path.

**Motivation for making this configurable:** Previously, if a Spark 
Standalone application experienced more than `ApplicationState.MAX_NUM_RETRY` 
executor failures and was left with no executors running then the Spark master 
would kill that application, but this behavior is problematic in environments 
where the Spark executors run on unstable infrastructure and can all 
simultaneously die. For instance, if your Spark driver runs on an on-demand EC2 
instance while all workers run on ephemeral spot instances then it's possible 
for all executors to die at the same time while the driver stays alive. In this 
case, it may be desirable to keep the Spark application alive so that it can 
recover once new workers and executors are available. In order to accommodate 
this use-case, this patch modifies the Master to never kill faulty applications 
if `spark.deploy.maxExecutorRetries` is negative.

I'd like to merge this patch into master, branch-2.0, and branch-1.6.

## How was this patch tested?

I tested this manually using `spark-shell` and `local-cluster` mode. This 
is a tricky feature to unit test and historically this code has not changed 
very often, so I'd prefer to skip the additional effort of adding a testing 
framework and would rather rely on manual tests and review for now.

Author: Josh Rosen 

Closes #14544 from JoshRosen/add-setting-for-max-executor-failures.

(cherry picked from commit b89b3a5c8e391fcaebe7ef3c77ef16bb9431d6ab)
Signed-off-by: Josh Rosen 

commit 2d136dba415a40a04598068ac2cea0490a6fd091
Author: Davies Liu 
Date:   2016-08-09T17:04:36Z

[SPARK-16905] SQL DDL: MSCK REPAIR TABLE

MSCK REPAIR TABLE could be used to recover the partitions in external 
catalog based on partitions in 

[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-18 Thread junyangq
Github user junyangq closed the pull request at:

https://github.com/apache/spark/pull/14558


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-17 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r75215338
  
--- Diff: R/pkg/R/generics.R ---
@@ -1251,10 +1311,57 @@ setGeneric("year", function(x) { 
standardGeneric("year") })
 #' @export
 setGeneric("spark.glm", function(data, formula, ...) { 
standardGeneric("spark.glm") })
 
+#' @param formula a symbolic description of the model to be fitted. If 
\code{data} is a
+#'SparkDataFrame, currently only a few formula operators 
are supported,
+#'including '~', '.', ':', '+', and '-'.
+#' @param data a SparkDataFrame or (R glm) data.frame, list or environment 
for training.
+#' @param family a description of the error distribution and link function 
to be used in the model.
+#'   This can be a character string naming a family function, 
a family function or
+#'   the result of a call to a family function. Refer R family 
at
+#'   
\url{https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html}.
+#' @param epsilon positive convergence tolerance of iterations.
+#' @param maxit integer giving the maximal number of IRLS iterations.
+#' @param weights an optional vector of 'prior weights' to be used in the 
fitting process.
--- End diff --

This is from R's own `glm` doc. Why do we need to copy it here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-17 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r75182227
  
--- Diff: .gitignore ---
@@ -77,3 +77,8 @@ spark-warehouse/
 # For R session data
 .RData
 .RHistory
+.Rhistory
--- End diff --

+1. This is not part of this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-17 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r75123714
  
--- Diff: R/pkg/R/generics.R ---
@@ -1251,10 +1311,57 @@ setGeneric("year", function(x) { 
standardGeneric("year") })
 #' @export
 setGeneric("spark.glm", function(data, formula, ...) { 
standardGeneric("spark.glm") })
 
+#' @param formula a symbolic description of the model to be fitted. If 
\code{data} is a
+#'SparkDataFrame, currently only a few formula operators 
are supported,
+#'including '~', '.', ':', '+', and '-'.
+#' @param data a SparkDataFrame or (R glm) data.frame, list or environment 
for training.
+#' @param family a description of the error distribution and link function 
to be used in the model.
+#'   This can be a character string naming a family function, 
a family function or
+#'   the result of a call to a family function. Refer R family 
at
+#'   
\url{https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html}.
+#' @param epsilon positive convergence tolerance of iterations.
+#' @param maxit integer giving the maximal number of IRLS iterations.
+#' @param weights an optional vector of 'prior weights' to be used in the 
fitting process.
+#'Should be NULL or a numeric vector.
+#' @param subset an optional vector specifying a subset of observations to 
be used in the
+#'   fitting process.
+#' @param na.action a function which indicates what should happen when the 
data contain NAs.
+#'  The default is set by the na.action setting of 
options, and is na.fail
+#'  if that is unset. The 'factory-fresh' default is 
na.omit. Another possible
+#'  value is NULL, no action. Value na.exclude can be 
useful.
+#' @param start starting values for the parameters in the linear predictor.
+#' @param etastart starting values for the linear predictor.
+#' @param mustart starting values for the vector of means.
+#' @param offset this can be used to specify an a priori known component 
to be included in
+#'   the linear predictor during fitting. This should be NULL 
or
+#'   a numeric vector of length equal to the number of cases. 
One or more offset
+#'   terms can be included in the formula instead or as well, 
and if more than
+#'   one is specified their sum is used. See model.offset.
+#' @param control a list of parameters for controlling the fitting 
process. For glm.fit
+#'this is passed to glm.control.
+#' @param model a logical value indicating whether model frame should be 
included as
+#'  a component of the returned value.
+#' @param method the method to be used in fitting the model. The default 
method
+#'   "glm.fit" uses iteratively reweighted least squares 
(IWLS): the alternative
+#'   "model.frame" returns the model frame and does no fitting.
+#'   User-supplied fitting functions can be supplied either as 
a function or
+#'   a character string naming a function, with a function 
which takes the same
+#'   arguments as glm.fit. If specified as a character string 
it is looked up from
+#'   within the stats namespace.
+#' @param x,y logical values indicating whether the response vector and 
model matrix
+#'used in the fitting process should be returned as components 
of the returned value.
+#' @param contrasts an optional list. See the contrasts.arg of 
model.matrix.default.
+#' @param ...  arguments to be used to form the default control 
argument if it is
+#'not supplied directly.
 #' @rdname glm
+#' @details If \code{data} is a data.frame, list or environment, 
\code{glm} behaves the same as
+#'  \code{glm} in the \code{stats} package. If \code{data} is a 
SparkDataFrame,
+#'  \code{spark.glm} is called.
 #' @export
 setGeneric("glm")
 
+#' @param object a fitted ML model object.
+#' @param ... additional argument(s) passed to the method.
--- End diff --

sorry, you are right - as you can see - having in 2 places are hard to 
follow :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-17 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r75124591
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -510,9 +510,7 @@ setMethod("registerTempTable",
 #'
 #' Insert the contents of a SparkDataFrame into a table registered in the 
current SparkSession.
 #'
-#' @param x A SparkDataFrame
-#' @param tableName A character vector containing the name of the table
--- End diff --

I agree. that is an unfortunate side-effect. I wonder if there is a way to 
order this in the Rd generated roxygen; maybe this could be a reasonable PR to 
change roxygen for special casing `...`.
The only other solution I've seen would be to hand craft a .Rd file...

For now I think we should prioritize maintainability and keep the doc close 
to the function as much as possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-17 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r75123349
  
--- Diff: R/pkg/R/mllib.R ---
@@ -602,14 +599,14 @@ setMethod("spark.survreg", signature(data = 
"SparkDataFrame", formula = "formula
 # Returns a summary of the AFT survival regression model produced by 
spark.survreg,
 # similarly to R's summary().
 
-#' @param object A fitted AFT survival regression model
+#' @param object a fitted AFT survival regression model.
 #' @return \code{summary} returns a list containing the model's 
coefficients,
 #' intercept and log(scale)
 #' @rdname spark.survreg
 #' @export
 #' @note summary(AFTSurvivalRegressionModel) since 2.0.0
 setMethod("summary", signature(object = "AFTSurvivalRegressionModel"),
-  function(object, ...) {
+  function(object) {
--- End diff --

sorry, no, not the generic, just here. I think this is fine


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-16 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r75025476
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -510,9 +510,7 @@ setMethod("registerTempTable",
 #'
 #' Insert the contents of a SparkDataFrame into a table registered in the 
current SparkSession.
 #'
-#' @param x A SparkDataFrame
-#' @param tableName A character vector containing the name of the table
--- End diff --

Yes, I agree that ideally the doc should be kept near the function 
definition/body, and that's consistent with many other functions. So then the 
only issue is `...` would become first in the argument list of the doc. I'm not 
sure if that would bother the user to some extent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-16 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74974091
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -510,9 +510,7 @@ setMethod("registerTempTable",
 #'
 #' Insert the contents of a SparkDataFrame into a table registered in the 
current SparkSession.
 #'
-#' @param x A SparkDataFrame
-#' @param tableName A character vector containing the name of the table
--- End diff --

I see.

We typically have
```
setGeneric("foo", function(x, ...) {...}

setMethod("foo", signature(x = "A"), function(x, a, b, moreParameters) {...}
```

In many cases we have setGeneric the fewer number of parameters - whether 
it is because it has to match an existing generic or some parameters we don't 
put in the signature with type - so we would have a parameter and `...` in the 
generic.

I see your point about if the parameter is in the generic then perhaps we 
should @param document it there as well.

I think we should try to keep it near the function definition/body though 
because
1) that's where the parameter is actually being used and its meaning could 
change
2) it would be easier to review - we have a few changes out where the first 
parameter does not have a @param (it should help when we turn on check-cran in 
Jenkins)
3) many of our functions are like this so if we are to move the @param for 
the first parameter to generics.R (if there is `...` there) then there will be 
dozens or hundreds of lines changing (eg. bround, contains, subset and many 
more)

with the exception where we've talked about the generic applies to multiple 
function definitions with different classes and that first parameter could be 
in different classes so it needs a central place.

What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-15 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74874929
  
--- Diff: R/pkg/R/functions.R ---
@@ -1143,7 +1139,7 @@ setMethod("minute",
 #' @export
 #' @examples \dontrun{select(df, monotonically_increasing_id())}
 setMethod("monotonically_increasing_id",
-  signature(x = "missing"),
+  signature(),
--- End diff --

Automatic generation of S4 methods is not desirable. I hope this case can 
be better handled by roxygen. For now, I agree (b) is a good solution to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-15 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74874081
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -181,7 +181,7 @@ getDefaultSqlSource <- function() {
 #' @method createDataFrame default
 #' @note createDataFrame since 1.4.0
 # TODO(davies): support sampling and infer type from NA
-createDataFrame.default <- function(data, schema = NULL, samplingRatio = 
1.0) {
+createDataFrame.default <- function(data, schema = NULL) {
--- End diff --

Oh yes... Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-15 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74873867
  
--- Diff: R/pkg/R/mllib.R ---
@@ -298,14 +304,15 @@ setMethod("summary", signature(object = 
"NaiveBayesModel"),
 #' Users can call \code{summary} to print a summary of the fitted model, 
\code{predict} to make
 #' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
 #'
-#' @param data SparkDataFrame for training
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#' @param data a SparkDataFrame for training.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
 #'operators are supported, including '~', '.', ':', '+', 
and '-'.
 #'Note that the response variable of formula is empty in 
spark.kmeans.
-#' @param k Number of centers
-#' @param maxIter Maximum iteration number
-#' @param initMode The initialization algorithm choosen to fit the model
-#' @return \code{spark.kmeans} returns a fitted k-means model
+#' @param ... additional argument(s) passed to the method.
--- End diff --

Yeah agreed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-15 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74729338
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -510,9 +510,7 @@ setMethod("registerTempTable",
 #'
 #' Insert the contents of a SparkDataFrame into a table registered in the 
current SparkSession.
 #'
-#' @param x A SparkDataFrame
-#' @param tableName A character vector containing the name of the table
--- End diff --

The reasons are

1. It's generic counterpart has `...` and we have to document for that.

2. I think it may be preferred that `...` follows other explicit arguments 
in order?

Does that make sense?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-15 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74729028
  
--- Diff: R/pkg/R/generics.R ---
@@ -1251,10 +1311,57 @@ setGeneric("year", function(x) { 
standardGeneric("year") })
 #' @export
 setGeneric("spark.glm", function(data, formula, ...) { 
standardGeneric("spark.glm") })
 
+#' @param formula a symbolic description of the model to be fitted. If 
\code{data} is a
+#'SparkDataFrame, currently only a few formula operators 
are supported,
+#'including '~', '.', ':', '+', and '-'.
+#' @param data a SparkDataFrame or (R glm) data.frame, list or environment 
for training.
+#' @param family a description of the error distribution and link function 
to be used in the model.
+#'   This can be a character string naming a family function, 
a family function or
+#'   the result of a call to a family function. Refer R family 
at
+#'   
\url{https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html}.
+#' @param epsilon positive convergence tolerance of iterations.
+#' @param maxit integer giving the maximal number of IRLS iterations.
+#' @param weights an optional vector of 'prior weights' to be used in the 
fitting process.
+#'Should be NULL or a numeric vector.
+#' @param subset an optional vector specifying a subset of observations to 
be used in the
+#'   fitting process.
+#' @param na.action a function which indicates what should happen when the 
data contain NAs.
+#'  The default is set by the na.action setting of 
options, and is na.fail
+#'  if that is unset. The 'factory-fresh' default is 
na.omit. Another possible
+#'  value is NULL, no action. Value na.exclude can be 
useful.
+#' @param start starting values for the parameters in the linear predictor.
+#' @param etastart starting values for the linear predictor.
+#' @param mustart starting values for the vector of means.
+#' @param offset this can be used to specify an a priori known component 
to be included in
+#'   the linear predictor during fitting. This should be NULL 
or
+#'   a numeric vector of length equal to the number of cases. 
One or more offset
+#'   terms can be included in the formula instead or as well, 
and if more than
+#'   one is specified their sum is used. See model.offset.
+#' @param control a list of parameters for controlling the fitting 
process. For glm.fit
+#'this is passed to glm.control.
+#' @param model a logical value indicating whether model frame should be 
included as
+#'  a component of the returned value.
+#' @param method the method to be used in fitting the model. The default 
method
+#'   "glm.fit" uses iteratively reweighted least squares 
(IWLS): the alternative
+#'   "model.frame" returns the model frame and does no fitting.
+#'   User-supplied fitting functions can be supplied either as 
a function or
+#'   a character string naming a function, with a function 
which takes the same
+#'   arguments as glm.fit. If specified as a character string 
it is looked up from
+#'   within the stats namespace.
+#' @param x,y logical values indicating whether the response vector and 
model matrix
+#'used in the fitting process should be returned as components 
of the returned value.
+#' @param contrasts an optional list. See the contrasts.arg of 
model.matrix.default.
+#' @param ...  arguments to be used to form the default control 
argument if it is
+#'not supplied directly.
 #' @rdname glm
+#' @details If \code{data} is a data.frame, list or environment, 
\code{glm} behaves the same as
+#'  \code{glm} in the \code{stats} package. If \code{data} is a 
SparkDataFrame,
+#'  \code{spark.glm} is called.
 #' @export
 setGeneric("glm")
 
+#' @param object a fitted ML model object.
+#' @param ... additional argument(s) passed to the method.
--- End diff --

There are additional arguments like `newData`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-15 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74728822
  
--- Diff: R/pkg/R/generics.R ---
@@ -1277,8 +1384,11 @@ setGeneric("spark.naiveBayes", function(data, 
formula, ...) { standardGeneric("s
 
 #' @rdname spark.survreg
 #' @export
-setGeneric("spark.survreg", function(data, formula, ...) { 
standardGeneric("spark.survreg") })
+setGeneric("spark.survreg", function(data, formula) { 
standardGeneric("spark.survreg") })
 
+#' @param object a fitted ML model object.
+#' @param path the directory where the model is saved.
+#' @param ... additional argument(s) passed to the method.
 #' @rdname write.ml
 #' @export
 setGeneric("write.ml", function(object, path, ...) { 
standardGeneric("write.ml") })
--- End diff --

There is an `overwrite` argument? It seems that methods can add arguments 
to the generic only if '...' is an argument to the generic.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-15 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74728351
  
--- Diff: R/pkg/R/mllib.R ---
@@ -142,15 +143,6 @@ setMethod("spark.glm", signature(data = 
"SparkDataFrame", formula = "formula"),
 #' Generalized Linear Models (R-compliant)
 #'
 #' Fits a generalized linear model, similarly to R's glm().
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
--- End diff --

Well yes we agree on that. The reason I moved these is because I thought 
when `setGeneric("glm")` is called, it looks for the R glm method and it 
includes many arguments. Some overlap with `glm` defined for SparkDataFrame. 
Then I simply moved all the argument docs to the generic part, though that 
makes the doc there cumbersome...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-15 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74727208
  
--- Diff: R/pkg/R/mllib.R ---
@@ -602,14 +599,14 @@ setMethod("spark.survreg", signature(data = 
"SparkDataFrame", formula = "formula
 # Returns a summary of the AFT survival regression model produced by 
spark.survreg,
 # similarly to R's summary().
 
-#' @param object A fitted AFT survival regression model
+#' @param object a fitted AFT survival regression model.
 #' @return \code{summary} returns a list containing the model's 
coefficients,
 #' intercept and log(scale)
 #' @rdname spark.survreg
 #' @export
 #' @note summary(AFTSurvivalRegressionModel) since 2.0.0
 setMethod("summary", signature(object = "AFTSurvivalRegressionModel"),
-  function(object, ...) {
+  function(object) {
--- End diff --

I guess you meant omitting `...` in the generic function as well? If so, 
the S3 base `summary` would not be added as a default method when it sees S4 
generic definition and would so be masked by SparkR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-14 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74707500
  
--- Diff: R/pkg/R/mllib.R ---
@@ -142,15 +143,6 @@ setMethod("spark.glm", signature(data = 
"SparkDataFrame", formula = "formula"),
 #' Generalized Linear Models (R-compliant)
 #'
 #' Fits a generalized linear model, similarly to R's glm().
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
--- End diff --

Is there a reason we are moving these?

As we have discussed, we should try to keep the documentation as close to 
the function definition/implementation as possible. That would make it a lot 
more easier to keep the code and documentation in sync, and to remind reviewers 
to check for them for correctness and consistency.

Where there are cases where a @param description applies to two or more 
implementations (because it supports multiple classes) because of the lack of a 
common place we could put that in the generics.R for now.

If `...` only shows up in generics.R - we should check if it's because it's 
to match an existing function signature, if it is then we add @param in 
generics.R. If it is not and it is not used anywhere we should consider 
removing the `...`

Does that make sense?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-14 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74707292
  
--- Diff: R/pkg/R/mllib.R ---
@@ -142,15 +143,6 @@ setMethod("spark.glm", signature(data = 
"SparkDataFrame", formula = "formula"),
 #' Generalized Linear Models (R-compliant)
 #'
 #' Fits a generalized linear model, similarly to R's glm().
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
--- End diff --

It was moved to the generic function definition.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-14 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74707272
  
--- Diff: R/pkg/R/mllib.R ---
@@ -346,8 +339,11 @@ setMethod("spark.kmeans", signature(data = 
"SparkDataFrame", formula = "formula"
 #' Get fitted result from a k-means model, similarly to R's fitted().
 #' Note: A saved-loaded model does not support this method.
 #'
-#' @param object A fitted k-means model
-#' @return \code{fitted} returns a SparkDataFrame containing fitted values
+#' @param object a fitted k-means model.
+#' @param method type of fitted results, \code{"centers"} for cluster 
centers
+#'or \code{"classes"} for assigned classes.
+#' @param ... additional argument(s) passed to the method.
--- End diff --

The same as above...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-14 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74707253
  
--- Diff: R/pkg/R/mllib.R ---
@@ -414,11 +411,12 @@ setMethod("predict", signature(object = 
"KMeansModel"),
 #' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
 #' Only categorical data is supported.
 #'
-#' @param data A \code{SparkDataFrame} of observations and labels for 
model fitting
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#' @param data a \code{SparkDataFrame} of observations and labels for 
model fitting.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
 #'   operators are supported, including '~', '.', ':', '+', 
and '-'.
-#' @param smoothing Smoothing parameter
-#' @return \code{spark.naiveBayes} returns a fitted naive Bayes model
+#' @param ... additional argument(s) passed to the method. Currently only 
\code{smoothing}.
--- End diff --

This is actually for the generic function. Should I move these doc to the 
generic definition?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-14 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74706060
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3184,6 +3200,7 @@ setMethod("histogram",
 #' @param x A SparkDataFrame
 #' @param url JDBC database url of the form `jdbc:subprotocol:subname`
 #' @param tableName The name of the table in the external database
+#' @param ... additional JDBC database connection propertie(s).
--- End diff --

Done. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-13 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74694116
  
--- Diff: R/pkg/R/functions.R ---
@@ -1143,7 +1139,7 @@ setMethod("minute",
 #' @export
 #' @examples \dontrun{select(df, monotonically_increasing_id())}
 setMethod("monotonically_increasing_id",
-  signature(x = "missing"),
+  signature(),
--- End diff --

Hmm, I looked into this a bit

in below, "a" is what we have orginially
```
setGeneric("a", function(x) { standardGeneric("a") } )
setMethod("a", signature(x = "missing"), function() { 1 })
> a()
[1] 1
> a(1)
Error in (function (classes, fdef, mtable)  :
  unable to find an inherited method for function ‘a’ for signature 
‘"numeric"’
> showMethods(a)
Function: a (package .GlobalEnv)
x="missing"

setGeneric("b", function(x = "missing") { standardGeneric("b") } )
setMethod("b", signature("missing"), function() { 1 })
> b()
[1] 1
> b(1)
Error in (function (classes, fdef, mtable)  :
  unable to find an inherited method for function ‘b’ for signature 
‘"numeric"’
> showMethods(b)
Function: b (package .GlobalEnv)
x="missing"

setGeneric("tt", function(...) { standardGeneric("tt") } )
setMethod("tt", signature(), function() { 1 })
> tt(1)
Error in .local(...) : unused argument (1)
> tt("emme")
Error in .local(...) : unused argument ("emme")
> showMethods(tt)
Function: tt (package .GlobalEnv)
...="ANY"
...="character"
(inherited from: ...="ANY")
...="numeric"
(inherited from: ...="ANY")
```

I think the issue with `setMethod("foo", signature(), function() { 1 })` 
are two folds:
1. The error message when calling the function with a parameter is less 
clear than "unable to find function for signature ‘"numeric"’"
2. S4 method is automatically generated for each parameter type that it is 
called with (see "numeric" or character" from showMethods)

So perhaps "b" is a better approach?
Unfortunately we still have to "document" a parameter or `...` in either 
case as `setGeneric` refuse to take it otherwise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-13 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74693897
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -181,7 +181,7 @@ getDefaultSqlSource <- function() {
 #' @method createDataFrame default
 #' @note createDataFrame since 1.4.0
 # TODO(davies): support sampling and infer type from NA
-createDataFrame.default <- function(data, schema = NULL, samplingRatio = 
1.0) {
+createDataFrame.default <- function(data, schema = NULL) {
--- End diff --

we discussed we shouldn't remove `samplingRatio = 1.0` from the signature?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-13 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74693887
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2461,8 +2473,9 @@ setMethod("unionAll",
 #' Union two or more SparkDataFrames. This is equivalent to `UNION ALL` in 
SQL.
 #' Note that this does not remove duplicate rows across the two 
SparkDataFrames.
 #'
-#' @param x A SparkDataFrame
-#' @param ... Additional SparkDataFrame
+#' @param x a SparkDataFrame.
+#' @param ... additional SparkDataFrame(s).
+#' @param deparse.level dummy variable, currently not used.
--- End diff --

It's not a dummy variable per se - it was there to match the signature of 
the base implementation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-13 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74693856
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1146,7 +1147,7 @@ setMethod("head",
 
 #' Return the first row of a SparkDataFrame
 #'
-#' @param x A SparkDataFrame
--- End diff --

Right - they are not - RDD functions are not exported from the packages 
(not public) and we don't want Rd file generated for them. Please see PR #14626 
- we want a separate SetGeneric for non-RDD functions, and then this line 
documenting both DataFrame and Column parameter can then go to generics.R


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-13 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74693823
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -510,9 +510,7 @@ setMethod("registerTempTable",
 #'
 #' Insert the contents of a SparkDataFrame into a table registered in the 
current SparkSession.
 #'
-#' @param x A SparkDataFrame
-#' @param tableName A character vector containing the name of the table
--- End diff --

why is tableName moved?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-13 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74693204
  
--- Diff: R/pkg/R/functions.R ---
@@ -1497,7 +1493,7 @@ setMethod("soundex",
 #' \dontrun{select(df, spark_partition_id())}
 #' @note spark_partition_id since 2.0.0
 setMethod("spark_partition_id",
-  signature(x = "missing"),
+  signature(),
--- End diff --

what I mean is you could put it as `signature(missing)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-13 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74693197
  
--- Diff: R/pkg/R/generics.R ---
@@ -1251,10 +1311,57 @@ setGeneric("year", function(x) { 
standardGeneric("year") })
 #' @export
 setGeneric("spark.glm", function(data, formula, ...) { 
standardGeneric("spark.glm") })
 
+#' @param formula a symbolic description of the model to be fitted. If 
\code{data} is a
+#'SparkDataFrame, currently only a few formula operators 
are supported,
+#'including '~', '.', ':', '+', and '-'.
+#' @param data a SparkDataFrame or (R glm) data.frame, list or environment 
for training.
+#' @param family a description of the error distribution and link function 
to be used in the model.
+#'   This can be a character string naming a family function, 
a family function or
+#'   the result of a call to a family function. Refer R family 
at
+#'   
\url{https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html}.
+#' @param epsilon positive convergence tolerance of iterations.
+#' @param maxit integer giving the maximal number of IRLS iterations.
+#' @param weights an optional vector of 'prior weights' to be used in the 
fitting process.
+#'Should be NULL or a numeric vector.
+#' @param subset an optional vector specifying a subset of observations to 
be used in the
+#'   fitting process.
+#' @param na.action a function which indicates what should happen when the 
data contain NAs.
+#'  The default is set by the na.action setting of 
options, and is na.fail
+#'  if that is unset. The 'factory-fresh' default is 
na.omit. Another possible
+#'  value is NULL, no action. Value na.exclude can be 
useful.
+#' @param start starting values for the parameters in the linear predictor.
+#' @param etastart starting values for the linear predictor.
+#' @param mustart starting values for the vector of means.
+#' @param offset this can be used to specify an a priori known component 
to be included in
+#'   the linear predictor during fitting. This should be NULL 
or
+#'   a numeric vector of length equal to the number of cases. 
One or more offset
+#'   terms can be included in the formula instead or as well, 
and if more than
+#'   one is specified their sum is used. See model.offset.
+#' @param control a list of parameters for controlling the fitting 
process. For glm.fit
+#'this is passed to glm.control.
+#' @param model a logical value indicating whether model frame should be 
included as
+#'  a component of the returned value.
+#' @param method the method to be used in fitting the model. The default 
method
+#'   "glm.fit" uses iteratively reweighted least squares 
(IWLS): the alternative
+#'   "model.frame" returns the model frame and does no fitting.
+#'   User-supplied fitting functions can be supplied either as 
a function or
+#'   a character string naming a function, with a function 
which takes the same
+#'   arguments as glm.fit. If specified as a character string 
it is looked up from
+#'   within the stats namespace.
+#' @param x,y logical values indicating whether the response vector and 
model matrix
+#'used in the fitting process should be returned as components 
of the returned value.
+#' @param contrasts an optional list. See the contrasts.arg of 
model.matrix.default.
+#' @param ...  arguments to be used to form the default control 
argument if it is
+#'not supplied directly.
 #' @rdname glm
+#' @details If \code{data} is a data.frame, list or environment, 
\code{glm} behaves the same as
+#'  \code{glm} in the \code{stats} package. If \code{data} is a 
SparkDataFrame,
+#'  \code{spark.glm} is called.
 #' @export
 setGeneric("glm")
 
+#' @param object a fitted ML model object.
+#' @param ... additional argument(s) passed to the method.
--- End diff --

`Currently not used` - we don't use this in any `predict` implementation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-13 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74693174
  
--- Diff: R/pkg/R/generics.R ---
@@ -1277,8 +1384,11 @@ setGeneric("spark.naiveBayes", function(data, 
formula, ...) { standardGeneric("s
 
 #' @rdname spark.survreg
 #' @export
-setGeneric("spark.survreg", function(data, formula, ...) { 
standardGeneric("spark.survreg") })
+setGeneric("spark.survreg", function(data, formula) { 
standardGeneric("spark.survreg") })
 
+#' @param object a fitted ML model object.
+#' @param path the directory where the model is saved.
+#' @param ... additional argument(s) passed to the method.
 #' @rdname write.ml
 #' @export
 setGeneric("write.ml", function(object, path, ...) { 
standardGeneric("write.ml") })
--- End diff --

again, no reason for this generics to have `...`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-13 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74693158
  
--- Diff: R/pkg/R/mllib.R ---
@@ -142,15 +143,6 @@ setMethod("spark.glm", signature(data = 
"SparkDataFrame", formula = "formula"),
 #' Generalized Linear Models (R-compliant)
 #'
 #' Fits a generalized linear model, similarly to R's glm().
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
--- End diff --

why are we moving this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-13 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74693142
  
--- Diff: R/pkg/R/mllib.R ---
@@ -298,14 +304,15 @@ setMethod("summary", signature(object = 
"NaiveBayesModel"),
 #' Users can call \code{summary} to print a summary of the fitted model, 
\code{predict} to make
 #' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
 #'
-#' @param data SparkDataFrame for training
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#' @param data a SparkDataFrame for training.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
 #'operators are supported, including '~', '.', ':', '+', 
and '-'.
 #'Note that the response variable of formula is empty in 
spark.kmeans.
-#' @param k Number of centers
-#' @param maxIter Maximum iteration number
-#' @param initMode The initialization algorithm choosen to fit the model
-#' @return \code{spark.kmeans} returns a fitted k-means model
+#' @param ... additional argument(s) passed to the method.
--- End diff --

for all the `spark.something` functions, if we are not actually using `...` 
for optional parameters, lets remove it. There is no reason to have unused 
unnamed parameters for a function?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-13 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74693106
  
--- Diff: R/pkg/R/mllib.R ---
@@ -346,8 +339,11 @@ setMethod("spark.kmeans", signature(data = 
"SparkDataFrame", formula = "formula"
 #' Get fitted result from a k-means model, similarly to R's fitted().
 #' Note: A saved-loaded model does not support this method.
 #'
-#' @param object A fitted k-means model
-#' @return \code{fitted} returns a SparkDataFrame containing fitted values
+#' @param object a fitted k-means model.
+#' @param method type of fitted results, \code{"centers"} for cluster 
centers
+#'or \code{"classes"} for assigned classes.
+#' @param ... additional argument(s) passed to the method.
--- End diff --

let's remove `...` in the function


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-13 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74693097
  
--- Diff: R/pkg/R/mllib.R ---
@@ -414,11 +411,12 @@ setMethod("predict", signature(object = 
"KMeansModel"),
 #' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
 #' Only categorical data is supported.
 #'
-#' @param data A \code{SparkDataFrame} of observations and labels for 
model fitting
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#' @param data a \code{SparkDataFrame} of observations and labels for 
model fitting.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
 #'   operators are supported, including '~', '.', ':', '+', 
and '-'.
-#' @param smoothing Smoothing parameter
-#' @return \code{spark.naiveBayes} returns a fitted naive Bayes model
+#' @param ... additional argument(s) passed to the method. Currently only 
\code{smoothing}.
--- End diff --

this is removed, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-13 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74693087
  
--- Diff: .gitignore ---
@@ -77,3 +77,8 @@ spark-warehouse/
 # For R session data
 .RData
 .RHistory
+.Rhistory
--- End diff --

I'm not sure if this is essential for this PR. I'd suggest leaving this out


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-13 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74693053
  
--- Diff: R/pkg/R/mllib.R ---
@@ -602,14 +599,14 @@ setMethod("spark.survreg", signature(data = 
"SparkDataFrame", formula = "formula
 # Returns a summary of the AFT survival regression model produced by 
spark.survreg,
 # similarly to R's summary().
 
-#' @param object A fitted AFT survival regression model
+#' @param object a fitted AFT survival regression model.
 #' @return \code{summary} returns a list containing the model's 
coefficients,
 #' intercept and log(scale)
 #' @rdname spark.survreg
 #' @export
 #' @note summary(AFTSurvivalRegressionModel) since 2.0.0
 setMethod("summary", signature(object = "AFTSurvivalRegressionModel"),
-  function(object, ...) {
+  function(object) {
--- End diff --

thinking more about, I think the reason is because R's base::summary has in 
fact the `...` 
https://stat.ethz.ch/R-manual/R-devel/library/base/html/summary.html

Could you test if base::summary still work? I think we would prefer 
omitting `...` for all our `summary` functions but we need to make sure doing 
so doesn't mask it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-13 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74692979
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3184,6 +3200,7 @@ setMethod("histogram",
 #' @param x A SparkDataFrame
 #' @param url JDBC database url of the form `jdbc:subprotocol:subname`
 #' @param tableName The name of the table in the external database
+#' @param ... additional JDBC database connection propertie(s).
--- End diff --

the singular form is property and plural is properties, so the `()` doesn't 
really work in this case. let's just leave it as properties.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-10 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74350887
  
--- Diff: R/pkg/R/generics.R ---
@@ -1046,8 +1084,8 @@ setGeneric("ntile", function(x) { 
standardGeneric("ntile") })
 #' @export
 setGeneric("n_distinct", function(x, ...) { standardGeneric("n_distinct") 
})
 
-#' @rdname percent_rank
-#' @export
+# @rdname percent_rank
--- End diff --

Yeah that works if we also set the argument list of the generic counterpart 
to `...`. Then we still have to document the `...` part. We could tell the user 
things like "this should be left empty". I'm not sure if that would cause some 
confusion?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-10 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74349926
  
--- Diff: R/pkg/R/mllib.R ---
@@ -57,6 +57,9 @@ setClass("KMeansModel", representation(jobj = "jobj"))
 #'
 #' Saves the MLlib model to the input path. For more information, see the 
specific
 #' MLlib model below.
+#' @param object a fitted ML model object.
+#' @param path the directory where the model is saved.
+#' @param ... additional argument(s) passed to the method.
--- End diff --

Thanks for the suggestion. I'll move those above the definition of generic 
functions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-10 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74349451
  
--- Diff: R/pkg/R/mllib.R ---
@@ -414,11 +425,12 @@ setMethod("predict", signature(object = 
"KMeansModel"),
 #' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
 #' Only categorical data is supported.
 #'
-#' @param data A \code{SparkDataFrame} of observations and labels for 
model fitting
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#' @param data a \code{SparkDataFrame} of observations and labels for 
model fitting.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
 #'   operators are supported, including '~', '.', ':', '+', 
and '-'.
-#' @param smoothing Smoothing parameter
-#' @return \code{spark.naiveBayes} returns a fitted naive Bayes model
+#' @param smoothing smoothing parameter.
+#' @param ... additional parameter(s) passed to the method.
--- End diff --

I think that is for the optional `smoothing` parameter? If we put it in the 
signature, does it mean that when we call the function, that field would be 
required from the user?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-10 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74316468
  
--- Diff: R/pkg/R/generics.R ---
@@ -1046,8 +1084,8 @@ setGeneric("ntile", function(x) { 
standardGeneric("ntile") })
 #' @export
 setGeneric("n_distinct", function(x, ...) { standardGeneric("n_distinct") 
})
 
-#' @rdname percent_rank
-#' @export
+# @rdname percent_rank
--- End diff --

try changing the S4 signature from `(x = missing)` to `(missing)` - i think 
that should eliminate the need for a dummy parameter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-10 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74316333
  
--- Diff: R/pkg/R/mllib.R ---
@@ -57,6 +57,9 @@ setClass("KMeansModel", representation(jobj = "jobj"))
 #'
 #' Saves the MLlib model to the input path. For more information, see the 
specific
 #' MLlib model below.
+#' @param object a fitted ML model object.
+#' @param path the directory where the model is saved.
+#' @param ... additional argument(s) passed to the method.
--- End diff --

this is more of a placeholder page to direct to other pages. there's no 
function signature on this output page and so I don't think we should add 
parameter here either.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-10 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74316168
  
--- Diff: R/pkg/R/mllib.R ---
@@ -414,11 +425,12 @@ setMethod("predict", signature(object = 
"KMeansModel"),
 #' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
 #' Only categorical data is supported.
 #'
-#' @param data A \code{SparkDataFrame} of observations and labels for 
model fitting
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#' @param data a \code{SparkDataFrame} of observations and labels for 
model fitting.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
 #'   operators are supported, including '~', '.', ':', '+', 
and '-'.
-#' @param smoothing Smoothing parameter
-#' @return \code{spark.naiveBayes} returns a fitted naive Bayes model
+#' @param smoothing smoothing parameter.
+#' @param ... additional parameter(s) passed to the method.
--- End diff --

It is, but what I mean is that there doesn't seem to be a reason that it is 
in the generic? this is a very specific spark function name that doesn't need 
to be having this to be compatible or something?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74178308
  
--- Diff: .gitignore ---
@@ -77,3 +77,8 @@ spark-warehouse/
 # For R session data
 .RData
 .RHistory
+.Rhistory
--- End diff --

Hmm... Is the upper/lower case difference caused by different R versions or 
platforms?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74177336
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -257,23 +257,24 @@ createDataFrame.default <- function(data, schema = 
NULL, samplingRatio = 1.0) {
 }
 
 createDataFrame <- function(x, ...) {
-  dispatchFunc("createDataFrame(data, schema = NULL, samplingRatio = 
1.0)", x, ...)
+  dispatchFunc("createDataFrame(data, schema = NULL)", x, ...)
 }
 
 #' @rdname createDataFrame
 #' @aliases createDataFrame
 #' @export
 #' @method as.DataFrame default
 #' @note as.DataFrame since 1.6.0
-as.DataFrame.default <- function(data, schema = NULL, samplingRatio = 1.0) 
{
--- End diff --

Good catch. I did not realize that. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74176995
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1146,7 +1147,7 @@ setMethod("head",
 
 #' Return the first row of a SparkDataFrame
 #'
-#' @param x A SparkDataFrame
--- End diff --

I don't have a good answer yet. When I tried to move the param there, it 
seems that the generic functions for RDD Actions and Transformations are not 
exposed. Do you know specific reason for that by chance?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74176280
  
--- Diff: R/pkg/R/functions.R ---
@@ -1273,12 +1267,14 @@ setMethod("round",
 #' bround
 #'
 #' Returns the value of the column `e` rounded to `scale` decimal places 
using HALF_EVEN rounding
-#' mode if `scale` >= 0 or at integral part when `scale` < 0.
+#' mode if `scale` >= 0 or at integer part when `scale` < 0.
 #' Also known as Gaussian rounding or bankers' rounding that rounds to the 
nearest even number.
 #' bround(2.5, 0) = 2, bround(3.5, 0) = 4.
 #'
 #' @param x Column to compute on.
-#'
+#' @param scale round to `scale` digits to the right of the decimal point 
when `scale` > 0,
--- End diff --

They are actually talking about the same thing. In the description, I think 
it says when `scale` < 0, it rounds on the integer part?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74172127
  
--- Diff: R/pkg/R/functions.R ---
@@ -3033,6 +3033,9 @@ setMethod("when", signature(condition = "Column", 
value = "ANY"),
 #' Evaluates a list of conditions and returns \code{yes} if the conditions 
are satisfied.
 #' Otherwise \code{no} is returned for unmatched conditions.
 #'
+#' @param test a Column expression that describes the condition.
+#' @param yes return values for true elements of test.
--- End diff --

Good catch, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74172250
  
--- Diff: R/pkg/R/functions.R ---
@@ -2654,6 +2647,9 @@ setMethod("expr", signature(x = "character"),
 #'
 #' Formats the arguments in printf-style and returns the result as a 
string column.
 #'
+#' @param format a character object of format strings.
+#' @param x a Column object.
+#' @param ... additional columns.
--- End diff --

Done. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74172111
  
--- Diff: R/pkg/R/generics.R ---
@@ -1022,6 +1059,7 @@ setGeneric("month", function(x) { 
standardGeneric("month") })
 #' @export
 setGeneric("months_between", function(y, x) { 
standardGeneric("months_between") })
 
+#' @param x a SparkDataFrame or a Column object.
 #' @rdname nrow
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74171391
  
--- Diff: R/pkg/R/mllib.R ---
@@ -57,6 +57,9 @@ setClass("KMeansModel", representation(jobj = "jobj"))
 #'
 #' Saves the MLlib model to the input path. For more information, see the 
specific
 #' MLlib model below.
+#' @param object a fitted ML model object.
+#' @param path the directory where the model is saved.
+#' @param ... additional argument(s) passed to the method.
--- End diff --

It doesn't complain about this, though a little weird.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74171283
  
--- Diff: R/pkg/R/generics.R ---
@@ -1046,8 +1084,8 @@ setGeneric("ntile", function(x) { 
standardGeneric("ntile") })
 #' @export
 setGeneric("n_distinct", function(x, ...) { standardGeneric("n_distinct") 
})
 
-#' @rdname percent_rank
-#' @export
+# @rdname percent_rank
--- End diff --

This is perhaps not the best way, but I did this because otherwise the 
generated doc will show `percent_rank(x)`. Then we have to document `x`, which 
could be confusing to the user since this argument should be left missing in 
actual function call. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74170465
  
--- Diff: R/pkg/R/mllib.R ---
@@ -82,15 +87,16 @@ NULL
 #' Users can call \code{summary} to print a summary of the fitted model, 
\code{predict} to make
 #' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
 #'
-#' @param data SparkDataFrame for training.
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#' @param data a SparkDataFrame for training.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
 #'operators are supported, including '~', '.', ':', '+', 
and '-'.
-#' @param family A description of the error distribution and link function 
to be used in the model.
+#' @param family a description of the error distribution and link function 
to be used in the model.
 #'   This can be a character string naming a family function, 
a family function or
 #'   the result of a call to a family function. Refer R family 
at
 #'   
\url{https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html}.
-#' @param tol Positive convergence tolerance of iterations.
-#' @param maxIter Integer giving the maximal number of IRLS iterations.
+#' @param tol positive convergence tolerance of iterations.
+#' @param maxIter integer giving the maximal number of IRLS iterations.
+#' @param ... additional arguments passed to the method.
--- End diff --

same as above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74170438
  
--- Diff: R/pkg/R/mllib.R ---
@@ -298,14 +304,15 @@ setMethod("summary", signature(object = 
"NaiveBayesModel"),
 #' Users can call \code{summary} to print a summary of the fitted model, 
\code{predict} to make
 #' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
 #'
-#' @param data SparkDataFrame for training
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#' @param data a SparkDataFrame for training.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
 #'operators are supported, including '~', '.', ':', '+', 
and '-'.
 #'Note that the response variable of formula is empty in 
spark.kmeans.
-#' @param k Number of centers
-#' @param maxIter Maximum iteration number
-#' @param initMode The initialization algorithm choosen to fit the model
-#' @return \code{spark.kmeans} returns a fitted k-means model
+#' @param ... additional argument(s) passed to the method.
--- End diff --

This is actually for the `...` argument in the generic function... 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74170348
  
--- Diff: R/pkg/R/mllib.R ---
@@ -414,11 +425,12 @@ setMethod("predict", signature(object = 
"KMeansModel"),
 #' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
 #' Only categorical data is supported.
 #'
-#' @param data A \code{SparkDataFrame} of observations and labels for 
model fitting
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#' @param data a \code{SparkDataFrame} of observations and labels for 
model fitting.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
 #'   operators are supported, including '~', '.', ':', '+', 
and '-'.
-#' @param smoothing Smoothing parameter
-#' @return \code{spark.naiveBayes} returns a fitted naive Bayes model
+#' @param smoothing smoothing parameter.
+#' @param ... additional parameter(s) passed to the method.
--- End diff --

Done. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74170002
  
--- Diff: R/pkg/R/mllib.R ---
@@ -563,11 +574,12 @@ read.ml <- function(path) {
 #' \code{predict} to make predictions on new data, and 
\code{write.ml}/\code{read.ml} to
 #' save/load fitted models.
 #'
-#' @param data A SparkDataFrame for training
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#' @param data a SparkDataFrame for training.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
 #'operators are supported, including '~', ':', '+', and 
'-'.
 #'Note that operator '.' is not supported currently
-#' @return \code{spark.survreg} returns a fitted AFT survival regression 
model
+#' @param ... additional argument(s) passed to the method.
--- End diff --

I noticed those and thought about leaving space for potential arguments in 
the future... but I guess a cleaner solution would be as you said, to remove 
these arguments and add back if needed in the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74169009
  
--- Diff: R/pkg/R/generics.R ---
@@ -465,10 +477,14 @@ setGeneric("dapply", function(x, func, schema) { 
standardGeneric("dapply") })
 #' @export
 setGeneric("dapplyCollect", function(x, func) { 
standardGeneric("dapplyCollect") })
 
+#' @param x a SparkDataFrame or GroupedData.
--- End diff --

I think there is also one for 
[SparkDataFrame](https://github.com/apache/spark/pull/14558/files#diff-508641a8bd6c6b59f3e77c80cdcfa6a9L1401)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74167878
  
--- Diff: R/pkg/R/generics.R ---
@@ -395,6 +396,9 @@ setGeneric("value", function(bcast) { 
standardGeneric("value") })
 
   SparkDataFrame Methods 
 
+#' @param x a SparkDataFrame or GroupedData.
--- End diff --

I was thinking if in the future we want to add other methods associated 
with that generic, would it be more acceptable to change the doc of the generic 
than the doc of another associated method (seemingly parallel to the added 
ones)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74146966
  
--- Diff: .gitignore ---
@@ -77,3 +77,8 @@ spark-warehouse/
 # For R session data
 .RData
 .RHistory
+.Rhistory
--- End diff --

Duplicate line. Can these be pushed down into a `.gitignore` in the 
subdirectory? this file is a mess.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74105069
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -177,11 +176,10 @@ setMethod("isLocal",
 #'
 #' Print the first numRows rows of a SparkDataFrame
 #'
-#' @param x A SparkDataFrame
-#' @param numRows The number of rows to print. Defaults to 20.
-#' @param truncate Whether truncate long strings. If true, strings more 
than 20 characters will be
-#' truncated and all cells will be aligned right
-#'
+#' @param numRows the number of rows to print. Defaults to 20.
+#' @param truncate whether truncate long strings. If true, strings more 
than 20 characters will be
--- End diff --

true -> TRUE


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74104886
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1146,7 +1147,7 @@ setMethod("head",
 
 #' Return the first row of a SparkDataFrame
 #'
-#' @param x A SparkDataFrame
--- End diff --

I think similar to what you have for other functions, this could go to 
generic.R - do you have any other idea how to have functions working with 
multiple class documented?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74104522
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2845,8 +2844,11 @@ setMethod("fillna",
 #' Since data.frames are held in memory, ensure that you have enough memory
 #' in your system to accommodate the contents.
 #'
-#' @param x a SparkDataFrame
-#' @return a data.frame
+#' @param x a SparkDataFrame.
+#' @param row.names NULL or a character vector giving the row names for 
the data frame.
+#' @param optional If `TRUE`, converting column names is optional.
+#' @param ... additional arguments passed to the method.
--- End diff --

in this case "additional arguments to pass to base::as.data.frame"



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74104039
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3049,8 +3050,8 @@ setMethod("drop",
 #'
 #' @name histogram
 #' @param nbins the number of bins (optional). Default value is 10.
+#' @param col the column (described by character or Column object) to 
build the histogram from.
--- End diff --

we generally don't say "Column object" - object system in R is a bit 
different and I think here "S4 class" would make more sense?
but I'd suggest simplifying it to:
"@param col the column as Character string or a Column to build the 
histogram from"



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74103250
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3184,6 +3185,7 @@ setMethod("histogram",
 #' @param x A SparkDataFrame
 #' @param url JDBC database url of the form `jdbc:subprotocol:subname`
 #' @param tableName The name of the table in the external database
+#' @param ... additional argument(s) passed to the method
--- End diff --

ditto, something like "additional JDBC database connection properties"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74103028
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -257,23 +257,24 @@ createDataFrame.default <- function(data, schema = 
NULL, samplingRatio = 1.0) {
 }
 
 createDataFrame <- function(x, ...) {
-  dispatchFunc("createDataFrame(data, schema = NULL, samplingRatio = 
1.0)", x, ...)
+  dispatchFunc("createDataFrame(data, schema = NULL)", x, ...)
 }
 
 #' @rdname createDataFrame
 #' @aliases createDataFrame
 #' @export
 #' @method as.DataFrame default
 #' @note as.DataFrame since 1.6.0
-as.DataFrame.default <- function(data, schema = NULL, samplingRatio = 1.0) 
{
--- End diff --

the `.default` methods are here for backward compatibility. 
(please see  SPARK-16693/PR #14330)
I don't think we should change the signature - or it will break any 
existing user. perhaps just add the @param and say "Currently not use"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74102403
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -727,6 +729,7 @@ dropTempView <- function(viewName) {
 #' @param source The name of external data source
 #' @param schema The data schema defined in structType
 #' @param na.strings Default string value for NA when source is "csv"
+#' @param ... additional argument(s) passed to the method.
--- End diff --

something like:
"additional external data source specific named properties"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74102227
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -840,6 +844,7 @@ createExternalTable <- function(x, ...) {
 #'  clause expressions used to split the column 
`partitionColumn` evenly.
 #'  This defaults to SparkContext.defaultParallelism 
when unset.
 #' @param predicates a list of conditions in the where clause; each one 
defines one partition
+#' @param ... additional argument(s) passed to the method.
--- End diff --

in this case I'd reference the same in L829, something like:
" additional JDBC database connection named properties"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74022313
  
--- Diff: R/pkg/R/functions.R ---
@@ -1273,12 +1267,14 @@ setMethod("round",
 #' bround
 #'
 #' Returns the value of the column `e` rounded to `scale` decimal places 
using HALF_EVEN rounding
-#' mode if `scale` >= 0 or at integral part when `scale` < 0.
+#' mode if `scale` >= 0 or at integer part when `scale` < 0.
 #' Also known as Gaussian rounding or bankers' rounding that rounds to the 
nearest even number.
 #' bround(2.5, 0) = 2, bround(3.5, 0) = 4.
 #'
 #' @param x Column to compute on.
-#'
+#' @param scale round to `scale` digits to the right of the decimal point 
when `scale` > 0,
--- End diff --

it seems this is duplicating L1270 and might seem confusing since they seem 
to different behavior?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74021040
  
--- Diff: R/pkg/R/functions.R ---
@@ -1560,7 +1556,8 @@ setMethod("stddev_samp",
 #'
 #' Creates a new struct column that composes multiple input columns.
 #'
-#' @param x Column to compute on.
+#' @param x a column to compute on.
+#' @param ... additional column(s) to be included.
--- End diff --

optional?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74020775
  
--- Diff: R/pkg/R/functions.R ---
@@ -2654,6 +2647,9 @@ setMethod("expr", signature(x = "character"),
 #'
 #' Formats the arguments in printf-style and returns the result as a 
string column.
 #'
+#' @param format a character object of format strings.
+#' @param x a Column object.
+#' @param ... additional columns.
--- End diff --

Let's keep type in capital case? `Column` or `Columns`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74017865
  
--- Diff: R/pkg/R/functions.R ---
@@ -3033,6 +3033,9 @@ setMethod("when", signature(condition = "Column", 
value = "ANY"),
 #' Evaluates a list of conditions and returns \code{yes} if the conditions 
are satisfied.
 #' Otherwise \code{no} is returned for unmatched conditions.
 #'
+#' @param test a Column expression that describes the condition.
+#' @param yes return values for true elements of test.
--- End diff --

true -> TRUE
false -> FALSE?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74017287
  
--- Diff: R/pkg/R/generics.R ---
@@ -1022,6 +1059,7 @@ setGeneric("month", function(x) { 
standardGeneric("month") })
 #' @export
 setGeneric("months_between", function(y, x) { 
standardGeneric("months_between") })
 
+#' @param x a SparkDataFrame or a Column object.
 #' @rdname nrow
--- End diff --

here for "n" column function, it shouldn't be under rdname nrow, which is 
for count for DataFrame - I'd change this to a new rdname, `count` and put 
"count" and "n" under that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74016459
  
--- Diff: R/pkg/R/generics.R ---
@@ -1091,8 +1129,8 @@ setGeneric("reverse", function(x) { 
standardGeneric("reverse") })
 #' @export
 setGeneric("rint", function(x, ...) { standardGeneric("rint") })
 
-#' @rdname row_number
-#' @export
+# @rdname row_number
+# @export
--- End diff --

`#'` changed to `#`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74016421
  
--- Diff: R/pkg/R/generics.R ---
@@ -1046,8 +1084,8 @@ setGeneric("ntile", function(x) { 
standardGeneric("ntile") })
 #' @export
 setGeneric("n_distinct", function(x, ...) { standardGeneric("n_distinct") 
})
 
-#' @rdname percent_rank
-#' @export
+# @rdname percent_rank
--- End diff --

`#'` changed to `#`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74016338
  
--- Diff: R/pkg/R/mllib.R ---
@@ -57,6 +57,9 @@ setClass("KMeansModel", representation(jobj = "jobj"))
 #'
 #' Saves the MLlib model to the input path. For more information, see the 
specific
 #' MLlib model below.
+#' @param object a fitted ML model object.
+#' @param path the directory where the model is saved.
+#' @param ... additional argument(s) passed to the method.
--- End diff --

does it complain about this? This rd does not have a function signature so 
it shouldn't ask to document parameter?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74016268
  
--- Diff: R/pkg/R/mllib.R ---
@@ -69,6 +72,8 @@ NULL
 #'
 #' Makes predictions from a MLlib model. For more information, see the 
specific
 #' MLlib model below.
+#' @param object a fitted ML model object.
+#' @param ... additional argument(s) passed to the method.
--- End diff --

does it complain about this? This rd does not have a function signature so 
it shouldn't ask to document parameter?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74016085
  
--- Diff: R/pkg/R/mllib.R ---
@@ -82,15 +87,16 @@ NULL
 #' Users can call \code{summary} to print a summary of the fitted model, 
\code{predict} to make
 #' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
 #'
-#' @param data SparkDataFrame for training.
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#' @param data a SparkDataFrame for training.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
 #'operators are supported, including '~', '.', ':', '+', 
and '-'.
-#' @param family A description of the error distribution and link function 
to be used in the model.
+#' @param family a description of the error distribution and link function 
to be used in the model.
 #'   This can be a character string naming a family function, 
a family function or
 #'   the result of a call to a family function. Refer R family 
at
 #'   
\url{https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html}.
-#' @param tol Positive convergence tolerance of iterations.
-#' @param maxIter Integer giving the maximal number of IRLS iterations.
+#' @param tol positive convergence tolerance of iterations.
+#' @param maxIter integer giving the maximal number of IRLS iterations.
+#' @param ... additional arguments passed to the method.
--- End diff --

there is no `...` here in the signature?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74015938
  
--- Diff: R/pkg/R/mllib.R ---
@@ -298,14 +304,15 @@ setMethod("summary", signature(object = 
"NaiveBayesModel"),
 #' Users can call \code{summary} to print a summary of the fitted model, 
\code{predict} to make
 #' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
 #'
-#' @param data SparkDataFrame for training
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#' @param data a SparkDataFrame for training.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
 #'operators are supported, including '~', '.', ':', '+', 
and '-'.
 #'Note that the response variable of formula is empty in 
spark.kmeans.
-#' @param k Number of centers
-#' @param maxIter Maximum iteration number
-#' @param initMode The initialization algorithm choosen to fit the model
-#' @return \code{spark.kmeans} returns a fitted k-means model
+#' @param ... additional argument(s) passed to the method.
--- End diff --

there is no `...` here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74015800
  
--- Diff: R/pkg/R/mllib.R ---
@@ -346,8 +353,11 @@ setMethod("spark.kmeans", signature(data = 
"SparkDataFrame", formula = "formula"
 #' Get fitted result from a k-means model, similarly to R's fitted().
 #' Note: A saved-loaded model does not support this method.
 #'
-#' @param object A fitted k-means model
-#' @return \code{fitted} returns a SparkDataFrame containing fitted values
+#' @param object a fitted k-means model.
+#' @param method type of fitted results, `"centers"` for cluster centers
--- End diff --

I wouldn't put it in ` and " - roxygen2 doesn't really handle `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74014809
  
--- Diff: R/pkg/R/mllib.R ---
@@ -563,11 +574,12 @@ read.ml <- function(path) {
 #' \code{predict} to make predictions on new data, and 
\code{write.ml}/\code{read.ml} to
 #' save/load fitted models.
 #'
-#' @param data A SparkDataFrame for training
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#' @param data a SparkDataFrame for training.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
 #'operators are supported, including '~', ':', '+', and 
'-'.
 #'Note that operator '.' is not supported currently
-#' @return \code{spark.survreg} returns a fitted AFT survival regression 
model
+#' @param ... additional argument(s) passed to the method.
--- End diff --

or document as `Currently not used.` like 
http://ugrad.stat.ubc.ca/R/library/e1071/html/predict.naiveBayes.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74014676
  
--- Diff: R/pkg/R/mllib.R ---
@@ -414,11 +425,12 @@ setMethod("predict", signature(object = 
"KMeansModel"),
 #' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
 #' Only categorical data is supported.
 #'
-#' @param data A \code{SparkDataFrame} of observations and labels for 
model fitting
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#' @param data a \code{SparkDataFrame} of observations and labels for 
model fitting.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
 #'   operators are supported, including '~', '.', ':', '+', 
and '-'.
-#' @param smoothing Smoothing parameter
-#' @return \code{spark.naiveBayes} returns a fitted naive Bayes model
+#' @param smoothing smoothing parameter.
+#' @param ... additional parameter(s) passed to the method.
--- End diff --

same here - `...` are unused


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74014544
  
--- Diff: R/pkg/R/mllib.R ---
@@ -563,11 +574,12 @@ read.ml <- function(path) {
 #' \code{predict} to make predictions on new data, and 
\code{write.ml}/\code{read.ml} to
 #' save/load fitted models.
 #'
-#' @param data A SparkDataFrame for training
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#' @param data a SparkDataFrame for training.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
 #'operators are supported, including '~', ':', '+', and 
'-'.
 #'Note that operator '.' is not supported currently
-#' @return \code{spark.survreg} returns a fitted AFT survival regression 
model
+#' @param ... additional argument(s) passed to the method.
--- End diff --

there are a few cases where they are not clear why `...` should be in the 
function signature. I think we should remove them since they are not used


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74013650
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -328,6 +328,7 @@ sparkRHive.init <- function(jsc = NULL) {
 #' @param sparkPackages Character vector of packages from 
spark-packages.org
 #' @param enableHiveSupport Enable support for Hive, fallback if not built 
with Hive support; once
 #'set, this cannot be turned off on an existing session
+#' @param ... additional parameters passed to the method
--- End diff --

I'd clarify as in L 317, for example, "named Spark properties passed to the 
method"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74013042
  
--- Diff: R/pkg/R/generics.R ---
@@ -465,10 +477,14 @@ setGeneric("dapply", function(x, func, schema) { 
standardGeneric("dapply") })
 #' @export
 setGeneric("dapplyCollect", function(x, func) { 
standardGeneric("dapplyCollect") })
 
+#' @param x a SparkDataFrame or GroupedData.
+#' @param ... additional argument(s) passed to the method.
 #' @rdname gapply
 #' @export
 setGeneric("gapply", function(x, ...) { standardGeneric("gapply") })
 
+#' @param x a SparkDataFrame or GroupedData.
--- End diff --

same here for gapplyCollect


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74013005
  
--- Diff: R/pkg/R/generics.R ---
@@ -465,10 +477,14 @@ setGeneric("dapply", function(x, func, schema) { 
standardGeneric("dapply") })
 #' @export
 setGeneric("dapplyCollect", function(x, func) { 
standardGeneric("dapplyCollect") })
 
+#' @param x a SparkDataFrame or GroupedData.
--- End diff --

gapply is only for GroupedData?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74012847
  
--- Diff: R/pkg/R/generics.R ---
@@ -395,6 +396,9 @@ setGeneric("value", function(bcast) { 
standardGeneric("value") })
 
   SparkDataFrame Methods 
 
+#' @param x a SparkDataFrame or GroupedData.
--- End diff --

Hmm.. I see why this would be a place for it. I think it would be easier to 
maintain if the documentation is next to the function body instead of the 
generics, but I haven't completely figure the best way to do yet.

The approach we have so far is to keep most of the tag/doc on one of the 
definition - do you think it would work here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread junyangq
GitHub user junyangq opened a pull request:

https://github.com/apache/spark/pull/14558

[SPARK-16508][SparkR] Fix warnings on undocumented/duplicated arguments by 
CRAN-check

## What changes were proposed in this pull request?

This PR tries to fix all the "undocumented/duplicated arguments" warnings 
given by CRAN-check.

Most have been resolved, with only a few left that will be handled soon.

## How was this patch tested?

R unit test and check-cran.sh script.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/junyangq/spark SPARK-16508-branch-2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14558.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14558


commit 82e2f09517e9f3d726af0046d251748f892f59c8
Author: Junyang Qian 
Date:   2016-08-09T04:52:34Z

Fix part of undocumented/duplicated arguments warnings by CRAN-check




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org