[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq closed the pull request at: https://github.com/apache/spark/pull/14558 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
GitHub user junyangq reopened a pull request: https://github.com/apache/spark/pull/14558 [SPARK-16508][SparkR] Fix warnings on undocumented/duplicated arguments by CRAN-check ## What changes were proposed in this pull request? This PR tries to fix all the remaining "undocumented/duplicated arguments" warnings given by CRAN-check. ## How was this patch tested? R unit test and check-cran.sh script. You can merge this pull request into a Git repository by running: $ git pull https://github.com/junyangq/spark SPARK-16508-branch-2.0 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14558.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14558 commit 82e2f09517e9f3d726af0046d251748f892f59c8 Author: Junyang QianDate: 2016-08-09T04:52:34Z Fix part of undocumented/duplicated arguments warnings by CRAN-check commit 41d9dcac3e1d3ee1a27fe094ebb60c1c18d6bcff Author: Mariusz Strzelecki Date: 2016-08-09T16:44:43Z [SPARK-16950] [PYSPARK] fromOffsets parameter support in KafkaUtils.createDirectStream for python3 ## What changes were proposed in this pull request? Ability to use KafkaUtils.createDirectStream with starting offsets in python 3 by using java.lang.Number instead of Long during param mapping in scala helper. This allows py4j to pass Integer or Long to the map and resolves ClassCastException problems. ## How was this patch tested? unit tests jerryshao - could you please look at this PR? Author: Mariusz Strzelecki Closes #14540 from szczeles/kafka_pyspark. (cherry picked from commit 29081b587f3423bf5a3e0066357884d0c26a04bf) Signed-off-by: Davies Liu commit 44115e90ef2a80d8ecf3965b97ce7bee21e29158 Author: Josh Rosen Date: 2016-08-09T18:21:45Z [SPARK-16956] Make ApplicationState.MAX_NUM_RETRY configurable ## What changes were proposed in this pull request? This patch introduces a new configuration, `spark.deploy.maxExecutorRetries`, to let users configure an obscure behavior in the standalone master where the master will kill Spark applications which have experienced too many back-to-back executor failures. The current setting is a hardcoded constant (10); this patch replaces that with a new cluster-wide configuration. **Background:** This application-killing was added in 6b5980da796e0204a7735a31fb454f312bc9daac (from September 2012) and I believe that it was designed to prevent a faulty application whose executors could never launch from DOS'ing the Spark cluster via an infinite series of executor launch attempts. In a subsequent patch (#1360), this feature was refined to prevent applications which have running executors from being killed by this code path. **Motivation for making this configurable:** Previously, if a Spark Standalone application experienced more than `ApplicationState.MAX_NUM_RETRY` executor failures and was left with no executors running then the Spark master would kill that application, but this behavior is problematic in environments where the Spark executors run on unstable infrastructure and can all simultaneously die. For instance, if your Spark driver runs on an on-demand EC2 instance while all workers run on ephemeral spot instances then it's possible for all executors to die at the same time while the driver stays alive. In this case, it may be desirable to keep the Spark application alive so that it can recover once new workers and executors are available. In order to accommodate this use-case, this patch modifies the Master to never kill faulty applications if `spark.deploy.maxExecutorRetries` is negative. I'd like to merge this patch into master, branch-2.0, and branch-1.6. ## How was this patch tested? I tested this manually using `spark-shell` and `local-cluster` mode. This is a tricky feature to unit test and historically this code has not changed very often, so I'd prefer to skip the additional effort of adding a testing framework and would rather rely on manual tests and review for now. Author: Josh Rosen Closes #14544 from JoshRosen/add-setting-for-max-executor-failures. (cherry picked from commit b89b3a5c8e391fcaebe7ef3c77ef16bb9431d6ab) Signed-off-by: Josh Rosen commit 2d136dba415a40a04598068ac2cea0490a6fd091 Author: Davies Liu Date: 2016-08-09T17:04:36Z [SPARK-16905] SQL DDL: MSCK REPAIR TABLE MSCK REPAIR TABLE could be used to recover the partitions in external catalog based on partitions in
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq closed the pull request at: https://github.com/apache/spark/pull/14558 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r75215338 --- Diff: R/pkg/R/generics.R --- @@ -1251,10 +1311,57 @@ setGeneric("year", function(x) { standardGeneric("year") }) #' @export setGeneric("spark.glm", function(data, formula, ...) { standardGeneric("spark.glm") }) +#' @param formula a symbolic description of the model to be fitted. If \code{data} is a +#'SparkDataFrame, currently only a few formula operators are supported, +#'including '~', '.', ':', '+', and '-'. +#' @param data a SparkDataFrame or (R glm) data.frame, list or environment for training. +#' @param family a description of the error distribution and link function to be used in the model. +#' This can be a character string naming a family function, a family function or +#' the result of a call to a family function. Refer R family at +#' \url{https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html}. +#' @param epsilon positive convergence tolerance of iterations. +#' @param maxit integer giving the maximal number of IRLS iterations. +#' @param weights an optional vector of 'prior weights' to be used in the fitting process. --- End diff -- This is from R's own `glm` doc. Why do we need to copy it here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r75182227 --- Diff: .gitignore --- @@ -77,3 +77,8 @@ spark-warehouse/ # For R session data .RData .RHistory +.Rhistory --- End diff -- +1. This is not part of this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r75123714 --- Diff: R/pkg/R/generics.R --- @@ -1251,10 +1311,57 @@ setGeneric("year", function(x) { standardGeneric("year") }) #' @export setGeneric("spark.glm", function(data, formula, ...) { standardGeneric("spark.glm") }) +#' @param formula a symbolic description of the model to be fitted. If \code{data} is a +#'SparkDataFrame, currently only a few formula operators are supported, +#'including '~', '.', ':', '+', and '-'. +#' @param data a SparkDataFrame or (R glm) data.frame, list or environment for training. +#' @param family a description of the error distribution and link function to be used in the model. +#' This can be a character string naming a family function, a family function or +#' the result of a call to a family function. Refer R family at +#' \url{https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html}. +#' @param epsilon positive convergence tolerance of iterations. +#' @param maxit integer giving the maximal number of IRLS iterations. +#' @param weights an optional vector of 'prior weights' to be used in the fitting process. +#'Should be NULL or a numeric vector. +#' @param subset an optional vector specifying a subset of observations to be used in the +#' fitting process. +#' @param na.action a function which indicates what should happen when the data contain NAs. +#' The default is set by the na.action setting of options, and is na.fail +#' if that is unset. The 'factory-fresh' default is na.omit. Another possible +#' value is NULL, no action. Value na.exclude can be useful. +#' @param start starting values for the parameters in the linear predictor. +#' @param etastart starting values for the linear predictor. +#' @param mustart starting values for the vector of means. +#' @param offset this can be used to specify an a priori known component to be included in +#' the linear predictor during fitting. This should be NULL or +#' a numeric vector of length equal to the number of cases. One or more offset +#' terms can be included in the formula instead or as well, and if more than +#' one is specified their sum is used. See model.offset. +#' @param control a list of parameters for controlling the fitting process. For glm.fit +#'this is passed to glm.control. +#' @param model a logical value indicating whether model frame should be included as +#' a component of the returned value. +#' @param method the method to be used in fitting the model. The default method +#' "glm.fit" uses iteratively reweighted least squares (IWLS): the alternative +#' "model.frame" returns the model frame and does no fitting. +#' User-supplied fitting functions can be supplied either as a function or +#' a character string naming a function, with a function which takes the same +#' arguments as glm.fit. If specified as a character string it is looked up from +#' within the stats namespace. +#' @param x,y logical values indicating whether the response vector and model matrix +#'used in the fitting process should be returned as components of the returned value. +#' @param contrasts an optional list. See the contrasts.arg of model.matrix.default. +#' @param ... arguments to be used to form the default control argument if it is +#'not supplied directly. #' @rdname glm +#' @details If \code{data} is a data.frame, list or environment, \code{glm} behaves the same as +#' \code{glm} in the \code{stats} package. If \code{data} is a SparkDataFrame, +#' \code{spark.glm} is called. #' @export setGeneric("glm") +#' @param object a fitted ML model object. +#' @param ... additional argument(s) passed to the method. --- End diff -- sorry, you are right - as you can see - having in 2 places are hard to follow :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r75124591 --- Diff: R/pkg/R/DataFrame.R --- @@ -510,9 +510,7 @@ setMethod("registerTempTable", #' #' Insert the contents of a SparkDataFrame into a table registered in the current SparkSession. #' -#' @param x A SparkDataFrame -#' @param tableName A character vector containing the name of the table --- End diff -- I agree. that is an unfortunate side-effect. I wonder if there is a way to order this in the Rd generated roxygen; maybe this could be a reasonable PR to change roxygen for special casing `...`. The only other solution I've seen would be to hand craft a .Rd file... For now I think we should prioritize maintainability and keep the doc close to the function as much as possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r75123349 --- Diff: R/pkg/R/mllib.R --- @@ -602,14 +599,14 @@ setMethod("spark.survreg", signature(data = "SparkDataFrame", formula = "formula # Returns a summary of the AFT survival regression model produced by spark.survreg, # similarly to R's summary(). -#' @param object A fitted AFT survival regression model +#' @param object a fitted AFT survival regression model. #' @return \code{summary} returns a list containing the model's coefficients, #' intercept and log(scale) #' @rdname spark.survreg #' @export #' @note summary(AFTSurvivalRegressionModel) since 2.0.0 setMethod("summary", signature(object = "AFTSurvivalRegressionModel"), - function(object, ...) { + function(object) { --- End diff -- sorry, no, not the generic, just here. I think this is fine --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r75025476 --- Diff: R/pkg/R/DataFrame.R --- @@ -510,9 +510,7 @@ setMethod("registerTempTable", #' #' Insert the contents of a SparkDataFrame into a table registered in the current SparkSession. #' -#' @param x A SparkDataFrame -#' @param tableName A character vector containing the name of the table --- End diff -- Yes, I agree that ideally the doc should be kept near the function definition/body, and that's consistent with many other functions. So then the only issue is `...` would become first in the argument list of the doc. I'm not sure if that would bother the user to some extent. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74974091 --- Diff: R/pkg/R/DataFrame.R --- @@ -510,9 +510,7 @@ setMethod("registerTempTable", #' #' Insert the contents of a SparkDataFrame into a table registered in the current SparkSession. #' -#' @param x A SparkDataFrame -#' @param tableName A character vector containing the name of the table --- End diff -- I see. We typically have ``` setGeneric("foo", function(x, ...) {...} setMethod("foo", signature(x = "A"), function(x, a, b, moreParameters) {...} ``` In many cases we have setGeneric the fewer number of parameters - whether it is because it has to match an existing generic or some parameters we don't put in the signature with type - so we would have a parameter and `...` in the generic. I see your point about if the parameter is in the generic then perhaps we should @param document it there as well. I think we should try to keep it near the function definition/body though because 1) that's where the parameter is actually being used and its meaning could change 2) it would be easier to review - we have a few changes out where the first parameter does not have a @param (it should help when we turn on check-cran in Jenkins) 3) many of our functions are like this so if we are to move the @param for the first parameter to generics.R (if there is `...` there) then there will be dozens or hundreds of lines changing (eg. bround, contains, subset and many more) with the exception where we've talked about the generic applies to multiple function definitions with different classes and that first parameter could be in different classes so it needs a central place. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74874929 --- Diff: R/pkg/R/functions.R --- @@ -1143,7 +1139,7 @@ setMethod("minute", #' @export #' @examples \dontrun{select(df, monotonically_increasing_id())} setMethod("monotonically_increasing_id", - signature(x = "missing"), + signature(), --- End diff -- Automatic generation of S4 methods is not desirable. I hope this case can be better handled by roxygen. For now, I agree (b) is a good solution to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74874081 --- Diff: R/pkg/R/SQLContext.R --- @@ -181,7 +181,7 @@ getDefaultSqlSource <- function() { #' @method createDataFrame default #' @note createDataFrame since 1.4.0 # TODO(davies): support sampling and infer type from NA -createDataFrame.default <- function(data, schema = NULL, samplingRatio = 1.0) { +createDataFrame.default <- function(data, schema = NULL) { --- End diff -- Oh yes... Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74873867 --- Diff: R/pkg/R/mllib.R --- @@ -298,14 +304,15 @@ setMethod("summary", signature(object = "NaiveBayesModel"), #' Users can call \code{summary} to print a summary of the fitted model, \code{predict} to make #' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. #' -#' @param data SparkDataFrame for training -#' @param formula A symbolic description of the model to be fitted. Currently only a few formula +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula #'operators are supported, including '~', '.', ':', '+', and '-'. #'Note that the response variable of formula is empty in spark.kmeans. -#' @param k Number of centers -#' @param maxIter Maximum iteration number -#' @param initMode The initialization algorithm choosen to fit the model -#' @return \code{spark.kmeans} returns a fitted k-means model +#' @param ... additional argument(s) passed to the method. --- End diff -- Yeah agreed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74729338 --- Diff: R/pkg/R/DataFrame.R --- @@ -510,9 +510,7 @@ setMethod("registerTempTable", #' #' Insert the contents of a SparkDataFrame into a table registered in the current SparkSession. #' -#' @param x A SparkDataFrame -#' @param tableName A character vector containing the name of the table --- End diff -- The reasons are 1. It's generic counterpart has `...` and we have to document for that. 2. I think it may be preferred that `...` follows other explicit arguments in order? Does that make sense? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74729028 --- Diff: R/pkg/R/generics.R --- @@ -1251,10 +1311,57 @@ setGeneric("year", function(x) { standardGeneric("year") }) #' @export setGeneric("spark.glm", function(data, formula, ...) { standardGeneric("spark.glm") }) +#' @param formula a symbolic description of the model to be fitted. If \code{data} is a +#'SparkDataFrame, currently only a few formula operators are supported, +#'including '~', '.', ':', '+', and '-'. +#' @param data a SparkDataFrame or (R glm) data.frame, list or environment for training. +#' @param family a description of the error distribution and link function to be used in the model. +#' This can be a character string naming a family function, a family function or +#' the result of a call to a family function. Refer R family at +#' \url{https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html}. +#' @param epsilon positive convergence tolerance of iterations. +#' @param maxit integer giving the maximal number of IRLS iterations. +#' @param weights an optional vector of 'prior weights' to be used in the fitting process. +#'Should be NULL or a numeric vector. +#' @param subset an optional vector specifying a subset of observations to be used in the +#' fitting process. +#' @param na.action a function which indicates what should happen when the data contain NAs. +#' The default is set by the na.action setting of options, and is na.fail +#' if that is unset. The 'factory-fresh' default is na.omit. Another possible +#' value is NULL, no action. Value na.exclude can be useful. +#' @param start starting values for the parameters in the linear predictor. +#' @param etastart starting values for the linear predictor. +#' @param mustart starting values for the vector of means. +#' @param offset this can be used to specify an a priori known component to be included in +#' the linear predictor during fitting. This should be NULL or +#' a numeric vector of length equal to the number of cases. One or more offset +#' terms can be included in the formula instead or as well, and if more than +#' one is specified their sum is used. See model.offset. +#' @param control a list of parameters for controlling the fitting process. For glm.fit +#'this is passed to glm.control. +#' @param model a logical value indicating whether model frame should be included as +#' a component of the returned value. +#' @param method the method to be used in fitting the model. The default method +#' "glm.fit" uses iteratively reweighted least squares (IWLS): the alternative +#' "model.frame" returns the model frame and does no fitting. +#' User-supplied fitting functions can be supplied either as a function or +#' a character string naming a function, with a function which takes the same +#' arguments as glm.fit. If specified as a character string it is looked up from +#' within the stats namespace. +#' @param x,y logical values indicating whether the response vector and model matrix +#'used in the fitting process should be returned as components of the returned value. +#' @param contrasts an optional list. See the contrasts.arg of model.matrix.default. +#' @param ... arguments to be used to form the default control argument if it is +#'not supplied directly. #' @rdname glm +#' @details If \code{data} is a data.frame, list or environment, \code{glm} behaves the same as +#' \code{glm} in the \code{stats} package. If \code{data} is a SparkDataFrame, +#' \code{spark.glm} is called. #' @export setGeneric("glm") +#' @param object a fitted ML model object. +#' @param ... additional argument(s) passed to the method. --- End diff -- There are additional arguments like `newData`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74728822 --- Diff: R/pkg/R/generics.R --- @@ -1277,8 +1384,11 @@ setGeneric("spark.naiveBayes", function(data, formula, ...) { standardGeneric("s #' @rdname spark.survreg #' @export -setGeneric("spark.survreg", function(data, formula, ...) { standardGeneric("spark.survreg") }) +setGeneric("spark.survreg", function(data, formula) { standardGeneric("spark.survreg") }) +#' @param object a fitted ML model object. +#' @param path the directory where the model is saved. +#' @param ... additional argument(s) passed to the method. #' @rdname write.ml #' @export setGeneric("write.ml", function(object, path, ...) { standardGeneric("write.ml") }) --- End diff -- There is an `overwrite` argument? It seems that methods can add arguments to the generic only if '...' is an argument to the generic. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74728351 --- Diff: R/pkg/R/mllib.R --- @@ -142,15 +143,6 @@ setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"), #' Generalized Linear Models (R-compliant) #' #' Fits a generalized linear model, similarly to R's glm(). -#' @param formula A symbolic description of the model to be fitted. Currently only a few formula --- End diff -- Well yes we agree on that. The reason I moved these is because I thought when `setGeneric("glm")` is called, it looks for the R glm method and it includes many arguments. Some overlap with `glm` defined for SparkDataFrame. Then I simply moved all the argument docs to the generic part, though that makes the doc there cumbersome... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74727208 --- Diff: R/pkg/R/mllib.R --- @@ -602,14 +599,14 @@ setMethod("spark.survreg", signature(data = "SparkDataFrame", formula = "formula # Returns a summary of the AFT survival regression model produced by spark.survreg, # similarly to R's summary(). -#' @param object A fitted AFT survival regression model +#' @param object a fitted AFT survival regression model. #' @return \code{summary} returns a list containing the model's coefficients, #' intercept and log(scale) #' @rdname spark.survreg #' @export #' @note summary(AFTSurvivalRegressionModel) since 2.0.0 setMethod("summary", signature(object = "AFTSurvivalRegressionModel"), - function(object, ...) { + function(object) { --- End diff -- I guess you meant omitting `...` in the generic function as well? If so, the S3 base `summary` would not be added as a default method when it sees S4 generic definition and would so be masked by SparkR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74707500 --- Diff: R/pkg/R/mllib.R --- @@ -142,15 +143,6 @@ setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"), #' Generalized Linear Models (R-compliant) #' #' Fits a generalized linear model, similarly to R's glm(). -#' @param formula A symbolic description of the model to be fitted. Currently only a few formula --- End diff -- Is there a reason we are moving these? As we have discussed, we should try to keep the documentation as close to the function definition/implementation as possible. That would make it a lot more easier to keep the code and documentation in sync, and to remind reviewers to check for them for correctness and consistency. Where there are cases where a @param description applies to two or more implementations (because it supports multiple classes) because of the lack of a common place we could put that in the generics.R for now. If `...` only shows up in generics.R - we should check if it's because it's to match an existing function signature, if it is then we add @param in generics.R. If it is not and it is not used anywhere we should consider removing the `...` Does that make sense? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74707292 --- Diff: R/pkg/R/mllib.R --- @@ -142,15 +143,6 @@ setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"), #' Generalized Linear Models (R-compliant) #' #' Fits a generalized linear model, similarly to R's glm(). -#' @param formula A symbolic description of the model to be fitted. Currently only a few formula --- End diff -- It was moved to the generic function definition. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74707272 --- Diff: R/pkg/R/mllib.R --- @@ -346,8 +339,11 @@ setMethod("spark.kmeans", signature(data = "SparkDataFrame", formula = "formula" #' Get fitted result from a k-means model, similarly to R's fitted(). #' Note: A saved-loaded model does not support this method. #' -#' @param object A fitted k-means model -#' @return \code{fitted} returns a SparkDataFrame containing fitted values +#' @param object a fitted k-means model. +#' @param method type of fitted results, \code{"centers"} for cluster centers +#'or \code{"classes"} for assigned classes. +#' @param ... additional argument(s) passed to the method. --- End diff -- The same as above... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74707253 --- Diff: R/pkg/R/mllib.R --- @@ -414,11 +411,12 @@ setMethod("predict", signature(object = "KMeansModel"), #' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. #' Only categorical data is supported. #' -#' @param data A \code{SparkDataFrame} of observations and labels for model fitting -#' @param formula A symbolic description of the model to be fitted. Currently only a few formula +#' @param data a \code{SparkDataFrame} of observations and labels for model fitting. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula #' operators are supported, including '~', '.', ':', '+', and '-'. -#' @param smoothing Smoothing parameter -#' @return \code{spark.naiveBayes} returns a fitted naive Bayes model +#' @param ... additional argument(s) passed to the method. Currently only \code{smoothing}. --- End diff -- This is actually for the generic function. Should I move these doc to the generic definition? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74706060 --- Diff: R/pkg/R/DataFrame.R --- @@ -3184,6 +3200,7 @@ setMethod("histogram", #' @param x A SparkDataFrame #' @param url JDBC database url of the form `jdbc:subprotocol:subname` #' @param tableName The name of the table in the external database +#' @param ... additional JDBC database connection propertie(s). --- End diff -- Done. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74694116 --- Diff: R/pkg/R/functions.R --- @@ -1143,7 +1139,7 @@ setMethod("minute", #' @export #' @examples \dontrun{select(df, monotonically_increasing_id())} setMethod("monotonically_increasing_id", - signature(x = "missing"), + signature(), --- End diff -- Hmm, I looked into this a bit in below, "a" is what we have orginially ``` setGeneric("a", function(x) { standardGeneric("a") } ) setMethod("a", signature(x = "missing"), function() { 1 }) > a() [1] 1 > a(1) Error in (function (classes, fdef, mtable) : unable to find an inherited method for function âaâ for signature â"numeric"â > showMethods(a) Function: a (package .GlobalEnv) x="missing" setGeneric("b", function(x = "missing") { standardGeneric("b") } ) setMethod("b", signature("missing"), function() { 1 }) > b() [1] 1 > b(1) Error in (function (classes, fdef, mtable) : unable to find an inherited method for function âbâ for signature â"numeric"â > showMethods(b) Function: b (package .GlobalEnv) x="missing" setGeneric("tt", function(...) { standardGeneric("tt") } ) setMethod("tt", signature(), function() { 1 }) > tt(1) Error in .local(...) : unused argument (1) > tt("emme") Error in .local(...) : unused argument ("emme") > showMethods(tt) Function: tt (package .GlobalEnv) ...="ANY" ...="character" (inherited from: ...="ANY") ...="numeric" (inherited from: ...="ANY") ``` I think the issue with `setMethod("foo", signature(), function() { 1 })` are two folds: 1. The error message when calling the function with a parameter is less clear than "unable to find function for signature â"numeric"â" 2. S4 method is automatically generated for each parameter type that it is called with (see "numeric" or character" from showMethods) So perhaps "b" is a better approach? Unfortunately we still have to "document" a parameter or `...` in either case as `setGeneric` refuse to take it otherwise. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74693897 --- Diff: R/pkg/R/SQLContext.R --- @@ -181,7 +181,7 @@ getDefaultSqlSource <- function() { #' @method createDataFrame default #' @note createDataFrame since 1.4.0 # TODO(davies): support sampling and infer type from NA -createDataFrame.default <- function(data, schema = NULL, samplingRatio = 1.0) { +createDataFrame.default <- function(data, schema = NULL) { --- End diff -- we discussed we shouldn't remove `samplingRatio = 1.0` from the signature? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74693887 --- Diff: R/pkg/R/DataFrame.R --- @@ -2461,8 +2473,9 @@ setMethod("unionAll", #' Union two or more SparkDataFrames. This is equivalent to `UNION ALL` in SQL. #' Note that this does not remove duplicate rows across the two SparkDataFrames. #' -#' @param x A SparkDataFrame -#' @param ... Additional SparkDataFrame +#' @param x a SparkDataFrame. +#' @param ... additional SparkDataFrame(s). +#' @param deparse.level dummy variable, currently not used. --- End diff -- It's not a dummy variable per se - it was there to match the signature of the base implementation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74693856 --- Diff: R/pkg/R/DataFrame.R --- @@ -1146,7 +1147,7 @@ setMethod("head", #' Return the first row of a SparkDataFrame #' -#' @param x A SparkDataFrame --- End diff -- Right - they are not - RDD functions are not exported from the packages (not public) and we don't want Rd file generated for them. Please see PR #14626 - we want a separate SetGeneric for non-RDD functions, and then this line documenting both DataFrame and Column parameter can then go to generics.R --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74693823 --- Diff: R/pkg/R/DataFrame.R --- @@ -510,9 +510,7 @@ setMethod("registerTempTable", #' #' Insert the contents of a SparkDataFrame into a table registered in the current SparkSession. #' -#' @param x A SparkDataFrame -#' @param tableName A character vector containing the name of the table --- End diff -- why is tableName moved? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74693204 --- Diff: R/pkg/R/functions.R --- @@ -1497,7 +1493,7 @@ setMethod("soundex", #' \dontrun{select(df, spark_partition_id())} #' @note spark_partition_id since 2.0.0 setMethod("spark_partition_id", - signature(x = "missing"), + signature(), --- End diff -- what I mean is you could put it as `signature(missing)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74693197 --- Diff: R/pkg/R/generics.R --- @@ -1251,10 +1311,57 @@ setGeneric("year", function(x) { standardGeneric("year") }) #' @export setGeneric("spark.glm", function(data, formula, ...) { standardGeneric("spark.glm") }) +#' @param formula a symbolic description of the model to be fitted. If \code{data} is a +#'SparkDataFrame, currently only a few formula operators are supported, +#'including '~', '.', ':', '+', and '-'. +#' @param data a SparkDataFrame or (R glm) data.frame, list or environment for training. +#' @param family a description of the error distribution and link function to be used in the model. +#' This can be a character string naming a family function, a family function or +#' the result of a call to a family function. Refer R family at +#' \url{https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html}. +#' @param epsilon positive convergence tolerance of iterations. +#' @param maxit integer giving the maximal number of IRLS iterations. +#' @param weights an optional vector of 'prior weights' to be used in the fitting process. +#'Should be NULL or a numeric vector. +#' @param subset an optional vector specifying a subset of observations to be used in the +#' fitting process. +#' @param na.action a function which indicates what should happen when the data contain NAs. +#' The default is set by the na.action setting of options, and is na.fail +#' if that is unset. The 'factory-fresh' default is na.omit. Another possible +#' value is NULL, no action. Value na.exclude can be useful. +#' @param start starting values for the parameters in the linear predictor. +#' @param etastart starting values for the linear predictor. +#' @param mustart starting values for the vector of means. +#' @param offset this can be used to specify an a priori known component to be included in +#' the linear predictor during fitting. This should be NULL or +#' a numeric vector of length equal to the number of cases. One or more offset +#' terms can be included in the formula instead or as well, and if more than +#' one is specified their sum is used. See model.offset. +#' @param control a list of parameters for controlling the fitting process. For glm.fit +#'this is passed to glm.control. +#' @param model a logical value indicating whether model frame should be included as +#' a component of the returned value. +#' @param method the method to be used in fitting the model. The default method +#' "glm.fit" uses iteratively reweighted least squares (IWLS): the alternative +#' "model.frame" returns the model frame and does no fitting. +#' User-supplied fitting functions can be supplied either as a function or +#' a character string naming a function, with a function which takes the same +#' arguments as glm.fit. If specified as a character string it is looked up from +#' within the stats namespace. +#' @param x,y logical values indicating whether the response vector and model matrix +#'used in the fitting process should be returned as components of the returned value. +#' @param contrasts an optional list. See the contrasts.arg of model.matrix.default. +#' @param ... arguments to be used to form the default control argument if it is +#'not supplied directly. #' @rdname glm +#' @details If \code{data} is a data.frame, list or environment, \code{glm} behaves the same as +#' \code{glm} in the \code{stats} package. If \code{data} is a SparkDataFrame, +#' \code{spark.glm} is called. #' @export setGeneric("glm") +#' @param object a fitted ML model object. +#' @param ... additional argument(s) passed to the method. --- End diff -- `Currently not used` - we don't use this in any `predict` implementation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74693174 --- Diff: R/pkg/R/generics.R --- @@ -1277,8 +1384,11 @@ setGeneric("spark.naiveBayes", function(data, formula, ...) { standardGeneric("s #' @rdname spark.survreg #' @export -setGeneric("spark.survreg", function(data, formula, ...) { standardGeneric("spark.survreg") }) +setGeneric("spark.survreg", function(data, formula) { standardGeneric("spark.survreg") }) +#' @param object a fitted ML model object. +#' @param path the directory where the model is saved. +#' @param ... additional argument(s) passed to the method. #' @rdname write.ml #' @export setGeneric("write.ml", function(object, path, ...) { standardGeneric("write.ml") }) --- End diff -- again, no reason for this generics to have `...`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74693158 --- Diff: R/pkg/R/mllib.R --- @@ -142,15 +143,6 @@ setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"), #' Generalized Linear Models (R-compliant) #' #' Fits a generalized linear model, similarly to R's glm(). -#' @param formula A symbolic description of the model to be fitted. Currently only a few formula --- End diff -- why are we moving this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74693142 --- Diff: R/pkg/R/mllib.R --- @@ -298,14 +304,15 @@ setMethod("summary", signature(object = "NaiveBayesModel"), #' Users can call \code{summary} to print a summary of the fitted model, \code{predict} to make #' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. #' -#' @param data SparkDataFrame for training -#' @param formula A symbolic description of the model to be fitted. Currently only a few formula +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula #'operators are supported, including '~', '.', ':', '+', and '-'. #'Note that the response variable of formula is empty in spark.kmeans. -#' @param k Number of centers -#' @param maxIter Maximum iteration number -#' @param initMode The initialization algorithm choosen to fit the model -#' @return \code{spark.kmeans} returns a fitted k-means model +#' @param ... additional argument(s) passed to the method. --- End diff -- for all the `spark.something` functions, if we are not actually using `...` for optional parameters, lets remove it. There is no reason to have unused unnamed parameters for a function? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74693106 --- Diff: R/pkg/R/mllib.R --- @@ -346,8 +339,11 @@ setMethod("spark.kmeans", signature(data = "SparkDataFrame", formula = "formula" #' Get fitted result from a k-means model, similarly to R's fitted(). #' Note: A saved-loaded model does not support this method. #' -#' @param object A fitted k-means model -#' @return \code{fitted} returns a SparkDataFrame containing fitted values +#' @param object a fitted k-means model. +#' @param method type of fitted results, \code{"centers"} for cluster centers +#'or \code{"classes"} for assigned classes. +#' @param ... additional argument(s) passed to the method. --- End diff -- let's remove `...` in the function --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74693097 --- Diff: R/pkg/R/mllib.R --- @@ -414,11 +411,12 @@ setMethod("predict", signature(object = "KMeansModel"), #' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. #' Only categorical data is supported. #' -#' @param data A \code{SparkDataFrame} of observations and labels for model fitting -#' @param formula A symbolic description of the model to be fitted. Currently only a few formula +#' @param data a \code{SparkDataFrame} of observations and labels for model fitting. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula #' operators are supported, including '~', '.', ':', '+', and '-'. -#' @param smoothing Smoothing parameter -#' @return \code{spark.naiveBayes} returns a fitted naive Bayes model +#' @param ... additional argument(s) passed to the method. Currently only \code{smoothing}. --- End diff -- this is removed, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74693087 --- Diff: .gitignore --- @@ -77,3 +77,8 @@ spark-warehouse/ # For R session data .RData .RHistory +.Rhistory --- End diff -- I'm not sure if this is essential for this PR. I'd suggest leaving this out --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74693053 --- Diff: R/pkg/R/mllib.R --- @@ -602,14 +599,14 @@ setMethod("spark.survreg", signature(data = "SparkDataFrame", formula = "formula # Returns a summary of the AFT survival regression model produced by spark.survreg, # similarly to R's summary(). -#' @param object A fitted AFT survival regression model +#' @param object a fitted AFT survival regression model. #' @return \code{summary} returns a list containing the model's coefficients, #' intercept and log(scale) #' @rdname spark.survreg #' @export #' @note summary(AFTSurvivalRegressionModel) since 2.0.0 setMethod("summary", signature(object = "AFTSurvivalRegressionModel"), - function(object, ...) { + function(object) { --- End diff -- thinking more about, I think the reason is because R's base::summary has in fact the `...` https://stat.ethz.ch/R-manual/R-devel/library/base/html/summary.html Could you test if base::summary still work? I think we would prefer omitting `...` for all our `summary` functions but we need to make sure doing so doesn't mask it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74692979 --- Diff: R/pkg/R/DataFrame.R --- @@ -3184,6 +3200,7 @@ setMethod("histogram", #' @param x A SparkDataFrame #' @param url JDBC database url of the form `jdbc:subprotocol:subname` #' @param tableName The name of the table in the external database +#' @param ... additional JDBC database connection propertie(s). --- End diff -- the singular form is property and plural is properties, so the `()` doesn't really work in this case. let's just leave it as properties. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74350887 --- Diff: R/pkg/R/generics.R --- @@ -1046,8 +1084,8 @@ setGeneric("ntile", function(x) { standardGeneric("ntile") }) #' @export setGeneric("n_distinct", function(x, ...) { standardGeneric("n_distinct") }) -#' @rdname percent_rank -#' @export +# @rdname percent_rank --- End diff -- Yeah that works if we also set the argument list of the generic counterpart to `...`. Then we still have to document the `...` part. We could tell the user things like "this should be left empty". I'm not sure if that would cause some confusion? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74349926 --- Diff: R/pkg/R/mllib.R --- @@ -57,6 +57,9 @@ setClass("KMeansModel", representation(jobj = "jobj")) #' #' Saves the MLlib model to the input path. For more information, see the specific #' MLlib model below. +#' @param object a fitted ML model object. +#' @param path the directory where the model is saved. +#' @param ... additional argument(s) passed to the method. --- End diff -- Thanks for the suggestion. I'll move those above the definition of generic functions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74349451 --- Diff: R/pkg/R/mllib.R --- @@ -414,11 +425,12 @@ setMethod("predict", signature(object = "KMeansModel"), #' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. #' Only categorical data is supported. #' -#' @param data A \code{SparkDataFrame} of observations and labels for model fitting -#' @param formula A symbolic description of the model to be fitted. Currently only a few formula +#' @param data a \code{SparkDataFrame} of observations and labels for model fitting. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula #' operators are supported, including '~', '.', ':', '+', and '-'. -#' @param smoothing Smoothing parameter -#' @return \code{spark.naiveBayes} returns a fitted naive Bayes model +#' @param smoothing smoothing parameter. +#' @param ... additional parameter(s) passed to the method. --- End diff -- I think that is for the optional `smoothing` parameter? If we put it in the signature, does it mean that when we call the function, that field would be required from the user? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74316468 --- Diff: R/pkg/R/generics.R --- @@ -1046,8 +1084,8 @@ setGeneric("ntile", function(x) { standardGeneric("ntile") }) #' @export setGeneric("n_distinct", function(x, ...) { standardGeneric("n_distinct") }) -#' @rdname percent_rank -#' @export +# @rdname percent_rank --- End diff -- try changing the S4 signature from `(x = missing)` to `(missing)` - i think that should eliminate the need for a dummy parameter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74316333 --- Diff: R/pkg/R/mllib.R --- @@ -57,6 +57,9 @@ setClass("KMeansModel", representation(jobj = "jobj")) #' #' Saves the MLlib model to the input path. For more information, see the specific #' MLlib model below. +#' @param object a fitted ML model object. +#' @param path the directory where the model is saved. +#' @param ... additional argument(s) passed to the method. --- End diff -- this is more of a placeholder page to direct to other pages. there's no function signature on this output page and so I don't think we should add parameter here either. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74316168 --- Diff: R/pkg/R/mllib.R --- @@ -414,11 +425,12 @@ setMethod("predict", signature(object = "KMeansModel"), #' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. #' Only categorical data is supported. #' -#' @param data A \code{SparkDataFrame} of observations and labels for model fitting -#' @param formula A symbolic description of the model to be fitted. Currently only a few formula +#' @param data a \code{SparkDataFrame} of observations and labels for model fitting. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula #' operators are supported, including '~', '.', ':', '+', and '-'. -#' @param smoothing Smoothing parameter -#' @return \code{spark.naiveBayes} returns a fitted naive Bayes model +#' @param smoothing smoothing parameter. +#' @param ... additional parameter(s) passed to the method. --- End diff -- It is, but what I mean is that there doesn't seem to be a reason that it is in the generic? this is a very specific spark function name that doesn't need to be having this to be compatible or something? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74178308 --- Diff: .gitignore --- @@ -77,3 +77,8 @@ spark-warehouse/ # For R session data .RData .RHistory +.Rhistory --- End diff -- Hmm... Is the upper/lower case difference caused by different R versions or platforms? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74177336 --- Diff: R/pkg/R/SQLContext.R --- @@ -257,23 +257,24 @@ createDataFrame.default <- function(data, schema = NULL, samplingRatio = 1.0) { } createDataFrame <- function(x, ...) { - dispatchFunc("createDataFrame(data, schema = NULL, samplingRatio = 1.0)", x, ...) + dispatchFunc("createDataFrame(data, schema = NULL)", x, ...) } #' @rdname createDataFrame #' @aliases createDataFrame #' @export #' @method as.DataFrame default #' @note as.DataFrame since 1.6.0 -as.DataFrame.default <- function(data, schema = NULL, samplingRatio = 1.0) { --- End diff -- Good catch. I did not realize that. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74176995 --- Diff: R/pkg/R/DataFrame.R --- @@ -1146,7 +1147,7 @@ setMethod("head", #' Return the first row of a SparkDataFrame #' -#' @param x A SparkDataFrame --- End diff -- I don't have a good answer yet. When I tried to move the param there, it seems that the generic functions for RDD Actions and Transformations are not exposed. Do you know specific reason for that by chance? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74176280 --- Diff: R/pkg/R/functions.R --- @@ -1273,12 +1267,14 @@ setMethod("round", #' bround #' #' Returns the value of the column `e` rounded to `scale` decimal places using HALF_EVEN rounding -#' mode if `scale` >= 0 or at integral part when `scale` < 0. +#' mode if `scale` >= 0 or at integer part when `scale` < 0. #' Also known as Gaussian rounding or bankers' rounding that rounds to the nearest even number. #' bround(2.5, 0) = 2, bround(3.5, 0) = 4. #' #' @param x Column to compute on. -#' +#' @param scale round to `scale` digits to the right of the decimal point when `scale` > 0, --- End diff -- They are actually talking about the same thing. In the description, I think it says when `scale` < 0, it rounds on the integer part? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74172127 --- Diff: R/pkg/R/functions.R --- @@ -3033,6 +3033,9 @@ setMethod("when", signature(condition = "Column", value = "ANY"), #' Evaluates a list of conditions and returns \code{yes} if the conditions are satisfied. #' Otherwise \code{no} is returned for unmatched conditions. #' +#' @param test a Column expression that describes the condition. +#' @param yes return values for true elements of test. --- End diff -- Good catch, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74172250 --- Diff: R/pkg/R/functions.R --- @@ -2654,6 +2647,9 @@ setMethod("expr", signature(x = "character"), #' #' Formats the arguments in printf-style and returns the result as a string column. #' +#' @param format a character object of format strings. +#' @param x a Column object. +#' @param ... additional columns. --- End diff -- Done. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74172111 --- Diff: R/pkg/R/generics.R --- @@ -1022,6 +1059,7 @@ setGeneric("month", function(x) { standardGeneric("month") }) #' @export setGeneric("months_between", function(y, x) { standardGeneric("months_between") }) +#' @param x a SparkDataFrame or a Column object. #' @rdname nrow --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74171391 --- Diff: R/pkg/R/mllib.R --- @@ -57,6 +57,9 @@ setClass("KMeansModel", representation(jobj = "jobj")) #' #' Saves the MLlib model to the input path. For more information, see the specific #' MLlib model below. +#' @param object a fitted ML model object. +#' @param path the directory where the model is saved. +#' @param ... additional argument(s) passed to the method. --- End diff -- It doesn't complain about this, though a little weird. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74171283 --- Diff: R/pkg/R/generics.R --- @@ -1046,8 +1084,8 @@ setGeneric("ntile", function(x) { standardGeneric("ntile") }) #' @export setGeneric("n_distinct", function(x, ...) { standardGeneric("n_distinct") }) -#' @rdname percent_rank -#' @export +# @rdname percent_rank --- End diff -- This is perhaps not the best way, but I did this because otherwise the generated doc will show `percent_rank(x)`. Then we have to document `x`, which could be confusing to the user since this argument should be left missing in actual function call. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74170465 --- Diff: R/pkg/R/mllib.R --- @@ -82,15 +87,16 @@ NULL #' Users can call \code{summary} to print a summary of the fitted model, \code{predict} to make #' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. #' -#' @param data SparkDataFrame for training. -#' @param formula A symbolic description of the model to be fitted. Currently only a few formula +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula #'operators are supported, including '~', '.', ':', '+', and '-'. -#' @param family A description of the error distribution and link function to be used in the model. +#' @param family a description of the error distribution and link function to be used in the model. #' This can be a character string naming a family function, a family function or #' the result of a call to a family function. Refer R family at #' \url{https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html}. -#' @param tol Positive convergence tolerance of iterations. -#' @param maxIter Integer giving the maximal number of IRLS iterations. +#' @param tol positive convergence tolerance of iterations. +#' @param maxIter integer giving the maximal number of IRLS iterations. +#' @param ... additional arguments passed to the method. --- End diff -- same as above --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74170438 --- Diff: R/pkg/R/mllib.R --- @@ -298,14 +304,15 @@ setMethod("summary", signature(object = "NaiveBayesModel"), #' Users can call \code{summary} to print a summary of the fitted model, \code{predict} to make #' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. #' -#' @param data SparkDataFrame for training -#' @param formula A symbolic description of the model to be fitted. Currently only a few formula +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula #'operators are supported, including '~', '.', ':', '+', and '-'. #'Note that the response variable of formula is empty in spark.kmeans. -#' @param k Number of centers -#' @param maxIter Maximum iteration number -#' @param initMode The initialization algorithm choosen to fit the model -#' @return \code{spark.kmeans} returns a fitted k-means model +#' @param ... additional argument(s) passed to the method. --- End diff -- This is actually for the `...` argument in the generic function... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74170348 --- Diff: R/pkg/R/mllib.R --- @@ -414,11 +425,12 @@ setMethod("predict", signature(object = "KMeansModel"), #' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. #' Only categorical data is supported. #' -#' @param data A \code{SparkDataFrame} of observations and labels for model fitting -#' @param formula A symbolic description of the model to be fitted. Currently only a few formula +#' @param data a \code{SparkDataFrame} of observations and labels for model fitting. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula #' operators are supported, including '~', '.', ':', '+', and '-'. -#' @param smoothing Smoothing parameter -#' @return \code{spark.naiveBayes} returns a fitted naive Bayes model +#' @param smoothing smoothing parameter. +#' @param ... additional parameter(s) passed to the method. --- End diff -- Done. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74170002 --- Diff: R/pkg/R/mllib.R --- @@ -563,11 +574,12 @@ read.ml <- function(path) { #' \code{predict} to make predictions on new data, and \code{write.ml}/\code{read.ml} to #' save/load fitted models. #' -#' @param data A SparkDataFrame for training -#' @param formula A symbolic description of the model to be fitted. Currently only a few formula +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula #'operators are supported, including '~', ':', '+', and '-'. #'Note that operator '.' is not supported currently -#' @return \code{spark.survreg} returns a fitted AFT survival regression model +#' @param ... additional argument(s) passed to the method. --- End diff -- I noticed those and thought about leaving space for potential arguments in the future... but I guess a cleaner solution would be as you said, to remove these arguments and add back if needed in the future. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74169009 --- Diff: R/pkg/R/generics.R --- @@ -465,10 +477,14 @@ setGeneric("dapply", function(x, func, schema) { standardGeneric("dapply") }) #' @export setGeneric("dapplyCollect", function(x, func) { standardGeneric("dapplyCollect") }) +#' @param x a SparkDataFrame or GroupedData. --- End diff -- I think there is also one for [SparkDataFrame](https://github.com/apache/spark/pull/14558/files#diff-508641a8bd6c6b59f3e77c80cdcfa6a9L1401) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74167878 --- Diff: R/pkg/R/generics.R --- @@ -395,6 +396,9 @@ setGeneric("value", function(bcast) { standardGeneric("value") }) SparkDataFrame Methods +#' @param x a SparkDataFrame or GroupedData. --- End diff -- I was thinking if in the future we want to add other methods associated with that generic, would it be more acceptable to change the doc of the generic than the doc of another associated method (seemingly parallel to the added ones)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74146966 --- Diff: .gitignore --- @@ -77,3 +77,8 @@ spark-warehouse/ # For R session data .RData .RHistory +.Rhistory --- End diff -- Duplicate line. Can these be pushed down into a `.gitignore` in the subdirectory? this file is a mess. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74105069 --- Diff: R/pkg/R/DataFrame.R --- @@ -177,11 +176,10 @@ setMethod("isLocal", #' #' Print the first numRows rows of a SparkDataFrame #' -#' @param x A SparkDataFrame -#' @param numRows The number of rows to print. Defaults to 20. -#' @param truncate Whether truncate long strings. If true, strings more than 20 characters will be -#' truncated and all cells will be aligned right -#' +#' @param numRows the number of rows to print. Defaults to 20. +#' @param truncate whether truncate long strings. If true, strings more than 20 characters will be --- End diff -- true -> TRUE --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74104886 --- Diff: R/pkg/R/DataFrame.R --- @@ -1146,7 +1147,7 @@ setMethod("head", #' Return the first row of a SparkDataFrame #' -#' @param x A SparkDataFrame --- End diff -- I think similar to what you have for other functions, this could go to generic.R - do you have any other idea how to have functions working with multiple class documented? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74104522 --- Diff: R/pkg/R/DataFrame.R --- @@ -2845,8 +2844,11 @@ setMethod("fillna", #' Since data.frames are held in memory, ensure that you have enough memory #' in your system to accommodate the contents. #' -#' @param x a SparkDataFrame -#' @return a data.frame +#' @param x a SparkDataFrame. +#' @param row.names NULL or a character vector giving the row names for the data frame. +#' @param optional If `TRUE`, converting column names is optional. +#' @param ... additional arguments passed to the method. --- End diff -- in this case "additional arguments to pass to base::as.data.frame" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74104039 --- Diff: R/pkg/R/DataFrame.R --- @@ -3049,8 +3050,8 @@ setMethod("drop", #' #' @name histogram #' @param nbins the number of bins (optional). Default value is 10. +#' @param col the column (described by character or Column object) to build the histogram from. --- End diff -- we generally don't say "Column object" - object system in R is a bit different and I think here "S4 class" would make more sense? but I'd suggest simplifying it to: "@param col the column as Character string or a Column to build the histogram from" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74103250 --- Diff: R/pkg/R/DataFrame.R --- @@ -3184,6 +3185,7 @@ setMethod("histogram", #' @param x A SparkDataFrame #' @param url JDBC database url of the form `jdbc:subprotocol:subname` #' @param tableName The name of the table in the external database +#' @param ... additional argument(s) passed to the method --- End diff -- ditto, something like "additional JDBC database connection properties" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74103028 --- Diff: R/pkg/R/SQLContext.R --- @@ -257,23 +257,24 @@ createDataFrame.default <- function(data, schema = NULL, samplingRatio = 1.0) { } createDataFrame <- function(x, ...) { - dispatchFunc("createDataFrame(data, schema = NULL, samplingRatio = 1.0)", x, ...) + dispatchFunc("createDataFrame(data, schema = NULL)", x, ...) } #' @rdname createDataFrame #' @aliases createDataFrame #' @export #' @method as.DataFrame default #' @note as.DataFrame since 1.6.0 -as.DataFrame.default <- function(data, schema = NULL, samplingRatio = 1.0) { --- End diff -- the `.default` methods are here for backward compatibility. (please see SPARK-16693/PR #14330) I don't think we should change the signature - or it will break any existing user. perhaps just add the @param and say "Currently not use"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74102403 --- Diff: R/pkg/R/SQLContext.R --- @@ -727,6 +729,7 @@ dropTempView <- function(viewName) { #' @param source The name of external data source #' @param schema The data schema defined in structType #' @param na.strings Default string value for NA when source is "csv" +#' @param ... additional argument(s) passed to the method. --- End diff -- something like: "additional external data source specific named properties"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74102227 --- Diff: R/pkg/R/SQLContext.R --- @@ -840,6 +844,7 @@ createExternalTable <- function(x, ...) { #' clause expressions used to split the column `partitionColumn` evenly. #' This defaults to SparkContext.defaultParallelism when unset. #' @param predicates a list of conditions in the where clause; each one defines one partition +#' @param ... additional argument(s) passed to the method. --- End diff -- in this case I'd reference the same in L829, something like: " additional JDBC database connection named properties" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74022313 --- Diff: R/pkg/R/functions.R --- @@ -1273,12 +1267,14 @@ setMethod("round", #' bround #' #' Returns the value of the column `e` rounded to `scale` decimal places using HALF_EVEN rounding -#' mode if `scale` >= 0 or at integral part when `scale` < 0. +#' mode if `scale` >= 0 or at integer part when `scale` < 0. #' Also known as Gaussian rounding or bankers' rounding that rounds to the nearest even number. #' bround(2.5, 0) = 2, bround(3.5, 0) = 4. #' #' @param x Column to compute on. -#' +#' @param scale round to `scale` digits to the right of the decimal point when `scale` > 0, --- End diff -- it seems this is duplicating L1270 and might seem confusing since they seem to different behavior? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74021040 --- Diff: R/pkg/R/functions.R --- @@ -1560,7 +1556,8 @@ setMethod("stddev_samp", #' #' Creates a new struct column that composes multiple input columns. #' -#' @param x Column to compute on. +#' @param x a column to compute on. +#' @param ... additional column(s) to be included. --- End diff -- optional? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74020775 --- Diff: R/pkg/R/functions.R --- @@ -2654,6 +2647,9 @@ setMethod("expr", signature(x = "character"), #' #' Formats the arguments in printf-style and returns the result as a string column. #' +#' @param format a character object of format strings. +#' @param x a Column object. +#' @param ... additional columns. --- End diff -- Let's keep type in capital case? `Column` or `Columns` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74017865 --- Diff: R/pkg/R/functions.R --- @@ -3033,6 +3033,9 @@ setMethod("when", signature(condition = "Column", value = "ANY"), #' Evaluates a list of conditions and returns \code{yes} if the conditions are satisfied. #' Otherwise \code{no} is returned for unmatched conditions. #' +#' @param test a Column expression that describes the condition. +#' @param yes return values for true elements of test. --- End diff -- true -> TRUE false -> FALSE? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74017287 --- Diff: R/pkg/R/generics.R --- @@ -1022,6 +1059,7 @@ setGeneric("month", function(x) { standardGeneric("month") }) #' @export setGeneric("months_between", function(y, x) { standardGeneric("months_between") }) +#' @param x a SparkDataFrame or a Column object. #' @rdname nrow --- End diff -- here for "n" column function, it shouldn't be under rdname nrow, which is for count for DataFrame - I'd change this to a new rdname, `count` and put "count" and "n" under that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74016459 --- Diff: R/pkg/R/generics.R --- @@ -1091,8 +1129,8 @@ setGeneric("reverse", function(x) { standardGeneric("reverse") }) #' @export setGeneric("rint", function(x, ...) { standardGeneric("rint") }) -#' @rdname row_number -#' @export +# @rdname row_number +# @export --- End diff -- `#'` changed to `#`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74016421 --- Diff: R/pkg/R/generics.R --- @@ -1046,8 +1084,8 @@ setGeneric("ntile", function(x) { standardGeneric("ntile") }) #' @export setGeneric("n_distinct", function(x, ...) { standardGeneric("n_distinct") }) -#' @rdname percent_rank -#' @export +# @rdname percent_rank --- End diff -- `#'` changed to `#`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74016338 --- Diff: R/pkg/R/mllib.R --- @@ -57,6 +57,9 @@ setClass("KMeansModel", representation(jobj = "jobj")) #' #' Saves the MLlib model to the input path. For more information, see the specific #' MLlib model below. +#' @param object a fitted ML model object. +#' @param path the directory where the model is saved. +#' @param ... additional argument(s) passed to the method. --- End diff -- does it complain about this? This rd does not have a function signature so it shouldn't ask to document parameter? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74016268 --- Diff: R/pkg/R/mllib.R --- @@ -69,6 +72,8 @@ NULL #' #' Makes predictions from a MLlib model. For more information, see the specific #' MLlib model below. +#' @param object a fitted ML model object. +#' @param ... additional argument(s) passed to the method. --- End diff -- does it complain about this? This rd does not have a function signature so it shouldn't ask to document parameter? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74016085 --- Diff: R/pkg/R/mllib.R --- @@ -82,15 +87,16 @@ NULL #' Users can call \code{summary} to print a summary of the fitted model, \code{predict} to make #' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. #' -#' @param data SparkDataFrame for training. -#' @param formula A symbolic description of the model to be fitted. Currently only a few formula +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula #'operators are supported, including '~', '.', ':', '+', and '-'. -#' @param family A description of the error distribution and link function to be used in the model. +#' @param family a description of the error distribution and link function to be used in the model. #' This can be a character string naming a family function, a family function or #' the result of a call to a family function. Refer R family at #' \url{https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html}. -#' @param tol Positive convergence tolerance of iterations. -#' @param maxIter Integer giving the maximal number of IRLS iterations. +#' @param tol positive convergence tolerance of iterations. +#' @param maxIter integer giving the maximal number of IRLS iterations. +#' @param ... additional arguments passed to the method. --- End diff -- there is no `...` here in the signature? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74015938 --- Diff: R/pkg/R/mllib.R --- @@ -298,14 +304,15 @@ setMethod("summary", signature(object = "NaiveBayesModel"), #' Users can call \code{summary} to print a summary of the fitted model, \code{predict} to make #' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. #' -#' @param data SparkDataFrame for training -#' @param formula A symbolic description of the model to be fitted. Currently only a few formula +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula #'operators are supported, including '~', '.', ':', '+', and '-'. #'Note that the response variable of formula is empty in spark.kmeans. -#' @param k Number of centers -#' @param maxIter Maximum iteration number -#' @param initMode The initialization algorithm choosen to fit the model -#' @return \code{spark.kmeans} returns a fitted k-means model +#' @param ... additional argument(s) passed to the method. --- End diff -- there is no `...` here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74015800 --- Diff: R/pkg/R/mllib.R --- @@ -346,8 +353,11 @@ setMethod("spark.kmeans", signature(data = "SparkDataFrame", formula = "formula" #' Get fitted result from a k-means model, similarly to R's fitted(). #' Note: A saved-loaded model does not support this method. #' -#' @param object A fitted k-means model -#' @return \code{fitted} returns a SparkDataFrame containing fitted values +#' @param object a fitted k-means model. +#' @param method type of fitted results, `"centers"` for cluster centers --- End diff -- I wouldn't put it in ` and " - roxygen2 doesn't really handle ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74014809 --- Diff: R/pkg/R/mllib.R --- @@ -563,11 +574,12 @@ read.ml <- function(path) { #' \code{predict} to make predictions on new data, and \code{write.ml}/\code{read.ml} to #' save/load fitted models. #' -#' @param data A SparkDataFrame for training -#' @param formula A symbolic description of the model to be fitted. Currently only a few formula +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula #'operators are supported, including '~', ':', '+', and '-'. #'Note that operator '.' is not supported currently -#' @return \code{spark.survreg} returns a fitted AFT survival regression model +#' @param ... additional argument(s) passed to the method. --- End diff -- or document as `Currently not used.` like http://ugrad.stat.ubc.ca/R/library/e1071/html/predict.naiveBayes.html --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74014676 --- Diff: R/pkg/R/mllib.R --- @@ -414,11 +425,12 @@ setMethod("predict", signature(object = "KMeansModel"), #' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. #' Only categorical data is supported. #' -#' @param data A \code{SparkDataFrame} of observations and labels for model fitting -#' @param formula A symbolic description of the model to be fitted. Currently only a few formula +#' @param data a \code{SparkDataFrame} of observations and labels for model fitting. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula #' operators are supported, including '~', '.', ':', '+', and '-'. -#' @param smoothing Smoothing parameter -#' @return \code{spark.naiveBayes} returns a fitted naive Bayes model +#' @param smoothing smoothing parameter. +#' @param ... additional parameter(s) passed to the method. --- End diff -- same here - `...` are unused --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74014544 --- Diff: R/pkg/R/mllib.R --- @@ -563,11 +574,12 @@ read.ml <- function(path) { #' \code{predict} to make predictions on new data, and \code{write.ml}/\code{read.ml} to #' save/load fitted models. #' -#' @param data A SparkDataFrame for training -#' @param formula A symbolic description of the model to be fitted. Currently only a few formula +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula #'operators are supported, including '~', ':', '+', and '-'. #'Note that operator '.' is not supported currently -#' @return \code{spark.survreg} returns a fitted AFT survival regression model +#' @param ... additional argument(s) passed to the method. --- End diff -- there are a few cases where they are not clear why `...` should be in the function signature. I think we should remove them since they are not used --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74013650 --- Diff: R/pkg/R/sparkR.R --- @@ -328,6 +328,7 @@ sparkRHive.init <- function(jsc = NULL) { #' @param sparkPackages Character vector of packages from spark-packages.org #' @param enableHiveSupport Enable support for Hive, fallback if not built with Hive support; once #'set, this cannot be turned off on an existing session +#' @param ... additional parameters passed to the method --- End diff -- I'd clarify as in L 317, for example, "named Spark properties passed to the method" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74013042 --- Diff: R/pkg/R/generics.R --- @@ -465,10 +477,14 @@ setGeneric("dapply", function(x, func, schema) { standardGeneric("dapply") }) #' @export setGeneric("dapplyCollect", function(x, func) { standardGeneric("dapplyCollect") }) +#' @param x a SparkDataFrame or GroupedData. +#' @param ... additional argument(s) passed to the method. #' @rdname gapply #' @export setGeneric("gapply", function(x, ...) { standardGeneric("gapply") }) +#' @param x a SparkDataFrame or GroupedData. --- End diff -- same here for gapplyCollect --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74013005 --- Diff: R/pkg/R/generics.R --- @@ -465,10 +477,14 @@ setGeneric("dapply", function(x, func, schema) { standardGeneric("dapply") }) #' @export setGeneric("dapplyCollect", function(x, func) { standardGeneric("dapplyCollect") }) +#' @param x a SparkDataFrame or GroupedData. --- End diff -- gapply is only for GroupedData? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14558#discussion_r74012847 --- Diff: R/pkg/R/generics.R --- @@ -395,6 +396,9 @@ setGeneric("value", function(bcast) { standardGeneric("value") }) SparkDataFrame Methods +#' @param x a SparkDataFrame or GroupedData. --- End diff -- Hmm.. I see why this would be a place for it. I think it would be easier to maintain if the documentation is next to the function body instead of the generics, but I haven't completely figure the best way to do yet. The approach we have so far is to keep most of the tag/doc on one of the definition - do you think it would work here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
GitHub user junyangq opened a pull request: https://github.com/apache/spark/pull/14558 [SPARK-16508][SparkR] Fix warnings on undocumented/duplicated arguments by CRAN-check ## What changes were proposed in this pull request? This PR tries to fix all the "undocumented/duplicated arguments" warnings given by CRAN-check. Most have been resolved, with only a few left that will be handled soon. ## How was this patch tested? R unit test and check-cran.sh script. You can merge this pull request into a Git repository by running: $ git pull https://github.com/junyangq/spark SPARK-16508-branch-2.0 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14558.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14558 commit 82e2f09517e9f3d726af0046d251748f892f59c8 Author: Junyang QianDate: 2016-08-09T04:52:34Z Fix part of undocumented/duplicated arguments warnings by CRAN-check --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org