[GitHub] spark issue #14182: [SPARK-16444][SparkR]: Isotonic Regression wrapper in Sp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14182 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14182: [SPARK-16444][SparkR]: Isotonic Regression wrapper in Sp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14182 **[Test build #63663 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63663/consoleFull)** for PR 14182 at commit [`edd1ce0`](https://github.com/apache/spark/commit/edd1ce05275447ceff298d4640f8da988d73184f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14392: [SPARK-16446] [SparkR] [ML] Gaussian Mixture Model wrapp...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14392 Yeah I am not sure `mvnormalmixEM` is very descriptive. @junyangq Any opinions on the name here ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14431: [SPARK-16258][SparkR] Automatically append the grouping ...
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/14431 yes, @shivaram , that will be one way to do. Basically, adding a new public function to `RelationalGroupedDataset` which will return the column names. If it is fine from SQL perspective, maybe I can make a separate pull request for that ? cc: @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14559: [SPARK-16968]Add additional options in jdbc when ...
Github user GraceH commented on a diff in the pull request: https://github.com/apache/spark/pull/14559#discussion_r74542628 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala --- @@ -20,14 +20,21 @@ package org.apache.spark.sql.execution.datasources.jdbc /** * Options for the JDBC data source. */ -private[jdbc] class JDBCOptions( +private[sql] class JDBCOptions( --- End diff -- OK. Just intend to follow that origin style. I will fix that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14559: [SPARK-16968]Add additional options in jdbc when creatin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14559 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63662/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14559: [SPARK-16968]Add additional options in jdbc when creatin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14559 **[Test build #63662 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63662/consoleFull)** for PR 14559 at commit [`4fb5e55`](https://github.com/apache/spark/commit/4fb5e55a50531abf255169c275ad2ad2cf2d71f2). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14559: [SPARK-16968]Add additional options in jdbc when creatin...
Github user GraceH commented on the issue: https://github.com/apache/spark/pull/14559 Thanks all. I have added the unit test in JDBCWriterSuite. Any further comment, please feel free to let me know. BTW, or we can point the user to check JDBCOptions for further configuration information. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14559: [SPARK-16968]Add additional options in jdbc when creatin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14559 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14558: [SPARK-16508][SparkR] Fix warnings on undocumented/dupli...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14558 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63654/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14558: [SPARK-16508][SparkR] Fix warnings on undocumented/dupli...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14558 **[Test build #63654 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63654/consoleFull)** for PR 14558 at commit [`d2c1d64`](https://github.com/apache/spark/commit/d2c1d641ef05692f629ef7cefa0b2b3131ba3475). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/14229 @felixcheung I add some aliases for spark.lda related functions. However, I am not quite understand it. From [here](https://cran.r-project.org/web/packages/roxygen2/vignettes/rd.html) I can see that *When you use ?x, help("x") or example("x") R looks for an Rd file containing \alias{x}. It then parses the file, converts it into html and displays it.* But when I using `?GroupedData-method`, sparkr-shell cannot find related topics. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14571: [SPARK-16983][SQL] Add `prettyName` for row_number, dens...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14571 Hi, @rxin . I added test files for window functions for SQLQueryTestSuite and removed the old `WindowQuerySuite.scala`. Could you review this again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14613: [SPARK-16883][SparkR]:SQL decimal type is not properly c...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/14613 @shivaram Sure. I will add unit tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution in CTE by ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14452 **[Test build #63658 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63658/consoleFull)** for PR 14452 at commit [`bdb6e84`](https://github.com/apache/spark/commit/bdb6e843ea5e488b0004a82fbba5ec6862c983a1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14617 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63656/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14617 **[Test build #63656 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63656/consoleFull)** for PR 14617 at commit [`a5e9d46`](https://github.com/apache/spark/commit/a5e9d46aedf6de47f1e93de0afd8b5d913f2f36e). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class SparkListenerBlockManagerAdded(` * `class StorageStatus(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution in CTE by ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14452 @gatorsmile I've made some changes. I will update this soon. The optimized plan for the query is: Join Inner :- Join Inner : :- CommonSubquery [a#226, b#227, a#245, b#246] : : : +- BroadcastNestedLoopJoin BuildRight, Inner, true : : : :- LocalTableScan [a#226, b#227] : : : +- BroadcastExchange IdentityBroadcastMode : : :+- LocalTableScan [a#245, b#246] : +- CommonSubquery [a#247, b#248, a#251, b#252] : : +- BroadcastNestedLoopJoin BuildRight, Inner, true : : :- LocalTableScan [a#226, b#227] : : +- BroadcastExchange IdentityBroadcastMode : :+- LocalTableScan [a#245, b#246] +- CommonSubquery [a#253, b#254, a#257, b#258] : +- BroadcastNestedLoopJoin BuildRight, Inner, true : :- LocalTableScan [a#226, b#227] : +- BroadcastExchange IdentityBroadcastMode :+- LocalTableScan [a#245, b#246] --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14617 **[Test build #63656 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63656/consoleFull)** for PR 14617 at commit [`a5e9d46`](https://github.com/apache/spark/commit/a5e9d46aedf6de47f1e93de0afd8b5d913f2f36e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14608: [SPARK-17013][SQL] Parse negative numeric literals
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14608 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63652/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14608: [SPARK-17013][SQL] Parse negative numeric literals
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14608 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14608: [SPARK-17013][SQL] Parse negative numeric literals
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14608 **[Test build #63652 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63652/consoleFull)** for PR 14608 at commit [`908253b`](https://github.com/apache/spark/commit/908253b92f87c823b7104f7b0df6f8ae6b4fd814). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r74536489 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,159 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' Additional arguments can be passed to the methods. +#' \describe{ +#'\item{nonnegative}{logical value indicating whether to apply nonnegativity constraints. +#' Default: FALSE} +#'\item{implicitPrefs}{logical value indicating whether to use implicit preference. +#' Default: FALSE} +#'\item{alpha}{alpha parameter in the implicit preference formulation (>= 0). Default: 1.0} +#'\item{seed}{integer seed for random number generation. Default: 0} +#'\item{numUserBlocks}{number of user blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{numItemBlocks}{number of item blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or disable checkpoint (-1). +#' Default: 10} +#'} +#' +#' @param data A SparkDataFrame for training +#' @param ratingCol column name for ratings +#' @param userCol column name for user ids. Ids must be (or can be coerced into) integers +#' @param itemCol column name for item ids. Ids must be (or can be coerced into) integers +#' @param rank rank of the matrix factorization (> 0) +#' @param reg regularization parameter (>= 0) +#' @param maxIter maximum number of iterations (>= 0) + +#' @return \code{spark.als} returns a fitted ALS model +#' @rdname spark.als +#' @aliases spark.als,SparkDataFrame +#' @name spark.als +#' @export +#' @examples +#' \dontrun{ +#' ratings <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), list(1, 2, 4.0), +#' list(2, 1, 1.0), list(2, 2, 5.0)) +#' df <- createDataFrame(ratings, c("user", "item", "rating")) +#' model <- spark.als(df, "rating", "user", "item") +#' +#' # extract latent factors +#' stats <- summary(model) +#' userFactors <- stats$userFactors +#' itemFactors <- stats$itemFactors +#' +#' # make predictions +#' predicted <- predict(model, df) +#' showDF(predicted) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' +#' # set other arguments +#' modelS <- spark.als(df, "rating", "user", "item", rank = 20, +#' reg = 0.1, nonnegative = TRUE) +#' statsS <- summary(modelS) +#' } +#' @note spark.als since 2.1.0 +setMethod("spark.als", signature(data = "SparkDataFrame"), + function(data, ratingCol = "rating", userCol = "user", itemCol = "item", + rank = 10, reg = 1.0, maxIter = 10, ...) { + +if (!is.numeric(rank) || rank <= 0) { + stop("rank should be a positive number.") +} +if (!is.numeric(reg) || reg < 0) { + stop("reg should be a nonnegative number.") +} +if (!is.numeric(maxIter) || maxIter <= 0) { + stop("maxIter should be a positive number.") +} + +`%||%` <- function(a, b) if (!is.null(a)) a else b + +args <- list(...) +numUserBlocks <- args$numUserBlocks %||% 10 +numItemBlocks <- args$numItemBlocks %||% 10 +implicitPrefs <- args$implicitPrefs %||% FALSE +alpha <- args$alpha %||% 1.0 +nonnegative <- args$nonnegative %||% FALSE +checkpointInterval <- args$checkpointInterval %||% 10 +seed <- args$seed %||% 0 + +features <- array(c(ratingCol, userCol, itemCol)) +distParams <- array(as.integer(c(numUserBlocks, numItemBlocks, + checkpointInterval, seed))) + +jobj
[GitHub] spark pull request #14102: [SPARK-16434][SQL] Avoid per-record type dispatch...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14102 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14551: [SPARK-16961][CORE] Fixed off-by-one error that biased r...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/14551 @nicklavers Please also change seed for ```GaussianMixture``` doctest in ```python/pyspark/ml/clustering.py```. And check whether we need to change seed for ```KMeans``` doctest. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13146: [SPARK-13081][PYSPARK][SPARK_SUBMIT]. Allow set p...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13146 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14616: [SPARK-16955][SQL] Fix analysis error when using ordinal...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14616 **[Test build #63655 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63655/consoleFull)** for PR 14616 at commit [`4087365`](https://github.com/apache/spark/commit/40873650c7397a339210092f616c15aedbf13b17). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13409: [SPARK-15667][SQL]Throw exception if columns number of o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13409 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14616: [SPARK-16955][SQL] Fix analysis error when using ...
GitHub user clockfly opened a pull request: https://github.com/apache/spark/pull/14616 [SPARK-16955][SQL] Fix analysis error when using ordinal in ORDER BY or GROUP BY ## What changes were proposed in this pull request? This PR adds two unresolved expressions to represent the ordinal in GROUP BY or ORDER BY `GroupByOrdinal` and `OrderByOrdinal`, and fixes the rules when resolving ordinals. Ordinals in GROUP BY or ORDER BY like `1` in `order by 1` or `group by 1` should be considered as unresolved expressions before analysis. But in current code, it is represented as a `Literal` expression directly, which is a resolved expression. It may cause analysis failure if a rule requires the ordinal to be resolved before applying. **For example:** Before this fix, rule `ResolveAggregateFunctions` will try to resolve the `Filter` before `Filter`'s child `Aggregate` is fully resolved (`Aggregate` contains an unresolved group by ordinal `2`) ``` 'Filter ('a > 0) +- Aggregate [2], [count(1) AS count(1)#83L, a#81] +- SubqueryAlias tmp +- Project [1 AS a#81] +- OneRowRelation$ ``` ### Before this change Ordinal is stored as `Literal` expression ``` scala> sc.setLogLevel("TRACE") scala> sql("select a from t group by 1 order by 1") ... 'Sort [1 ASC], true +- 'Aggregate [1], ['a] +- 'UnresolvedRelation `t ``` And it causes analysis error when applying rule ResolveAggregateFunctions, as group by ordinal `2` claim to have been resolved, but is not resolved actually. ``` scala> Seq(1).toDF("a").createOrReplaceTempView("t") scala> sql("select count(a), a from t group by 2 having a > 0").show org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to Group by position: '2' exceeds the size of the select list '1'. on unresolved object, tree: Aggregate [2], [(a#9 > 0) AS havingCondition#15] +- SubqueryAlias t +- Project [value#7 AS a#9] +- LocalRelation [value#7] ... ``` ### After this change Ordinals are stored as `GroupByOrdinal` or `OrderByOrdinal`. ``` scala> sc.setLogLevel("TRACE") scala> sql("select a from t group by 1 order by 1") ... 'Sort [orderbyordinal(1) ASC], true +- 'Aggregate [groupbyordinal(1)], ['a] +- 'UnresolvedRelation `t` ``` And rule ResolveAggregateFunctions can be safely applied as we have explicitly resolved `GroupByOrdinal(2)` before applying this rule. ``` scala> Seq(1).toDF("a").createOrReplaceTempView("t") scala> sql("select count(a), a from t group by 2 having a > 0").show ++---+ |count(a)| a| ++---+ | 1| 1| ++---+ ``` ## How was this patch tested? Unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/clockfly/spark spark-16955 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14616.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14616 commit 40873650c7397a339210092f616c15aedbf13b17 Author: Sean ZhongDate: 2016-08-08T21:40:53Z [SPARK-16955][SQL] Fix analysis error when using ordinal in ORDER BY or GROUP BY --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14558: [SPARK-16508][SparkR] Fix warnings on undocumented/dupli...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14558 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14609: [MINOR][Core] fix warnings on depreciated methods in Mes...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14609 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63650/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14530: [SPARK-16868][Web Ui] Fix executor be both dead and aliv...
Github user SaintBacchus commented on the issue: https://github.com/apache/spark/pull/14530 I will re-run this case, and dig into why the executor will double register. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14607: [SPARK-16905] SQL DDL: MSCK REPAIR TABLE (follow-up)
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14607 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14607: [SPARK-16905] SQL DDL: MSCK REPAIR TABLE (follow-up)
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14607 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63646/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14433: [SPARK-16829][SparkR]:sparkR sc.setLogLevel doesn't work
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/14433 Can we have this like [SparkSubmitAction](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L55) that extends `Enumeration`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14586: [SPARK-17003] [BUILD] [BRANCH-1.6] release-build.sh is m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14586 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptron Class...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/14447 so there are a few competing implementation in R and `mlp` might not be a super relevant name. @shivaram @mengxr any thought on `spark.mlp` here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptro...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14447#discussion_r74530136 --- Diff: R/pkg/R/mllib.R --- @@ -533,6 +626,26 @@ setMethod("write.ml", signature(object = "KMeansModel", path = "character"), invisible(callJMethod(writer, "save", path)) }) +# Saves the Multilayer Perceptron Classification Model to the input path. + +#' @param path The directory where the model is saved +#' @param overwrite Overwrites or not if the output path already exists. Default is FALSE +#' which means throw exception if the output path exists. +#' +#' @rdname spark.mlp +#' @export +#' @seealso \link{write.ml} +#' @note write.ml(MultilayerPerceptronClassificationModel, character) since 2.0.0 --- End diff -- since 2.1.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptro...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14447#discussion_r74530039 --- Diff: R/pkg/R/mllib.R --- @@ -414,6 +421,92 @@ setMethod("predict", signature(object = "KMeansModel"), return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) +#' Multilayer Perceptron Classification Model +#' +#' \code{spark.mlp} fits a multi-layer perceptron neural network model against a SparkDataFrame. +#' Users can call \code{summary} to print a summary of the fitted model, \code{predict} to make +#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' Only categorical data is supported. +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html +#' #multilayer-perceptron-classifier}{Multilayerperceptron classifier}. +#' +#' @param data A \code{SparkDataFrame} of observations and labels for model fitting +#' @param blockSize BlockSize parameter +#' @param layers Layers parameter +#' @param solver Solver parameter, supported options: "gd" (minibatch gradient descent) or "l-bfgs" +#' @param maxIter Maximum iteration number +#' @param tol Convergence tolerance of iterations +#' @param stepSize StepSize parameter +#' @param seed Seed parameter for weights initialization +#' @return \code{spark.mlp} returns a fitted Multilayer Perceptron Classification Model +#' @rdname spark.mlp +#' @aliases spark.mlp,SparkDataFrame,formula-method +#' @name spark.mlp +#' @seealso \link{read.ml} +#' @export +#' @examples +#' \dontrun{ +#' df <- read.df("data/mllib/sample_multiclass_classification_data.txt", source = "libsvm") +#' +#' # fit a Multilayer Perceptron Classification Model +#' model <- spark.mlp(df, blockSize = 128, layers = c(4, 5, 4, 3), solver = "l-bfgs", +#'maxIter = 100, tol = 0.5, stepSize = 1, seed = 1) +#' +#' # get the summary of the model +#' summary(model) +#' +#' # make predictions +#' predictions <- predict(model, df) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' } +#' @note spark.mlp since 2.1.0 +setMethod("spark.mlp", signature(data = "SparkDataFrame"), + function(data, blockSize = 128, layers = c(3, 5, 2), solver = "l-bfgs", maxIter = 100, + tol = 0.5, stepSize = 1, seed = 1, ...) { +jobj <- callJStatic("org.apache.spark.ml.r.MultilayerPerceptronClassifierWrapper", +"fit", data@sdf, as.integer(blockSize), as.array(layers), +solver, as.integer(maxIter), tol, stepSize, as.integer(seed)) +return(new("MultilayerPerceptronClassificationModel", jobj = jobj)) + }) + +# Makes predictions from a model produced by spark.mlp(). + +#' @param newData A SparkDataFrame for testing +#' @return \code{predict} returns a SparkDataFrame containing predicted labeled in a column named +#' "prediction" +#' @rdname spark.mlp +#' @export +#' @note predict(MultilayerPerceptronClassificationModel) since 2.0.0 --- End diff -- please add @aliases for each function introduced in this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptro...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14447#discussion_r74530043 --- Diff: R/pkg/R/mllib.R --- @@ -414,6 +421,92 @@ setMethod("predict", signature(object = "KMeansModel"), return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) +#' Multilayer Perceptron Classification Model +#' +#' \code{spark.mlp} fits a multi-layer perceptron neural network model against a SparkDataFrame. +#' Users can call \code{summary} to print a summary of the fitted model, \code{predict} to make +#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' Only categorical data is supported. +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html +#' #multilayer-perceptron-classifier}{Multilayerperceptron classifier}. +#' +#' @param data A \code{SparkDataFrame} of observations and labels for model fitting +#' @param blockSize BlockSize parameter +#' @param layers Layers parameter +#' @param solver Solver parameter, supported options: "gd" (minibatch gradient descent) or "l-bfgs" +#' @param maxIter Maximum iteration number +#' @param tol Convergence tolerance of iterations +#' @param stepSize StepSize parameter +#' @param seed Seed parameter for weights initialization +#' @return \code{spark.mlp} returns a fitted Multilayer Perceptron Classification Model +#' @rdname spark.mlp +#' @aliases spark.mlp,SparkDataFrame,formula-method +#' @name spark.mlp +#' @seealso \link{read.ml} +#' @export +#' @examples +#' \dontrun{ +#' df <- read.df("data/mllib/sample_multiclass_classification_data.txt", source = "libsvm") +#' +#' # fit a Multilayer Perceptron Classification Model +#' model <- spark.mlp(df, blockSize = 128, layers = c(4, 5, 4, 3), solver = "l-bfgs", +#'maxIter = 100, tol = 0.5, stepSize = 1, seed = 1) +#' +#' # get the summary of the model +#' summary(model) +#' +#' # make predictions +#' predictions <- predict(model, df) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' } +#' @note spark.mlp since 2.1.0 +setMethod("spark.mlp", signature(data = "SparkDataFrame"), + function(data, blockSize = 128, layers = c(3, 5, 2), solver = "l-bfgs", maxIter = 100, + tol = 0.5, stepSize = 1, seed = 1, ...) { +jobj <- callJStatic("org.apache.spark.ml.r.MultilayerPerceptronClassifierWrapper", +"fit", data@sdf, as.integer(blockSize), as.array(layers), +solver, as.integer(maxIter), tol, stepSize, as.integer(seed)) +return(new("MultilayerPerceptronClassificationModel", jobj = jobj)) + }) + +# Makes predictions from a model produced by spark.mlp(). + +#' @param newData A SparkDataFrame for testing +#' @return \code{predict} returns a SparkDataFrame containing predicted labeled in a column named +#' "prediction" +#' @rdname spark.mlp +#' @export +#' @note predict(MultilayerPerceptronClassificationModel) since 2.0.0 +setMethod("predict", signature(object = "MultilayerPerceptronClassificationModel"), + function(object, newData) { +return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) + }) + +# Returns the summary of a Multilayer Perceptron Classification Model produced by \code{spark.mlp} + +#' @param object A Multilayer Perceptron Classification Model fitted by \code{spark.mlp} +#' @return \code{summary} returns a list containing \code{layers}, the label distribution, and +#' \code{tables}, conditional probabilities given the target label +#' @rdname spark.mlp +#' @export +#' @note summary(MultilayerPerceptronClassificationModel) since 2.0.0 --- End diff -- please add @aliases --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptro...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14447#discussion_r74529955 --- Diff: R/pkg/R/mllib.R --- @@ -487,7 +580,7 @@ setMethod("write.ml", signature(object = "NaiveBayesModel", path = "character"), #' @rdname spark.survreg #' @export #' @note write.ml(AFTSurvivalRegressionModel, character) since 2.0.0 -#' @seealso \link{read.ml} +#' @seealso \link{write.ml} --- End diff -- same here, this was intentional to link to `read.ml` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptro...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14447#discussion_r74529891 --- Diff: R/pkg/R/mllib.R --- @@ -414,6 +421,92 @@ setMethod("predict", signature(object = "KMeansModel"), return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) +#' Multilayer Perceptron Classification Model +#' +#' \code{spark.mlp} fits a multi-layer perceptron neural network model against a SparkDataFrame. +#' Users can call \code{summary} to print a summary of the fitted model, \code{predict} to make +#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' Only categorical data is supported. +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html +#' #multilayer-perceptron-classifier}{Multilayerperceptron classifier}. +#' +#' @param data A \code{SparkDataFrame} of observations and labels for model fitting +#' @param blockSize BlockSize parameter +#' @param layers Layers parameter +#' @param solver Solver parameter, supported options: "gd" (minibatch gradient descent) or "l-bfgs" +#' @param maxIter Maximum iteration number +#' @param tol Convergence tolerance of iterations +#' @param stepSize StepSize parameter +#' @param seed Seed parameter for weights initialization +#' @return \code{spark.mlp} returns a fitted Multilayer Perceptron Classification Model +#' @rdname spark.mlp +#' @aliases spark.mlp,SparkDataFrame,formula-method +#' @name spark.mlp +#' @seealso \link{read.ml} +#' @export +#' @examples +#' \dontrun{ +#' df <- read.df("data/mllib/sample_multiclass_classification_data.txt", source = "libsvm") +#' +#' # fit a Multilayer Perceptron Classification Model +#' model <- spark.mlp(df, blockSize = 128, layers = c(4, 5, 4, 3), solver = "l-bfgs", +#'maxIter = 100, tol = 0.5, stepSize = 1, seed = 1) +#' +#' # get the summary of the model +#' summary(model) +#' +#' # make predictions +#' predictions <- predict(model, df) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' } +#' @note spark.mlp since 2.1.0 +setMethod("spark.mlp", signature(data = "SparkDataFrame"), + function(data, blockSize = 128, layers = c(3, 5, 2), solver = "l-bfgs", maxIter = 100, + tol = 0.5, stepSize = 1, seed = 1, ...) { +jobj <- callJStatic("org.apache.spark.ml.r.MultilayerPerceptronClassifierWrapper", +"fit", data@sdf, as.integer(blockSize), as.array(layers), +solver, as.integer(maxIter), tol, stepSize, as.integer(seed)) +return(new("MultilayerPerceptronClassificationModel", jobj = jobj)) + }) + +# Makes predictions from a model produced by spark.mlp(). + +#' @param newData A SparkDataFrame for testing +#' @return \code{predict} returns a SparkDataFrame containing predicted labeled in a column named +#' "prediction" +#' @rdname spark.mlp +#' @export +#' @note predict(MultilayerPerceptronClassificationModel) since 2.0.0 +setMethod("predict", signature(object = "MultilayerPerceptronClassificationModel"), + function(object, newData) { +return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) + }) + +# Returns the summary of a Multilayer Perceptron Classification Model produced by \code{spark.mlp} + +#' @param object A Multilayer Perceptron Classification Model fitted by \code{spark.mlp} +#' @return \code{summary} returns a list containing \code{layers}, the label distribution, and +#' \code{tables}, conditional probabilities given the target label +#' @rdname spark.mlp +#' @export +#' @note summary(MultilayerPerceptronClassificationModel) since 2.0.0 --- End diff -- since 2.1.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptro...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14447#discussion_r74529897 --- Diff: R/pkg/R/mllib.R --- @@ -414,6 +421,92 @@ setMethod("predict", signature(object = "KMeansModel"), return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) +#' Multilayer Perceptron Classification Model +#' +#' \code{spark.mlp} fits a multi-layer perceptron neural network model against a SparkDataFrame. +#' Users can call \code{summary} to print a summary of the fitted model, \code{predict} to make +#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' Only categorical data is supported. +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html +#' #multilayer-perceptron-classifier}{Multilayerperceptron classifier}. +#' +#' @param data A \code{SparkDataFrame} of observations and labels for model fitting +#' @param blockSize BlockSize parameter +#' @param layers Layers parameter +#' @param solver Solver parameter, supported options: "gd" (minibatch gradient descent) or "l-bfgs" +#' @param maxIter Maximum iteration number +#' @param tol Convergence tolerance of iterations +#' @param stepSize StepSize parameter +#' @param seed Seed parameter for weights initialization +#' @return \code{spark.mlp} returns a fitted Multilayer Perceptron Classification Model +#' @rdname spark.mlp +#' @aliases spark.mlp,SparkDataFrame,formula-method +#' @name spark.mlp +#' @seealso \link{read.ml} +#' @export +#' @examples +#' \dontrun{ +#' df <- read.df("data/mllib/sample_multiclass_classification_data.txt", source = "libsvm") +#' +#' # fit a Multilayer Perceptron Classification Model +#' model <- spark.mlp(df, blockSize = 128, layers = c(4, 5, 4, 3), solver = "l-bfgs", +#'maxIter = 100, tol = 0.5, stepSize = 1, seed = 1) +#' +#' # get the summary of the model +#' summary(model) +#' +#' # make predictions +#' predictions <- predict(model, df) +#' +#' # save and load the model +#' path <- "path/to/model" +#' write.ml(model, path) +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' } +#' @note spark.mlp since 2.1.0 +setMethod("spark.mlp", signature(data = "SparkDataFrame"), + function(data, blockSize = 128, layers = c(3, 5, 2), solver = "l-bfgs", maxIter = 100, + tol = 0.5, stepSize = 1, seed = 1, ...) { +jobj <- callJStatic("org.apache.spark.ml.r.MultilayerPerceptronClassifierWrapper", +"fit", data@sdf, as.integer(blockSize), as.array(layers), +solver, as.integer(maxIter), tol, stepSize, as.integer(seed)) +return(new("MultilayerPerceptronClassificationModel", jobj = jobj)) + }) + +# Makes predictions from a model produced by spark.mlp(). + +#' @param newData A SparkDataFrame for testing +#' @return \code{predict} returns a SparkDataFrame containing predicted labeled in a column named +#' "prediction" +#' @rdname spark.mlp +#' @export +#' @note predict(MultilayerPerceptronClassificationModel) since 2.0.0 --- End diff -- since 2.1.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptro...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14447#discussion_r74529661 --- Diff: R/pkg/R/mllib.R --- @@ -414,6 +421,92 @@ setMethod("predict", signature(object = "KMeansModel"), return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) +#' Multilayer Perceptron Classification Model +#' +#' \code{spark.mlp} fits a multi-layer perceptron neural network model against a SparkDataFrame. +#' Users can call \code{summary} to print a summary of the fitted model, \code{predict} to make +#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' Only categorical data is supported. +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html +#' #multilayer-perceptron-classifier}{Multilayerperceptron classifier}. +#' +#' @param data A \code{SparkDataFrame} of observations and labels for model fitting +#' @param blockSize BlockSize parameter +#' @param layers Layers parameter --- End diff -- something like "integer vector containing the number of nodes for each layer" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14612: [SPARK-16803] [SQL] SaveAsTable does not work when sourc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14612 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63645/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14522: [Spark-16508][SparkR] Split docs for arrange and ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14522#discussion_r74529416 --- Diff: R/pkg/R/DataFrame.R --- @@ -2121,7 +2121,7 @@ setMethod("arrange", }) #' @rdname arrange -#' @name orderBy --- End diff -- hmm, so you are saying having a @name is ok? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptro...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14447#discussion_r74529446 --- Diff: R/pkg/R/mllib.R --- @@ -53,6 +53,13 @@ setClass("AFTSurvivalRegressionModel", representation(jobj = "jobj")) #' @note KMeansModel since 2.0.0 setClass("KMeansModel", representation(jobj = "jobj")) +#' S4 class that represents a MultilayerPerceptronClassificationModel +#' +#' @param jobj a Java object reference to the backing Scala MultilayerPerceptronClassifierWrapper +#' @export +#' @note MultilayerPerceptronClassificationModel since 2.0.0 --- End diff -- since 2.1.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14612: [SPARK-16803] [SQL] SaveAsTable does not work when sourc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14612 **[Test build #63645 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63645/consoleFull)** for PR 14612 at commit [`71399f1`](https://github.com/apache/spark/commit/71399f1ca1e91af2a7d2a12c92d32bb691031c86). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r74527784 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,159 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' Additional arguments can be passed to the methods. +#' \describe{ +#'\item{nonnegative}{logical value indicating whether to apply nonnegativity constraints. +#' Default: FALSE} +#'\item{implicitPrefs}{logical value indicating whether to use implicit preference. +#' Default: FALSE} +#'\item{alpha}{alpha parameter in the implicit preference formulation (>= 0). Default: 1.0} +#'\item{seed}{integer seed for random number generation. Default: 0} +#'\item{numUserBlocks}{number of user blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{numItemBlocks}{number of item blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or disable checkpoint (-1). +#' Default: 10} --- End diff -- Is there a reason we are preferring `...` vs naming these out like `maxIter` in the function definition on L714? if it's well known it's probably better to name them? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14182: [SPARK-16444][SparkR]: Isotonic Regression wrappe...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14182#discussion_r74527541 --- Diff: R/pkg/R/mllib.R --- @@ -292,6 +299,85 @@ setMethod("summary", signature(object = "NaiveBayesModel"), return(list(apriori = apriori, tables = tables)) }) +#' Isotonic Regression Model +#' +#' Fits an Isotonic Regression model against a Spark DataFrame, similarly to R's isoreg(). +#' Users can print, make predictions on the produced model and save the model to the input path. +#' +#' @param data SparkDataFrame for training +#' @param formula A symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', '.', ':', '+', and '-'. +#' @param isotonic Whether the output sequence should be isotonic/increasing (TRUE) or +#' antitonic/decreasing (FALSE) +#' @param featureIndex The index of the feature if \code{featuresCol} is a vector column (default: `0`), +#' no effect otherwise +#' @param weightCol The weight column name. +#' @return \code{spark.isoreg} returns a fitted Isotonic Regression model +#' @rdname spark.isoreg +#' @aliases spark.isoreg,SparkDataFrame,formula-method +#' @name spark.isoreg +#' @export +#' @examples +#' \dontrun{ +#' sparkR.session() +#' data <- list(list(7.0, 0.0), list(5.0, 1.0), list(3.0, 2.0), +#' list(5.0, 3.0), list(1.0, 4.0)) +#' df <- createDataFrame(data, c("label", "feature")) +#' model <- spark.isoreg(df, label ~ feature, isotonic = FALSE) +#' # return model boundaries and prediction as lists --- End diff -- also please add `spark.isoreg` to @seealso of write.ml (around L63), predict like other ML --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14608: [SPARK-17013][SQL] Parse negative numeric literals
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14608 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63644/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14608: [SPARK-17013][SQL] Parse negative numeric literals
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14608 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14608: [SPARK-17013][SQL] Parse negative numeric literals
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14608 **[Test build #63644 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63644/consoleFull)** for PR 14608 at commit [`154abba`](https://github.com/apache/spark/commit/154abba4cff36f352e47927d8b707b1c3fa25668). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14384#discussion_r74527216 --- Diff: R/pkg/R/mllib.R --- @@ -632,3 +642,159 @@ setMethod("predict", signature(object = "AFTSurvivalRegressionModel"), function(object, newData) { return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) + + +#' Alternating Least Squares (ALS) for Collaborative Filtering +#' +#' \code{spark.als} learns latent factors in collaborative filtering via alternating least +#' squares. Users can call \code{summary} to obtain fitted latent factors, \code{predict} +#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' For more details, see +#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib: +#' Collaborative Filtering}. +#' Additional arguments can be passed to the methods. +#' \describe{ +#'\item{nonnegative}{logical value indicating whether to apply nonnegativity constraints. +#' Default: FALSE} +#'\item{implicitPrefs}{logical value indicating whether to use implicit preference. +#' Default: FALSE} +#'\item{alpha}{alpha parameter in the implicit preference formulation (>= 0). Default: 1.0} +#'\item{seed}{integer seed for random number generation. Default: 0} +#'\item{numUserBlocks}{number of user blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{numItemBlocks}{number of item blocks used to parallelize computation (> 0). +#' Default: 10} +#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or disable checkpoint (-1). +#' Default: 10} +#'} +#' +#' @param data A SparkDataFrame for training +#' @param ratingCol column name for ratings +#' @param userCol column name for user ids. Ids must be (or can be coerced into) integers +#' @param itemCol column name for item ids. Ids must be (or can be coerced into) integers +#' @param rank rank of the matrix factorization (> 0) +#' @param reg regularization parameter (>= 0) +#' @param maxIter maximum number of iterations (>= 0) --- End diff -- please add documentation for `...` as for example `@param ... additional name arguments such as nonnegative` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14614: [SPARK-17027][ML] Avoid integer overflow in PolynomialEx...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14614 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14614: [SPARK-17027][ML] Avoid integer overflow in PolynomialEx...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14614 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63649/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14609: [MINOR][Core] fix warnings on depreciated methods in Mes...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14609 **[Test build #63650 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63650/consoleFull)** for PR 14609 at commit [`75b6c22`](https://github.com/apache/spark/commit/75b6c2254be381abef667a0fce1d47a6f8b40cf5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHEMA
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14116 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHEMA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14116 **[Test build #63641 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63641/consoleFull)** for PR 14116 at commit [`dc5d1dc`](https://github.com/apache/spark/commit/dc5d1dc38d3cd92c08bedd2ee5ce6f0937353ca3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/14229 @felixcheung Yes. Sorry I missed the email. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14426 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14426 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63639/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14182: [SPARK-16444][SparkR]: Isotonic Regression wrappe...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14182#discussion_r74526172 --- Diff: R/pkg/R/mllib.R --- @@ -292,6 +299,83 @@ setMethod("summary", signature(object = "NaiveBayesModel"), return(list(apriori = apriori, tables = tables)) }) +#' Isotonic Regression Model +#' +#' Fits an Isotonic Regression model against a Spark DataFrame, similarly to R's isoreg(). +#' Users can print, make predictions on the produced model and save the model to the input path. +#' +#' @param data SparkDataFrame for training +#' @param formula A symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', '.', ':', '+', and '-'. +#' @param isotonic Whether the output sequence should be isotonic/increasing (true) or +#' antitonic/decreasing (false) +#' @param featureIndex The index of the feature if \code{featuresCol} is a vector column (default: `0`), +#' no effect otherwise +#' @param weightCol The weight column name. +#' @return \code{spark.isoreg} returns a fitted Isotonic Regression model +#' @rdname spark.isoreg +#' @aliases spark.isoreg,SparkDataFrame,formula-method +#' @name spark.isoreg +#' @export +#' @examples +#' \dontrun{ +#' sparkR.session() +#' data <- list(list(7.0, 0.0), list(5.0, 1.0), list(3.0, 2.0), +#' list(5.0, 3.0), list(1.0, 4.0)) +#' df <- createDataFrame(data, c("label", "feature")) +#' model <- spark.isoreg(df, label ~ feature, isotonic = FALSE) +#' # return model boundaries and prediction as lists +#' result <- summary(model, df) +#' +#' # save fitted model to input path +#' path <- "path/to/model" +#' write.ml(model, path) +#' +#' # can also read back the saved model and print +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' } +#' @note spark.isoreg since 2.1.0 +setMethod("spark.isoreg", signature(data = "SparkDataFrame", formula = "formula"), + function(data, formula, isotonic = TRUE, featureIndex = 0, weightCol = NULL) { +formula <- paste0(deparse(formula), collapse = "") + +if (is.null(weightCol)) { + weightCol <- "" +} + +jobj <- callJStatic("org.apache.spark.ml.r.IsotonicRegressionWrapper", "fit", +data@sdf, formula, as.logical(isotonic), as.integer(featureIndex), weightCol) +return(new("IsotonicRegressionModel", jobj = jobj)) + }) + +# Predicted values based on an isotonicRegression model + +#' @param object a fitted isotonicRegressionModel +#' @param newData SparkDataFrame for testing +#' @return \code{predict} returns a SparkDataFrame containing predicted values +#' @rdname spark.isoreg +#' @export +#' @note predict(isotonicRegressionModel) since 2.1.0 +setMethod("predict", signature(object = "IsotonicRegressionModel"), + function(object, newData) { +return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) + }) + +# Get the summary of a isotonicRegressionModel model + +#' @return \code{summary} returns the model's boundaries and prediction as lists +#' @rdname spark.isoreg --- End diff -- Please make sure we add this - otherwise it fails CRAN test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11157: [SPARK-11714][Mesos] Make Spark on Mesos honor po...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/11157#discussion_r74526303 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala --- @@ -358,6 +376,109 @@ private[mesos] trait MesosSchedulerUtils extends Logging { } /** + * Checks executor ports if they are within some range of the offered list of ports ranges, + * + * @param conf the Spark Config + * @param ports the list of ports to check + * @return true if ports are within range false otherwise + */ + protected def checkPorts(conf: SparkConf, ports: List[(Long, Long)]): Boolean = { + +def checkIfInRange(port: Long, ps: List[(Long, Long)]): Boolean = { + ps.exists(r => r._1 <= port & r._2 >= port) +} + +val portsToCheck = nonZeroPortValuesFromConfig(conf) +val withinRange = portsToCheck.forall(p => checkIfInRange(p, ports)) +// make sure we have enough ports to allocate per offer +ports.map(r => r._2 - r._1 + 1).sum >= portsToCheck.size && withinRange + } + + /** + * Partitions port resources. + * + * @param requestedPorts non-zero ports to assign + * @param offeredResources the resources offered + * @return resources left, port resources to be used. + */ + def partitionPortResources(requestedPorts: List[Long], offeredResources: List[Resource]) +: (List[Resource], List[Resource]) = { +if (requestedPorts.isEmpty) { + (offeredResources, List[Resource]()) +} +else { + // partition port offers + val (resourcesWithoutPorts, portResources) = filterPortResources(offeredResources) + + val portsAndRoles = requestedPorts. +map(x => (x, findPortAndGetAssignedRangeRole(x, portResources))) + + val assignedPortResources = createResourcesFromPorts(portsAndRoles) + + // ignore non-assigned port resources, they will be declined implicitly by mesos + // no need for splitting port resources. + (resourcesWithoutPorts, assignedPortResources) +} + } + + val managedPortNames = List("spark.executor.port", "spark.blockManager.port") + + /** + * The values of the non-zero ports to be used by the executor process. + * @param conf the spark config to use + * @return the ono-zero values of the ports + */ + def nonZeroPortValuesFromConfig(conf: SparkConf): List[Long] = { +managedPortNames.map(conf.getLong(_, 0)).filter( _ != 0) + } + + /** Creates a mesos resource for a specific port number. */ + private def createResourcesFromPorts(portsAndRoles: List[(Long, String)]) : List[Resource] = { +portsAndRoles.flatMap{port => createMesosPortResource(List((port._1, port._1)), Some(port._2))} + } + + /** Helper to create mesos resources for specific port ranges. */ + private def createMesosPortResource( + ranges: List[(Long, Long)], + role: Option[String] = None): List[Resource] = { +ranges.map { range => --- End diff -- ok will try it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11157: [SPARK-11714][Mesos] Make Spark on Mesos honor po...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/11157#discussion_r74526278 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala --- @@ -358,6 +376,109 @@ private[mesos] trait MesosSchedulerUtils extends Logging { } /** + * Checks executor ports if they are within some range of the offered list of ports ranges, + * + * @param conf the Spark Config + * @param ports the list of ports to check + * @return true if ports are within range false otherwise + */ + protected def checkPorts(conf: SparkConf, ports: List[(Long, Long)]): Boolean = { + +def checkIfInRange(port: Long, ps: List[(Long, Long)]): Boolean = { + ps.exists(r => r._1 <= port & r._2 >= port) +} + +val portsToCheck = nonZeroPortValuesFromConfig(conf) +val withinRange = portsToCheck.forall(p => checkIfInRange(p, ports)) +// make sure we have enough ports to allocate per offer +ports.map(r => r._2 - r._1 + 1).sum >= portsToCheck.size && withinRange + } + + /** + * Partitions port resources. + * + * @param requestedPorts non-zero ports to assign + * @param offeredResources the resources offered + * @return resources left, port resources to be used. + */ + def partitionPortResources(requestedPorts: List[Long], offeredResources: List[Resource]) +: (List[Resource], List[Resource]) = { +if (requestedPorts.isEmpty) { + (offeredResources, List[Resource]()) +} +else { + // partition port offers + val (resourcesWithoutPorts, portResources) = filterPortResources(offeredResources) + + val portsAndRoles = requestedPorts. +map(x => (x, findPortAndGetAssignedRangeRole(x, portResources))) + + val assignedPortResources = createResourcesFromPorts(portsAndRoles) + + // ignore non-assigned port resources, they will be declined implicitly by mesos + // no need for splitting port resources. + (resourcesWithoutPorts, assignedPortResources) +} + } + + val managedPortNames = List("spark.executor.port", "spark.blockManager.port") + + /** + * The values of the non-zero ports to be used by the executor process. + * @param conf the spark config to use + * @return the ono-zero values of the ports + */ + def nonZeroPortValuesFromConfig(conf: SparkConf): List[Long] = { +managedPortNames.map(conf.getLong(_, 0)).filter( _ != 0) + } + + /** Creates a mesos resource for a specific port number. */ + private def createResourcesFromPorts(portsAndRoles: List[(Long, String)]) : List[Resource] = { +portsAndRoles.flatMap{port => createMesosPortResource(List((port._1, port._1)), Some(port._2))} --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11157: [SPARK-11714][Mesos] Make Spark on Mesos honor po...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/11157#discussion_r74526246 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala --- @@ -358,6 +376,109 @@ private[mesos] trait MesosSchedulerUtils extends Logging { } /** + * Checks executor ports if they are within some range of the offered list of ports ranges, + * + * @param conf the Spark Config + * @param ports the list of ports to check + * @return true if ports are within range false otherwise + */ + protected def checkPorts(conf: SparkConf, ports: List[(Long, Long)]): Boolean = { + +def checkIfInRange(port: Long, ps: List[(Long, Long)]): Boolean = { + ps.exists(r => r._1 <= port & r._2 >= port) +} + +val portsToCheck = nonZeroPortValuesFromConfig(conf) +val withinRange = portsToCheck.forall(p => checkIfInRange(p, ports)) +// make sure we have enough ports to allocate per offer +ports.map(r => r._2 - r._1 + 1).sum >= portsToCheck.size && withinRange + } + + /** + * Partitions port resources. + * + * @param requestedPorts non-zero ports to assign + * @param offeredResources the resources offered + * @return resources left, port resources to be used. + */ + def partitionPortResources(requestedPorts: List[Long], offeredResources: List[Resource]) +: (List[Resource], List[Resource]) = { +if (requestedPorts.isEmpty) { + (offeredResources, List[Resource]()) +} +else { --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14561: [SPARK-16972][CORE] Move DriverEndpoint out of CoarseGra...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/14561 Agreed with @jerryshao. @lshmouse could you submit the whole refactoring PR in order to show why this one is necessary? It's better to not refactor stable code paths unless there is a strong reason. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14182: [SPARK-16444][SparkR]: Isotonic Regression wrappe...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14182#discussion_r74525862 --- Diff: R/pkg/R/mllib.R --- @@ -292,6 +299,85 @@ setMethod("summary", signature(object = "NaiveBayesModel"), return(list(apriori = apriori, tables = tables)) }) +#' Isotonic Regression Model +#' +#' Fits an Isotonic Regression model against a Spark DataFrame, similarly to R's isoreg(). +#' Users can print, make predictions on the produced model and save the model to the input path. +#' +#' @param data SparkDataFrame for training +#' @param formula A symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', '.', ':', '+', and '-'. +#' @param isotonic Whether the output sequence should be isotonic/increasing (TRUE) or +#' antitonic/decreasing (FALSE) +#' @param featureIndex The index of the feature if \code{featuresCol} is a vector column (default: `0`), +#' no effect otherwise +#' @param weightCol The weight column name. +#' @return \code{spark.isoreg} returns a fitted Isotonic Regression model +#' @rdname spark.isoreg +#' @aliases spark.isoreg,SparkDataFrame,formula-method +#' @name spark.isoreg +#' @export +#' @examples +#' \dontrun{ +#' sparkR.session() +#' data <- list(list(7.0, 0.0), list(5.0, 1.0), list(3.0, 2.0), +#' list(5.0, 3.0), list(1.0, 4.0)) +#' df <- createDataFrame(data, c("label", "feature")) +#' model <- spark.isoreg(df, label ~ feature, isotonic = FALSE) +#' # return model boundaries and prediction as lists +#' result <- summary(model, df) +#' +#' # save fitted model to input path +#' path <- "path/to/model" +#' write.ml(model, path) +#' +#' # can also read back the saved model and print +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' } +#' @note spark.isoreg since 2.1.0 +setMethod("spark.isoreg", signature(data = "SparkDataFrame", formula = "formula"), + function(data, formula, isotonic = TRUE, featureIndex = 0, weightCol = NULL) { +formula <- paste0(deparse(formula), collapse = "") + +if (is.null(weightCol)) { + weightCol <- "" +} + +jobj <- callJStatic("org.apache.spark.ml.r.IsotonicRegressionWrapper", "fit", +data@sdf, formula, as.logical(isotonic), as.integer(featureIndex), + as.character(weightCol)) +return(new("IsotonicRegressionModel", jobj = jobj)) + }) + +# Predicted values based on an isotonicRegression model + +#' @param object a fitted isotonicRegressionModel +#' @param newData SparkDataFrame for testing +#' @return \code{predict} returns a SparkDataFrame containing predicted values +#' @rdname spark.isoreg +#' @export +#' @note predict(isotonicRegressionModel) since 2.1.0 --- End diff -- capital "IsotonicRegressionModel" since it's a class it needs to match? similarly in L355, 368, 372 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14182: [SPARK-16444][SparkR]: Isotonic Regression wrappe...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14182#discussion_r74525897 --- Diff: R/pkg/R/mllib.R --- @@ -292,6 +299,85 @@ setMethod("summary", signature(object = "NaiveBayesModel"), return(list(apriori = apriori, tables = tables)) }) +#' Isotonic Regression Model +#' +#' Fits an Isotonic Regression model against a Spark DataFrame, similarly to R's isoreg(). +#' Users can print, make predictions on the produced model and save the model to the input path. +#' +#' @param data SparkDataFrame for training +#' @param formula A symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', '.', ':', '+', and '-'. +#' @param isotonic Whether the output sequence should be isotonic/increasing (TRUE) or +#' antitonic/decreasing (FALSE) +#' @param featureIndex The index of the feature if \code{featuresCol} is a vector column (default: `0`), +#' no effect otherwise +#' @param weightCol The weight column name. +#' @return \code{spark.isoreg} returns a fitted Isotonic Regression model +#' @rdname spark.isoreg +#' @aliases spark.isoreg,SparkDataFrame,formula-method +#' @name spark.isoreg +#' @export +#' @examples +#' \dontrun{ +#' sparkR.session() +#' data <- list(list(7.0, 0.0), list(5.0, 1.0), list(3.0, 2.0), +#' list(5.0, 3.0), list(1.0, 4.0)) +#' df <- createDataFrame(data, c("label", "feature")) +#' model <- spark.isoreg(df, label ~ feature, isotonic = FALSE) +#' # return model boundaries and prediction as lists --- End diff -- could you add an example with `predict`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14613: [SPARK-16883][SparkR]:SQL decimal type is not properly c...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14613 @wangmiao1981 Thanks for the PR. Could we add a couple of test cases for this ? It'll also help me understand what is the expected behavior -- one of them could be for `collect` with decimals and another one could be for `str` on a Spark DatatFrame which contains decimals. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14613: [SPARK-16883][SparkR]:SQL decimal type is not properly c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14613 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63648/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14613: [SPARK-16883][SparkR]:SQL decimal type is not properly c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14613 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14546: [SPARK-16955][SQL] Using ordinals in ORDER BY and GROUP ...
Github user clockfly commented on the issue: https://github.com/apache/spark/pull/14546 I think a proper fix will be marking ordinal unresolved, the ordinal can exists in group by or order by expression. Then we can make sure the ResolveAggregateFunctions and other analyzer rules doesn't assume the ordinals are resolved, and do pre-mature Analysis. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14392: [SPARK-16446] [SparkR] [ML] Gaussian Mixture Model wrapp...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/14392 btw, I think it'll be great to get some feedback on the naming of this. As per SPARK-14831, should we go with a more Spark specific name like `gaussianmixture` rather than a R one? How well known is `mvnormalmixEM `? How close is the Spark implementation to that? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14546: [SPARK-16955][SQL] Using ordinals in ORDER BY and GROUP ...
Github user clockfly commented on the issue: https://github.com/apache/spark/pull/14546 I think the root cause is that the Aggregate operator is treated as resolved if even it has group by ordinals. For example: ``` 'Filter ('a > 0) +- Aggregate [2], [count(1) AS count(1)#83L, a#81] +- SubqueryAlias tmp +- Project [1 AS a#81] +- OneRowRelation$ ``` Aggregate is treated as resolved even if it has a group by ordinal "2". Then, it tries to resolve the `Filter` by putting the `Filter` as a aggregation expression: ``` !'Aggregate [2], [('a > 0) AS havingCondition#84] +- SubqueryAlias tmp +- Project [1 AS a#81] +- OneRowRelation$ ``` Actually this plan is already wrong. As we are asking for ordinal "2", but actually there is only one aggregation expression `[('a > 0) AS havingCondition#84] ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14609: [MINOR][Core] fix warnings on depreciated methods in Mes...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14609 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63638/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14615: make toJSON not go through rdd form but operate o...
GitHub user robert3005 opened a pull request: https://github.com/apache/spark/pull/14615 make toJSON not go through rdd form but operate on dataset always ## What changes were proposed in this pull request? Don't convert toRdd when doing toJSON ## How was this patch tested? Existing unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/robert3005/spark robertk/correct-tojson Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14615.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14615 commit 98086f4fdf0d7464bed0bb4f23c3694da828e222 Author: Robert KruszewskiDate: 2016-08-11T19:26:21Z make toJSON not go through rdd form but operate on dataset always --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14609: [MINOR][Core] fix warnings on depreciated methods in Mes...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14609 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14571: [SPARK-16983][SQL] Add `prettyName` for row_number, dens...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14571 I see. Then, I'll include only that today. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12930: [SPARK-15153] [ML] [SparkR] Fix SparkR spark.naiveBayes ...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/12930 Do we still need this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14546: [SPARK-16955][SQL] Using ordinals in ORDER BY and GROUP ...
Github user clockfly commented on the issue: https://github.com/apache/spark/pull/14546 @dongjoon-hyun The exception was muted by line: https://github.com/apache/spark/pull/14546/files#diff-57b3d87be744b7d79a9beacf8e5e5eb2R1257 If you add some log message, you wils find it still throws exception like: ``` org.apache.spark.sql.AnalysisException: GROUP BY position 2 is not in select list (valid range is [1, 1]); line 1 pos 53 ... ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14586: [SPARK-17003] [BUILD] [BRANCH-1.6] release-build.sh is m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14586 **[Test build #3222 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3222/consoleFull)** for PR 14586 at commit [`a785c01`](https://github.com/apache/spark/commit/a785c0190bf093f1e6deb0c46b5dbc89bc307603). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14611: [SPARK-17028][Repl]Backport SI-9734 for Scala 2.10
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/14611 Closing this one. Just found another issue with the current implementation and will reopen in the future --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14611: [SPARK-17028][Repl]Backport SI-9734 for Scala 2.1...
Github user zsxwing closed the pull request at: https://github.com/apache/spark/pull/14611 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14397: [SPARK-16771][SQL] WITH clause should not fall into infi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14397 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63637/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14613: [SPARK-16883][SparkR]:SQL decimal type is not properly c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14613 **[Test build #63648 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63648/consoleFull)** for PR 14613 at commit [`e95f557`](https://github.com/apache/spark/commit/e95f5575018d15782917b9b3d679b4f6da345ee6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14586: [SPARK-17003] [BUILD] [BRANCH-1.6] release-build.sh is m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14586 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14614: [SPARK-17027][ML] Avoid integer overflow in PolynomialEx...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14614 **[Test build #63649 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63649/consoleFull)** for PR 14614 at commit [`47170b8`](https://github.com/apache/spark/commit/47170b80cad68baf073fff54f5505124508267fd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14613: [SPARK-16883][SparkR]:SQL decimal type is not pro...
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/14613 [SPARK-16883][SparkR]:SQL decimal type is not properly cast to number when collecting SparkDataFrame ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) registerTempTable(createDataFrame(iris), "iris") str(collect(sql("select cast('1' as double) as x, cast('2' as decimal) as y from iris limit 5"))) 'data.frame': 5 obs. of 2 variables: $ x: num 1 1 1 1 1 $ y:List of 5 ..$ : num 2 ..$ : num 2 ..$ : num 2 ..$ : num 2 ..$ : num 2 The problem is that spark returns `decimal(10, 0)` col type, instead of `decimal`. Thus, `decimal(10, 0)` is not handled correctly. It should be handled as "double". As discussed in JIRA thread, we can have two potential fixes: 1). Scala side fix to add a new case when writing the object back; However, I can't use spark.sql.types._ in Spark core due to dependency issues. I don't find a way of doing type case match; 2). SparkR side fix: Add a helper function to check special type like `"decimal(10, 0)"` and replace it with `double`, which is PRIMITIVE type. This special helper is generic for adding new types handling in the future. I open this PR to discuss pros and cons of both approaches. If we want to do Scala side fix, we need to find a way to match the case of DecimalType and StructType in Spark Core. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Manual test: > str(collect(sql("select cast('1' as double) as x, cast('2' as decimal) as y from iris limit 5"))) 'data.frame': 5 obs. of 2 variables: $ x: num 1 1 1 1 1 $ y: num 2 2 2 2 2 R Unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangmiao1981/spark type Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14613.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14613 commit e95f5575018d15782917b9b3d679b4f6da345ee6 Author: wm...@hotmail.comDate: 2016-08-11T23:15:15Z add a type check helper --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14614: [SPARK-17027][ML] Avoid integer overflow in Polyn...
GitHub user zero323 opened a pull request: https://github.com/apache/spark/pull/14614 [SPARK-17027][ML] Avoid integer overflow in PolynomialExpansion.getPolySize ## What changes were proposed in this pull request? Replaces custom choose function with o.a.commons.math3.CombinatoricsUtils.binomialCoefficient ## How was this patch tested? Spark unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zero323/spark SPARK-17027 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14614.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14614 commit 47170b80cad68baf073fff54f5505124508267fd Author: zero323Date: 2016-08-11T17:44:32Z Replace PolynomialExpansion.choose with CombinatoricsUtils.binomialCoefficient --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14571: [SPARK-16983][SQL] Add `prettyName` for row_number, dens...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14571 Oh, I think you are working on that transition somewhere else. BTW, what about other tests? If you have a plan to enrich SQLQueryTestSuite, I prefer to do all of them in a single PR during this weekend. How do you think about making a single JIRA for that transition? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14586: [SPARK-17003] [BUILD] [BRANCH-1.6] release-build.sh is m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14586 **[Test build #63647 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63647/consoleFull)** for PR 14586 at commit [`a785c01`](https://github.com/apache/spark/commit/a785c0190bf093f1e6deb0c46b5dbc89bc307603). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14586: [SPARK-17003] [BUILD] [BRANCH-1.6] release-build.sh is m...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/14586 Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14571: [SPARK-16983][SQL] Add `prettyName` for row_number, dens...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14571 Do you think you can create a test file for window functions in the new SQLQueryTestSuite along with this fix? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14612: [SPARK-16803] [SQL] SaveAsTable does not work when sourc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14612 **[Test build #63645 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63645/consoleFull)** for PR 14612 at commit [`71399f1`](https://github.com/apache/spark/commit/71399f1ca1e91af2a7d2a12c92d32bb691031c86). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14607: [SPARK-16905] SQL DDL: MSCK REPAIR TABLE (follow-up)
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14607 **[Test build #63646 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63646/consoleFull)** for PR 14607 at commit [`c442b75`](https://github.com/apache/spark/commit/c442b758e8bf0fc1affd1daa08381d458c7a71a4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14607: [SPARK-16905] SQL DDL: MSCK REPAIR TABLE (follow-up)
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14607 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63636/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14607: [SPARK-16905] SQL DDL: MSCK REPAIR TABLE (follow-up)
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14607 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13146: [SPARK-13081][PYSPARK][SPARK_SUBMIT]. Allow set pythonEx...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13146 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13146: [SPARK-13081][PYSPARK][SPARK_SUBMIT]. Allow set pythonEx...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13146 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63632/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org