[GitHub] spark issue #16595: [Minor][YARN] Move YarnSchedulerBackendSuite to resource...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16595 **[Test build #71427 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71427/testReport)** for PR 16595 at commit [`9301974`](https://github.com/apache/spark/commit/93019741bb94d955fc24e5b06d1dd1aa95672f70). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16592: [SPARK-19235] [SQL] [TESTS] Enable Test Cases in DDLSuit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16592 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71420/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16592: [SPARK-19235] [SQL] [TESTS] Enable Test Cases in DDLSuit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16592 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16592: [SPARK-19235] [SQL] [TESTS] Enable Test Cases in DDLSuit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16592 **[Test build #71420 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71420/testReport)** for PR 16592 at commit [`0133463`](https://github.com/apache/spark/commit/01334635c5433f0515beb92660b79796c97677d5). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class InMemoryCatalogedDDLSuite extends DDLSuite with SharedSQLContext with BeforeAndAfterEach ` * `abstract class DDLSuite extends QueryTest with SQLTestUtils ` * `class HiveCatalogedDDLSuite extends DDLSuite with TestHiveSingleton with BeforeAndAfterEach ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16566#discussion_r9617 --- Diff: R/pkg/R/mllib_clustering.R --- @@ -38,6 +45,146 @@ setClass("KMeansModel", representation(jobj = "jobj")) #' @note LDAModel since 2.1.0 setClass("LDAModel", representation(jobj = "jobj")) +#' Bisecting K-Means Clustering Model +#' +#' Fits a bisecting k-means clustering model against a Spark DataFrame. +#' Users can call \code{summary} to print a summary of the fitted model, \code{predict} to make +#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', '.', ':', '+', and '-'. +#'Note that the response variable of formula is empty in spark.bisectingKmeans. +#' @param k the desired number of leaf clusters. Must be > 1. +#' The actual number could be smaller if there are no divisible leaf clusters. +#' @param maxIter maximum iteration number. +#' @param minDivisibleClusterSize The minimum number of points (if greater than or equal to 1.0) +#'or the minimum proportion of points (if less than 1.0) of a divisible cluster. +#' @param seed the random seed. +#' @param ... additional argument(s) passed to the method. +#' @return \code{spark.bisectingKmeans} returns a fitted bisecting k-means model. +#' @rdname spark.bisectingKmeans +#' @aliases spark.bisectingKmeans,SparkDataFrame,formula-method +#' @name spark.bisectingKmeans +#' @export +#' @examples +#' \dontrun{ +#' sparkR.session() +#' data(iris) +#' df <- createDataFrame(iris) +#' model <- spark.bisectingKmeans(df, Sepal_Length ~ Sepal_Width, k = 4) +#' summary(model) +#' +#' # fitted values on training data +#' fitted <- predict(model, df) +#' head(select(fitted, "Sepal_Length", "prediction")) +#' +#' # save fitted model to input path +#' path <- "path/to/model" +#' write.ml(model, path) +#' +#' # can also read back the saved model and print +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' } +#' @note spark.bisectingKmeans since 2.2.0 +#' @seealso \link{predict}, \link{read.ml}, \link{write.ml} +setMethod("spark.bisectingKmeans", signature(data = "SparkDataFrame", formula = "formula"), + function(data, formula, k = 4, maxIter = 20, minDivisibleClusterSize = 1.0, seed = NULL) { +formula <- paste0(deparse(formula), collapse = "") +if (!is.null(seed)) { + seed <- as.character(as.integer(seed)) +} +jobj <- callJStatic("org.apache.spark.ml.r.BisectingKMeansWrapper", "fit", +data@sdf, formula, as.integer(k), as.integer(maxIter), +as.numeric(minDivisibleClusterSize), seed) +new("BisectingKMeansModel", jobj = jobj) + }) + +# Get the summary of a bisecting k-means model + +#' @param object a fitted bisecting k-means model. +#' @return \code{summary} returns summary information of the fitted model, which is a list. +#' The list includes the model's \code{k} (number of cluster centers), +#' \code{coefficients} (model cluster centers), +#' \code{size} (number of data points in each cluster), and \code{cluster} +#' (cluster centers of the transformed data). +#' @rdname spark.bisectingKmeans +#' @export +#' @note summary(BisectingKMeansModel) since 2.2.0 +setMethod("summary", signature(object = "BisectingKMeansModel"), + function(object) { +jobj <- object@jobj +is.loaded <- callJMethod(jobj, "isLoaded") +features <- callJMethod(jobj, "features") +coefficients <- callJMethod(jobj, "coefficients") +k <- callJMethod(jobj, "k") +size <- callJMethod(jobj, "size") +coefficients <- t(matrix(coefficients, ncol = k)) +colnames(coefficients) <- unlist(features) +rownames(coefficients) <- 1:k +cluster <- if (is.loaded) { + NULL +} else { + dataFrame(callJMethod(jobj, "cluster")) +} +list(k = k, coefficients = coefficients, size = size, +cluster = cluster, is.loaded = is.loaded) + }) + +# Predicted values based on a bisecting k-means model + +#' @param newData a SparkDataFrame for testing. +#' @return \code{predict} returns the predicted values
[GitHub] spark pull request #16595: [Minor][YARN] Move YarnSchedulerBackendSuite to r...
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/16595 [Minor][YARN] Move YarnSchedulerBackendSuite to resource-managers/yarn directory. ## What changes were proposed in this pull request? #16092 moves YARN resource manager related code to resource-managers/yarn directory. The test case ```YarnSchedulerBackendSuite``` was added after that but with the wrong place. I move it to correct directory in this PR. ## How was this patch tested? Existing test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanboliang/spark yarn Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16595.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16595 commit 93019741bb94d955fc24e5b06d1dd1aa95672f70 Author: Yanbo LiangDate: 2017-01-16T07:46:26Z Move YarnSchedulerBackendSuite to resource-managers/yarn directory. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16561 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16561 **[Test build #71426 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71426/testReport)** for PR 16561 at commit [`21e63f8`](https://github.com/apache/spark/commit/21e63f8eb0540ff26c16804bffb222123a97c1c8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16561#discussion_r96175295 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -89,6 +89,25 @@ object Cast { case _ => false } + /** + * Return false iff we may truncate during casting `from` type to `to` type. e.g. long -> int, + * timestamp -> date. + */ + def canUpCast(from: DataType, to: DataType): Boolean = (from, to) match { --- End diff -- how about `def mayTruncate`? `canUpCast` is not accurate, we may not be able to cast even `canUpCast` returns true. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work f...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16474 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16387: [SPARK-18986][Core] ExternalAppendOnlyMap shouldn't fail...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16387 **[Test build #71425 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71425/testReport)** for PR 16387 at commit [`b1ef9ec`](https://github.com/apache/spark/commit/b1ef9ec749737125d833cd3a64922b4a9f8c32f1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work for Parq...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16474 thanks, merging to master/2.1! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16594 **[Test build #71424 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71424/testReport)** for PR 16594 at commit [`c3489fc`](https://github.com/apache/spark/commit/c3489fcad32caa1d6a9b7182e387a46aae5710fa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/16594 cc @rxin @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain
GitHub user wzhfy opened a pull request: https://github.com/apache/spark/pull/16594 [SPARK-17078] [SQL] Show stats when explain ## What changes were proposed in this pull request? Currently we can only check the estimated stats in logical plans by debugging. We need to provide an easier and more efficient way for developers/users. In this pr, we add an internal conf, when it's true, we can check the stats by explain extended command. ## How was this patch tested? Add test case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wzhfy/spark showStats Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16594.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16594 commit c3489fcad32caa1d6a9b7182e387a46aae5710fa Author: wangzhenhuaDate: 2017-01-16T07:24:23Z show stats in explain command --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16387: [SPARK-18986][Core] ExternalAppendOnlyMap shouldn't fail...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16387 @samkum Thanks for testing this. I think it is because every time `forceSpill` is called now, it will spill the map anyway. I will add a check to only spill the map if the map is not empty. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work for Parq...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16474 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71417/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work for Parq...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16474 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work for Parq...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16474 **[Test build #71417 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71417/testReport)** for PR 16474 at commit [`261e1b5`](https://github.com/apache/spark/commit/261e1b5f295ca35ed2635c75aa9f1b91d8805bd7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r96171888 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala --- @@ -316,30 +329,43 @@ class CastSuite extends SparkFunSuite with ExpressionEvalHelper { val zts = sd + " 00:00:00" val sts = sd + " 00:00:02" val nts = sts + ".1" -val ts = Timestamp.valueOf(nts) - -var c = Calendar.getInstance() -c.set(2015, 2, 8, 2, 30, 0) -checkEvaluation(cast(cast(new Timestamp(c.getTimeInMillis), StringType), TimestampType), - c.getTimeInMillis * 1000) -c = Calendar.getInstance() -c.set(2015, 10, 1, 2, 30, 0) -checkEvaluation(cast(cast(new Timestamp(c.getTimeInMillis), StringType), TimestampType), - c.getTimeInMillis * 1000) +val ts = withDefaultTimeZone(TimeZoneGMT)(Timestamp.valueOf(nts)) + +for (tz <- ALL_TIMEZONES) { + val timeZoneId = Option(tz.getID) --- End diff -- when will `timeZoneId` be None here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r96171901 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala --- @@ -41,13 +46,18 @@ object ReplaceExpressions extends Rule[LogicalPlan] { */ object ComputeCurrentTime extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = { -val dateExpr = CurrentDate() +val currentDates = mutable.Map.empty[String, Literal] val timeExpr = CurrentTimestamp() -val currentDate = Literal.create(dateExpr.eval(EmptyRow), dateExpr.dataType) -val currentTime = Literal.create(timeExpr.eval(EmptyRow), timeExpr.dataType) +val timestamp = timeExpr.eval(EmptyRow).asInstanceOf[Long] +val currentTime = Literal.create(timestamp, timeExpr.dataType) plan transformAllExpressions { - case CurrentDate() => currentDate + case CurrentDate(Some(timeZoneId)) => +currentDates.getOrElseUpdate(timeZoneId, { + Literal.create( +DateTimeUtils.millisToDays(timestamp / 1000L, TimeZone.getTimeZone(timeZoneId)), +DateType) +}) case CurrentTimestamp() => currentTime --- End diff -- timestamp is an absolute value -- timezone only matters when converting a timestamp into a displayable value (string) or date. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r96171760 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala --- @@ -32,20 +34,20 @@ import org.apache.spark.unsafe.types.UTF8String */ class CastSuite extends SparkFunSuite with ExpressionEvalHelper { - private def cast(v: Any, targetType: DataType): Cast = { + private def cast(v: Any, targetType: DataType, timeZoneId: Option[String] = None): Cast = { v match { - case lit: Expression => Cast(lit, targetType) - case _ => Cast(Literal(v), targetType) + case lit: Expression => Cast(lit, targetType, timeZoneId) + case _ => Cast(Literal(v), targetType, timeZoneId) } } // expected cannot be null - private def checkCast(v: Any, expected: Any): Unit = { -checkEvaluation(cast(v, Literal(expected).dataType), expected) + private def checkCast(v: Any, expected: Any, timeZoneId: Option[String] = None): Unit = { --- End diff -- where do you call this method and set the `timeZoneId` parameter? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16593 **[Test build #71423 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71423/testReport)** for PR 16593 at commit [`7c09a7c`](https://github.com/apache/spark/commit/7c09a7ca1b948368cf67505e8bd19d0ae6e6142b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r96171393 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -195,19 +231,26 @@ case class Hour(child: Expression) extends UnaryExpression with ImplicitCastInpu > SELECT _FUNC_('2009-07-30 12:58:59'); 58 """) -case class Minute(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { +case class Minute(child: Expression, timeZoneId: Option[String] = None) --- End diff -- Logically `Minute`/`Second` are not timezone-aware right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16591: [SPARK-19227][CORE] remove unused imports and outdated c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16591 **[Test build #71422 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71422/testReport)** for PR 16591 at commit [`e98d9ab`](https://github.com/apache/spark/commit/e98d9abdb6b4073f8d75beee919081e1a4baf1dc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16591: [SPARK-19227][CORE] remove unused imports and outdated c...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16591 This work does not change any code, but just delete unused imports and fix some code style issue. cc @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r96171142 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala --- @@ -41,13 +46,18 @@ object ReplaceExpressions extends Rule[LogicalPlan] { */ object ComputeCurrentTime extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = { -val dateExpr = CurrentDate() +val currentDates = mutable.Map.empty[String, Literal] val timeExpr = CurrentTimestamp() -val currentDate = Literal.create(dateExpr.eval(EmptyRow), dateExpr.dataType) -val currentTime = Literal.create(timeExpr.eval(EmptyRow), timeExpr.dataType) +val timestamp = timeExpr.eval(EmptyRow).asInstanceOf[Long] +val currentTime = Literal.create(timestamp, timeExpr.dataType) plan transformAllExpressions { - case CurrentDate() => currentDate + case CurrentDate(Some(timeZoneId)) => +currentDates.getOrElseUpdate(timeZoneId, { + Literal.create( +DateTimeUtils.millisToDays(timestamp / 1000L, TimeZone.getTimeZone(timeZoneId)), +DateType) +}) case CurrentTimestamp() => currentTime --- End diff -- why `CurrentTimestamp` is not timezone aware? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16593 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16593 **[Test build #71421 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71421/testReport)** for PR 16593 at commit [`6c31d01`](https://github.com/apache/spark/commit/6c31d017324b3c7f310103d2d4b5138bbef4b463). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16593 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71421/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16593 **[Test build #71421 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71421/testReport)** for PR 16593 at commit [`6c31d01`](https://github.com/apache/spark/commit/6c31d017324b3c7f310103d2d4b5138bbef4b463). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable wor...
GitHub user windpiger opened a pull request: https://github.com/apache/spark/pull/16593 [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with create partitioned table ## What changes were proposed in this pull request? After [SPARK-19107](https://issues.apache.org/jira/browse/SPARK-19153), we now can treat hive as a data source and create hive tables with DataFrameWriter and Catalog. However, the support is not completed, there are still some cases we do not support. this PR provide DataFrameWriter.saveAsTable work with hive format to create partitioned table. ## How was this patch tested? unit test added You can merge this pull request into a Git repository by running: $ git pull https://github.com/windpiger/spark saveAsTableWithPartitionedTable Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16593.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16593 commit 6c31d017324b3c7f310103d2d4b5138bbef4b463 Author: windpigerDate: 2017-01-16T06:23:09Z [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with create partitioned table --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work for Parq...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16474 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/16561#discussion_r96168755 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala --- @@ -28,22 +28,56 @@ import org.apache.spark.sql.catalyst.rules.Rule */ /** - * Make sure that a view's child plan produces the view's output attributes. We wrap the child - * with a Project and add an alias for each output attribute. The attributes are resolved by - * name. This should be only done after the batch of Resolution, because the view attributes are - * not completely resolved during the batch of Resolution. + * Make sure that a view's child plan produces the view's output attributes. We try to wrap the + * child by: + * 1. Generate the `queryOutput` by: + *1.1. If the query column names are defined, map the column names to attributes in the child + * output by name; + *1.2. Else set the child output attributes to `queryOutput`. + * 2. Map the `queryQutput` to view output by index, if the corresponding attributes don't match, + *try to up cast and alias the attribute in `queryOutput` to the attribute in the view output. + * 3. Add a Project over the child, with the new output generated by the previous steps. + * If the view output doesn't have the same number of columns neither with the child output, nor + * with the query column names, throw an AnalysisException. + * + * This should be only done after the batch of Resolution, because the view attributes are not + * completely resolved during the batch of Resolution. */ case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { -case v @ View(_, output, child) if child.resolved => +case v @ View(desc, output, child) if child.resolved => val resolver = conf.resolver - val newOutput = output.map { attr => -val originAttr = findAttributeByName(attr.name, child.output, resolver) -// The dataType of the output attributes may be not the same with that of the view output, -// so we should cast the attribute to the dataType of the view output attribute. If the -// cast can't perform, will throw an AnalysisException. -Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = attr.exprId, - qualifier = attr.qualifier, explicitMetadata = Some(attr.metadata)) + val queryColumnNames = desc.viewQueryColumnNames + // If the view output doesn't have the same number of columns with the child output and the + // query column names, throw an AnalysisException. + if (output.length != child.output.length && output.length != queryColumnNames.length) { +throw new AnalysisException( + s"The view output ${output.mkString("[", ",", "]")} doesn't have the same number of " + +s"columns with the child output ${child.output.mkString("[", ",", "]")}") + } + // If the child output is the same with the view output, we don't need to generate the query + // output again. + val queryOutput = if (queryColumnNames.nonEmpty && output != child.output) { --- End diff -- Oh I think that's better! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16464 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71418/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16464 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16464 **[Test build #71418 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71418/testReport)** for PR 16464 at commit [`e133ee6`](https://github.com/apache/spark/commit/e133ee64961beaf10b7885ece76ded021ae5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16561#discussion_r96168453 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala --- @@ -28,22 +28,56 @@ import org.apache.spark.sql.catalyst.rules.Rule */ /** - * Make sure that a view's child plan produces the view's output attributes. We wrap the child - * with a Project and add an alias for each output attribute. The attributes are resolved by - * name. This should be only done after the batch of Resolution, because the view attributes are - * not completely resolved during the batch of Resolution. + * Make sure that a view's child plan produces the view's output attributes. We try to wrap the + * child by: + * 1. Generate the `queryOutput` by: + *1.1. If the query column names are defined, map the column names to attributes in the child + * output by name; + *1.2. Else set the child output attributes to `queryOutput`. + * 2. Map the `queryQutput` to view output by index, if the corresponding attributes don't match, + *try to up cast and alias the attribute in `queryOutput` to the attribute in the view output. + * 3. Add a Project over the child, with the new output generated by the previous steps. + * If the view output doesn't have the same number of columns neither with the child output, nor + * with the query column names, throw an AnalysisException. + * + * This should be only done after the batch of Resolution, because the view attributes are not + * completely resolved during the batch of Resolution. */ case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { -case v @ View(_, output, child) if child.resolved => +case v @ View(desc, output, child) if child.resolved => val resolver = conf.resolver - val newOutput = output.map { attr => -val originAttr = findAttributeByName(attr.name, child.output, resolver) -// The dataType of the output attributes may be not the same with that of the view output, -// so we should cast the attribute to the dataType of the view output attribute. If the -// cast can't perform, will throw an AnalysisException. -Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = attr.exprId, - qualifier = attr.qualifier, explicitMetadata = Some(attr.metadata)) + val queryColumnNames = desc.viewQueryColumnNames + // If the view output doesn't have the same number of columns with the child output and the + // query column names, throw an AnalysisException. + if (output.length != child.output.length && output.length != queryColumnNames.length) { +throw new AnalysisException( + s"The view output ${output.mkString("[", ",", "]")} doesn't have the same number of " + +s"columns with the child output ${child.output.mkString("[", ",", "]")}") + } + // If the child output is the same with the view output, we don't need to generate the query + // output again. + val queryOutput = if (queryColumnNames.nonEmpty && output != child.output) { +desc.viewQueryColumnNames.map { colName => + findAttributeByName(colName, child.output, resolver) +} + } else { +child.output + } --- End diff -- how about ``` val queryOutput = if (queryColumnNames.nonEmpty) { if (output.length != queryColumnNames.length) throw ... desc.viewQueryColumnNames.map { colName => findAttributeByName(colName, child.output, resolver) } } else { // For view created before Spark 2.1, the view text is already fully qualified, the plan output is view output. child.output } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/16561#discussion_r96168416 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala --- @@ -28,22 +28,56 @@ import org.apache.spark.sql.catalyst.rules.Rule */ /** - * Make sure that a view's child plan produces the view's output attributes. We wrap the child - * with a Project and add an alias for each output attribute. The attributes are resolved by - * name. This should be only done after the batch of Resolution, because the view attributes are - * not completely resolved during the batch of Resolution. + * Make sure that a view's child plan produces the view's output attributes. We try to wrap the + * child by: + * 1. Generate the `queryOutput` by: + *1.1. If the query column names are defined, map the column names to attributes in the child + * output by name; + *1.2. Else set the child output attributes to `queryOutput`. + * 2. Map the `queryQutput` to view output by index, if the corresponding attributes don't match, + *try to up cast and alias the attribute in `queryOutput` to the attribute in the view output. + * 3. Add a Project over the child, with the new output generated by the previous steps. + * If the view output doesn't have the same number of columns neither with the child output, nor + * with the query column names, throw an AnalysisException. + * + * This should be only done after the batch of Resolution, because the view attributes are not + * completely resolved during the batch of Resolution. */ case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { -case v @ View(_, output, child) if child.resolved => +case v @ View(desc, output, child) if child.resolved => val resolver = conf.resolver - val newOutput = output.map { attr => -val originAttr = findAttributeByName(attr.name, child.output, resolver) -// The dataType of the output attributes may be not the same with that of the view output, -// so we should cast the attribute to the dataType of the view output attribute. If the -// cast can't perform, will throw an AnalysisException. -Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = attr.exprId, - qualifier = attr.qualifier, explicitMetadata = Some(attr.metadata)) + val queryColumnNames = desc.viewQueryColumnNames + // If the view output doesn't have the same number of columns with the child output and the + // query column names, throw an AnalysisException. + if (output.length != child.output.length && output.length != queryColumnNames.length) { +throw new AnalysisException( + s"The view output ${output.mkString("[", ",", "]")} doesn't have the same number of " + +s"columns with the child output ${child.output.mkString("[", ",", "]")}") + } + // If the child output is the same with the view output, we don't need to generate the query + // output again. + val queryOutput = if (queryColumnNames.nonEmpty && output != child.output) { --- End diff -- For a nested view, the inner view operator may have been resolved, in that case the output is the same with child.output. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16561#discussion_r96168283 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala --- @@ -28,22 +28,56 @@ import org.apache.spark.sql.catalyst.rules.Rule */ /** - * Make sure that a view's child plan produces the view's output attributes. We wrap the child - * with a Project and add an alias for each output attribute. The attributes are resolved by - * name. This should be only done after the batch of Resolution, because the view attributes are - * not completely resolved during the batch of Resolution. + * Make sure that a view's child plan produces the view's output attributes. We try to wrap the + * child by: + * 1. Generate the `queryOutput` by: + *1.1. If the query column names are defined, map the column names to attributes in the child + * output by name; + *1.2. Else set the child output attributes to `queryOutput`. + * 2. Map the `queryQutput` to view output by index, if the corresponding attributes don't match, + *try to up cast and alias the attribute in `queryOutput` to the attribute in the view output. + * 3. Add a Project over the child, with the new output generated by the previous steps. + * If the view output doesn't have the same number of columns neither with the child output, nor + * with the query column names, throw an AnalysisException. + * + * This should be only done after the batch of Resolution, because the view attributes are not + * completely resolved during the batch of Resolution. */ case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { -case v @ View(_, output, child) if child.resolved => +case v @ View(desc, output, child) if child.resolved => val resolver = conf.resolver - val newOutput = output.map { attr => -val originAttr = findAttributeByName(attr.name, child.output, resolver) -// The dataType of the output attributes may be not the same with that of the view output, -// so we should cast the attribute to the dataType of the view output attribute. If the -// cast can't perform, will throw an AnalysisException. -Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = attr.exprId, - qualifier = attr.qualifier, explicitMetadata = Some(attr.metadata)) + val queryColumnNames = desc.viewQueryColumnNames + // If the view output doesn't have the same number of columns with the child output and the + // query column names, throw an AnalysisException. + if (output.length != child.output.length && output.length != queryColumnNames.length) { +throw new AnalysisException( + s"The view output ${output.mkString("[", ",", "]")} doesn't have the same number of " + +s"columns with the child output ${child.output.mkString("[", ",", "]")}") + } + // If the child output is the same with the view output, we don't need to generate the query + // output again. + val queryOutput = if (queryColumnNames.nonEmpty && output != child.output) { --- End diff -- `output != child.output` will always be true right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16561#discussion_r96168225 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala --- @@ -28,22 +28,56 @@ import org.apache.spark.sql.catalyst.rules.Rule */ /** - * Make sure that a view's child plan produces the view's output attributes. We wrap the child - * with a Project and add an alias for each output attribute. The attributes are resolved by - * name. This should be only done after the batch of Resolution, because the view attributes are - * not completely resolved during the batch of Resolution. + * Make sure that a view's child plan produces the view's output attributes. We try to wrap the + * child by: + * 1. Generate the `queryOutput` by: + *1.1. If the query column names are defined, map the column names to attributes in the child + * output by name; + *1.2. Else set the child output attributes to `queryOutput`. + * 2. Map the `queryQutput` to view output by index, if the corresponding attributes don't match, + *try to up cast and alias the attribute in `queryOutput` to the attribute in the view output. + * 3. Add a Project over the child, with the new output generated by the previous steps. + * If the view output doesn't have the same number of columns neither with the child output, nor + * with the query column names, throw an AnalysisException. + * + * This should be only done after the batch of Resolution, because the view attributes are not + * completely resolved during the batch of Resolution. */ case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { -case v @ View(_, output, child) if child.resolved => +case v @ View(desc, output, child) if child.resolved => val resolver = conf.resolver - val newOutput = output.map { attr => -val originAttr = findAttributeByName(attr.name, child.output, resolver) -// The dataType of the output attributes may be not the same with that of the view output, -// so we should cast the attribute to the dataType of the view output attribute. If the -// cast can't perform, will throw an AnalysisException. -Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = attr.exprId, - qualifier = attr.qualifier, explicitMetadata = Some(attr.metadata)) + val queryColumnNames = desc.viewQueryColumnNames + // If the view output doesn't have the same number of columns with the child output and the + // query column names, throw an AnalysisException. + if (output.length != child.output.length && output.length != queryColumnNames.length) { --- End diff -- This condition doesn't look very clear to me. How about `if (queryColumnNames.nonEmpty && output.length != queryColumnNames.length)`? When `queryColumnNames` is empty, it means this view is created prior to Spark 2.2, and we don't need to check anything. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16561#discussion_r96167779 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala --- @@ -28,22 +28,56 @@ import org.apache.spark.sql.catalyst.rules.Rule */ /** - * Make sure that a view's child plan produces the view's output attributes. We wrap the child - * with a Project and add an alias for each output attribute. The attributes are resolved by - * name. This should be only done after the batch of Resolution, because the view attributes are - * not completely resolved during the batch of Resolution. + * Make sure that a view's child plan produces the view's output attributes. We try to wrap the + * child by: + * 1. Generate the `queryOutput` by: + *1.1. If the query column names are defined, map the column names to attributes in the child + * output by name; --- End diff -- should we mention that, this is mostly for `SELECT * ...`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16561 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71416/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16561 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16561 **[Test build #71416 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71416/testReport)** for PR 16561 at commit [`16ec310`](https://github.com/apache/spark/commit/16ec310d96471579af916716f6c99df60fd20bc5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16387: [SPARK-18986][Core] ExternalAppendOnlyMap shouldn't fail...
Github user samkum commented on the issue: https://github.com/apache/spark/pull/16387 I have tested this, but I found a very strange observation. GC frequency has increased many folds...and majority of the time is spend in GC. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16592: [SPARK-19235] [SQL] [TESTS] Enable Test Cases in DDLSuit...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16592 cc @cloud-fan I think we really need to do this ASAP for improving the test case coverage in DDL commands, when I do the PR: https://github.com/apache/spark/pull/16587 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/16344 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16592: [SPARK-19235] [SQL] [TESTS] Enable Test Cases in ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16592#discussion_r96166442 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -102,6 +76,198 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach { tracksPartitionsInCatalog = true) } + test("desc table for parquet data source table using in-memory catalog") { +val tabName = "tab1" +withTable(tabName) { + sql(s"CREATE TABLE $tabName(a int comment 'test') USING parquet ") + + checkAnswer( +sql(s"DESC $tabName").select("col_name", "data_type", "comment"), +Row("a", "int", "test") + ) +} + } + + test("select/insert into the managed table") { +val tabName = "tbl" +withTable(tabName) { + sql(s"CREATE TABLE $tabName (i INT, j STRING)") + val catalogTable = + spark.sessionState.catalog.getTableMetadata(TableIdentifier(tabName, Some("default"))) + assert(catalogTable.tableType == CatalogTableType.MANAGED) + + var message = intercept[AnalysisException] { +sql(s"INSERT OVERWRITE TABLE $tabName SELECT 1, 'a'") + }.getMessage + assert(message.contains("Hive support is required to insert into the following tables")) + message = intercept[AnalysisException] { +sql(s"SELECT * FROM $tabName") + }.getMessage + assert(message.contains("Hive support is required to select over the following tables")) +} + } + + test("select/insert into external table") { +withTempDir { tempDir => + val tabName = "tbl" + withTable(tabName) { +sql( + s""" + |CREATE EXTERNAL TABLE $tabName (i INT, j STRING) + |ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' + |LOCATION '$tempDir' + """.stripMargin) +val catalogTable = + spark.sessionState.catalog.getTableMetadata(TableIdentifier(tabName, Some("default"))) +assert(catalogTable.tableType == CatalogTableType.EXTERNAL) + +var message = intercept[AnalysisException] { + sql(s"INSERT OVERWRITE TABLE $tabName SELECT 1, 'a'") +}.getMessage +assert(message.contains("Hive support is required to insert into the following tables")) +message = intercept[AnalysisException] { + sql(s"SELECT * FROM $tabName") +}.getMessage +assert(message.contains("Hive support is required to select over the following tables")) + } +} + } + + test("Create Hive Table As Select") { +import testImplicits._ +withTable("t", "t1") { + var e = intercept[AnalysisException] { +sql("CREATE TABLE t SELECT 1 as a, 1 as b") + }.getMessage + assert(e.contains("Hive support is required to use CREATE Hive TABLE AS SELECT")) + + spark.range(1).select('id as 'a, 'id as 'b).write.saveAsTable("t1") + e = intercept[AnalysisException] { +sql("CREATE TABLE t SELECT a, b from t1") + }.getMessage + assert(e.contains("Hive support is required to use CREATE Hive TABLE AS SELECT")) +} + } + + test("alter table: set location (datasource table)") { +testSetLocation(isDatasourceTable = true) + } + + test("alter table: set properties (datasource table)") { +testSetProperties(isDatasourceTable = true) + } + + test("alter table: unset properties (datasource table)") { +testUnsetProperties(isDatasourceTable = true) + } + + test("alter table: set serde (datasource table)") { +testSetSerde(isDatasourceTable = true) + } + + test("alter table: set serde partition (datasource table)") { +testSetSerdePartition(isDatasourceTable = true) + } + + test("alter table: change column (datasource table)") { +testChangeColumn(isDatasourceTable = true) + } + + test("alter table: add partition (datasource table)") { +testAddPartitions(isDatasourceTable = true) + } + + test("alter table: drop partition (datasource table)") { +testDropPartitions(isDatasourceTable = true) + } + + test("alter table: rename partition (datasource table)") { +testRenamePartitions(isDatasourceTable = true) + } + + test("drop table - data source table") { +testDropTable(isDatasourceTable = true) + } --- End diff -- The above 10 test cases are currently running with `InMemoryCatalog` only. The reason is `HiveExternalCatalog` does not allow users to change the table provider from `hive` to the others. In the
[GitHub] spark issue #16592: [SPARK-19235] [SQL] [TESTS] Enable Test Cases in DDLSuit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16592 **[Test build #71420 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71420/testReport)** for PR 16592 at commit [`0133463`](https://github.com/apache/spark/commit/01334635c5433f0515beb92660b79796c97677d5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16592: [SPARK-19235] [SQL] [TESTS] Enable Test Cases in ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16592#discussion_r96166347 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -102,6 +76,198 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach { tracksPartitionsInCatalog = true) } + test("desc table for parquet data source table using in-memory catalog") { +val tabName = "tab1" +withTable(tabName) { + sql(s"CREATE TABLE $tabName(a int comment 'test') USING parquet ") + + checkAnswer( +sql(s"DESC $tabName").select("col_name", "data_type", "comment"), +Row("a", "int", "test") + ) +} + } + + test("select/insert into the managed table") { +val tabName = "tbl" +withTable(tabName) { + sql(s"CREATE TABLE $tabName (i INT, j STRING)") + val catalogTable = + spark.sessionState.catalog.getTableMetadata(TableIdentifier(tabName, Some("default"))) + assert(catalogTable.tableType == CatalogTableType.MANAGED) + + var message = intercept[AnalysisException] { +sql(s"INSERT OVERWRITE TABLE $tabName SELECT 1, 'a'") + }.getMessage + assert(message.contains("Hive support is required to insert into the following tables")) + message = intercept[AnalysisException] { +sql(s"SELECT * FROM $tabName") + }.getMessage + assert(message.contains("Hive support is required to select over the following tables")) +} + } + + test("select/insert into external table") { +withTempDir { tempDir => + val tabName = "tbl" + withTable(tabName) { +sql( + s""" + |CREATE EXTERNAL TABLE $tabName (i INT, j STRING) + |ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' + |LOCATION '$tempDir' + """.stripMargin) +val catalogTable = + spark.sessionState.catalog.getTableMetadata(TableIdentifier(tabName, Some("default"))) +assert(catalogTable.tableType == CatalogTableType.EXTERNAL) + +var message = intercept[AnalysisException] { + sql(s"INSERT OVERWRITE TABLE $tabName SELECT 1, 'a'") +}.getMessage +assert(message.contains("Hive support is required to insert into the following tables")) +message = intercept[AnalysisException] { + sql(s"SELECT * FROM $tabName") +}.getMessage +assert(message.contains("Hive support is required to select over the following tables")) + } +} + } + + test("Create Hive Table As Select") { +import testImplicits._ +withTable("t", "t1") { + var e = intercept[AnalysisException] { +sql("CREATE TABLE t SELECT 1 as a, 1 as b") + }.getMessage + assert(e.contains("Hive support is required to use CREATE Hive TABLE AS SELECT")) + + spark.range(1).select('id as 'a, 'id as 'b).write.saveAsTable("t1") + e = intercept[AnalysisException] { +sql("CREATE TABLE t SELECT a, b from t1") + }.getMessage + assert(e.contains("Hive support is required to use CREATE Hive TABLE AS SELECT")) +} + } --- End diff -- The above four cases are copied from the existing ones in DDLSuites. These test cases only makes sense to InMemoryCatalog. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16592: [SPARK-19235] [SQL] [TESTS] Enable Test Cases in ...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/16592 [SPARK-19235] [SQL] [TESTS] Enable Test Cases in DDLSuite with Hive Metastore ### What changes were proposed in this pull request? So far, the test cases in DDLSuites only verify the behaviors of InMemoryCatalog. That means, they do not cover the scenarios using HiveExternalCatalog. Thus, we need to improve the existing test suite to run these cases using Hive metastore. When porting these test cases, a bug of `SET LOCATION` is found. `path` is not set when the location is not changed. After this PR, a few changes are made, as summarized below, - `DDLSuite` becomes an abstract class. Both `InMemoryCatalogedDDLSuite` and `HiveCatalogedDDLSuite` extend it. `InMemoryCatalogedDDLSuite` is using `InMemoryCatalog`. `HiveCatalogedDDLSuite` is using `HiveExternalCatalog`. - `InMemoryCatalogedDDLSuite` contains all the existing test cases in `DDLSuite`. - `HiveCatalogedDDLSuite` contains a subset of `DDLSuite`. The following test cases are excluded: 1. The following test cases only make sense for `InMemoryCatalog`: ``` test("desc table for parquet data source table using in-memory catalog") test("select/insert into the managed table") test("select/insert into external table") test("Create Hive Table As Select") ``` 2. The following test cases are unable to be ported because we are unable to alter table provider when using Hive metastore. In the future PRs we need to improve the test cases so that altering table provider is not needed: ``` test("alter table: set location (datasource table)") test("alter table: set properties (datasource table)") test("alter table: unset properties (datasource table)") test("alter table: set serde (datasource table)") test("alter table: set serde partition (datasource table)") test("alter table: change column (datasource table)") test("alter table: add partition (datasource table)") test("alter table: drop partition (datasource table)") test("alter table: rename partition (datasource table)") test("drop table - data source table") ``` **TODO** : in the future PRs, we need to remove `HiveDDLSuite` and move the test cases to either `DDLSuite`, `InMemoryCatalogedDDLSuite` or `HiveCatalogedDDLSuite`. ### How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark refactorDDLSuite Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16592.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16592 commit 01334635c5433f0515beb92660b79796c97677d5 Author: gatorsmileDate: 2017-01-16T04:52:55Z fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12064 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71419/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12064 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12064 **[Test build #71419 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71419/testReport)** for PR 12064 at commit [`fe2c424`](https://github.com/apache/spark/commit/fe2c424a4aa08f5f50387069db26f179e50395d4). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/16344 add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper ...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16566#discussion_r96165457 --- Diff: R/pkg/R/mllib_clustering.R --- @@ -38,6 +45,146 @@ setClass("KMeansModel", representation(jobj = "jobj")) #' @note LDAModel since 2.1.0 setClass("LDAModel", representation(jobj = "jobj")) +#' Bisecting K-Means Clustering Model +#' +#' Fits a bisecting k-means clustering model against a Spark DataFrame. +#' Users can call \code{summary} to print a summary of the fitted model, \code{predict} to make +#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', '.', ':', '+', and '-'. +#'Note that the response variable of formula is empty in spark.bisectingKmeans. +#' @param k the desired number of leaf clusters. Must be > 1. +#' The actual number could be smaller if there are no divisible leaf clusters. +#' @param maxIter maximum iteration number. +#' @param minDivisibleClusterSize The minimum number of points (if greater than or equal to 1.0) +#'or the minimum proportion of points (if less than 1.0) of a divisible cluster. +#' @param seed the random seed. +#' @param ... additional argument(s) passed to the method. +#' @return \code{spark.bisectingKmeans} returns a fitted bisecting k-means model. +#' @rdname spark.bisectingKmeans +#' @aliases spark.bisectingKmeans,SparkDataFrame,formula-method +#' @name spark.bisectingKmeans +#' @export +#' @examples +#' \dontrun{ +#' sparkR.session() +#' data(iris) +#' df <- createDataFrame(iris) +#' model <- spark.bisectingKmeans(df, Sepal_Length ~ Sepal_Width, k = 4) +#' summary(model) +#' +#' # fitted values on training data +#' fitted <- predict(model, df) +#' head(select(fitted, "Sepal_Length", "prediction")) +#' +#' # save fitted model to input path +#' path <- "path/to/model" +#' write.ml(model, path) +#' +#' # can also read back the saved model and print +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' } +#' @note spark.bisectingKmeans since 2.2.0 +#' @seealso \link{predict}, \link{read.ml}, \link{write.ml} +setMethod("spark.bisectingKmeans", signature(data = "SparkDataFrame", formula = "formula"), + function(data, formula, k = 4, maxIter = 20, minDivisibleClusterSize = 1.0, seed = NULL) { +formula <- paste0(deparse(formula), collapse = "") +if (!is.null(seed)) { + seed <- as.character(as.integer(seed)) +} +jobj <- callJStatic("org.apache.spark.ml.r.BisectingKMeansWrapper", "fit", +data@sdf, formula, as.integer(k), as.integer(maxIter), +as.numeric(minDivisibleClusterSize), seed) +new("BisectingKMeansModel", jobj = jobj) + }) + +# Get the summary of a bisecting k-means model + +#' @param object a fitted bisecting k-means model. +#' @return \code{summary} returns summary information of the fitted model, which is a list. +#' The list includes the model's \code{k} (number of cluster centers), +#' \code{coefficients} (model cluster centers), +#' \code{size} (number of data points in each cluster), and \code{cluster} +#' (cluster centers of the transformed data). +#' @rdname spark.bisectingKmeans +#' @export +#' @note summary(BisectingKMeansModel) since 2.2.0 +setMethod("summary", signature(object = "BisectingKMeansModel"), + function(object) { +jobj <- object@jobj +is.loaded <- callJMethod(jobj, "isLoaded") +features <- callJMethod(jobj, "features") +coefficients <- callJMethod(jobj, "coefficients") +k <- callJMethod(jobj, "k") +size <- callJMethod(jobj, "size") +coefficients <- t(matrix(coefficients, ncol = k)) +colnames(coefficients) <- unlist(features) +rownames(coefficients) <- 1:k +cluster <- if (is.loaded) { + NULL +} else { + dataFrame(callJMethod(jobj, "cluster")) +} +list(k = k, coefficients = coefficients, size = size, +cluster = cluster, is.loaded = is.loaded) + }) + +# Predicted values based on a bisecting k-means model + +#' @param newData a SparkDataFrame for testing. +#' @return \code{predict} returns the predicted
[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/16344 ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12064 **[Test build #71419 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71419/testReport)** for PR 12064 at commit [`fe2c424`](https://github.com/apache/spark/commit/fe2c424a4aa08f5f50387069db26f179e50395d4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16464 **[Test build #71418 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71418/testReport)** for PR 16464 at commit [`e133ee6`](https://github.com/apache/spark/commit/e133ee64961beaf10b7885ece76ded021ae5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16588: [SPARK-19092] [SQL] [Backport-2.1] Save() API of DataFra...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16588 Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16588: [SPARK-19092] [SQL] [Backport-2.1] Save() API of ...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/16588 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work for Parq...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16474 **[Test build #71417 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71417/testReport)** for PR 16474 at commit [`261e1b5`](https://github.com/apache/spark/commit/261e1b5f295ca35ed2635c75aa9f1b91d8805bd7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16561 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16561 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71415/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16561 **[Test build #71415 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71415/testReport)** for PR 16561 at commit [`d6537a5`](https://github.com/apache/spark/commit/d6537a5d66ba88cb827a6fc84a7ec8be79af5277). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16591: [SPARK-19227][CORE] remove ununsed imports and outdated ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16591 I know this is acceptable assuming from the history. However, I have seen a lot of unused imports across the code base. I think it'd be nicer if the same instances are checked at least within the same package. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16591: [SPARK-19227][CORE] remove ununsed imports and outdated ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16591 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16591: [SPARK-19227][CORE] remove ununsed imports and outdated ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16591 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71414/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16591: [SPARK-19227][CORE] remove ununsed imports and outdated ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16591 **[Test build #71414 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71414/testReport)** for PR 16591 at commit [`22405d1`](https://github.com/apache/spark/commit/22405d19a1ca6944162ddc330ea2dfc5a7c4638c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelation...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16545 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelation...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16545 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71413/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelation...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16545 **[Test build #71413 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71413/testReport)** for PR 16545 at commit [`6d1defb`](https://github.com/apache/spark/commit/6d1defb57407a12c6bf6020ed18cb2249328e435). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work f...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16474#discussion_r96162649 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala --- @@ -135,11 +135,21 @@ class FileScanRDD( try { if (ignoreCorruptFiles) { currentIterator = new NextIterator[Object] { -private val internalIter = readFunction(currentFile) +private val internalIter = { + try { +// The readFunction may read files before consuming the iterator. +// E.g., vectorized Parquet reader. +readFunction(currentFile) + } catch { +case e @(_: RuntimeException | _: IOException) => --- End diff -- yeah, I have this concern too in the pr description. One problem is the error message is varying across data sources. To list all error messages here looks not a good idea. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work f...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16474#discussion_r96162528 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala --- @@ -135,11 +135,21 @@ class FileScanRDD( try { if (ignoreCorruptFiles) { currentIterator = new NextIterator[Object] { -private val internalIter = readFunction(currentFile) +private val internalIter = { + try { +// The readFunction may read files before consuming the iterator. +// E.g., vectorized Parquet reader. +readFunction(currentFile) --- End diff -- I think it is hard to guarantee this because `readFunction` is coming from individual data source. Even we can modify current data sources, we may not be able to prevent other data sources doing this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15505 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71412/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15505 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15505 **[Test build #71412 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71412/testReport)** for PR 15505 at commit [`2d9569e`](https://github.com/apache/spark/commit/2d9569ea6cfeb09837897d4290f7605bc229c645). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16561 **[Test build #71416 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71416/testReport)** for PR 16561 at commit [`16ec310`](https://github.com/apache/spark/commit/16ec310d96471579af916716f6c99df60fd20bc5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/16561#discussion_r96159390 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -198,9 +203,44 @@ case class CatalogTable( /** * Return the default database name we use to resolve a view, should be None if the CatalogTable - * is not a View. + * is not a View or created by older versions of Spark(before 2.2.0). + */ + def viewDefaultDatabase: Option[String] = properties.get(VIEW_DEFAULT_DATABASE) + + /** + * Return the output column names of the query that creates a view, the column names are used to + * resolve a view, should be None if the CatalogTable is not a View or created by older versions + * of Spark(before 2.2.0). + */ + def viewQueryColumnNames: Seq[String] = { +for { + numCols <- properties.get(VIEW_QUERY_OUTPUT_COLUMN_NUM).toSeq --- End diff -- It is needed to generate the correct output. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16209: [WIP][SPARK-10849][SQL] Adds option to the JDBC data sou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16209 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71410/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16517: [SPARK-18243][SQL] Port Hive writing to use FileF...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16517#discussion_r96158823 --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala --- @@ -99,7 +99,7 @@ class HadoopMapReduceCommitProtocol(jobId: String, path: String) } private def getFilename(taskContext: TaskAttemptContext, ext: String): String = { -// The file name looks like part-r-0-2dd664f9-d2c4-4ffe-878f-c6c70c1fb0cb_3.gz.parquet +// The file name looks like part-0-2dd664f9-d2c4-4ffe-878f-c6c70c1fb0cb_3.gz.parquet --- End diff -- ok I should update this string, `c000` is files-count, which is added recently. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16209: [WIP][SPARK-10849][SQL] Adds option to the JDBC data sou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16209 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16209: [WIP][SPARK-10849][SQL] Adds option to the JDBC data sou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16209 **[Test build #71410 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71410/testReport)** for PR 16209 at commit [`ff71bac`](https://github.com/apache/spark/commit/ff71bac8162778f99e8985476498010c22268926). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16517: [SPARK-18243][SQL] Port Hive writing to use FileF...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16517#discussion_r96158740 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -69,34 +69,31 @@ import org.apache.spark.util.SerializableJobConf * {{{ * Map('a' -> Some('1'), 'b' -> None) * }}}. - * @param child the logical plan representing data to write to. + * @param query the logical plan representing data to write to. * @param overwrite overwrite existing table or partitions. * @param ifNotExists If true, only write if the table or partition does not exist. */ case class InsertIntoHiveTable( table: MetastoreRelation, partition: Map[String, Option[String]], -child: SparkPlan, +query: LogicalPlan, overwrite: Boolean, -ifNotExists: Boolean) extends UnaryExecNode { +ifNotExists: Boolean) extends RunnableCommand { - @transient private val sessionState = sqlContext.sessionState.asInstanceOf[HiveSessionState] - @transient private val externalCatalog = sqlContext.sharedState.externalCatalog + override protected def innerChildren: Seq[LogicalPlan] = query :: Nil --- End diff -- We can't. We only replace `InsertIntoTable` with `InsertIntoHiveTable` at planner. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16588: [SPARK-19092] [SQL] [Backport-2.1] Save() API of DataFra...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16588 thanks, merging to 2.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16587: [SPARK-19229] [SQL] Disallow Creating Hive Source...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16587#discussion_r96158395 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala --- @@ -278,7 +277,7 @@ abstract class ExternalCatalogSuite extends SparkFunSuite with BeforeAndAfterEac schema = new StructType() .add("HelLo", "int", nullable = false) .add("WoRLd", "int", nullable = true), - provider = Some("hive"), + provider = Some("parquet"), --- End diff -- shall we also use `defaultProvider` here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16587: [SPARK-19229] [SQL] Disallow Creating Hive Source...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16587#discussion_r96158356 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala --- @@ -186,6 +186,11 @@ class InMemoryCatalog( val db = tableDefinition.identifier.database.get requireDbExists(db) val table = tableDefinition.identifier.table +if (tableDefinition.provider.isDefined && tableDefinition.provider.get.toLowerCase == "hive") { --- End diff -- shall we put this in `HiveOnlyCheck`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16561#discussion_r96158153 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala --- @@ -28,22 +28,56 @@ import org.apache.spark.sql.catalyst.rules.Rule */ /** - * Make sure that a view's child plan produces the view's output attributes. We wrap the child - * with a Project and add an alias for each output attribute. The attributes are resolved by - * name. This should be only done after the batch of Resolution, because the view attributes are - * not completely resolved during the batch of Resolution. + * Make sure that a view's child plan produces the view's output attributes. We try to wrap the + * child by: + * 1. Generate the `queryOutput` by: + *1.1. If the query column names are defined, map the column names to attributes in the child + * output by name; + *1.2. Else set the child output attributes to `queryOutput`. + * 2. Map the `queryQutput` to view output by index, if the corresponding attributes don't match, + *try to up cast and alias the attribute in `queryOutput` to the attribute in the view output. + * 3. Add a Project over the child, with the new output generated by the previous steps. + * If the view output doesn't have the same number of columns neither with the child output, nor + * with the query column names, throw an AnalysisException. + * + * This should be only done after the batch of Resolution, because the view attributes are not + * completely resolved during the batch of Resolution. */ case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { -case v @ View(_, output, child) if child.resolved => +case v @ View(desc, output, child) if child.resolved => val resolver = conf.resolver - val newOutput = output.map { attr => -val originAttr = findAttributeByName(attr.name, child.output, resolver) -// The dataType of the output attributes may be not the same with that of the view output, -// so we should cast the attribute to the dataType of the view output attribute. If the -// cast can't perform, will throw an AnalysisException. -Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = attr.exprId, - qualifier = attr.qualifier, explicitMetadata = Some(attr.metadata)) + val queryColumnNames = desc.viewQueryColumnNames + // If the view output doesn't have the same number of columns either with the child output, + // or with the query column names, throw an AnalysisException. + if (output.length != child.output.length && output.length != queryColumnNames.length) { --- End diff -- the comment says `or` but the code use `&&`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16561#discussion_r96157931 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala --- @@ -680,21 +700,70 @@ class SQLViewSuite extends QueryTest with SQLTestUtils with TestHiveSingleton { } } - test("correctly handle type casting between view output and child output") { + test("resolve a view with custom column names") { withTable("testTable") { + spark.range(1, 10).selectExpr("id", "id + 1 id1").write.saveAsTable("testTable") withView("testView") { -spark.range(1, 10).toDF("id1").write.format("json").saveAsTable("testTable") -sql("CREATE VIEW testView AS SELECT * FROM testTable") +val testView = CatalogTable( + identifier = TableIdentifier("testView"), + tableType = CatalogTableType.VIEW, + storage = CatalogStorageFormat.empty, + schema = new StructType().add("x", "long").add("y", "long"), + viewOriginalText = Some("SELECT * FROM testTable"), + viewText = Some("SELECT * FROM testTable"), + properties = Map(CatalogTable.VIEW_DEFAULT_DATABASE -> "default", +CatalogTable.VIEW_QUERY_OUTPUT_COLUMN_NUM -> "2", +s"${CatalogTable.VIEW_QUERY_OUTPUT_COLUMN_NAME_PREFIX}0" -> "id", +s"${CatalogTable.VIEW_QUERY_OUTPUT_COLUMN_NAME_PREFIX}1" -> "id1")) +hiveContext.sessionState.catalog.createTable(testView, ignoreIfExists = false) + +// Correctly resolve a view with custom column names. +checkAnswer(sql("SELECT * FROM testView ORDER BY x"), (1 to 9).map(i => Row(i, i + 1))) --- End diff -- can we use `select x, y ...`? to test the custom column names really work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16561#discussion_r96157834 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala --- @@ -680,21 +700,70 @@ class SQLViewSuite extends QueryTest with SQLTestUtils with TestHiveSingleton { } } - test("correctly handle type casting between view output and child output") { + test("resolve a view with custom column names") { withTable("testTable") { + spark.range(1, 10).selectExpr("id", "id + 1 id1").write.saveAsTable("testTable") withView("testView") { -spark.range(1, 10).toDF("id1").write.format("json").saveAsTable("testTable") -sql("CREATE VIEW testView AS SELECT * FROM testTable") +val testView = CatalogTable( + identifier = TableIdentifier("testView"), + tableType = CatalogTableType.VIEW, + storage = CatalogStorageFormat.empty, + schema = new StructType().add("x", "long").add("y", "long"), --- End diff -- let's think about how we will persistent view when we have custom column names. Ideally we will have a logical plan representing the view, a SQL statement of the view query, and a `Seq[String]` for the custom column names. 1. call `plan.schema` to get the view schema, and zip it with custom column names, to get the final schema and save it. Then use `plan.schema.map(_.name)` to generate the `VIEW_QUERY_OUTPUT_COLUMN_NAME` in table properties. 2. call `plan.schema` to get the view schema, and save it as the final schema. Then use custom column names to generate the `VIEW_QUERY_OUTPUT_COLUMN_NAME` in table properties. Personally I think option 2 is more natural, what do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16561#discussion_r96157384 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -198,9 +203,44 @@ case class CatalogTable( /** * Return the default database name we use to resolve a view, should be None if the CatalogTable - * is not a View. + * is not a View or created by older versions of Spark(before 2.2.0). + */ + def viewDefaultDatabase: Option[String] = properties.get(VIEW_DEFAULT_DATABASE) + + /** + * Return the output column names of the query that creates a view, the column names are used to + * resolve a view, should be None if the CatalogTable is not a View or created by older versions + * of Spark(before 2.2.0). + */ + def viewQueryColumnNames: Seq[String] = { +for { + numCols <- properties.get(VIEW_QUERY_OUTPUT_COLUMN_NUM).toSeq + index <- 0 until numCols.toInt +} yield properties.getOrElse( + s"$VIEW_QUERY_OUTPUT_COLUMN_NAME_PREFIX$index", + throw new AnalysisException("Corrupted view query output column names in catalog: " + +s"$numCols parts expected, but part $index is missing.") +) + } + + /** + * Insert/Update the view query output column names in `properties`. */ - def viewDefaultDatabase: Option[String] = properties.get(CatalogTable.VIEW_DEFAULT_DATABASE) + def withQueryColumnNames(columns: Seq[String]): CatalogTable = { +val props = new mutable.HashMap[String, String] --- End diff -- let's follow the existing code for partition columns ``` properties.put(DATASOURCE_SCHEMA_NUMPARTCOLS, partitionColumns.length.toString) partitionColumns.zipWithIndex.foreach { case (partCol, index) => properties.put(s"$DATASOURCE_SCHEMA_PARTCOL_PREFIX$index", partCol) } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16561#discussion_r96157302 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -198,9 +203,44 @@ case class CatalogTable( /** * Return the default database name we use to resolve a view, should be None if the CatalogTable - * is not a View. + * is not a View or created by older versions of Spark(before 2.2.0). + */ + def viewDefaultDatabase: Option[String] = properties.get(VIEW_DEFAULT_DATABASE) + + /** + * Return the output column names of the query that creates a view, the column names are used to + * resolve a view, should be None if the CatalogTable is not a View or created by older versions + * of Spark(before 2.2.0). + */ + def viewQueryColumnNames: Seq[String] = { +for { + numCols <- properties.get(VIEW_QUERY_OUTPUT_COLUMN_NUM).toSeq --- End diff -- `.toSeq` is not needed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16561#discussion_r96157263 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -254,6 +294,9 @@ case class CatalogTable( object CatalogTable { val VIEW_DEFAULT_DATABASE = "view.default.database" + val VIEW_QUERY_OUTPUT_PREFIX = "view.query.out." + val VIEW_QUERY_OUTPUT_COLUMN_NUM = VIEW_QUERY_OUTPUT_PREFIX + "numCols" --- End diff -- nit: `xxx_NUM_COLUMNS` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16561#discussion_r96157156 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -198,9 +203,44 @@ case class CatalogTable( /** * Return the default database name we use to resolve a view, should be None if the CatalogTable - * is not a View. + * is not a View or created by older versions of Spark(before 2.2.0). + */ + def viewDefaultDatabase: Option[String] = properties.get(VIEW_DEFAULT_DATABASE) + + /** + * Return the output column names of the query that creates a view, the column names are used to + * resolve a view, should be None if the CatalogTable is not a View or created by older versions --- End diff -- should be `Nil` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16528: [SPARK-19148][SQL] do not expose the external table conc...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16528 cc @yhuai for final sign-off --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16561 **[Test build #71415 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71415/testReport)** for PR 16561 at commit [`d6537a5`](https://github.com/apache/spark/commit/d6537a5d66ba88cb827a6fc84a7ec8be79af5277). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16591: [SPARK-19227][CORE] remove ununsed imports and outdated ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16591 **[Test build #71414 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71414/testReport)** for PR 16591 at commit [`22405d1`](https://github.com/apache/spark/commit/22405d19a1ca6944162ddc330ea2dfc5a7c4638c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16587: [SPARK-19229] [SQL] Disallow Creating Hive Source...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16587#discussion_r96155690 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -1692,20 +1678,27 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach { test("truncate table - external table, temporary table, view (not allowed)") { import testImplicits._ -val path = Utils.createTempDir().getAbsolutePath -(1 to 10).map { i => (i, i) }.toDF("a", "b").createTempView("my_temp_tab") -sql(s"CREATE EXTERNAL TABLE my_ext_tab LOCATION '$path'") -sql(s"CREATE VIEW my_view AS SELECT 1") -intercept[NoSuchTableException] { - sql("TRUNCATE TABLE my_temp_tab") +withTempPath { tempDir => + withTable("my_ext_tab") { +(("a", "b") :: Nil).toDF().write.parquet(tempDir.getCanonicalPath) +(1 to 10).map { i => (i, i) }.toDF("a", "b").createTempView("my_temp_tab") +sql(s"CREATE TABLE my_ext_tab using parquet LOCATION '$tempDir'") --- End diff -- Ah, this is why you asked me https://github.com/apache/spark/pull/16586#discussion_r96142347. I just ran a test for this to help. ``` - truncate table - external table, temporary table, view (not allowed) *** FAILED *** (188 milliseconds) org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark arget mpspark-9e70280d-56dc-4063-8f40-8e62fec18394; at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:382) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) ``` Maybe, it'd be okay to just use `toURI` if this test is not supposed to test Windows path.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16591: [SPARK-19227][CORE] remove ununsed imports and ou...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/16591 [SPARK-19227][CORE] remove ununsed imports and outdated comments in `org.apache.spark.internal.config.ConfigEntry` ## What changes were proposed in this pull request? remove ununsed imports and outdated comments in `org.apache.spark.internal.config.ConfigEntry` ## How was this patch tested? existing ut You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark SPARK-19227 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16591.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16591 commit 22405d19a1ca6944162ddc330ea2dfc5a7c4638c Author: uncleGenDate: 2017-01-16T01:47:53Z SPARK-19227: remove ununsed imports and outdated comments in `org.apache.spark.internal.config.ConfigEntry` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org