spark git commit: [SPARK-17318][TESTS] Fix ReplSuite replicating blocks of object with class defined in repl again
Repository: spark Updated Branches: refs/heads/master aaf632b21 -> 21c0a4fe9 [SPARK-17318][TESTS] Fix ReplSuite replicating blocks of object with class defined in repl again ## What changes were proposed in this pull request? After digging into the logs, I noticed the failure is because in this test, it starts a local cluster with 2 executors. However, when SparkContext is created, executors may be still not up. When one of the executor is not up during running the job, the blocks won't be replicated. This PR just adds a wait loop before running the job to fix the flaky test. ## How was this patch tested? Jenkins Author: Shixiong Zhu Closes #14905 from zsxwing/SPARK-17318-2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/21c0a4fe Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/21c0a4fe Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/21c0a4fe Branch: refs/heads/master Commit: 21c0a4fe9d8e21819ba96e7dc2b1f2999d3299ae Parents: aaf632b Author: Shixiong Zhu Authored: Wed Aug 31 23:25:20 2016 -0700 Committer: Shixiong Zhu Committed: Wed Aug 31 23:25:20 2016 -0700 -- .../src/test/scala/org/apache/spark/repl/ReplSuite.scala| 9 + 1 file changed, 9 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/21c0a4fe/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala -- diff --git a/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala b/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala index f1284b1..f7d7a4f 100644 --- a/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala +++ b/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala @@ -399,6 +399,15 @@ class ReplSuite extends SparkFunSuite { test("replicating blocks of object with class defined in repl") { val output = runInterpreter("local-cluster[2,1,1024]", """ +|val timeout = 6 // 60 seconds +|val start = System.currentTimeMillis +|while(sc.getExecutorStorageStatus.size != 3 && +|(System.currentTimeMillis - start) < timeout) { +| Thread.sleep(10) +|} +|if (System.currentTimeMillis - start >= timeout) { +| throw new java.util.concurrent.TimeoutException("Executors were not up in 60 seconds") +|} |import org.apache.spark.storage.StorageLevel._ |case class Foo(i: Int) |val ret = sc.parallelize((1 to 100).map(Foo), 10).persist(MEMORY_AND_DISK_2) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17318][TESTS] Fix ReplSuite replicating blocks of object with class defined in repl again
Repository: spark Updated Branches: refs/heads/branch-2.0 8711b451d -> 6281b74b6 [SPARK-17318][TESTS] Fix ReplSuite replicating blocks of object with class defined in repl again ## What changes were proposed in this pull request? After digging into the logs, I noticed the failure is because in this test, it starts a local cluster with 2 executors. However, when SparkContext is created, executors may be still not up. When one of the executor is not up during running the job, the blocks won't be replicated. This PR just adds a wait loop before running the job to fix the flaky test. ## How was this patch tested? Jenkins Author: Shixiong Zhu Closes #14905 from zsxwing/SPARK-17318-2. (cherry picked from commit 21c0a4fe9d8e21819ba96e7dc2b1f2999d3299ae) Signed-off-by: Shixiong Zhu Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6281b74b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6281b74b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6281b74b Branch: refs/heads/branch-2.0 Commit: 6281b74b6965ffcd0600844cea168cbe71ca8248 Parents: 8711b45 Author: Shixiong Zhu Authored: Wed Aug 31 23:25:20 2016 -0700 Committer: Shixiong Zhu Committed: Wed Aug 31 23:25:27 2016 -0700 -- .../src/test/scala/org/apache/spark/repl/ReplSuite.scala| 9 + 1 file changed, 9 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6281b74b/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala -- diff --git a/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala b/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala index f1284b1..f7d7a4f 100644 --- a/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala +++ b/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala @@ -399,6 +399,15 @@ class ReplSuite extends SparkFunSuite { test("replicating blocks of object with class defined in repl") { val output = runInterpreter("local-cluster[2,1,1024]", """ +|val timeout = 6 // 60 seconds +|val start = System.currentTimeMillis +|while(sc.getExecutorStorageStatus.size != 3 && +|(System.currentTimeMillis - start) < timeout) { +| Thread.sleep(10) +|} +|if (System.currentTimeMillis - start >= timeout) { +| throw new java.util.concurrent.TimeoutException("Executors were not up in 60 seconds") +|} |import org.apache.spark.storage.StorageLevel._ |case class Foo(i: Int) |val ret = sc.parallelize((1 to 100).map(Foo), 10).persist(MEMORY_AND_DISK_2) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: revert PR#10896 and PR#14865
Repository: spark Updated Branches: refs/heads/master 7a5000f39 -> aaf632b21 revert PR#10896 and PR#14865 ## What changes were proposed in this pull request? according to the discussion in the original PR #10896 and the new approach PR #14876 , we decided to revert these 2 PRs and go with the new approach. ## How was this patch tested? N/A Author: Wenchen Fan Closes #14909 from cloud-fan/revert. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/aaf632b2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/aaf632b2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/aaf632b2 Branch: refs/heads/master Commit: aaf632b2132750c6970469b902d9308dbf36 Parents: 7a5000f Author: Wenchen Fan Authored: Thu Sep 1 13:19:15 2016 +0800 Committer: Wenchen Fan Committed: Thu Sep 1 13:19:15 2016 +0800 -- .../spark/sql/execution/SparkStrategies.scala | 17 +- .../sql/execution/aggregate/AggUtils.scala | 250 ++- .../sql/execution/aggregate/AggregateExec.scala | 56 - .../execution/aggregate/HashAggregateExec.scala | 22 +- .../execution/aggregate/SortAggregateExec.scala | 24 +- .../execution/exchange/EnsureRequirements.scala | 39 +-- .../org/apache/spark/sql/DataFrameSuite.scala | 15 +- .../spark/sql/execution/PlannerSuite.scala | 77 ++ 8 files changed, 223 insertions(+), 277 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/aaf632b2/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala index cda3b2b..4aaf454 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala @@ -259,17 +259,24 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { } val aggregateOperator = - if (functionsWithDistinct.isEmpty) { + if (aggregateExpressions.map(_.aggregateFunction).exists(!_.supportsPartial)) { +if (functionsWithDistinct.nonEmpty) { + sys.error("Distinct columns cannot exist in Aggregate operator containing " + +"aggregate functions which don't support partial aggregation.") +} else { + aggregate.AggUtils.planAggregateWithoutPartial( +groupingExpressions, +aggregateExpressions, +resultExpressions, +planLater(child)) +} + } else if (functionsWithDistinct.isEmpty) { aggregate.AggUtils.planAggregateWithoutDistinct( groupingExpressions, aggregateExpressions, resultExpressions, planLater(child)) } else { -if (aggregateExpressions.map(_.aggregateFunction).exists(!_.supportsPartial)) { - sys.error("Distinct columns cannot exist in Aggregate operator containing " + -"aggregate functions which don't support partial aggregation.") -} aggregate.AggUtils.planAggregateWithOneDistinct( groupingExpressions, functionsWithDistinct, http://git-wip-us.apache.org/repos/asf/spark/blob/aaf632b2/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala index fe75ece..4fbb9d5 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala @@ -19,97 +19,34 @@ package org.apache.spark.sql.execution.aggregate import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.expressions.aggregate._ -import org.apache.spark.sql.catalyst.plans.physical.Distribution import org.apache.spark.sql.execution.SparkPlan import org.apache.spark.sql.execution.streaming.{StateStoreRestoreExec, StateStoreSaveExec} /** - * A pattern that finds aggregate operators to support partial aggregations. - */ -object PartialAggregate { - - def unapply(plan: SparkPlan): Option[Distribution] = plan match { -case agg: AggregateExec if AggUtils.supportPartialAggregate(agg.aggregateExpressions) => - Some(agg.requiredChildDistribution.head) -case _ => - None - } -} - -/** * Utility functions used by the query
spark git commit: [SPARK-17241][SPARKR][MLLIB] SparkR spark.glm should have configurable regularization parameter
Repository: spark Updated Branches: refs/heads/master d008638fb -> 7a5000f39 [SPARK-17241][SPARKR][MLLIB] SparkR spark.glm should have configurable regularization parameter https://issues.apache.org/jira/browse/SPARK-17241 ## What changes were proposed in this pull request? Spark has configurable L2 regularization parameter for generalized linear regression. It is very important to have them in SparkR so that users can run ridge regression. ## How was this patch tested? Test manually on local laptop. Author: Xin Ren Closes #14856 from keypointt/SPARK-17241. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7a5000f3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7a5000f3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7a5000f3 Branch: refs/heads/master Commit: 7a5000f39ef4f195696836f8a4e8ab4ff5c14dd2 Parents: d008638 Author: Xin Ren Authored: Wed Aug 31 21:39:31 2016 -0700 Committer: Shivaram Venkataraman Committed: Wed Aug 31 21:39:31 2016 -0700 -- R/pkg/R/mllib.R | 10 +++-- R/pkg/inst/tests/testthat/test_mllib.R | 6 +++ .../r/GeneralizedLinearRegressionWrapper.scala | 4 +- .../GeneralizedLinearRegressionSuite.scala | 40 4 files changed, 55 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/7a5000f3/R/pkg/R/mllib.R -- diff --git a/R/pkg/R/mllib.R b/R/pkg/R/mllib.R index 64d19fa..9a53f75 100644 --- a/R/pkg/R/mllib.R +++ b/R/pkg/R/mllib.R @@ -138,10 +138,11 @@ predict_internal <- function(object, newData) { #' This can be a character string naming a family function, a family function or #' the result of a call to a family function. Refer R family at #' \url{https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html}. -#' @param weightCol the weight column name. If this is not set or \code{NULL}, we treat all instance -#' weights as 1.0. #' @param tol positive convergence tolerance of iterations. #' @param maxIter integer giving the maximal number of IRLS iterations. +#' @param weightCol the weight column name. If this is not set or \code{NULL}, we treat all instance +#' weights as 1.0. +#' @param regParam regularization parameter for L2 regularization. #' @param ... additional arguments passed to the method. #' @aliases spark.glm,SparkDataFrame,formula-method #' @return \code{spark.glm} returns a fitted generalized linear model @@ -171,7 +172,8 @@ predict_internal <- function(object, newData) { #' @note spark.glm since 2.0.0 #' @seealso \link{glm}, \link{read.ml} setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"), - function(data, formula, family = gaussian, tol = 1e-6, maxIter = 25, weightCol = NULL) { + function(data, formula, family = gaussian, tol = 1e-6, maxIter = 25, weightCol = NULL, + regParam = 0.0) { if (is.character(family)) { family <- get(family, mode = "function", envir = parent.frame()) } @@ -190,7 +192,7 @@ setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"), jobj <- callJStatic("org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper", "fit", formula, data@sdf, family$family, family$link, -tol, as.integer(maxIter), as.character(weightCol)) +tol, as.integer(maxIter), as.character(weightCol), regParam) new("GeneralizedLinearRegressionModel", jobj = jobj) }) http://git-wip-us.apache.org/repos/asf/spark/blob/7a5000f3/R/pkg/inst/tests/testthat/test_mllib.R -- diff --git a/R/pkg/inst/tests/testthat/test_mllib.R b/R/pkg/inst/tests/testthat/test_mllib.R index 1e6da65..825a240 100644 --- a/R/pkg/inst/tests/testthat/test_mllib.R +++ b/R/pkg/inst/tests/testthat/test_mllib.R @@ -148,6 +148,12 @@ test_that("spark.glm summary", { baseModel <- stats::glm(Sepal.Width ~ Sepal.Length + Species, data = iris) baseSummary <- summary(baseModel) expect_true(abs(baseSummary$deviance - 12.19313) < 1e-4) + + # Test spark.glm works with regularization parameter + data <- as.data.frame(cbind(a1, a2, b)) + df <- suppressWarnings(createDataFrame(data)) + regStats <- summary(spark.glm(df, b ~ a1 + a2, regParam = 1.0)) + expect_equal(regStats$aic, 13.32836, tolerance = 1e-4) # 13.32836 is from summary() result }) test_that("spark.glm save/load", { http://git-wip-us.apache.org/repos/asf/spark/blob/7a5000f3/mllib/src/main/scala/org/apache/spark/ml/r/Gen
spark git commit: [SPARKR][MINOR] Fix windowPartitionBy example
Repository: spark Updated Branches: refs/heads/branch-2.0 191d99692 -> 8711b451d [SPARKR][MINOR] Fix windowPartitionBy example ## What changes were proposed in this pull request? The usage in the original example is incorrect. This PR fixes it. ## How was this patch tested? Manual test. Author: Junyang Qian Closes #14903 from junyangq/SPARKR-FixWindowPartitionByDoc. (cherry picked from commit d008638fbedc857c1adc1dff399d427b8bae848e) Signed-off-by: Shivaram Venkataraman Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8711b451 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8711b451 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8711b451 Branch: refs/heads/branch-2.0 Commit: 8711b451d727074173748418a47cec210f84f2f7 Parents: 191d996 Author: Junyang Qian Authored: Wed Aug 31 21:28:53 2016 -0700 Committer: Shivaram Venkataraman Committed: Wed Aug 31 21:29:05 2016 -0700 -- R/pkg/R/window.R | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8711b451/R/pkg/R/window.R -- diff --git a/R/pkg/R/window.R b/R/pkg/R/window.R index 215d0e7..0799d84 100644 --- a/R/pkg/R/window.R +++ b/R/pkg/R/window.R @@ -21,9 +21,9 @@ #' #' Creates a WindowSpec with the partitioning defined. #' -#' @param col A column name or Column by which rows are partitioned to +#' @param col A column name or Column by which rows are partitioned to #'windows. -#' @param ... Optional column names or Columns in addition to col, by +#' @param ... Optional column names or Columns in addition to col, by #'which rows are partitioned to windows. #' #' @rdname windowPartitionBy @@ -32,10 +32,10 @@ #' @export #' @examples #' \dontrun{ -#' ws <- windowPartitionBy("key1", "key2") +#' ws <- orderBy(windowPartitionBy("key1", "key2"), "key3") #' df1 <- select(df, over(lead("value", 1), ws)) #' -#' ws <- windowPartitionBy(df$key1, df$key2) +#' ws <- orderBy(windowPartitionBy(df$key1, df$key2), df$key3) #' df1 <- select(df, over(lead("value", 1), ws)) #' } #' @note windowPartitionBy(character) since 2.0.0 @@ -70,9 +70,9 @@ setMethod("windowPartitionBy", #' #' Creates a WindowSpec with the ordering defined. #' -#' @param col A column name or Column by which rows are ordered within +#' @param col A column name or Column by which rows are ordered within #'windows. -#' @param ... Optional column names or Columns in addition to col, by +#' @param ... Optional column names or Columns in addition to col, by #'which rows are ordered within windows. #' #' @rdname windowOrderBy - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARKR][MINOR] Fix windowPartitionBy example
Repository: spark Updated Branches: refs/heads/master 2f9c27364 -> d008638fb [SPARKR][MINOR] Fix windowPartitionBy example ## What changes were proposed in this pull request? The usage in the original example is incorrect. This PR fixes it. ## How was this patch tested? Manual test. Author: Junyang Qian Closes #14903 from junyangq/SPARKR-FixWindowPartitionByDoc. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d008638f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d008638f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d008638f Branch: refs/heads/master Commit: d008638fbedc857c1adc1dff399d427b8bae848e Parents: 2f9c273 Author: Junyang Qian Authored: Wed Aug 31 21:28:53 2016 -0700 Committer: Shivaram Venkataraman Committed: Wed Aug 31 21:28:53 2016 -0700 -- R/pkg/R/window.R | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d008638f/R/pkg/R/window.R -- diff --git a/R/pkg/R/window.R b/R/pkg/R/window.R index 215d0e7..0799d84 100644 --- a/R/pkg/R/window.R +++ b/R/pkg/R/window.R @@ -21,9 +21,9 @@ #' #' Creates a WindowSpec with the partitioning defined. #' -#' @param col A column name or Column by which rows are partitioned to +#' @param col A column name or Column by which rows are partitioned to #'windows. -#' @param ... Optional column names or Columns in addition to col, by +#' @param ... Optional column names or Columns in addition to col, by #'which rows are partitioned to windows. #' #' @rdname windowPartitionBy @@ -32,10 +32,10 @@ #' @export #' @examples #' \dontrun{ -#' ws <- windowPartitionBy("key1", "key2") +#' ws <- orderBy(windowPartitionBy("key1", "key2"), "key3") #' df1 <- select(df, over(lead("value", 1), ws)) #' -#' ws <- windowPartitionBy(df$key1, df$key2) +#' ws <- orderBy(windowPartitionBy(df$key1, df$key2), df$key3) #' df1 <- select(df, over(lead("value", 1), ws)) #' } #' @note windowPartitionBy(character) since 2.0.0 @@ -70,9 +70,9 @@ setMethod("windowPartitionBy", #' #' Creates a WindowSpec with the ordering defined. #' -#' @param col A column name or Column by which rows are ordered within +#' @param col A column name or Column by which rows are ordered within #'windows. -#' @param ... Optional column names or Columns in addition to col, by +#' @param ... Optional column names or Columns in addition to col, by #'which rows are ordered within windows. #' #' @rdname windowOrderBy - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17180][SPARK-17309][SPARK-17323][SQL][2.0] create AlterViewAsCommand to handle ALTER VIEW AS
Repository: spark Updated Branches: refs/heads/branch-2.0 8d15c1a6a -> 191d99692 [SPARK-17180][SPARK-17309][SPARK-17323][SQL][2.0] create AlterViewAsCommand to handle ALTER VIEW AS ## What changes were proposed in this pull request? Currently we use `CreateViewCommand` to implement ALTER VIEW AS, which has 3 bugs: 1. SPARK-17180: ALTER VIEW AS should alter temp view if view name has no database part and temp view exists 2. SPARK-17309: ALTER VIEW AS should issue exception if view does not exist. 3. SPARK-17323: ALTER VIEW AS should keep the previous table properties, comment, create_time, etc. The root cause is, ALTER VIEW AS is quite different from CREATE VIEW, we need different code path to handle them. However, in `CreateViewCommand`, there is no way to distinguish ALTER VIEW AS and CREATE VIEW, we have to introduce extra flag. But instead of doing this, I think a more natural way is to separate the ALTER VIEW AS logic into a new command. backport https://github.com/apache/spark/pull/14874 to 2.0 ## How was this patch tested? new tests in SQLViewSuite Author: Wenchen Fan Closes #14893 from cloud-fan/minor4. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/191d9969 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/191d9969 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/191d9969 Branch: refs/heads/branch-2.0 Commit: 191d99692dc4315c371b566e3a9c5b114876ee49 Parents: 8d15c1a Author: Wenchen Fan Authored: Thu Sep 1 08:54:59 2016 +0800 Committer: Wenchen Fan Committed: Thu Sep 1 08:54:59 2016 +0800 -- .../spark/sql/execution/SparkSqlParser.scala| 74 .../spark/sql/execution/command/views.scala | 71 +-- .../spark/sql/hive/execution/SQLViewSuite.scala | 71 +++ 3 files changed, 167 insertions(+), 49 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/191d9969/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala index 876b334..3072a6d 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala @@ -1250,60 +1250,44 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { val schema = identifiers.map { ic => CatalogColumn(ic.identifier.getText, null, nullable = true, Option(ic.STRING).map(string)) } - createView( -ctx, -ctx.tableIdentifier, -comment = Option(ctx.STRING).map(string), -schema, -ctx.query, - Option(ctx.tablePropertyList).map(visitPropertyKeyValues).getOrElse(Map.empty), -ctx.EXISTS != null, -ctx.REPLACE != null, -ctx.TEMPORARY != null - ) + + val sql = Option(source(ctx.query)) + val tableDesc = CatalogTable( +identifier = visitTableIdentifier(ctx.tableIdentifier), +tableType = CatalogTableType.VIEW, +schema = schema, +storage = CatalogStorageFormat.empty, +properties = Option(ctx.tablePropertyList).map(visitPropertyKeyValues).getOrElse(Map.empty), +viewOriginalText = sql, +viewText = sql, +comment = Option(ctx.STRING).map(string)) + + CreateViewCommand( +tableDesc, +plan(ctx.query), +allowExisting = ctx.EXISTS != null, +replace = ctx.REPLACE != null, +isTemporary = ctx.TEMPORARY != null) } } /** - * Alter the query of a view. This creates a [[CreateViewCommand]] command. + * Alter the query of a view. This creates a [[AlterViewAsCommand]] command. + * + * For example: + * {{{ + * ALTER VIEW [db_name.]view_name AS SELECT ...; + * }}} */ override def visitAlterViewQuery(ctx: AlterViewQueryContext): LogicalPlan = withOrigin(ctx) { -createView( - ctx, - ctx.tableIdentifier, - comment = None, - Seq.empty, - ctx.query, - Map.empty, - allowExist = false, - replace = true, - isTemporary = false) - } - - /** - * Create a [[CreateViewCommand]] command. - */ - private def createView( - ctx: ParserRuleContext, - name: TableIdentifierContext, - comment: Option[String], - schema: Seq[CatalogColumn], - query: QueryContext, - properties: Map[String, String], - allowExist: Boolean, - replace: Boolean, - isTemporary: Boolean): LogicalPlan = { -val sql = Option(source(query)) val tableDesc = CatalogTable( - identifier = visitTableIdentifier(name), +
spark git commit: [SPARK-16581][SPARKR] Fix JVM API tests in SparkR
Repository: spark Updated Branches: refs/heads/branch-2.0 d01251c92 -> 8d15c1a6a [SPARK-16581][SPARKR] Fix JVM API tests in SparkR ## What changes were proposed in this pull request? Remove cleanup.jobj test. Use JVM wrapper API for other test cases. ## How was this patch tested? Run R unit tests with testthat 1.0 Author: Shivaram Venkataraman Closes #14904 from shivaram/sparkr-jvm-tests-fix. (cherry picked from commit 2f9c27364ea00473933213700edb93b63b55b313) Signed-off-by: Shivaram Venkataraman Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8d15c1a6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8d15c1a6 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8d15c1a6 Branch: refs/heads/branch-2.0 Commit: 8d15c1a6a0ac2e57b537c370a8e8283d56ca290e Parents: d01251c Author: Shivaram Venkataraman Authored: Wed Aug 31 16:56:41 2016 -0700 Committer: Shivaram Venkataraman Committed: Wed Aug 31 16:56:51 2016 -0700 -- R/pkg/inst/tests/testthat/test_jvm_api.R | 15 --- 1 file changed, 4 insertions(+), 11 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8d15c1a6/R/pkg/inst/tests/testthat/test_jvm_api.R -- diff --git a/R/pkg/inst/tests/testthat/test_jvm_api.R b/R/pkg/inst/tests/testthat/test_jvm_api.R index 151c529..7348c89 100644 --- a/R/pkg/inst/tests/testthat/test_jvm_api.R +++ b/R/pkg/inst/tests/testthat/test_jvm_api.R @@ -20,24 +20,17 @@ context("JVM API") sparkSession <- sparkR.session(enableHiveSupport = FALSE) test_that("Create and call methods on object", { - jarr <- newJObject("java.util.ArrayList") + jarr <- sparkR.newJObject("java.util.ArrayList") # Add an element to the array - callJMethod(jarr, "add", 1L) + sparkR.callJMethod(jarr, "add", 1L) # Check if get returns the same element - expect_equal(callJMethod(jarr, "get", 0L), 1L) + expect_equal(sparkR.callJMethod(jarr, "get", 0L), 1L) }) test_that("Call static methods", { # Convert a boolean to a string - strTrue <- callJStatic("java.lang.String", "valueOf", TRUE) + strTrue <- sparkR.callJStatic("java.lang.String", "valueOf", TRUE) expect_equal(strTrue, "true") }) -test_that("Manually garbage collect objects", { - jarr <- newJObject("java.util.ArrayList") - cleanup.jobj(jarr) - # Using a jobj after GC should throw an error - expect_error(print(jarr), "Error in invokeJava.*") -}) - sparkR.session.stop() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16581][SPARKR] Fix JVM API tests in SparkR
Repository: spark Updated Branches: refs/heads/master d375c8a3d -> 2f9c27364 [SPARK-16581][SPARKR] Fix JVM API tests in SparkR ## What changes were proposed in this pull request? Remove cleanup.jobj test. Use JVM wrapper API for other test cases. ## How was this patch tested? Run R unit tests with testthat 1.0 Author: Shivaram Venkataraman Closes #14904 from shivaram/sparkr-jvm-tests-fix. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2f9c2736 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2f9c2736 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2f9c2736 Branch: refs/heads/master Commit: 2f9c27364ea00473933213700edb93b63b55b313 Parents: d375c8a Author: Shivaram Venkataraman Authored: Wed Aug 31 16:56:41 2016 -0700 Committer: Shivaram Venkataraman Committed: Wed Aug 31 16:56:41 2016 -0700 -- R/pkg/inst/tests/testthat/test_jvm_api.R | 15 --- 1 file changed, 4 insertions(+), 11 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2f9c2736/R/pkg/inst/tests/testthat/test_jvm_api.R -- diff --git a/R/pkg/inst/tests/testthat/test_jvm_api.R b/R/pkg/inst/tests/testthat/test_jvm_api.R index 151c529..7348c89 100644 --- a/R/pkg/inst/tests/testthat/test_jvm_api.R +++ b/R/pkg/inst/tests/testthat/test_jvm_api.R @@ -20,24 +20,17 @@ context("JVM API") sparkSession <- sparkR.session(enableHiveSupport = FALSE) test_that("Create and call methods on object", { - jarr <- newJObject("java.util.ArrayList") + jarr <- sparkR.newJObject("java.util.ArrayList") # Add an element to the array - callJMethod(jarr, "add", 1L) + sparkR.callJMethod(jarr, "add", 1L) # Check if get returns the same element - expect_equal(callJMethod(jarr, "get", 0L), 1L) + expect_equal(sparkR.callJMethod(jarr, "get", 0L), 1L) }) test_that("Call static methods", { # Convert a boolean to a string - strTrue <- callJStatic("java.lang.String", "valueOf", TRUE) + strTrue <- sparkR.callJStatic("java.lang.String", "valueOf", TRUE) expect_equal(strTrue, "true") }) -test_that("Manually garbage collect objects", { - jarr <- newJObject("java.util.ArrayList") - cleanup.jobj(jarr) - # Using a jobj after GC should throw an error - expect_error(print(jarr), "Error in invokeJava.*") -}) - sparkR.session.stop() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17316][TESTS] Fix MesosCoarseGrainedSchedulerBackendSuite
Repository: spark Updated Branches: refs/heads/master 50bb14233 -> d375c8a3d [SPARK-17316][TESTS] Fix MesosCoarseGrainedSchedulerBackendSuite ## What changes were proposed in this pull request? The master is broken because #14882 didn't run mesos tests. ## How was this patch tested? Jenkins unit tests. Author: Shixiong Zhu Closes #14902 from zsxwing/hotfix. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d375c8a3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d375c8a3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d375c8a3 Branch: refs/heads/master Commit: d375c8a3de1d253c485078f55eb9c5b928ab96d5 Parents: 50bb142 Author: Shixiong Zhu Authored: Wed Aug 31 15:25:13 2016 -0700 Committer: Shixiong Zhu Committed: Wed Aug 31 15:25:13 2016 -0700 -- .../cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala| 2 ++ 1 file changed, 2 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d375c8a3/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala -- diff --git a/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala b/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala index c063797..d98ddb2 100644 --- a/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala +++ b/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala @@ -19,6 +19,7 @@ package org.apache.spark.scheduler.cluster.mesos import scala.collection.JavaConverters._ import scala.collection.mutable.ArrayBuffer +import scala.concurrent.Promise import scala.reflect.ClassTag import org.apache.mesos.{Protos, Scheduler, SchedulerDriver} @@ -511,6 +512,7 @@ class MesosCoarseGrainedSchedulerBackendSuite extends SparkFunSuite when(taskScheduler.sc).thenReturn(sc) externalShuffleClient = mock[MesosExternalShuffleClient] driverEndpoint = mock[RpcEndpointRef] +when(driverEndpoint.ask(any())(any())).thenReturn(Promise().future) backend = createSchedulerBackend(taskScheduler, driver, externalShuffleClient, driverEndpoint) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17316][TESTS] Fix MesosCoarseGrainedSchedulerBackendSuite
Repository: spark Updated Branches: refs/heads/branch-2.0 ad3689261 -> d01251c92 [SPARK-17316][TESTS] Fix MesosCoarseGrainedSchedulerBackendSuite ## What changes were proposed in this pull request? The master is broken because #14882 didn't run mesos tests. ## How was this patch tested? Jenkins unit tests. Author: Shixiong Zhu Closes #14902 from zsxwing/hotfix. (cherry picked from commit d375c8a3de1d253c485078f55eb9c5b928ab96d5) Signed-off-by: Shixiong Zhu Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d01251c9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d01251c9 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d01251c9 Branch: refs/heads/branch-2.0 Commit: d01251c928ce76e22d081a3764134f44ffe9aa86 Parents: ad36892 Author: Shixiong Zhu Authored: Wed Aug 31 15:25:13 2016 -0700 Committer: Shixiong Zhu Committed: Wed Aug 31 15:25:21 2016 -0700 -- .../cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala| 2 ++ 1 file changed, 2 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d01251c9/core/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala index f6ec167..12c4a79 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala @@ -21,6 +21,7 @@ import java.util.Collections import scala.collection.JavaConverters._ import scala.collection.mutable.ArrayBuffer +import scala.concurrent.Promise import scala.reflect.ClassTag import org.apache.mesos.{Protos, Scheduler, SchedulerDriver} @@ -410,6 +411,7 @@ class MesosCoarseGrainedSchedulerBackendSuite extends SparkFunSuite when(taskScheduler.sc).thenReturn(sc) externalShuffleClient = mock[MesosExternalShuffleClient] driverEndpoint = mock[RpcEndpointRef] +when(driverEndpoint.ask(any())(any())).thenReturn(Promise().future) backend = createSchedulerBackend(taskScheduler, driver, externalShuffleClient, driverEndpoint) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17326][SPARKR] Fix tests with HiveContext in SparkR not to be skipped always
Repository: spark Updated Branches: refs/heads/branch-2.0 c17334e47 -> ad3689261 [SPARK-17326][SPARKR] Fix tests with HiveContext in SparkR not to be skipped always ## What changes were proposed in this pull request? Currently, `HiveContext` in SparkR is not being tested and always skipped. This is because the initiation of `TestHiveContext` is being failed due to trying to load non-existing data paths (test tables). This is introduced from https://github.com/apache/spark/pull/14005 This enables the tests with SparkR. ## How was this patch tested? Manually, **Before** (on Mac OS) ``` ... Skipped 1. create DataFrame from RDD (test_sparkSQL.R#200) - Hive is not build with SparkSQL, skipped 2. test HiveContext (test_sparkSQL.R#1041) - Hive is not build with SparkSQL, skipped 3. read/write ORC files (test_sparkSQL.R#1748) - Hive is not build with SparkSQL, skipped 4. enableHiveSupport on SparkSession (test_sparkSQL.R#2480) - Hive is not build with SparkSQL, skipped 5. sparkJars tag in SparkContext (test_Windows.R#21) - This test is only for Windows, skipped ... ``` **After** (on Mac OS) ``` ... Skipped 1. sparkJars tag in SparkContext (test_Windows.R#21) - This test is only for Windows, skipped ... ``` Please refer the tests below (on Windows) - Before: https://ci.appveyor.com/project/HyukjinKwon/spark/build/45-test123 - After: https://ci.appveyor.com/project/HyukjinKwon/spark/build/46-test123 Author: hyukjinkwon Closes #14889 from HyukjinKwon/SPARK-17326. (cherry picked from commit 50bb142332d1147861def692bf63f0055ecb8576) Signed-off-by: Shivaram Venkataraman Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ad368926 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ad368926 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ad368926 Branch: refs/heads/branch-2.0 Commit: ad368926101efadf7b9f95ec1c95989f0c0a2855 Parents: c17334e Author: hyukjinkwon Authored: Wed Aug 31 14:02:21 2016 -0700 Committer: Shivaram Venkataraman Committed: Wed Aug 31 14:02:32 2016 -0700 -- R/pkg/inst/tests/testthat/test_sparkSQL.R | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ad368926/R/pkg/inst/tests/testthat/test_sparkSQL.R -- diff --git a/R/pkg/inst/tests/testthat/test_sparkSQL.R b/R/pkg/inst/tests/testthat/test_sparkSQL.R index 0aea89d..279d512 100644 --- a/R/pkg/inst/tests/testthat/test_sparkSQL.R +++ b/R/pkg/inst/tests/testthat/test_sparkSQL.R @@ -39,7 +39,7 @@ setHiveContext <- function(sc) { # initialize once and reuse ssc <- callJMethod(sc, "sc") hiveCtx <- tryCatch({ - newJObject("org.apache.spark.sql.hive.test.TestHiveContext", ssc) + newJObject("org.apache.spark.sql.hive.test.TestHiveContext", ssc, FALSE) }, error = function(err) { skip("Hive is not build with SparkSQL, skipped") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17326][SPARKR] Fix tests with HiveContext in SparkR not to be skipped always
Repository: spark Updated Branches: refs/heads/master 5d84c7fd8 -> 50bb14233 [SPARK-17326][SPARKR] Fix tests with HiveContext in SparkR not to be skipped always ## What changes were proposed in this pull request? Currently, `HiveContext` in SparkR is not being tested and always skipped. This is because the initiation of `TestHiveContext` is being failed due to trying to load non-existing data paths (test tables). This is introduced from https://github.com/apache/spark/pull/14005 This enables the tests with SparkR. ## How was this patch tested? Manually, **Before** (on Mac OS) ``` ... Skipped 1. create DataFrame from RDD (test_sparkSQL.R#200) - Hive is not build with SparkSQL, skipped 2. test HiveContext (test_sparkSQL.R#1041) - Hive is not build with SparkSQL, skipped 3. read/write ORC files (test_sparkSQL.R#1748) - Hive is not build with SparkSQL, skipped 4. enableHiveSupport on SparkSession (test_sparkSQL.R#2480) - Hive is not build with SparkSQL, skipped 5. sparkJars tag in SparkContext (test_Windows.R#21) - This test is only for Windows, skipped ... ``` **After** (on Mac OS) ``` ... Skipped 1. sparkJars tag in SparkContext (test_Windows.R#21) - This test is only for Windows, skipped ... ``` Please refer the tests below (on Windows) - Before: https://ci.appveyor.com/project/HyukjinKwon/spark/build/45-test123 - After: https://ci.appveyor.com/project/HyukjinKwon/spark/build/46-test123 Author: hyukjinkwon Closes #14889 from HyukjinKwon/SPARK-17326. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/50bb1423 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/50bb1423 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/50bb1423 Branch: refs/heads/master Commit: 50bb142332d1147861def692bf63f0055ecb8576 Parents: 5d84c7f Author: hyukjinkwon Authored: Wed Aug 31 14:02:21 2016 -0700 Committer: Shivaram Venkataraman Committed: Wed Aug 31 14:02:21 2016 -0700 -- R/pkg/inst/tests/testthat/test_sparkSQL.R | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/50bb1423/R/pkg/inst/tests/testthat/test_sparkSQL.R -- diff --git a/R/pkg/inst/tests/testthat/test_sparkSQL.R b/R/pkg/inst/tests/testthat/test_sparkSQL.R index 3ccb8b6..8ff56eb 100644 --- a/R/pkg/inst/tests/testthat/test_sparkSQL.R +++ b/R/pkg/inst/tests/testthat/test_sparkSQL.R @@ -39,7 +39,7 @@ setHiveContext <- function(sc) { # initialize once and reuse ssc <- callJMethod(sc, "sc") hiveCtx <- tryCatch({ - newJObject("org.apache.spark.sql.hive.test.TestHiveContext", ssc) + newJObject("org.apache.spark.sql.hive.test.TestHiveContext", ssc, FALSE) }, error = function(err) { skip("Hive is not build with SparkSQL, skipped") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17332][CORE] Make Java Loggers static members
Repository: spark Updated Branches: refs/heads/master 9bcb33c54 -> 5d84c7fd8 [SPARK-17332][CORE] Make Java Loggers static members ## What changes were proposed in this pull request? Make all Java Loggers static members ## How was this patch tested? Jenkins Author: Sean Owen Closes #14896 from srowen/SPARK-17332. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5d84c7fd Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5d84c7fd Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5d84c7fd Branch: refs/heads/master Commit: 5d84c7fd83502aeb551d46a740502db4862508fe Parents: 9bcb33c Author: Sean Owen Authored: Wed Aug 31 11:09:14 2016 -0700 Committer: Marcelo Vanzin Committed: Wed Aug 31 11:09:14 2016 -0700 -- .../src/main/java/org/apache/spark/network/TransportContext.java | 2 +- .../java/org/apache/spark/network/client/TransportClient.java | 2 +- .../org/apache/spark/network/client/TransportClientFactory.java | 2 +- .../org/apache/spark/network/client/TransportResponseHandler.java | 2 +- .../java/org/apache/spark/network/protocol/MessageDecoder.java| 3 ++- .../java/org/apache/spark/network/protocol/MessageEncoder.java| 2 +- .../java/org/apache/spark/network/sasl/SaslClientBootstrap.java | 2 +- .../main/java/org/apache/spark/network/sasl/SparkSaslClient.java | 2 +- .../main/java/org/apache/spark/network/sasl/SparkSaslServer.java | 2 +- .../org/apache/spark/network/server/OneForOneStreamManager.java | 2 +- .../src/main/java/org/apache/spark/network/server/RpcHandler.java | 2 +- .../org/apache/spark/network/server/TransportChannelHandler.java | 2 +- .../org/apache/spark/network/server/TransportRequestHandler.java | 2 +- .../java/org/apache/spark/network/server/TransportServer.java | 2 +- .../java/org/apache/spark/network/sasl/ShuffleSecretManager.java | 3 ++- .../apache/spark/network/shuffle/ExternalShuffleBlockHandler.java | 2 +- .../org/apache/spark/network/shuffle/ExternalShuffleClient.java | 2 +- .../org/apache/spark/network/shuffle/OneForOneBlockFetcher.java | 2 +- .../org/apache/spark/network/shuffle/RetryingBlockFetcher.java| 2 +- .../spark/network/shuffle/mesos/MesosExternalShuffleClient.java | 2 +- .../java/org/apache/spark/network/yarn/YarnShuffleService.java| 2 +- core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java | 2 +- .../apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java | 2 +- .../java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java | 2 +- .../java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java | 2 +- .../main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java| 2 +- .../spark/util/collection/unsafe/sort/UnsafeExternalSorter.java | 2 +- 27 files changed, 29 insertions(+), 27 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/5d84c7fd/common/network-common/src/main/java/org/apache/spark/network/TransportContext.java -- diff --git a/common/network-common/src/main/java/org/apache/spark/network/TransportContext.java b/common/network-common/src/main/java/org/apache/spark/network/TransportContext.java index 5320b28..5b69e2b 100644 --- a/common/network-common/src/main/java/org/apache/spark/network/TransportContext.java +++ b/common/network-common/src/main/java/org/apache/spark/network/TransportContext.java @@ -56,7 +56,7 @@ import org.apache.spark.network.util.TransportFrameDecoder; * processes to send messages back to the client on an existing channel. */ public class TransportContext { - private final Logger logger = LoggerFactory.getLogger(TransportContext.class); + private static final Logger logger = LoggerFactory.getLogger(TransportContext.class); private final TransportConf conf; private final RpcHandler rpcHandler; http://git-wip-us.apache.org/repos/asf/spark/blob/5d84c7fd/common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java -- diff --git a/common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java b/common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java index a67683b..600b80e 100644 --- a/common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java +++ b/common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java @@ -72,7 +72,7 @@ import static org.apache.spark.network.util.NettyUtils.getRemoteAddress; * Concurrency: thread safe and can be called from multiple threads. */ public class TransportClient implements Closeable { - private final Logger logger = LoggerFactory.getLogger(TransportClient.class); + pri
spark git commit: [SPARK-17316][CORE] Make CoarseGrainedSchedulerBackend.removeExecutor non-blocking
Repository: spark Updated Branches: refs/heads/master 0611b3a2b -> 9bcb33c54 [SPARK-17316][CORE] Make CoarseGrainedSchedulerBackend.removeExecutor non-blocking ## What changes were proposed in this pull request? StandaloneSchedulerBackend.executorRemoved is a blocking call right now. It may cause some deadlock since it's called inside StandaloneAppClient.ClientEndpoint. This PR just changed CoarseGrainedSchedulerBackend.removeExecutor to be non-blocking. It's safe since the only two usages (StandaloneSchedulerBackend and YarnSchedulerEndpoint) don't need the return value). ## How was this patch tested? Jenkins unit tests. Author: Shixiong Zhu Closes #14882 from zsxwing/SPARK-17316. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9bcb33c5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9bcb33c5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9bcb33c5 Branch: refs/heads/master Commit: 9bcb33c54117cebc9e087017bf4e4163edaeff17 Parents: 0611b3a Author: Shixiong Zhu Authored: Wed Aug 31 10:56:02 2016 -0700 Committer: Marcelo Vanzin Committed: Wed Aug 31 10:56:02 2016 -0700 -- .../cluster/CoarseGrainedSchedulerBackend.scala| 17 + 1 file changed, 9 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/9bcb33c5/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala -- diff --git a/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala b/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala index 8259923..2db3a3b 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala @@ -406,14 +406,15 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2)) } - // Called by subclasses when notified of a lost worker - def removeExecutor(executorId: String, reason: ExecutorLossReason) { -try { - driverEndpoint.askWithRetry[Boolean](RemoveExecutor(executorId, reason)) -} catch { - case e: Exception => -throw new SparkException("Error notifying standalone scheduler's driver endpoint", e) -} + /** + * Called by subclasses when notified of a lost worker. It just fires the message and returns + * at once. + */ + protected def removeExecutor(executorId: String, reason: ExecutorLossReason): Unit = { +// Only log the failure since we don't care about the result. +driverEndpoint.ask(RemoveExecutor(executorId, reason)).onFailure { case t => + logError(t.getMessage, t) +}(ThreadUtils.sameThread) } def sufficientResourcesRegistered(): Boolean = true - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17316][CORE] Make CoarseGrainedSchedulerBackend.removeExecutor non-blocking
Repository: spark Updated Branches: refs/heads/branch-2.0 021aa28f4 -> c17334e47 [SPARK-17316][CORE] Make CoarseGrainedSchedulerBackend.removeExecutor non-blocking ## What changes were proposed in this pull request? StandaloneSchedulerBackend.executorRemoved is a blocking call right now. It may cause some deadlock since it's called inside StandaloneAppClient.ClientEndpoint. This PR just changed CoarseGrainedSchedulerBackend.removeExecutor to be non-blocking. It's safe since the only two usages (StandaloneSchedulerBackend and YarnSchedulerEndpoint) don't need the return value). ## How was this patch tested? Jenkins unit tests. Author: Shixiong Zhu Closes #14882 from zsxwing/SPARK-17316. (cherry picked from commit 9bcb33c54117cebc9e087017bf4e4163edaeff17) Signed-off-by: Marcelo Vanzin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c17334e4 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c17334e4 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c17334e4 Branch: refs/heads/branch-2.0 Commit: c17334e47e806e59ef65a8eefab632781bf9422c Parents: 021aa28 Author: Shixiong Zhu Authored: Wed Aug 31 10:56:02 2016 -0700 Committer: Marcelo Vanzin Committed: Wed Aug 31 10:56:17 2016 -0700 -- .../cluster/CoarseGrainedSchedulerBackend.scala| 17 + 1 file changed, 9 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c17334e4/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala -- diff --git a/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala b/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala index 8259923..2db3a3b 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala @@ -406,14 +406,15 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2)) } - // Called by subclasses when notified of a lost worker - def removeExecutor(executorId: String, reason: ExecutorLossReason) { -try { - driverEndpoint.askWithRetry[Boolean](RemoveExecutor(executorId, reason)) -} catch { - case e: Exception => -throw new SparkException("Error notifying standalone scheduler's driver endpoint", e) -} + /** + * Called by subclasses when notified of a lost worker. It just fires the message and returns + * at once. + */ + protected def removeExecutor(executorId: String, reason: ExecutorLossReason): Unit = { +// Only log the failure since we don't care about the result. +driverEndpoint.ask(RemoveExecutor(executorId, reason)).onFailure { case t => + logError(t.getMessage, t) +}(ThreadUtils.sameThread) } def sufficientResourcesRegistered(): Boolean = true - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17320] add build_profile_flags entry to mesos build module
Repository: spark Updated Branches: refs/heads/master 9953442ac -> 0611b3a2b [SPARK-17320] add build_profile_flags entry to mesos build module ## What changes were proposed in this pull request? add build_profile_flags entry to mesos build module ## How was this patch tested? unit tests Author: Michael Gummelt Closes #14885 from mgummelt/mesos-profile. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0611b3a2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0611b3a2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0611b3a2 Branch: refs/heads/master Commit: 0611b3a2bf6d73ab62ee133fbb70430839bea7bc Parents: 9953442 Author: Michael Gummelt Authored: Wed Aug 31 10:17:05 2016 -0700 Committer: Marcelo Vanzin Committed: Wed Aug 31 10:17:05 2016 -0700 -- dev/sparktestsupport/modules.py | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/0611b3a2/dev/sparktestsupport/modules.py -- diff --git a/dev/sparktestsupport/modules.py b/dev/sparktestsupport/modules.py index f2aa241..d8e3989 100644 --- a/dev/sparktestsupport/modules.py +++ b/dev/sparktestsupport/modules.py @@ -462,6 +462,7 @@ mesos = Module( name="mesos", dependencies=[], source_file_regexes=["mesos/"], +build_profile_flags=["-Pmesos"], sbt_test_goals=["mesos/test"] ) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR][SPARKR] Verbose build comment in WINDOWS.md rather than promoting default build without Hive
Repository: spark Updated Branches: refs/heads/master 12fd0cd61 -> 9953442ac [MINOR][SPARKR] Verbose build comment in WINDOWS.md rather than promoting default build without Hive ## What changes were proposed in this pull request? This PR fixes `WINDOWS.md` to imply referring other profiles in http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn rather than directly pointing to run `mvn -DskipTests -Psparkr package` without Hive supports. ## How was this patch tested? Manually, https://cloud.githubusercontent.com/assets/6477701/18122549/f6297b2c-6fa4-11e6-9b5e-fd4347355d87.png";> Author: hyukjinkwon Closes #14890 from HyukjinKwon/minor-build-r. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9953442a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9953442a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9953442a Branch: refs/heads/master Commit: 9953442aca5a1528a6b85fa8713a56d36c9a199f Parents: 12fd0cd Author: hyukjinkwon Authored: Wed Aug 31 09:06:23 2016 -0700 Committer: Shivaram Venkataraman Committed: Wed Aug 31 09:06:23 2016 -0700 -- R/WINDOWS.md | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/9953442a/R/WINDOWS.md -- diff --git a/R/WINDOWS.md b/R/WINDOWS.md index f67a1c5..1afcbfc 100644 --- a/R/WINDOWS.md +++ b/R/WINDOWS.md @@ -4,13 +4,23 @@ To build SparkR on Windows, the following steps are required 1. Install R (>= 3.1) and [Rtools](http://cran.r-project.org/bin/windows/Rtools/). Make sure to include Rtools and R in `PATH`. + 2. Install [JDK7](http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html) and set `JAVA_HOME` in the system environment variables. + 3. Download and install [Maven](http://maven.apache.org/download.html). Also include the `bin` directory in Maven in `PATH`. + 4. Set `MAVEN_OPTS` as described in [Building Spark](http://spark.apache.org/docs/latest/building-spark.html). -5. Open a command shell (`cmd`) in the Spark directory and run `mvn -DskipTests -Psparkr package` + +5. Open a command shell (`cmd`) in the Spark directory and build Spark with [Maven](http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn) and include the `-Psparkr` profile to build the R package. For example to use the default Hadoop versions you can run + +```bash +mvn.cmd -DskipTests -Psparkr package +``` + +`.\build\mvn` is a shell script so `mvn.cmd` should be used directly on Windows. ## Unit tests - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17243][WEB UI] Spark 2.0 History Server won't load with very large application history
Repository: spark Updated Branches: refs/heads/branch-2.0 bc6c0d9f9 -> 021aa28f4 [SPARK-17243][WEB UI] Spark 2.0 History Server won't load with very large application history ## What changes were proposed in this pull request? back port of #14835 addressing merge conflicts With the new History Server the summary page loads the application list via the the REST API, this makes it very slow to impossible to load with large (10K+) application history. This pr fixes this by adding the `spark.history.ui.maxApplications` conf to limit the number of applications the History Server displays. This is accomplished using a new optional `limit` param for the `applications` api. (Note this only applies to what the summary page displays, all the Application UI's are still accessible if the user knows the App ID and goes to the Application UI directly.) I've also added a new test for the `limit` param in `HistoryServerSuite.scala` ## How was this patch tested? Manual testing and dev/run-tests Author: Alex Bozarth Closes #14886 from ajbozarth/spark17243-branch-2.0. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/021aa28f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/021aa28f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/021aa28f Branch: refs/heads/branch-2.0 Commit: 021aa28f439443cda1bc7c5e3eee7c85b40c1a2d Parents: bc6c0d9 Author: Alex Bozarth Authored: Wed Aug 31 08:50:42 2016 -0500 Committer: Tom Graves Committed: Wed Aug 31 08:50:42 2016 -0500 -- .../org/apache/spark/ui/static/historypage.js | 8 ++- .../spark/deploy/history/HistoryPage.scala | 3 +- .../spark/deploy/history/HistoryServer.scala| 4 ++ .../apache/spark/internal/config/package.scala | 4 ++ .../status/api/v1/ApplicationListResource.scala | 10 ++- .../limit_app_list_json_expectation.json| 67 .../deploy/history/HistoryServerSuite.scala | 1 + docs/monitoring.md | 16 - 8 files changed, 106 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/021aa28f/core/src/main/resources/org/apache/spark/ui/static/historypage.js -- diff --git a/core/src/main/resources/org/apache/spark/ui/static/historypage.js b/core/src/main/resources/org/apache/spark/ui/static/historypage.js index d216166..177120a 100644 --- a/core/src/main/resources/org/apache/spark/ui/static/historypage.js +++ b/core/src/main/resources/org/apache/spark/ui/static/historypage.js @@ -15,6 +15,12 @@ * limitations under the License. */ +var appLimit = -1; + +function setAppLimit(val) { +appLimit = val; +} + // this function works exactly the same as UIUtils.formatDuration function formatDuration(milliseconds) { if (milliseconds < 100) { @@ -111,7 +117,7 @@ $(document).ready(function() { requestedIncomplete = getParameterByName("showIncomplete", searchString); requestedIncomplete = (requestedIncomplete == "true" ? true : false); -$.getJSON("api/v1/applications", function(response,status,jqXHR) { +$.getJSON("api/v1/applications?limit=" + appLimit, function(response,status,jqXHR) { var array = []; var hasMultipleAttempts = false; for (i in response) { http://git-wip-us.apache.org/repos/asf/spark/blob/021aa28f/core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala -- diff --git a/core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala b/core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala index 2fad112..a120b6c 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala @@ -44,7 +44,8 @@ private[history] class HistoryPage(parent: HistoryServer) extends WebUIPage("") if (allAppsSize > 0) { ++ ++ - + ++ + setAppLimit({parent.maxApplications}) } else if (requestedIncomplete) { No incomplete applications found! } else { http://git-wip-us.apache.org/repos/asf/spark/blob/021aa28f/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala -- diff --git a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala index d821474..c178917 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala @@ -28,6 +28,7 @@ import
[2/2] spark-website git commit: Re-sync Spark site HTML to output of latest jekyll
Re-sync Spark site HTML to output of latest jekyll Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/0845f49d Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/0845f49d Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/0845f49d Branch: refs/heads/asf-site Commit: 0845f49def1dd1fc5fb439d2f1e22f03297944ed Parents: fcd0bc3 Author: Sean Owen Authored: Wed Aug 31 12:38:33 2016 +0100 Committer: Sean Owen Committed: Wed Aug 31 12:38:33 2016 +0100 -- site/documentation.html | 5 +- site/examples.html | 60 ++-- site/news/index.html| 44 +++--- site/news/spark-0-9-1-released.html | 2 +- site/news/spark-0-9-2-released.html | 2 +- site/news/spark-1-1-0-released.html | 2 +- site/news/spark-1-2-2-released.html | 2 +- site/news/spark-and-shark-in-the-news.html | 2 +- .../spark-summit-east-2015-videos-posted.html | 2 +- site/releases/spark-release-0-8-0.html | 4 +- site/releases/spark-release-0-9-1.html | 20 +++ site/releases/spark-release-1-0-1.html | 8 +-- site/releases/spark-release-1-0-2.html | 2 +- site/releases/spark-release-1-1-0.html | 6 +- site/releases/spark-release-1-2-0.html | 2 +- site/releases/spark-release-1-3-0.html | 6 +- site/releases/spark-release-1-3-1.html | 6 +- site/releases/spark-release-1-4-0.html | 4 +- site/releases/spark-release-1-5-0.html | 30 +- site/releases/spark-release-1-6-0.html | 20 +++ site/releases/spark-release-2-0-0.html | 36 ++-- 21 files changed, 118 insertions(+), 147 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark-website/blob/0845f49d/site/documentation.html -- diff --git a/site/documentation.html b/site/documentation.html index 652281d..859b767 100644 --- a/site/documentation.html +++ b/site/documentation.html @@ -253,12 +253,13 @@ Meetup Talk Videos -In addition to the videos listed below, you can also view http://www.meetup.com/spark-users/files/";>all slides from Bay Area meetups here. +In addition to the videos listed below, you can also view http://www.meetup.com/spark-users/files/";>all slides from Bay Area meetups here. .video-meta-info { font-size: 0.95em; } - + + http://www.youtube.com/watch?v=NUQ-8to2XAk&list=PL-x35fyliRwiP3YteXbnhk0QGOtYLBT3a";>Spark 1.0 and Beyond (http://files.meetup.com/3138542/Spark%201.0%20Meetup.ppt";>slides) by Patrick Wendell, at Cisco in San Jose, 2014-04-23 http://git-wip-us.apache.org/repos/asf/spark-website/blob/0845f49d/site/examples.html -- diff --git a/site/examples.html b/site/examples.html index 5431f5d..1be96be 100644 --- a/site/examples.html +++ b/site/examples.html @@ -213,11 +213,11 @@ In this page, we will show examples using RDD API as well as examples using high -text_file = sc.textFile("hdfs://...") +text_file = sc.textFile("hdfs://...") counts = text_file.flatMap(lambda line: line.split(" ")) \ .map(lambda word: (word, 1)) \ .reduceByKey(lambda a, b: a + b) -counts.saveAsTextFile("hdfs://...") +counts.saveAsTextFile("hdfs://...") @@ -225,11 +225,11 @@ In this page, we will show examples using RDD API as well as examples using high -val textFile = sc.textFile("hdfs://...") +val textFile = sc.textFile("hdfs://...") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) -counts.saveAsTextFile("hdfs://...") +counts.saveAsTextFile("hdfs://...") @@ -237,7 +237,7 @@ In this page, we will show examples using RDD API as well as examples using high -JavaRDDtextFile = sc.textFile("hdfs://..."); +JavaRDD textFile = sc.textFile("hdfs://..."); JavaRDD words = textFile.flatMap(new FlatMapFunction () { public Iterable call(String s) { return Arrays.asList(s.split(" ")); } }); @@ -247,7 +247,7 @@ In this page, we will show examples using RDD API as well as examples using high JavaPairRDD counts = pairs.reduceByKey(new Function2 () { public Integer call(Integer a, Integer b) { return a + b; } }); -counts.saveAsTextFile("hdfs://..."); +counts.saveAsTextFile("hdfs://..."); @@ -266,13 +266,13 @@ In this page, we will show examples using RDD API as well as examples using high -def sample(p): +def sample(p): x, y = random(
[1/2] spark-website git commit: Re-sync Spark site HTML to output of latest jekyll
Repository: spark-website Updated Branches: refs/heads/asf-site fcd0bc3dd -> 0845f49de http://git-wip-us.apache.org/repos/asf/spark-website/blob/0845f49d/site/releases/spark-release-1-1-0.html -- diff --git a/site/releases/spark-release-1-1-0.html b/site/releases/spark-release-1-1-0.html index 7f88126..2c3bffc 100644 --- a/site/releases/spark-release-1-1-0.html +++ b/site/releases/spark-release-1-1-0.html @@ -197,7 +197,7 @@ Spark SQL adds a number of new features and performance improvements in this release. A http://spark.apache.org/docs/1.1.0/sql-programming-guide.html#running-the-thrift-jdbc-server";>JDBC/ODBC server allows users to connect to SparkSQL from many different applications and provides shared access to cached tables. A new module provides http://spark.apache.org/docs/1.1.0/sql-programming-guide.html#json-datasets";>support for loading JSON data directly into Sparkâs SchemaRDD format, including automatic schema inference. Spark SQL introduces http://spark.apache.org/docs/1.1.0/sql-programming-guide.html#other-configuration-options";>dynamic bytecode generation in this release, a technique which significantly speeds up execution for queries that perform complex expression evaluation. This release also adds support for registering Python, Scala, and Java lambda functions as UDFs, which can then be called directly in SQL. Spark 1.1 adds a public types API to allow users to create SchemaRDDâs from custom data sources. Finally, many optimizations have been added to the native Parquet support as well as throughout the engine. MLlib -MLlib adds several new algorithms and optimizations in this release. 1.1 introduces a https://issues.apache.org/jira/browse/SPARK-2359";>new library of statistical packages which provides exploratory analytic functions. These include stratified sampling, correlations, chi-squared tests and support for creating random datasets. This release adds utilities for feature extraction (https://issues.apache.org/jira/browse/SPARK-2510";>Word2Vec and https://issues.apache.org/jira/browse/SPARK-2511";>TF-IDF) and feature transformation (https://issues.apache.org/jira/browse/SPARK-2272";>normalization and standard scaling). Also new are support for https://issues.apache.org/jira/browse/SPARK-1553";>nonnegative matrix factorization and https://issues.apache.org/jira/browse/SPARK-1782";>SVD via Lanczos. The decision tree algorithm has been https://issues.apache.org/jira/browse/SPARK-2478";>added in Python and Java< /a>. A tree aggregation primitive has been added to help optimize many existing algorithms. Performance improves across the board in MLlib 1.1, with improvements of around 2-3X for many algorithms and up to 5X for large scale decision tree problems. +MLlib adds several new algorithms and optimizations in this release. 1.1 introduces a https://issues.apache.org/jira/browse/SPARK-2359";>new library of statistical packages which provides exploratory analytic functions. These include stratified sampling, correlations, chi-squared tests and support for creating random datasets. This release adds utilities for feature extraction (https://issues.apache.org/jira/browse/SPARK-2510";>Word2Vec and https://issues.apache.org/jira/browse/SPARK-2511";>TF-IDF) and feature transformation (https://issues.apache.org/jira/browse/SPARK-2272";>normalization and standard scaling). Also new are support for https://issues.apache.org/jira/browse/SPARK-1553";>nonnegative matrix factorization and https://issues.apache.org/jira/browse/SPARK-1782";>SVD via Lanczos. The decision tree algorithm has been https://issues.apache.org/jira/browse/SPARK-2478";>added in Python and Java< /a>. A tree aggregation primitive has been added to help optimize many existing algorithms. Performance improves across the board in MLlib 1.1, with improvements of around 2-3X for many algorithms and up to 5X for large scale decision tree problems. GraphX and Spark Streaming Spark streaming adds a new data source https://issues.apache.org/jira/browse/SPARK-1981";>Amazon Kinesis. For the Apache Flume, a new mode is supported which https://issues.apache.org/jira/browse/SPARK-1729";>pulls data from Flume, simplifying deployment and providing high availability. The first of a set of https://issues.apache.org/jira/browse/SPARK-2438";>streaming machine learning algorithms is introduced with streaming linear regression. Finally, https://issues.apache.org/jira/browse/SPARK-1341";>rate limiting has been added for streaming inputs. GraphX adds https://issues.apache.org/jira/browse/SPARK-1991";>custom storage levels for vertices and edges along with https://issues.apache.org/jira/browse/SPARK-2748";>improved numerical precision across the board. Finally, GraphX adds a new label propagation algorithm. @@ -215,7 +215,7 @@ The default value of spark.io.compression.codec is now snappy for improved m
spark-website git commit: Push only fix to contributor list to 2.0.0 for Spark
Repository: spark-website Updated Branches: refs/heads/asf-site b2cf71427 -> fcd0bc3dd Push only fix to contributor list to 2.0.0 for Spark Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/fcd0bc3d Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/fcd0bc3d Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/fcd0bc3d Branch: refs/heads/asf-site Commit: fcd0bc3dd9812c263a17bd4df4e85ce3a89d2b8b Parents: b2cf714 Author: Sean Owen Authored: Wed Aug 31 12:31:09 2016 +0100 Committer: Sean Owen Committed: Wed Aug 31 12:31:09 2016 +0100 -- site/releases/spark-release-2-0-0.html | 47 - 1 file changed, 46 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark-website/blob/fcd0bc3d/site/releases/spark-release-2-0-0.html -- diff --git a/site/releases/spark-release-2-0-0.html b/site/releases/spark-release-2-0-0.html index ddd979a..7aa2fc3 100644 --- a/site/releases/spark-release-2-0-0.html +++ b/site/releases/spark-release-2-0-0.html @@ -379,7 +379,52 @@ Credits -Last but not least, this release would not have been possible without the following contributors: Aaron Tokhy, Abhinav Gupta, Abou Haydar Elias, Adam Budde, Adam Roberts, Ahmed Kamal, Ahmed Mahran, Alex Bozarth, Alexander Ulanov, Allen, Anatoliy Plastinin, Andrew, Andrew Ash, Andrew Or, Andrew Ray, Anthony Truchet, Anton Okolnychyi, Antonio Murgia, Antonio Murgia, Arun Allamsetty, Azeem Jiva, Ben McCann, BenFradet, Bertrand Bossy, Bill Chambers, Bjorn Jonsson, Bo Meng, Bo Meng, Brandon Bradley, Brian O’Neill, BrianLondon, Bryan Cutler, Burak Köse, Burak Yavuz, Carson Wang, Cazen, Cedar Pan, Charles Allen, Cheng Hao, Cheng Lian, Claes Redestad, CodingCat, Cody Koeninger, DB Tsai, DLucky, Daniel Jalova, Daoyuan Wang, Darek Blasiak, David Tolpin, Davies Liu, Devaraj K, Dhruve Ashar, Dilip Biswal, Dmitry Erastov, Dominik JastrzÄbski, Dongjoon Hyun, Earthson Lu, Egor Pakhomov, Ehsan M.Kermani, Ergin Seyfe, Eric Liang, Ernest, Felix Cheung, Felix Cheung, Feynman Liang, Fokko Dr iesprong, Fonso Li, Franklyn D’souza, François Garillot, Fred Reiss, Gabriele Nizzoli, Gary King, GayathriMurali, Gio Borje, Grace, Greg Michalopoulos, Grzegorz Chilkiewicz, Guillaume Poulin, Gábor Lipták, Hemant Bhanawat, Herman van Hovell, Herman van Hövell tot Westerflier, Hiroshi Inoue, Holden Karau, Hossein, Huaxin Gao, Hyukjin Kwon, Imran Rashid, Imran Younus, Ioana Delaney, Iulian Dragos, Jacek Laskowski, Jacek Lewandowski, Jakob Odersky, James Lohse, James Thomas, Jason Lee, Jason Moore, Jason White, Jean Lyn, Jean-Baptiste Onofré, Jeff L, Jeff Zhang, Jeremy Derr, JeremyNixon, Jia Li, Jo Voordeckers, Joan, Jon Maurer, Joseph K. Bradley, Josh Howes, Josh Rosen, Joshi, Juarez Bochi, Julien Baley, Junyang, Junyang Qian, Jurriaan Pruis, Kai Jiang, KaiXinXiaoLei, Kay Ousterhout, Kazuaki Ishizaki, Kevin Yu, Koert Kuipers, Kousuke Saruta, Koyo Yoshida, Krishna Kalyan, Krishna Kalyan, Lewuathe, Liang-Chi Hsieh, Lianhui Wang, Lin Zhao, Lining Sun, Liu Xiang, Liwei Lin, Liw ei Lin, Liye, Luc Bourlier, Luciano Resende, Lukasz, Maciej Brynski, Malte, Maciej Szymkiewicz, Marcelo Vanzin, Marcin Tustin, Mark Grover, Mark Yang, Martin Menestret, Masayoshi TSUZUKI, Matei Zaharia, Mathieu Longtin, Matthew Wise, Miao Wang, Michael Allman, Michael Armbrust, Michael Gummelt, Michel Lemay, Mike Dusenberry, Mortada Mehyar, Nakul Jindal, Nam Pham, Narine Kokhlikyan, NarineK, Neelesh Srinivas Salian, Nezih Yigitbasi, Nicholas Chammas, Nicholas Tietz, Nick Pentreath, Nilanjan Raychaudhuri, Nirman Narang, Nishkam Ravi, Nong, Nong Li, Oleg Danilov, Oliver Pierson, Oscar D. Lara Yejas, Parth Brahmbhatt, Patrick Wendell, Pete Robbins, Peter Ableda, Pierre Borckmans, Prajwal Tuladhar, Prashant Sharma, Pravin Gadakh, QiangCai, Qifan Pu, Raafat Akkad, Rahul Tanwani, Rajesh Balamohan, Rekha Joshi, Reynold Xin, Richard W. Eggert II, Robert Dodier, Robert Kruszewski, Robin East, Ruifeng Zheng, Ryan Blue, Sachin Aggarwal, Saisai Shao, Sameer Agarwal, Sandeep Singh, Sanket, Sasak i Toru, Sean Owen, Sean Zhong, Sebastien Rainville, Sebastián RamÃrez, Sela, Sergiusz Urbaniak, Seth Hendrickson, Shally Sangal, Sheamus K. Parkes, Shi Jinkui, Shivaram Venkataraman, Shixiong Zhu, Shuai Lin, Shubhanshu Mishra, Sin Wu, Sital Kedia, Stavros Kontopoulos, Stephan Kessler, Steve Loughran, Subhobrata Dey, Subroto Sanyal, Sumedh Mungee, Sun Rui, Sunitha Kambhampati, Suresh Thalamati, Takahashi Hiroshi, Takeshi YAMAMURO, Takuya Kuwahara, Takuya UESHIN, Tathagata Das, Ted Yu, Tejas Patil, Terence Yim, Thomas Graves, Timothy Chen, Timothy Hunter, Tom Graves, Tom Magrino, Tommy YU, Travis Crawford, Tristan
spark-website git commit: Hot-fix ec2-scripts.html in Spark docs/2.0.0, which for some reason built correctly in 2.0.0-preview, builds in master, but didn't deploy in 2.0.0
Repository: spark-website Updated Branches: refs/heads/asf-site d37a3afce -> b2cf71427 Hot-fix ec2-scripts.html in Spark docs/2.0.0, which for some reason built correctly in 2.0.0-preview, builds in master, but didn't deploy in 2.0.0 Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/b2cf7142 Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/b2cf7142 Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/b2cf7142 Branch: refs/heads/asf-site Commit: b2cf71427338b1bdc108c1bbc9ec642b547a9eb4 Parents: d37a3af Author: Sean Owen Authored: Wed Aug 31 10:09:18 2016 +0100 Committer: Sean Owen Committed: Wed Aug 31 10:09:18 2016 +0100 -- site/docs/2.0.0/ec2-scripts.html | 160 ++ site/docs/2.0.0/ec2-scripts.md | 7 -- 2 files changed, 160 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark-website/blob/b2cf7142/site/docs/2.0.0/ec2-scripts.html -- diff --git a/site/docs/2.0.0/ec2-scripts.html b/site/docs/2.0.0/ec2-scripts.html new file mode 100644 index 000..abe211c --- /dev/null +++ b/site/docs/2.0.0/ec2-scripts.html @@ -0,0 +1,160 @@ + + + + + + + + + +Running Spark on EC2 - Spark 2.0.0 Documentation + + + + https://github.com/amplab/spark-ec2#readme";> + https://github.com/amplab/spark-ec2#readme"; /> + + + + +body { +padding-top: 60px; +padding-bottom: 40px; +} + + + + + + + + + + + + + + + + + + + + + + 2.0.0 + + + +Overview + + +Programming Guides + +Quick Start +Spark Programming Guide + +Spark Streaming +DataFrames, Datasets and SQL +MLlib (Machine Learning) +GraphX (Graph Processing) +SparkR (R on Spark) + + + + +API Docs + +Scala +Java +Python +R + + + + +Deploying + +Overview +Submitting Applications + +Spark Standalone +Mesos +YARN + + + + +More + +Configuration +Monitoring +Tuning Guide +Job Scheduling +Security +Hardware Provisioning + +Building Spark +https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark";>Contributing to Spark +https://cwiki.apache.org/confluence/display/SPARK/Supplemental+Spark+Projects";>Supplemental Projects + + + + + + + + + + + + + +Running Spark on EC2 + + +This document has been superseded and replaced by documentation at https://github.com/amplab/spark-ec2#readme + + + + + + + + + + + + + + +MathJax.Hub.Config({ +TeX: { equationNumbers: { autoNumber: "AMS" } } +}); + + +// Note that we load MathJax this way to work with local file (file://), HTTP and HTTPS. +// We could use "//cdn.mathjax...", but that won't support "fil
spark git commit: [SPARK-17180][SPARK-17309][SPARK-17323][SQL] create AlterViewAsCommand to handle ALTER VIEW AS
Repository: spark Updated Branches: refs/heads/master fa6347938 -> 12fd0cd61 [SPARK-17180][SPARK-17309][SPARK-17323][SQL] create AlterViewAsCommand to handle ALTER VIEW AS ## What changes were proposed in this pull request? Currently we use `CreateViewCommand` to implement ALTER VIEW AS, which has 3 bugs: 1. SPARK-17180: ALTER VIEW AS should alter temp view if view name has no database part and temp view exists 2. SPARK-17309: ALTER VIEW AS should issue exception if view does not exist. 3. SPARK-17323: ALTER VIEW AS should keep the previous table properties, comment, create_time, etc. The root cause is, ALTER VIEW AS is quite different from CREATE VIEW, we need different code path to handle them. However, in `CreateViewCommand`, there is no way to distinguish ALTER VIEW AS and CREATE VIEW, we have to introduce extra flag. But instead of doing this, I think a more natural way is to separate the ALTER VIEW AS logic into a new command. ## How was this patch tested? new tests in SQLViewSuite Author: Wenchen Fan Closes #14874 from cloud-fan/minor4. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/12fd0cd6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/12fd0cd6 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/12fd0cd6 Branch: refs/heads/master Commit: 12fd0cd615683cd4c3e9094ce71a1e6fc33b8d6a Parents: fa63479 Author: Wenchen Fan Authored: Wed Aug 31 17:08:08 2016 +0800 Committer: Wenchen Fan Committed: Wed Aug 31 17:08:08 2016 +0800 -- .../spark/sql/execution/SparkSqlParser.scala| 63 +--- .../spark/sql/execution/command/views.scala | 77 +--- .../spark/sql/hive/execution/SQLViewSuite.scala | 77 +++- 3 files changed, 157 insertions(+), 60 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/12fd0cd6/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala index e32d301..656494d 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala @@ -1254,60 +1254,33 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { ic.identifier.getText -> Option(ic.STRING).map(string) } } - createView( -ctx, -ctx.tableIdentifier, + + CreateViewCommand( +name = visitTableIdentifier(ctx.tableIdentifier), +userSpecifiedColumns = userSpecifiedColumns, comment = Option(ctx.STRING).map(string), -userSpecifiedColumns, -ctx.query, - Option(ctx.tablePropertyList).map(visitPropertyKeyValues).getOrElse(Map.empty), +properties = Option(ctx.tablePropertyList).map(visitPropertyKeyValues).getOrElse(Map.empty), +originalText = Option(source(ctx.query)), +child = plan(ctx.query), allowExisting = ctx.EXISTS != null, replace = ctx.REPLACE != null, -isTemporary = ctx.TEMPORARY != null - ) +isTemporary = ctx.TEMPORARY != null) } } /** - * Alter the query of a view. This creates a [[CreateViewCommand]] command. + * Alter the query of a view. This creates a [[AlterViewAsCommand]] command. + * + * For example: + * {{{ + * ALTER VIEW [db_name.]view_name AS SELECT ...; + * }}} */ override def visitAlterViewQuery(ctx: AlterViewQueryContext): LogicalPlan = withOrigin(ctx) { -createView( - ctx, - name = ctx.tableIdentifier, - comment = None, - userSpecifiedColumns = Seq.empty, - query = ctx.query, - properties = Map.empty, - allowExisting = false, - replace = true, - isTemporary = false) - } - - /** - * Create a [[CreateViewCommand]] command. - */ - private def createView( - ctx: ParserRuleContext, - name: TableIdentifierContext, - comment: Option[String], - userSpecifiedColumns: Seq[(String, Option[String])], - query: QueryContext, - properties: Map[String, String], - allowExisting: Boolean, - replace: Boolean, - isTemporary: Boolean): LogicalPlan = { -val originalText = source(query) -CreateViewCommand( - visitTableIdentifier(name), - userSpecifiedColumns, - comment, - properties, - Some(originalText), - plan(query), - allowExisting = allowExisting, - replace = replace, - isTemporary = isTemporary) +AlterViewAsCommand( + name = visitTableIdentifier(ctx.tableIdentifier), + originalText = source(ctx.query), +
spark git commit: [SPARK-17178][SPARKR][SPARKSUBMIT] Allow to set sparkr shell command through --conf
Repository: spark Updated Branches: refs/heads/master d92cd227c -> fa6347938 [SPARK-17178][SPARKR][SPARKSUBMIT] Allow to set sparkr shell command through --conf ## What changes were proposed in this pull request? Allow user to set sparkr shell command through --conf spark.r.shell.command ## How was this patch tested? Unit test is added and also verify it manually through ``` bin/sparkr --master yarn-client --conf spark.r.shell.command=/usr/local/bin/R ``` Author: Jeff Zhang Closes #14744 from zjffdu/SPARK-17178. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fa634793 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fa634793 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fa634793 Branch: refs/heads/master Commit: fa6347938fc1c72ddc03a5f3cd2e929b5694f0a6 Parents: d92cd22 Author: Jeff Zhang Authored: Wed Aug 31 00:20:41 2016 -0700 Committer: Felix Cheung Committed: Wed Aug 31 00:20:41 2016 -0700 -- docs/configuration.md | 11 ++- .../org/apache/spark/launcher/SparkLauncher.java | 2 ++ .../spark/launcher/SparkSubmitCommandBuilder.java | 3 ++- .../launcher/SparkSubmitCommandBuilderSuite.java | 18 ++ 4 files changed, 32 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/fa634793/docs/configuration.md -- diff --git a/docs/configuration.md b/docs/configuration.md index d0c76aa..6e98f67 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1786,6 +1786,14 @@ showDF(properties, numRows = 200, truncate = FALSE) Executable for executing R scripts in client modes for driver. Ignored in cluster modes. + + spark.r.shell.command + R + +Executable for executing sparkR shell in client modes for driver. Ignored in cluster modes. It is the same as environment variable SPARKR_DRIVER_R, but take precedence over it. +spark.r.shell.command is used for sparkR shell while spark.r.driver.command is used for running R script. + + Deploy @@ -1852,7 +1860,8 @@ The following variables can be set in `spark-env.sh`: SPARKR_DRIVER_R -R binary executable to use for SparkR shell (default is R). +R binary executable to use for SparkR shell (default is R). +Property spark.r.shell.command take precedence if it is set SPARK_LOCAL_IP http://git-wip-us.apache.org/repos/asf/spark/blob/fa634793/launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java -- diff --git a/launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java b/launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java index 7b7a7bf..ea56214 100644 --- a/launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java +++ b/launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java @@ -68,6 +68,8 @@ public class SparkLauncher { static final String PYSPARK_PYTHON = "spark.pyspark.python"; + static final String SPARKR_R_SHELL = "spark.r.shell.command"; + /** Logger name to use when launching a child process. */ public static final String CHILD_PROCESS_LOGGER_NAME = "spark.launcher.childProcLoggerName"; http://git-wip-us.apache.org/repos/asf/spark/blob/fa634793/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java -- diff --git a/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java b/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java index f6da644..29c6d82 100644 --- a/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java +++ b/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java @@ -336,7 +336,8 @@ class SparkSubmitCommandBuilder extends AbstractCommandBuilder { join(File.separator, sparkHome, "R", "lib", "SparkR", "profile", "shell.R")); List args = new ArrayList<>(); -args.add(firstNonEmpty(System.getenv("SPARKR_DRIVER_R"), "R")); +args.add(firstNonEmpty(conf.get(SparkLauncher.SPARKR_R_SHELL), + System.getenv("SPARKR_DRIVER_R"), "R")); return args; } http://git-wip-us.apache.org/repos/asf/spark/blob/fa634793/launcher/src/test/java/org/apache/spark/launcher/SparkSubmitCommandBuilderSuite.java -- diff --git a/launcher/src/test/java/org/apache/spark/launcher/SparkSubmitCommandBuilderSuite.java b/launcher/src/test/java/org/apache/spark/launcher/SparkSubmitCommandBuilderSuite.java index 16e5a22..ad2e7a7 100644 --- a/launcher/src/test/java/org/apache/spark