date:20170115

[GitHub] spark issue #16595: [Minor][YARN] Move YarnSchedulerBackendSuite to resource...

2017-01-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16595
  
**[Test build #71427 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71427/testReport)**
 for PR 16595 at commit 
[`9301974`](https://github.com/apache/spark/commit/93019741bb94d955fc24e5b06d1dd1aa95672f70).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16592: [SPARK-19235] [SQL] [TESTS] Enable Test Cases in DDLSuit...

2017-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16592
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71420/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16592: [SPARK-19235] [SQL] [TESTS] Enable Test Cases in DDLSuit...

2017-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16592
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16592: [SPARK-19235] [SQL] [TESTS] Enable Test Cases in DDLSuit...

2017-01-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16592
  
**[Test build #71420 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71420/testReport)**
 for PR 16592 at commit 
[`0133463`](https://github.com/apache/spark/commit/01334635c5433f0515beb92660b79796c97677d5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class InMemoryCatalogedDDLSuite extends DDLSuite with SharedSQLContext 
with BeforeAndAfterEach `
  * `abstract class DDLSuite extends QueryTest with SQLTestUtils `
  * `class HiveCatalogedDDLSuite extends DDLSuite with TestHiveSingleton 
with BeforeAndAfterEach `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper ...

2017-01-15 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16566#discussion_r9617
  
--- Diff: R/pkg/R/mllib_clustering.R ---
@@ -38,6 +45,146 @@ setClass("KMeansModel", representation(jobj = "jobj"))
 #' @note LDAModel since 2.1.0
 setClass("LDAModel", representation(jobj = "jobj"))
 
+#' Bisecting K-Means Clustering Model
+#'
+#' Fits a bisecting k-means clustering model against a Spark DataFrame.
+#' Users can call \code{summary} to print a summary of the fitted model, 
\code{predict} to make
+#' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' @param data a SparkDataFrame for training.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
+#'operators are supported, including '~', '.', ':', '+', 
and '-'.
+#'Note that the response variable of formula is empty in 
spark.bisectingKmeans.
+#' @param k the desired number of leaf clusters. Must be > 1.
+#'  The actual number could be smaller if there are no divisible 
leaf clusters.
+#' @param maxIter maximum iteration number.
+#' @param minDivisibleClusterSize The minimum number of points (if greater 
than or equal to 1.0)
+#'or the minimum proportion of points (if 
less than 1.0) of a divisible cluster.
+#' @param seed the random seed.
+#' @param ... additional argument(s) passed to the method.
+#' @return \code{spark.bisectingKmeans} returns a fitted bisecting k-means 
model.
+#' @rdname spark.bisectingKmeans
+#' @aliases spark.bisectingKmeans,SparkDataFrame,formula-method
+#' @name spark.bisectingKmeans
+#' @export
+#' @examples
+#' \dontrun{
+#' sparkR.session()
+#' data(iris)
+#' df <- createDataFrame(iris)
+#' model <- spark.bisectingKmeans(df, Sepal_Length ~ Sepal_Width, k = 4)
+#' summary(model)
+#'
+#' # fitted values on training data
+#' fitted <- predict(model, df)
+#' head(select(fitted, "Sepal_Length", "prediction"))
+#'
+#' # save fitted model to input path
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#'
+#' # can also read back the saved model and print
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#' }
+#' @note spark.bisectingKmeans since 2.2.0
+#' @seealso \link{predict}, \link{read.ml}, \link{write.ml}
+setMethod("spark.bisectingKmeans", signature(data = "SparkDataFrame", 
formula = "formula"),
+  function(data, formula, k = 4, maxIter = 20, 
minDivisibleClusterSize = 1.0, seed = NULL) {
+formula <- paste0(deparse(formula), collapse = "")
+if (!is.null(seed)) {
+  seed <- as.character(as.integer(seed))
+}
+jobj <- 
callJStatic("org.apache.spark.ml.r.BisectingKMeansWrapper", "fit",
+data@sdf, formula, as.integer(k), 
as.integer(maxIter),
+as.numeric(minDivisibleClusterSize), seed)
+new("BisectingKMeansModel", jobj = jobj)
+  })
+
+#  Get the summary of a bisecting k-means model
+
+#' @param object a fitted bisecting k-means model.
+#' @return \code{summary} returns summary information of the fitted model, 
which is a list.
+#' The list includes the model's \code{k} (number of cluster 
centers),
+#' \code{coefficients} (model cluster centers),
+#' \code{size} (number of data points in each cluster), and 
\code{cluster}
+#' (cluster centers of the transformed data).
+#' @rdname spark.bisectingKmeans
+#' @export
+#' @note summary(BisectingKMeansModel) since 2.2.0
+setMethod("summary", signature(object = "BisectingKMeansModel"),
+  function(object) {
+jobj <- object@jobj
+is.loaded <- callJMethod(jobj, "isLoaded")
+features <- callJMethod(jobj, "features")
+coefficients <- callJMethod(jobj, "coefficients")
+k <- callJMethod(jobj, "k")
+size <- callJMethod(jobj, "size")
+coefficients <- t(matrix(coefficients, ncol = k))
+colnames(coefficients) <- unlist(features)
+rownames(coefficients) <- 1:k
+cluster <- if (is.loaded) {
+  NULL
+} else {
+  dataFrame(callJMethod(jobj, "cluster"))
+}
+list(k = k, coefficients = coefficients, size = size,
+cluster = cluster, is.loaded = is.loaded)
+  })
+
+#  Predicted values based on a bisecting k-means model
+
+#' @param newData a SparkDataFrame for testing.
+#' @return \code{predict} returns the predicted values

[GitHub] spark pull request #16595: [Minor][YARN] Move YarnSchedulerBackendSuite to r...

2017-01-15 Thread yanboliang

GitHub user yanboliang opened a pull request:

https://github.com/apache/spark/pull/16595

[Minor][YARN] Move YarnSchedulerBackendSuite to resource-managers/yarn 
directory.

## What changes were proposed in this pull request?
#16092 moves YARN resource manager related code to resource-managers/yarn 
directory. The test case ```YarnSchedulerBackendSuite``` was added after that 
but with the wrong place. I move it to correct directory in this PR.

## How was this patch tested?
Existing test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanboliang/spark yarn

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16595.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16595


commit 93019741bb94d955fc24e5b06d1dd1aa95672f70
Author: Yanbo Liang 
Date:   2017-01-16T07:46:26Z

Move YarnSchedulerBackendSuite to resource-managers/yarn directory.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

2017-01-15 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16561
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

2017-01-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16561
  
**[Test build #71426 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71426/testReport)**
 for PR 16561 at commit 
[`21e63f8`](https://github.com/apache/spark/commit/21e63f8eb0540ff26c16804bffb222123a97c1c8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

2017-01-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16561#discussion_r96175295
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -89,6 +89,25 @@ object Cast {
 case _ => false
   }
 
+  /**
+   * Return false iff we may truncate during casting `from` type to `to` 
type. e.g. long -> int,
+   * timestamp -> date.
+   */
+  def canUpCast(from: DataType, to: DataType): Boolean = (from, to) match {
--- End diff --

how about `def mayTruncate`? `canUpCast` is not accurate, we may not be 
able to cast even `canUpCast` returns true.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work f...

2017-01-15 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16474


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16387: [SPARK-18986][Core] ExternalAppendOnlyMap shouldn't fail...

2017-01-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16387
  
**[Test build #71425 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71425/testReport)**
 for PR 16387 at commit 
[`b1ef9ec`](https://github.com/apache/spark/commit/b1ef9ec749737125d833cd3a64922b4a9f8c32f1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work for Parq...

2017-01-15 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16474
  
thanks, merging to master/2.1!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16594
  
**[Test build #71424 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71424/testReport)**
 for PR 16594 at commit 
[`c3489fc`](https://github.com/apache/spark/commit/c3489fcad32caa1d6a9b7182e387a46aae5710fa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-15 Thread wzhfy

Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/16594
  
cc @rxin @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-15 Thread wzhfy

GitHub user wzhfy opened a pull request:

https://github.com/apache/spark/pull/16594

[SPARK-17078] [SQL] Show stats when explain

## What changes were proposed in this pull request?

Currently we can only check the estimated stats in logical plans by 
debugging. We need to provide an easier and more efficient way for 
developers/users.
In this pr, we add an internal conf, when it's true, we can check the stats 
by explain extended command.

## How was this patch tested?

Add test case.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wzhfy/spark showStats

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16594.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16594


commit c3489fcad32caa1d6a9b7182e387a46aae5710fa
Author: wangzhenhua 
Date:   2017-01-16T07:24:23Z

show stats in explain command




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16387: [SPARK-18986][Core] ExternalAppendOnlyMap shouldn't fail...

2017-01-15 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16387
  
@samkum Thanks for testing this. I think it is because every time 
`forceSpill` is called now, it will spill the map anyway. I will add a check to 
only spill the map if the map is not empty.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work for Parq...

2017-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16474
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71417/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work for Parq...

2017-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16474
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work for Parq...

2017-01-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16474
  
**[Test build #71417 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71417/testReport)**
 for PR 16474 at commit 
[`261e1b5`](https://github.com/apache/spark/commit/261e1b5f295ca35ed2635c75aa9f1b91d8805bd7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...

2017-01-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r96171888
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 ---
@@ -316,30 +329,43 @@ class CastSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 val zts = sd + " 00:00:00"
 val sts = sd + " 00:00:02"
 val nts = sts + ".1"
-val ts = Timestamp.valueOf(nts)
-
-var c = Calendar.getInstance()
-c.set(2015, 2, 8, 2, 30, 0)
-checkEvaluation(cast(cast(new Timestamp(c.getTimeInMillis), 
StringType), TimestampType),
-  c.getTimeInMillis * 1000)
-c = Calendar.getInstance()
-c.set(2015, 10, 1, 2, 30, 0)
-checkEvaluation(cast(cast(new Timestamp(c.getTimeInMillis), 
StringType), TimestampType),
-  c.getTimeInMillis * 1000)
+val ts = withDefaultTimeZone(TimeZoneGMT)(Timestamp.valueOf(nts))
+
+for (tz <- ALL_TIMEZONES) {
+  val timeZoneId = Option(tz.getID)
--- End diff --

when will `timeZoneId` be None here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...

2017-01-15 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r96171901
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
 ---
@@ -41,13 +46,18 @@ object ReplaceExpressions extends Rule[LogicalPlan] {
  */
 object ComputeCurrentTime extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = {
-val dateExpr = CurrentDate()
+val currentDates = mutable.Map.empty[String, Literal]
 val timeExpr = CurrentTimestamp()
-val currentDate = Literal.create(dateExpr.eval(EmptyRow), 
dateExpr.dataType)
-val currentTime = Literal.create(timeExpr.eval(EmptyRow), 
timeExpr.dataType)
+val timestamp = timeExpr.eval(EmptyRow).asInstanceOf[Long]
+val currentTime = Literal.create(timestamp, timeExpr.dataType)
 
 plan transformAllExpressions {
-  case CurrentDate() => currentDate
+  case CurrentDate(Some(timeZoneId)) =>
+currentDates.getOrElseUpdate(timeZoneId, {
+  Literal.create(
+DateTimeUtils.millisToDays(timestamp / 1000L, 
TimeZone.getTimeZone(timeZoneId)),
+DateType)
+})
   case CurrentTimestamp() => currentTime
--- End diff --

timestamp is an absolute value -- timezone only matters when converting a 
timestamp into a displayable value (string) or date.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...

2017-01-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r96171760
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 ---
@@ -32,20 +34,20 @@ import org.apache.spark.unsafe.types.UTF8String
  */
 class CastSuite extends SparkFunSuite with ExpressionEvalHelper {
 
-  private def cast(v: Any, targetType: DataType): Cast = {
+  private def cast(v: Any, targetType: DataType, timeZoneId: 
Option[String] = None): Cast = {
 v match {
-  case lit: Expression => Cast(lit, targetType)
-  case _ => Cast(Literal(v), targetType)
+  case lit: Expression => Cast(lit, targetType, timeZoneId)
+  case _ => Cast(Literal(v), targetType, timeZoneId)
 }
   }
 
   // expected cannot be null
-  private def checkCast(v: Any, expected: Any): Unit = {
-checkEvaluation(cast(v, Literal(expected).dataType), expected)
+  private def checkCast(v: Any, expected: Any, timeZoneId: Option[String] 
= None): Unit = {
--- End diff --

where do you call this method and set the `timeZoneId` parameter?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with ...

2017-01-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16593
  
**[Test build #71423 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71423/testReport)**
 for PR 16593 at commit 
[`7c09a7c`](https://github.com/apache/spark/commit/7c09a7ca1b948368cf67505e8bd19d0ae6e6142b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...

2017-01-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r96171393
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -195,19 +231,26 @@ case class Hour(child: Expression) extends 
UnaryExpression with ImplicitCastInpu
   > SELECT _FUNC_('2009-07-30 12:58:59');
58
   """)
-case class Minute(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
+case class Minute(child: Expression, timeZoneId: Option[String] = None)
--- End diff --

Logically `Minute`/`Second` are not timezone-aware right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16591: [SPARK-19227][CORE] remove unused imports and outdated c...

2017-01-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16591
  
**[Test build #71422 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71422/testReport)**
 for PR 16591 at commit 
[`e98d9ab`](https://github.com/apache/spark/commit/e98d9abdb6b4073f8d75beee919081e1a4baf1dc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16591: [SPARK-19227][CORE] remove unused imports and outdated c...

2017-01-15 Thread uncleGen

Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16591
  
This work does not change any code, but just delete unused imports and fix 
some code style issue. 
cc @srowen


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...

2017-01-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r96171142
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala
 ---
@@ -41,13 +46,18 @@ object ReplaceExpressions extends Rule[LogicalPlan] {
  */
 object ComputeCurrentTime extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = {
-val dateExpr = CurrentDate()
+val currentDates = mutable.Map.empty[String, Literal]
 val timeExpr = CurrentTimestamp()
-val currentDate = Literal.create(dateExpr.eval(EmptyRow), 
dateExpr.dataType)
-val currentTime = Literal.create(timeExpr.eval(EmptyRow), 
timeExpr.dataType)
+val timestamp = timeExpr.eval(EmptyRow).asInstanceOf[Long]
+val currentTime = Literal.create(timestamp, timeExpr.dataType)
 
 plan transformAllExpressions {
-  case CurrentDate() => currentDate
+  case CurrentDate(Some(timeZoneId)) =>
+currentDates.getOrElseUpdate(timeZoneId, {
+  Literal.create(
+DateTimeUtils.millisToDays(timestamp / 1000L, 
TimeZone.getTimeZone(timeZoneId)),
+DateType)
+})
   case CurrentTimestamp() => currentTime
--- End diff --

why `CurrentTimestamp` is not timezone aware?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with ...

2017-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16593
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with ...

2017-01-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16593
  
**[Test build #71421 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71421/testReport)**
 for PR 16593 at commit 
[`6c31d01`](https://github.com/apache/spark/commit/6c31d017324b3c7f310103d2d4b5138bbef4b463).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with ...

2017-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16593
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71421/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with ...

2017-01-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16593
  
**[Test build #71421 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71421/testReport)**
 for PR 16593 at commit 
[`6c31d01`](https://github.com/apache/spark/commit/6c31d017324b3c7f310103d2d4b5138bbef4b463).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable wor...

2017-01-15 Thread windpiger

GitHub user windpiger opened a pull request:

https://github.com/apache/spark/pull/16593

[SPARK-19153][SQL]DataFrameWriter.saveAsTable work with create partitioned 
table

## What changes were proposed in this pull request?

After [SPARK-19107](https://issues.apache.org/jira/browse/SPARK-19153), we 
now can treat hive as a data source and create hive tables with DataFrameWriter 
and Catalog. However, the support is not completed, there are still some cases 
we do not support.

this PR provide DataFrameWriter.saveAsTable work with hive format to create 
partitioned table.

## How was this patch tested?
unit test added


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/windpiger/spark 
saveAsTableWithPartitionedTable

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16593.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16593


commit 6c31d017324b3c7f310103d2d4b5138bbef4b463
Author: windpiger 
Date:   2017-01-16T06:23:09Z

[SPARK-19153][SQL]DataFrameWriter.saveAsTable work with create partitioned 
table




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work for Parq...

2017-01-15 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16474
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

2017-01-15 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16561#discussion_r96168755
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala 
---
@@ -28,22 +28,56 @@ import org.apache.spark.sql.catalyst.rules.Rule
  */
 
 /**
- * Make sure that a view's child plan produces the view's output 
attributes. We wrap the child
- * with a Project and add an alias for each output attribute. The 
attributes are resolved by
- * name. This should be only done after the batch of Resolution, because 
the view attributes are
- * not completely resolved during the batch of Resolution.
+ * Make sure that a view's child plan produces the view's output 
attributes. We try to wrap the
+ * child by:
+ * 1. Generate the `queryOutput` by:
+ *1.1. If the query column names are defined, map the column names to 
attributes in the child
+ * output by name;
+ *1.2. Else set the child output attributes to `queryOutput`.
+ * 2. Map the `queryQutput` to view output by index, if the corresponding 
attributes don't match,
+ *try to up cast and alias the attribute in `queryOutput` to the 
attribute in the view output.
+ * 3. Add a Project over the child, with the new output generated by the 
previous steps.
+ * If the view output doesn't have the same number of columns neither with 
the child output, nor
+ * with the query column names, throw an AnalysisException.
+ *
+ * This should be only done after the batch of Resolution, because the 
view attributes are not
+ * completely resolved during the batch of Resolution.
  */
 case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] {
   override def apply(plan: LogicalPlan): LogicalPlan = plan 
resolveOperators {
-case v @ View(_, output, child) if child.resolved =>
+case v @ View(desc, output, child) if child.resolved =>
   val resolver = conf.resolver
-  val newOutput = output.map { attr =>
-val originAttr = findAttributeByName(attr.name, child.output, 
resolver)
-// The dataType of the output attributes may be not the same with 
that of the view output,
-// so we should cast the attribute to the dataType of the view 
output attribute. If the
-// cast can't perform, will throw an AnalysisException.
-Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = 
attr.exprId,
-  qualifier = attr.qualifier, explicitMetadata = 
Some(attr.metadata))
+  val queryColumnNames = desc.viewQueryColumnNames
+  // If the view output doesn't have the same number of columns with 
the child output and the
+  // query column names, throw an AnalysisException.
+  if (output.length != child.output.length && output.length != 
queryColumnNames.length) {
+throw new AnalysisException(
+  s"The view output ${output.mkString("[", ",", "]")} doesn't have 
the same number of " +
+s"columns with the child output ${child.output.mkString("[", 
",", "]")}")
+  }
+  // If the child output is the same with the view output, we don't 
need to generate the query
+  // output again.
+  val queryOutput = if (queryColumnNames.nonEmpty && output != 
child.output) {
--- End diff --

Oh I think that's better!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

2017-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16464
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71418/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

2017-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16464
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

2017-01-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16464
  
**[Test build #71418 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71418/testReport)**
 for PR 16464 at commit 
[`e133ee6`](https://github.com/apache/spark/commit/e133ee64961beaf10b7885ece76ded021ae5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

2017-01-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16561#discussion_r96168453
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala 
---
@@ -28,22 +28,56 @@ import org.apache.spark.sql.catalyst.rules.Rule
  */
 
 /**
- * Make sure that a view's child plan produces the view's output 
attributes. We wrap the child
- * with a Project and add an alias for each output attribute. The 
attributes are resolved by
- * name. This should be only done after the batch of Resolution, because 
the view attributes are
- * not completely resolved during the batch of Resolution.
+ * Make sure that a view's child plan produces the view's output 
attributes. We try to wrap the
+ * child by:
+ * 1. Generate the `queryOutput` by:
+ *1.1. If the query column names are defined, map the column names to 
attributes in the child
+ * output by name;
+ *1.2. Else set the child output attributes to `queryOutput`.
+ * 2. Map the `queryQutput` to view output by index, if the corresponding 
attributes don't match,
+ *try to up cast and alias the attribute in `queryOutput` to the 
attribute in the view output.
+ * 3. Add a Project over the child, with the new output generated by the 
previous steps.
+ * If the view output doesn't have the same number of columns neither with 
the child output, nor
+ * with the query column names, throw an AnalysisException.
+ *
+ * This should be only done after the batch of Resolution, because the 
view attributes are not
+ * completely resolved during the batch of Resolution.
  */
 case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] {
   override def apply(plan: LogicalPlan): LogicalPlan = plan 
resolveOperators {
-case v @ View(_, output, child) if child.resolved =>
+case v @ View(desc, output, child) if child.resolved =>
   val resolver = conf.resolver
-  val newOutput = output.map { attr =>
-val originAttr = findAttributeByName(attr.name, child.output, 
resolver)
-// The dataType of the output attributes may be not the same with 
that of the view output,
-// so we should cast the attribute to the dataType of the view 
output attribute. If the
-// cast can't perform, will throw an AnalysisException.
-Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = 
attr.exprId,
-  qualifier = attr.qualifier, explicitMetadata = 
Some(attr.metadata))
+  val queryColumnNames = desc.viewQueryColumnNames
+  // If the view output doesn't have the same number of columns with 
the child output and the
+  // query column names, throw an AnalysisException.
+  if (output.length != child.output.length && output.length != 
queryColumnNames.length) {
+throw new AnalysisException(
+  s"The view output ${output.mkString("[", ",", "]")} doesn't have 
the same number of " +
+s"columns with the child output ${child.output.mkString("[", 
",", "]")}")
+  }
+  // If the child output is the same with the view output, we don't 
need to generate the query
+  // output again.
+  val queryOutput = if (queryColumnNames.nonEmpty && output != 
child.output) {
+desc.viewQueryColumnNames.map { colName =>
+  findAttributeByName(colName, child.output, resolver)
+}
+  } else {
+child.output
+  }
--- End diff --

how about
```
val queryOutput = if (queryColumnNames.nonEmpty) {
  if (output.length != queryColumnNames.length) throw ...
  desc.viewQueryColumnNames.map { colName =>
findAttributeByName(colName, child.output, resolver)
  }
} else {
  // For view created before Spark 2.1, the view text is already fully 
qualified, the plan output is view output.
  child.output
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

2017-01-15 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16561#discussion_r96168416
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala 
---
@@ -28,22 +28,56 @@ import org.apache.spark.sql.catalyst.rules.Rule
  */
 
 /**
- * Make sure that a view's child plan produces the view's output 
attributes. We wrap the child
- * with a Project and add an alias for each output attribute. The 
attributes are resolved by
- * name. This should be only done after the batch of Resolution, because 
the view attributes are
- * not completely resolved during the batch of Resolution.
+ * Make sure that a view's child plan produces the view's output 
attributes. We try to wrap the
+ * child by:
+ * 1. Generate the `queryOutput` by:
+ *1.1. If the query column names are defined, map the column names to 
attributes in the child
+ * output by name;
+ *1.2. Else set the child output attributes to `queryOutput`.
+ * 2. Map the `queryQutput` to view output by index, if the corresponding 
attributes don't match,
+ *try to up cast and alias the attribute in `queryOutput` to the 
attribute in the view output.
+ * 3. Add a Project over the child, with the new output generated by the 
previous steps.
+ * If the view output doesn't have the same number of columns neither with 
the child output, nor
+ * with the query column names, throw an AnalysisException.
+ *
+ * This should be only done after the batch of Resolution, because the 
view attributes are not
+ * completely resolved during the batch of Resolution.
  */
 case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] {
   override def apply(plan: LogicalPlan): LogicalPlan = plan 
resolveOperators {
-case v @ View(_, output, child) if child.resolved =>
+case v @ View(desc, output, child) if child.resolved =>
   val resolver = conf.resolver
-  val newOutput = output.map { attr =>
-val originAttr = findAttributeByName(attr.name, child.output, 
resolver)
-// The dataType of the output attributes may be not the same with 
that of the view output,
-// so we should cast the attribute to the dataType of the view 
output attribute. If the
-// cast can't perform, will throw an AnalysisException.
-Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = 
attr.exprId,
-  qualifier = attr.qualifier, explicitMetadata = 
Some(attr.metadata))
+  val queryColumnNames = desc.viewQueryColumnNames
+  // If the view output doesn't have the same number of columns with 
the child output and the
+  // query column names, throw an AnalysisException.
+  if (output.length != child.output.length && output.length != 
queryColumnNames.length) {
+throw new AnalysisException(
+  s"The view output ${output.mkString("[", ",", "]")} doesn't have 
the same number of " +
+s"columns with the child output ${child.output.mkString("[", 
",", "]")}")
+  }
+  // If the child output is the same with the view output, we don't 
need to generate the query
+  // output again.
+  val queryOutput = if (queryColumnNames.nonEmpty && output != 
child.output) {
--- End diff --

For a nested view, the inner view operator may have been resolved, in that 
case the output is the same with child.output.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

2017-01-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16561#discussion_r96168283
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala 
---
@@ -28,22 +28,56 @@ import org.apache.spark.sql.catalyst.rules.Rule
  */
 
 /**
- * Make sure that a view's child plan produces the view's output 
attributes. We wrap the child
- * with a Project and add an alias for each output attribute. The 
attributes are resolved by
- * name. This should be only done after the batch of Resolution, because 
the view attributes are
- * not completely resolved during the batch of Resolution.
+ * Make sure that a view's child plan produces the view's output 
attributes. We try to wrap the
+ * child by:
+ * 1. Generate the `queryOutput` by:
+ *1.1. If the query column names are defined, map the column names to 
attributes in the child
+ * output by name;
+ *1.2. Else set the child output attributes to `queryOutput`.
+ * 2. Map the `queryQutput` to view output by index, if the corresponding 
attributes don't match,
+ *try to up cast and alias the attribute in `queryOutput` to the 
attribute in the view output.
+ * 3. Add a Project over the child, with the new output generated by the 
previous steps.
+ * If the view output doesn't have the same number of columns neither with 
the child output, nor
+ * with the query column names, throw an AnalysisException.
+ *
+ * This should be only done after the batch of Resolution, because the 
view attributes are not
+ * completely resolved during the batch of Resolution.
  */
 case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] {
   override def apply(plan: LogicalPlan): LogicalPlan = plan 
resolveOperators {
-case v @ View(_, output, child) if child.resolved =>
+case v @ View(desc, output, child) if child.resolved =>
   val resolver = conf.resolver
-  val newOutput = output.map { attr =>
-val originAttr = findAttributeByName(attr.name, child.output, 
resolver)
-// The dataType of the output attributes may be not the same with 
that of the view output,
-// so we should cast the attribute to the dataType of the view 
output attribute. If the
-// cast can't perform, will throw an AnalysisException.
-Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = 
attr.exprId,
-  qualifier = attr.qualifier, explicitMetadata = 
Some(attr.metadata))
+  val queryColumnNames = desc.viewQueryColumnNames
+  // If the view output doesn't have the same number of columns with 
the child output and the
+  // query column names, throw an AnalysisException.
+  if (output.length != child.output.length && output.length != 
queryColumnNames.length) {
+throw new AnalysisException(
+  s"The view output ${output.mkString("[", ",", "]")} doesn't have 
the same number of " +
+s"columns with the child output ${child.output.mkString("[", 
",", "]")}")
+  }
+  // If the child output is the same with the view output, we don't 
need to generate the query
+  // output again.
+  val queryOutput = if (queryColumnNames.nonEmpty && output != 
child.output) {
--- End diff --

`output != child.output` will always be true right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

2017-01-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16561#discussion_r96168225
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala 
---
@@ -28,22 +28,56 @@ import org.apache.spark.sql.catalyst.rules.Rule
  */
 
 /**
- * Make sure that a view's child plan produces the view's output 
attributes. We wrap the child
- * with a Project and add an alias for each output attribute. The 
attributes are resolved by
- * name. This should be only done after the batch of Resolution, because 
the view attributes are
- * not completely resolved during the batch of Resolution.
+ * Make sure that a view's child plan produces the view's output 
attributes. We try to wrap the
+ * child by:
+ * 1. Generate the `queryOutput` by:
+ *1.1. If the query column names are defined, map the column names to 
attributes in the child
+ * output by name;
+ *1.2. Else set the child output attributes to `queryOutput`.
+ * 2. Map the `queryQutput` to view output by index, if the corresponding 
attributes don't match,
+ *try to up cast and alias the attribute in `queryOutput` to the 
attribute in the view output.
+ * 3. Add a Project over the child, with the new output generated by the 
previous steps.
+ * If the view output doesn't have the same number of columns neither with 
the child output, nor
+ * with the query column names, throw an AnalysisException.
+ *
+ * This should be only done after the batch of Resolution, because the 
view attributes are not
+ * completely resolved during the batch of Resolution.
  */
 case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] {
   override def apply(plan: LogicalPlan): LogicalPlan = plan 
resolveOperators {
-case v @ View(_, output, child) if child.resolved =>
+case v @ View(desc, output, child) if child.resolved =>
   val resolver = conf.resolver
-  val newOutput = output.map { attr =>
-val originAttr = findAttributeByName(attr.name, child.output, 
resolver)
-// The dataType of the output attributes may be not the same with 
that of the view output,
-// so we should cast the attribute to the dataType of the view 
output attribute. If the
-// cast can't perform, will throw an AnalysisException.
-Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = 
attr.exprId,
-  qualifier = attr.qualifier, explicitMetadata = 
Some(attr.metadata))
+  val queryColumnNames = desc.viewQueryColumnNames
+  // If the view output doesn't have the same number of columns with 
the child output and the
+  // query column names, throw an AnalysisException.
+  if (output.length != child.output.length && output.length != 
queryColumnNames.length) {
--- End diff --

This condition doesn't look very clear to me. How about `if 
(queryColumnNames.nonEmpty && output.length != queryColumnNames.length)`? When 
`queryColumnNames` is empty, it means this view is created prior to Spark 2.2, 
and we don't need to check anything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

2017-01-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16561#discussion_r96167779
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala 
---
@@ -28,22 +28,56 @@ import org.apache.spark.sql.catalyst.rules.Rule
  */
 
 /**
- * Make sure that a view's child plan produces the view's output 
attributes. We wrap the child
- * with a Project and add an alias for each output attribute. The 
attributes are resolved by
- * name. This should be only done after the batch of Resolution, because 
the view attributes are
- * not completely resolved during the batch of Resolution.
+ * Make sure that a view's child plan produces the view's output 
attributes. We try to wrap the
+ * child by:
+ * 1. Generate the `queryOutput` by:
+ *1.1. If the query column names are defined, map the column names to 
attributes in the child
+ * output by name;
--- End diff --

should we mention that, this is mostly for `SELECT * ...`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

2017-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16561
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71416/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

2017-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16561
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

2017-01-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16561
  
**[Test build #71416 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71416/testReport)**
 for PR 16561 at commit 
[`16ec310`](https://github.com/apache/spark/commit/16ec310d96471579af916716f6c99df60fd20bc5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16387: [SPARK-18986][Core] ExternalAppendOnlyMap shouldn't fail...

2017-01-15 Thread samkum

Github user samkum commented on the issue:

https://github.com/apache/spark/pull/16387
  
I have tested this, but I found a very strange observation. GC frequency 
has increased many folds...and majority of the time is spend in GC.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16592: [SPARK-19235] [SQL] [TESTS] Enable Test Cases in DDLSuit...

2017-01-15 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16592
  
cc @cloud-fan I think we really need to do this ASAP for improving the test 
case coverage in DDL commands, when I do the PR: 
https://github.com/apache/spark/pull/16587 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-15 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/16344
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16592: [SPARK-19235] [SQL] [TESTS] Enable Test Cases in ...

2017-01-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16592#discussion_r96166442
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -102,6 +76,198 @@ class DDLSuite extends QueryTest with SharedSQLContext 
with BeforeAndAfterEach {
   tracksPartitionsInCatalog = true)
   }
 
+  test("desc table for parquet data source table using in-memory catalog") 
{
+val tabName = "tab1"
+withTable(tabName) {
+  sql(s"CREATE TABLE $tabName(a int comment 'test') USING parquet ")
+
+  checkAnswer(
+sql(s"DESC $tabName").select("col_name", "data_type", "comment"),
+Row("a", "int", "test")
+  )
+}
+  }
+
+  test("select/insert into the managed table") {
+val tabName = "tbl"
+withTable(tabName) {
+  sql(s"CREATE TABLE $tabName (i INT, j STRING)")
+  val catalogTable =
+
spark.sessionState.catalog.getTableMetadata(TableIdentifier(tabName, 
Some("default")))
+  assert(catalogTable.tableType == CatalogTableType.MANAGED)
+
+  var message = intercept[AnalysisException] {
+sql(s"INSERT OVERWRITE TABLE $tabName SELECT 1, 'a'")
+  }.getMessage
+  assert(message.contains("Hive support is required to insert into the 
following tables"))
+  message = intercept[AnalysisException] {
+sql(s"SELECT * FROM $tabName")
+  }.getMessage
+  assert(message.contains("Hive support is required to select over the 
following tables"))
+}
+  }
+
+  test("select/insert into external table") {
+withTempDir { tempDir =>
+  val tabName = "tbl"
+  withTable(tabName) {
+sql(
+  s"""
+ |CREATE EXTERNAL TABLE $tabName (i INT, j STRING)
+ |ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
+ |LOCATION '$tempDir'
+   """.stripMargin)
+val catalogTable =
+  
spark.sessionState.catalog.getTableMetadata(TableIdentifier(tabName, 
Some("default")))
+assert(catalogTable.tableType == CatalogTableType.EXTERNAL)
+
+var message = intercept[AnalysisException] {
+  sql(s"INSERT OVERWRITE TABLE $tabName SELECT 1, 'a'")
+}.getMessage
+assert(message.contains("Hive support is required to insert into 
the following tables"))
+message = intercept[AnalysisException] {
+  sql(s"SELECT * FROM $tabName")
+}.getMessage
+assert(message.contains("Hive support is required to select over 
the following tables"))
+  }
+}
+  }
+
+  test("Create Hive Table As Select") {
+import testImplicits._
+withTable("t", "t1") {
+  var e = intercept[AnalysisException] {
+sql("CREATE TABLE t SELECT 1 as a, 1 as b")
+  }.getMessage
+  assert(e.contains("Hive support is required to use CREATE Hive TABLE 
AS SELECT"))
+
+  spark.range(1).select('id as 'a, 'id as 'b).write.saveAsTable("t1")
+  e = intercept[AnalysisException] {
+sql("CREATE TABLE t SELECT a, b from t1")
+  }.getMessage
+  assert(e.contains("Hive support is required to use CREATE Hive TABLE 
AS SELECT"))
+}
+  }
+
+  test("alter table: set location (datasource table)") {
+testSetLocation(isDatasourceTable = true)
+  }
+
+  test("alter table: set properties (datasource table)") {
+testSetProperties(isDatasourceTable = true)
+  }
+
+  test("alter table: unset properties (datasource table)") {
+testUnsetProperties(isDatasourceTable = true)
+  }
+
+  test("alter table: set serde (datasource table)") {
+testSetSerde(isDatasourceTable = true)
+  }
+
+  test("alter table: set serde partition (datasource table)") {
+testSetSerdePartition(isDatasourceTable = true)
+  }
+
+  test("alter table: change column (datasource table)") {
+testChangeColumn(isDatasourceTable = true)
+  }
+
+  test("alter table: add partition (datasource table)") {
+testAddPartitions(isDatasourceTable = true)
+  }
+
+  test("alter table: drop partition (datasource table)") {
+testDropPartitions(isDatasourceTable = true)
+  }
+
+  test("alter table: rename partition (datasource table)") {
+testRenamePartitions(isDatasourceTable = true)
+  }
+
+  test("drop table - data source table") {
+testDropTable(isDatasourceTable = true)
+  }
--- End diff --

The above 10 test cases are currently running with `InMemoryCatalog` only. 
The reason is `HiveExternalCatalog` does not allow users to change the table 
provider from `hive` to the others. In the

[GitHub] spark issue #16592: [SPARK-19235] [SQL] [TESTS] Enable Test Cases in DDLSuit...

2017-01-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16592
  
**[Test build #71420 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71420/testReport)**
 for PR 16592 at commit 
[`0133463`](https://github.com/apache/spark/commit/01334635c5433f0515beb92660b79796c97677d5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16592: [SPARK-19235] [SQL] [TESTS] Enable Test Cases in ...

2017-01-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16592#discussion_r96166347
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -102,6 +76,198 @@ class DDLSuite extends QueryTest with SharedSQLContext 
with BeforeAndAfterEach {
   tracksPartitionsInCatalog = true)
   }
 
+  test("desc table for parquet data source table using in-memory catalog") 
{
+val tabName = "tab1"
+withTable(tabName) {
+  sql(s"CREATE TABLE $tabName(a int comment 'test') USING parquet ")
+
+  checkAnswer(
+sql(s"DESC $tabName").select("col_name", "data_type", "comment"),
+Row("a", "int", "test")
+  )
+}
+  }
+
+  test("select/insert into the managed table") {
+val tabName = "tbl"
+withTable(tabName) {
+  sql(s"CREATE TABLE $tabName (i INT, j STRING)")
+  val catalogTable =
+
spark.sessionState.catalog.getTableMetadata(TableIdentifier(tabName, 
Some("default")))
+  assert(catalogTable.tableType == CatalogTableType.MANAGED)
+
+  var message = intercept[AnalysisException] {
+sql(s"INSERT OVERWRITE TABLE $tabName SELECT 1, 'a'")
+  }.getMessage
+  assert(message.contains("Hive support is required to insert into the 
following tables"))
+  message = intercept[AnalysisException] {
+sql(s"SELECT * FROM $tabName")
+  }.getMessage
+  assert(message.contains("Hive support is required to select over the 
following tables"))
+}
+  }
+
+  test("select/insert into external table") {
+withTempDir { tempDir =>
+  val tabName = "tbl"
+  withTable(tabName) {
+sql(
+  s"""
+ |CREATE EXTERNAL TABLE $tabName (i INT, j STRING)
+ |ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
+ |LOCATION '$tempDir'
+   """.stripMargin)
+val catalogTable =
+  
spark.sessionState.catalog.getTableMetadata(TableIdentifier(tabName, 
Some("default")))
+assert(catalogTable.tableType == CatalogTableType.EXTERNAL)
+
+var message = intercept[AnalysisException] {
+  sql(s"INSERT OVERWRITE TABLE $tabName SELECT 1, 'a'")
+}.getMessage
+assert(message.contains("Hive support is required to insert into 
the following tables"))
+message = intercept[AnalysisException] {
+  sql(s"SELECT * FROM $tabName")
+}.getMessage
+assert(message.contains("Hive support is required to select over 
the following tables"))
+  }
+}
+  }
+
+  test("Create Hive Table As Select") {
+import testImplicits._
+withTable("t", "t1") {
+  var e = intercept[AnalysisException] {
+sql("CREATE TABLE t SELECT 1 as a, 1 as b")
+  }.getMessage
+  assert(e.contains("Hive support is required to use CREATE Hive TABLE 
AS SELECT"))
+
+  spark.range(1).select('id as 'a, 'id as 'b).write.saveAsTable("t1")
+  e = intercept[AnalysisException] {
+sql("CREATE TABLE t SELECT a, b from t1")
+  }.getMessage
+  assert(e.contains("Hive support is required to use CREATE Hive TABLE 
AS SELECT"))
+}
+  }
--- End diff --

The above four cases are copied from the existing ones in DDLSuites. These 
test cases only makes sense to InMemoryCatalog. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16592: [SPARK-19235] [SQL] [TESTS] Enable Test Cases in ...

2017-01-15 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/16592

[SPARK-19235] [SQL] [TESTS] Enable Test Cases in DDLSuite with Hive 
Metastore

### What changes were proposed in this pull request?
So far, the test cases in DDLSuites only verify the behaviors of 
InMemoryCatalog. That means, they do not cover the scenarios using 
HiveExternalCatalog. Thus, we need to improve the existing test suite to run 
these cases using Hive metastore.

When porting these test cases, a bug of `SET LOCATION` is found. `path` is 
not set when the location is not changed. 

After this PR, a few changes are made, as summarized below,
- `DDLSuite` becomes an abstract class. Both `InMemoryCatalogedDDLSuite` 
and `HiveCatalogedDDLSuite` extend it. `InMemoryCatalogedDDLSuite` is using 
`InMemoryCatalog`. `HiveCatalogedDDLSuite` is using `HiveExternalCatalog`.
- `InMemoryCatalogedDDLSuite` contains all the existing test cases in 
`DDLSuite`. 
- `HiveCatalogedDDLSuite` contains a subset of `DDLSuite`. The following 
test cases are excluded:

1. The following test cases only make sense for `InMemoryCatalog`:
```
  test("desc table for parquet data source table using in-memory catalog")
  test("select/insert into the managed table")
  test("select/insert into external table")
  test("Create Hive Table As Select")
```

2. The following test cases are unable to be ported because we are unable 
to alter table provider when using Hive metastore. In the future PRs we need to 
improve the test cases so that altering table provider is not needed:
```
  test("alter table: set location (datasource table)")
  test("alter table: set properties (datasource table)")
  test("alter table: unset properties (datasource table)")
  test("alter table: set serde (datasource table)")
  test("alter table: set serde partition (datasource table)")
  test("alter table: change column (datasource table)")
  test("alter table: add partition (datasource table)")
  test("alter table: drop partition (datasource table)")
  test("alter table: rename partition (datasource table)")
  test("drop table - data source table")
```

**TODO** : in the future PRs, we need to remove `HiveDDLSuite` and move the 
test cases to either `DDLSuite`,  `InMemoryCatalogedDDLSuite` or 
`HiveCatalogedDDLSuite`. 

### How was this patch tested?
N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark refactorDDLSuite

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16592.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16592


commit 01334635c5433f0515beb92660b79796c97677d5
Author: gatorsmile 
Date:   2017-01-16T04:52:55Z

fix




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...

2017-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12064
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71419/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...

2017-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12064
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...

2017-01-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12064
  
**[Test build #71419 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71419/testReport)**
 for PR 12064 at commit 
[`fe2c424`](https://github.com/apache/spark/commit/fe2c424a4aa08f5f50387069db26f179e50395d4).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-15 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/16344
  
add to whitelist.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper ...

2017-01-15 Thread wangmiao1981

Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16566#discussion_r96165457
  
--- Diff: R/pkg/R/mllib_clustering.R ---
@@ -38,6 +45,146 @@ setClass("KMeansModel", representation(jobj = "jobj"))
 #' @note LDAModel since 2.1.0
 setClass("LDAModel", representation(jobj = "jobj"))
 
+#' Bisecting K-Means Clustering Model
+#'
+#' Fits a bisecting k-means clustering model against a Spark DataFrame.
+#' Users can call \code{summary} to print a summary of the fitted model, 
\code{predict} to make
+#' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' @param data a SparkDataFrame for training.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
+#'operators are supported, including '~', '.', ':', '+', 
and '-'.
+#'Note that the response variable of formula is empty in 
spark.bisectingKmeans.
+#' @param k the desired number of leaf clusters. Must be > 1.
+#'  The actual number could be smaller if there are no divisible 
leaf clusters.
+#' @param maxIter maximum iteration number.
+#' @param minDivisibleClusterSize The minimum number of points (if greater 
than or equal to 1.0)
+#'or the minimum proportion of points (if 
less than 1.0) of a divisible cluster.
+#' @param seed the random seed.
+#' @param ... additional argument(s) passed to the method.
+#' @return \code{spark.bisectingKmeans} returns a fitted bisecting k-means 
model.
+#' @rdname spark.bisectingKmeans
+#' @aliases spark.bisectingKmeans,SparkDataFrame,formula-method
+#' @name spark.bisectingKmeans
+#' @export
+#' @examples
+#' \dontrun{
+#' sparkR.session()
+#' data(iris)
+#' df <- createDataFrame(iris)
+#' model <- spark.bisectingKmeans(df, Sepal_Length ~ Sepal_Width, k = 4)
+#' summary(model)
+#'
+#' # fitted values on training data
+#' fitted <- predict(model, df)
+#' head(select(fitted, "Sepal_Length", "prediction"))
+#'
+#' # save fitted model to input path
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#'
+#' # can also read back the saved model and print
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#' }
+#' @note spark.bisectingKmeans since 2.2.0
+#' @seealso \link{predict}, \link{read.ml}, \link{write.ml}
+setMethod("spark.bisectingKmeans", signature(data = "SparkDataFrame", 
formula = "formula"),
+  function(data, formula, k = 4, maxIter = 20, 
minDivisibleClusterSize = 1.0, seed = NULL) {
+formula <- paste0(deparse(formula), collapse = "")
+if (!is.null(seed)) {
+  seed <- as.character(as.integer(seed))
+}
+jobj <- 
callJStatic("org.apache.spark.ml.r.BisectingKMeansWrapper", "fit",
+data@sdf, formula, as.integer(k), 
as.integer(maxIter),
+as.numeric(minDivisibleClusterSize), seed)
+new("BisectingKMeansModel", jobj = jobj)
+  })
+
+#  Get the summary of a bisecting k-means model
+
+#' @param object a fitted bisecting k-means model.
+#' @return \code{summary} returns summary information of the fitted model, 
which is a list.
+#' The list includes the model's \code{k} (number of cluster 
centers),
+#' \code{coefficients} (model cluster centers),
+#' \code{size} (number of data points in each cluster), and 
\code{cluster}
+#' (cluster centers of the transformed data).
+#' @rdname spark.bisectingKmeans
+#' @export
+#' @note summary(BisectingKMeansModel) since 2.2.0
+setMethod("summary", signature(object = "BisectingKMeansModel"),
+  function(object) {
+jobj <- object@jobj
+is.loaded <- callJMethod(jobj, "isLoaded")
+features <- callJMethod(jobj, "features")
+coefficients <- callJMethod(jobj, "coefficients")
+k <- callJMethod(jobj, "k")
+size <- callJMethod(jobj, "size")
+coefficients <- t(matrix(coefficients, ncol = k))
+colnames(coefficients) <- unlist(features)
+rownames(coefficients) <- 1:k
+cluster <- if (is.loaded) {
+  NULL
+} else {
+  dataFrame(callJMethod(jobj, "cluster"))
+}
+list(k = k, coefficients = coefficients, size = size,
+cluster = cluster, is.loaded = is.loaded)
+  })
+
+#  Predicted values based on a bisecting k-means model
+
+#' @param newData a SparkDataFrame for testing.
+#' @return \code{predict} returns the predicted

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-15 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/16344
  
ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...

2017-01-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12064
  
**[Test build #71419 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71419/testReport)**
 for PR 12064 at commit 
[`fe2c424`](https://github.com/apache/spark/commit/fe2c424a4aa08f5f50387069db26f179e50395d4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

2017-01-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16464
  
**[Test build #71418 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71418/testReport)**
 for PR 16464 at commit 
[`e133ee6`](https://github.com/apache/spark/commit/e133ee64961beaf10b7885ece76ded021ae5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16588: [SPARK-19092] [SQL] [Backport-2.1] Save() API of DataFra...

2017-01-15 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16588
  
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16588: [SPARK-19092] [SQL] [Backport-2.1] Save() API of ...

2017-01-15 Thread gatorsmile

Github user gatorsmile closed the pull request at:

https://github.com/apache/spark/pull/16588


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work for Parq...

2017-01-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16474
  
**[Test build #71417 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71417/testReport)**
 for PR 16474 at commit 
[`261e1b5`](https://github.com/apache/spark/commit/261e1b5f295ca35ed2635c75aa9f1b91d8805bd7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

2017-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16561
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

2017-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16561
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71415/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

2017-01-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16561
  
**[Test build #71415 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71415/testReport)**
 for PR 16561 at commit 
[`d6537a5`](https://github.com/apache/spark/commit/d6537a5d66ba88cb827a6fc84a7ec8be79af5277).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16591: [SPARK-19227][CORE] remove ununsed imports and outdated ...

2017-01-15 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16591
  
I know this is acceptable assuming from the history. However, I have seen a 
lot of unused imports across the code base. I think it'd be nicer if the same 
instances are checked at least within the same package. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16591: [SPARK-19227][CORE] remove ununsed imports and outdated ...

2017-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16591
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16591: [SPARK-19227][CORE] remove ununsed imports and outdated ...

2017-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16591
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71414/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16591: [SPARK-19227][CORE] remove ununsed imports and outdated ...

2017-01-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16591
  
**[Test build #71414 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71414/testReport)**
 for PR 16591 at commit 
[`22405d1`](https://github.com/apache/spark/commit/22405d19a1ca6944162ddc330ea2dfc5a7c4638c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelation...

2017-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16545
  
Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelation...

2017-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16545
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71413/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelation...

2017-01-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16545
  
**[Test build #71413 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71413/testReport)**
 for PR 16545 at commit 
[`6d1defb`](https://github.com/apache/spark/commit/6d1defb57407a12c6bf6020ed18cb2249328e435).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work f...

2017-01-15 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16474#discussion_r96162649
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala
 ---
@@ -135,11 +135,21 @@ class FileScanRDD(
   try {
 if (ignoreCorruptFiles) {
   currentIterator = new NextIterator[Object] {
-private val internalIter = readFunction(currentFile)
+private val internalIter = {
+  try {
+// The readFunction may read files before consuming 
the iterator.
+// E.g., vectorized Parquet reader.
+readFunction(currentFile)
+  } catch {
+case e @(_: RuntimeException | _: IOException) =>
--- End diff --

yeah, I have this concern too in the pr description.

One problem is the error message is varying across data sources. To list 
all error messages here looks not a good idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work f...

2017-01-15 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16474#discussion_r96162528
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala
 ---
@@ -135,11 +135,21 @@ class FileScanRDD(
   try {
 if (ignoreCorruptFiles) {
   currentIterator = new NextIterator[Object] {
-private val internalIter = readFunction(currentFile)
+private val internalIter = {
+  try {
+// The readFunction may read files before consuming 
the iterator.
+// E.g., vectorized Parquet reader.
+readFunction(currentFile)
--- End diff --

I think it is hard to guarantee this because `readFunction` is coming from 
individual data source. Even we can modify current data sources, we may not be 
able to prevent other data sources doing this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...

2017-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15505
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71412/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...

2017-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15505
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...

2017-01-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15505
  
**[Test build #71412 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71412/testReport)**
 for PR 15505 at commit 
[`2d9569e`](https://github.com/apache/spark/commit/2d9569ea6cfeb09837897d4290f7605bc229c645).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

2017-01-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16561
  
**[Test build #71416 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71416/testReport)**
 for PR 16561 at commit 
[`16ec310`](https://github.com/apache/spark/commit/16ec310d96471579af916716f6c99df60fd20bc5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

2017-01-15 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16561#discussion_r96159390
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -198,9 +203,44 @@ case class CatalogTable(
 
   /**
* Return the default database name we use to resolve a view, should be 
None if the CatalogTable
-   * is not a View.
+   * is not a View or created by older versions of Spark(before 2.2.0).
+   */
+  def viewDefaultDatabase: Option[String] = 
properties.get(VIEW_DEFAULT_DATABASE)
+
+  /**
+   * Return the output column names of the query that creates a view, the 
column names are used to
+   * resolve a view, should be None if the CatalogTable is not a View or 
created by older versions
+   * of Spark(before 2.2.0).
+   */
+  def viewQueryColumnNames: Seq[String] = {
+for {
+  numCols <- properties.get(VIEW_QUERY_OUTPUT_COLUMN_NUM).toSeq
--- End diff --

It is needed to generate the correct output.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16209: [WIP][SPARK-10849][SQL] Adds option to the JDBC data sou...

2017-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16209
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71410/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16517: [SPARK-18243][SQL] Port Hive writing to use FileF...

2017-01-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16517#discussion_r96158823
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
 ---
@@ -99,7 +99,7 @@ class HadoopMapReduceCommitProtocol(jobId: String, path: 
String)
   }
 
   private def getFilename(taskContext: TaskAttemptContext, ext: String): 
String = {
-// The file name looks like 
part-r-0-2dd664f9-d2c4-4ffe-878f-c6c70c1fb0cb_3.gz.parquet
+// The file name looks like 
part-0-2dd664f9-d2c4-4ffe-878f-c6c70c1fb0cb_3.gz.parquet
--- End diff --

ok I should update this string, `c000` is files-count, which is added 
recently.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16209: [WIP][SPARK-10849][SQL] Adds option to the JDBC data sou...

2017-01-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16209
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16209: [WIP][SPARK-10849][SQL] Adds option to the JDBC data sou...

2017-01-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16209
  
**[Test build #71410 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71410/testReport)**
 for PR 16209 at commit 
[`ff71bac`](https://github.com/apache/spark/commit/ff71bac8162778f99e8985476498010c22268926).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16517: [SPARK-18243][SQL] Port Hive writing to use FileF...

2017-01-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16517#discussion_r96158740
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -69,34 +69,31 @@ import org.apache.spark.util.SerializableJobConf
  *  {{{
  *  Map('a' -> Some('1'), 'b' -> None)
  *  }}}.
- * @param child the logical plan representing data to write to.
+ * @param query the logical plan representing data to write to.
  * @param overwrite overwrite existing table or partitions.
  * @param ifNotExists If true, only write if the table or partition does 
not exist.
  */
 case class InsertIntoHiveTable(
 table: MetastoreRelation,
 partition: Map[String, Option[String]],
-child: SparkPlan,
+query: LogicalPlan,
 overwrite: Boolean,
-ifNotExists: Boolean) extends UnaryExecNode {
+ifNotExists: Boolean) extends RunnableCommand {
 
-  @transient private val sessionState = 
sqlContext.sessionState.asInstanceOf[HiveSessionState]
-  @transient private val externalCatalog = 
sqlContext.sharedState.externalCatalog
+  override protected def innerChildren: Seq[LogicalPlan] = query :: Nil
--- End diff --

We can't. We only replace `InsertIntoTable` with `InsertIntoHiveTable` at 
planner.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16588: [SPARK-19092] [SQL] [Backport-2.1] Save() API of DataFra...

2017-01-15 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16588
  
thanks, merging to 2.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16587: [SPARK-19229] [SQL] Disallow Creating Hive Source...

2017-01-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16587#discussion_r96158395
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala
 ---
@@ -278,7 +277,7 @@ abstract class ExternalCatalogSuite extends 
SparkFunSuite with BeforeAndAfterEac
   schema = new StructType()
 .add("HelLo", "int", nullable = false)
 .add("WoRLd", "int", nullable = true),
-  provider = Some("hive"),
+  provider = Some("parquet"),
--- End diff --

shall we also use `defaultProvider` here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16587: [SPARK-19229] [SQL] Disallow Creating Hive Source...

2017-01-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16587#discussion_r96158356
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
 ---
@@ -186,6 +186,11 @@ class InMemoryCatalog(
 val db = tableDefinition.identifier.database.get
 requireDbExists(db)
 val table = tableDefinition.identifier.table
+if (tableDefinition.provider.isDefined && 
tableDefinition.provider.get.toLowerCase == "hive") {
--- End diff --

shall we put this in `HiveOnlyCheck`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

2017-01-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16561#discussion_r96158153
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala 
---
@@ -28,22 +28,56 @@ import org.apache.spark.sql.catalyst.rules.Rule
  */
 
 /**
- * Make sure that a view's child plan produces the view's output 
attributes. We wrap the child
- * with a Project and add an alias for each output attribute. The 
attributes are resolved by
- * name. This should be only done after the batch of Resolution, because 
the view attributes are
- * not completely resolved during the batch of Resolution.
+ * Make sure that a view's child plan produces the view's output 
attributes. We try to wrap the
+ * child by:
+ * 1. Generate the `queryOutput` by:
+ *1.1. If the query column names are defined, map the column names to 
attributes in the child
+ * output by name;
+ *1.2. Else set the child output attributes to `queryOutput`.
+ * 2. Map the `queryQutput` to view output by index, if the corresponding 
attributes don't match,
+ *try to up cast and alias the attribute in `queryOutput` to the 
attribute in the view output.
+ * 3. Add a Project over the child, with the new output generated by the 
previous steps.
+ * If the view output doesn't have the same number of columns neither with 
the child output, nor
+ * with the query column names, throw an AnalysisException.
+ *
+ * This should be only done after the batch of Resolution, because the 
view attributes are not
+ * completely resolved during the batch of Resolution.
  */
 case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] {
   override def apply(plan: LogicalPlan): LogicalPlan = plan 
resolveOperators {
-case v @ View(_, output, child) if child.resolved =>
+case v @ View(desc, output, child) if child.resolved =>
   val resolver = conf.resolver
-  val newOutput = output.map { attr =>
-val originAttr = findAttributeByName(attr.name, child.output, 
resolver)
-// The dataType of the output attributes may be not the same with 
that of the view output,
-// so we should cast the attribute to the dataType of the view 
output attribute. If the
-// cast can't perform, will throw an AnalysisException.
-Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = 
attr.exprId,
-  qualifier = attr.qualifier, explicitMetadata = 
Some(attr.metadata))
+  val queryColumnNames = desc.viewQueryColumnNames
+  // If the view output doesn't have the same number of columns either 
with the child output,
+  // or with the query column names, throw an AnalysisException.
+  if (output.length != child.output.length && output.length != 
queryColumnNames.length) {
--- End diff --

the comment says `or` but the code use `&&`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

2017-01-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16561#discussion_r96157931
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala 
---
@@ -680,21 +700,70 @@ class SQLViewSuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
 }
   }
 
-  test("correctly handle type casting between view output and child 
output") {
+  test("resolve a view with custom column names") {
 withTable("testTable") {
+  spark.range(1, 10).selectExpr("id", "id + 1 
id1").write.saveAsTable("testTable")
   withView("testView") {
-spark.range(1, 
10).toDF("id1").write.format("json").saveAsTable("testTable")
-sql("CREATE VIEW testView AS SELECT * FROM testTable")
+val testView = CatalogTable(
+  identifier = TableIdentifier("testView"),
+  tableType = CatalogTableType.VIEW,
+  storage = CatalogStorageFormat.empty,
+  schema = new StructType().add("x", "long").add("y", "long"),
+  viewOriginalText = Some("SELECT * FROM testTable"),
+  viewText = Some("SELECT * FROM testTable"),
+  properties = Map(CatalogTable.VIEW_DEFAULT_DATABASE -> "default",
+CatalogTable.VIEW_QUERY_OUTPUT_COLUMN_NUM -> "2",
+s"${CatalogTable.VIEW_QUERY_OUTPUT_COLUMN_NAME_PREFIX}0" -> 
"id",
+s"${CatalogTable.VIEW_QUERY_OUTPUT_COLUMN_NAME_PREFIX}1" -> 
"id1"))
+hiveContext.sessionState.catalog.createTable(testView, 
ignoreIfExists = false)
+
+// Correctly resolve a view with custom column names.
+checkAnswer(sql("SELECT * FROM testView ORDER BY x"), (1 to 
9).map(i => Row(i, i + 1)))
--- End diff --

can we use `select x, y ...`? to test the custom column names really work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

2017-01-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16561#discussion_r96157834
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala 
---
@@ -680,21 +700,70 @@ class SQLViewSuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
 }
   }
 
-  test("correctly handle type casting between view output and child 
output") {
+  test("resolve a view with custom column names") {
 withTable("testTable") {
+  spark.range(1, 10).selectExpr("id", "id + 1 
id1").write.saveAsTable("testTable")
   withView("testView") {
-spark.range(1, 
10).toDF("id1").write.format("json").saveAsTable("testTable")
-sql("CREATE VIEW testView AS SELECT * FROM testTable")
+val testView = CatalogTable(
+  identifier = TableIdentifier("testView"),
+  tableType = CatalogTableType.VIEW,
+  storage = CatalogStorageFormat.empty,
+  schema = new StructType().add("x", "long").add("y", "long"),
--- End diff --

let's think about how we will persistent view when we have custom column 
names. Ideally we will have a logical plan representing the view, a SQL 
statement of the view query, and a `Seq[String]` for the custom column names.

1. call `plan.schema` to get the view schema, and zip it with custom column 
names, to get the final schema and save it. Then use `plan.schema.map(_.name)` 
to generate the `VIEW_QUERY_OUTPUT_COLUMN_NAME` in table properties.
2. call `plan.schema` to get the view schema, and save it as the final 
schema. Then use custom column names to generate the 
`VIEW_QUERY_OUTPUT_COLUMN_NAME` in table properties.

Personally I think option 2 is more natural, what do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

2017-01-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16561#discussion_r96157384
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -198,9 +203,44 @@ case class CatalogTable(
 
   /**
* Return the default database name we use to resolve a view, should be 
None if the CatalogTable
-   * is not a View.
+   * is not a View or created by older versions of Spark(before 2.2.0).
+   */
+  def viewDefaultDatabase: Option[String] = 
properties.get(VIEW_DEFAULT_DATABASE)
+
+  /**
+   * Return the output column names of the query that creates a view, the 
column names are used to
+   * resolve a view, should be None if the CatalogTable is not a View or 
created by older versions
+   * of Spark(before 2.2.0).
+   */
+  def viewQueryColumnNames: Seq[String] = {
+for {
+  numCols <- properties.get(VIEW_QUERY_OUTPUT_COLUMN_NUM).toSeq
+  index <- 0 until numCols.toInt
+} yield properties.getOrElse(
+  s"$VIEW_QUERY_OUTPUT_COLUMN_NAME_PREFIX$index",
+  throw new AnalysisException("Corrupted view query output column 
names in catalog: " +
+s"$numCols parts expected, but part $index is missing.")
+)
+  }
+
+  /**
+   * Insert/Update the view query output column names in `properties`.
*/
-  def viewDefaultDatabase: Option[String] = 
properties.get(CatalogTable.VIEW_DEFAULT_DATABASE)
+  def withQueryColumnNames(columns: Seq[String]): CatalogTable = {
+val props = new mutable.HashMap[String, String]
--- End diff --

let's follow the existing code for partition columns
```
properties.put(DATASOURCE_SCHEMA_NUMPARTCOLS, 
partitionColumns.length.toString)
  partitionColumns.zipWithIndex.foreach { case (partCol, index) =>
properties.put(s"$DATASOURCE_SCHEMA_PARTCOL_PREFIX$index", partCol)
  }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

2017-01-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16561#discussion_r96157302
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -198,9 +203,44 @@ case class CatalogTable(
 
   /**
* Return the default database name we use to resolve a view, should be 
None if the CatalogTable
-   * is not a View.
+   * is not a View or created by older versions of Spark(before 2.2.0).
+   */
+  def viewDefaultDatabase: Option[String] = 
properties.get(VIEW_DEFAULT_DATABASE)
+
+  /**
+   * Return the output column names of the query that creates a view, the 
column names are used to
+   * resolve a view, should be None if the CatalogTable is not a View or 
created by older versions
+   * of Spark(before 2.2.0).
+   */
+  def viewQueryColumnNames: Seq[String] = {
+for {
+  numCols <- properties.get(VIEW_QUERY_OUTPUT_COLUMN_NUM).toSeq
--- End diff --

`.toSeq` is not needed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

2017-01-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16561#discussion_r96157263
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -254,6 +294,9 @@ case class CatalogTable(
 
 object CatalogTable {
   val VIEW_DEFAULT_DATABASE = "view.default.database"
+  val VIEW_QUERY_OUTPUT_PREFIX = "view.query.out."
+  val VIEW_QUERY_OUTPUT_COLUMN_NUM = VIEW_QUERY_OUTPUT_PREFIX + "numCols"
--- End diff --

nit: `xxx_NUM_COLUMNS`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

2017-01-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16561#discussion_r96157156
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -198,9 +203,44 @@ case class CatalogTable(
 
   /**
* Return the default database name we use to resolve a view, should be 
None if the CatalogTable
-   * is not a View.
+   * is not a View or created by older versions of Spark(before 2.2.0).
+   */
+  def viewDefaultDatabase: Option[String] = 
properties.get(VIEW_DEFAULT_DATABASE)
+
+  /**
+   * Return the output column names of the query that creates a view, the 
column names are used to
+   * resolve a view, should be None if the CatalogTable is not a View or 
created by older versions
--- End diff --

should be `Nil`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16528: [SPARK-19148][SQL] do not expose the external table conc...

2017-01-15 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16528
  
cc @yhuai for final sign-off


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

2017-01-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16561
  
**[Test build #71415 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71415/testReport)**
 for PR 16561 at commit 
[`d6537a5`](https://github.com/apache/spark/commit/d6537a5d66ba88cb827a6fc84a7ec8be79af5277).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16591: [SPARK-19227][CORE] remove ununsed imports and outdated ...

2017-01-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16591
  
**[Test build #71414 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71414/testReport)**
 for PR 16591 at commit 
[`22405d1`](https://github.com/apache/spark/commit/22405d19a1ca6944162ddc330ea2dfc5a7c4638c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16587: [SPARK-19229] [SQL] Disallow Creating Hive Source...

2017-01-15 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16587#discussion_r96155690
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -1692,20 +1678,27 @@ class DDLSuite extends QueryTest with 
SharedSQLContext with BeforeAndAfterEach {
 
   test("truncate table - external table, temporary table, view (not 
allowed)") {
 import testImplicits._
-val path = Utils.createTempDir().getAbsolutePath
-(1 to 10).map { i => (i, i) }.toDF("a", 
"b").createTempView("my_temp_tab")
-sql(s"CREATE EXTERNAL TABLE my_ext_tab LOCATION '$path'")
-sql(s"CREATE VIEW my_view AS SELECT 1")
-intercept[NoSuchTableException] {
-  sql("TRUNCATE TABLE my_temp_tab")
+withTempPath { tempDir =>
+  withTable("my_ext_tab") {
+(("a", "b") :: Nil).toDF().write.parquet(tempDir.getCanonicalPath)
+(1 to 10).map { i => (i, i) }.toDF("a", 
"b").createTempView("my_temp_tab")
+sql(s"CREATE TABLE my_ext_tab using parquet LOCATION '$tempDir'")
--- End diff --

Ah, this is why you asked me 
https://github.com/apache/spark/pull/16586#discussion_r96142347. I just ran a 
test for this to help.

```
 - truncate table - external table, temporary table, view (not allowed) *** 
FAILED *** (188 milliseconds)
   org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/C:projectsspark   arget   
mpspark-9e70280d-56dc-4063-8f40-8e62fec18394;
   at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:382)
   at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
   at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
   at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
```

 Maybe, it'd be okay to just use `toURI` if this test is not supposed to 
test Windows path..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16591: [SPARK-19227][CORE] remove ununsed imports and ou...

2017-01-15 Thread uncleGen

GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/16591

[SPARK-19227][CORE] remove ununsed imports and outdated comments in 
`org.apache.spark.internal.config.ConfigEntry`

## What changes were proposed in this pull request?
remove ununsed imports and outdated comments in 
`org.apache.spark.internal.config.ConfigEntry`

## How was this patch tested?
existing ut


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark SPARK-19227

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16591.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16591


commit 22405d19a1ca6944162ddc330ea2dfc5a7c4638c
Author: uncleGen 
Date:   2017-01-16T01:47:53Z

SPARK-19227: remove ununsed imports and outdated comments in 
`org.apache.spark.internal.config.ConfigEntry`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 246 matches

Mail list logo