date:20161206

[GitHub] spark pull request #16154: [SPARK-17822] [R] Make JVMObjectTracker a member ...

2016-12-06 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16154#discussion_r91018557
  
--- Diff: core/src/main/scala/org/apache/spark/api/r/JVMObjectTracker.scala 
---
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.api.r
+
+import java.util.concurrent.atomic.AtomicInteger
+import java.util.concurrent.ConcurrentHashMap
+
+/** JVM object ID wrapper */
+private[r] case class JVMObjectId(id: String)
--- End diff --

Is there a reason we should use Strings instead of Long - Just wondering as 
this will also add to memory overhead


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16154: [SPARK-17822] [R] Make JVMObjectTracker a member ...

2016-12-06 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16154#discussion_r91145714
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala 
---
@@ -247,7 +247,7 @@ private[sql] object SQLUtils extends Logging {
 dataType match {
   case 's' =>
 // Read StructType for DataFrame
-val fields = SerDe.readList(dis).asInstanceOf[Array[Object]]
+val fields = SerDe.readList(dis, jvmObjectTracker = 
null).asInstanceOf[Array[Object]]
--- End diff --

The two changes below are functions used in serializing and deserializing 
StructType from R to JVM (when calling collect or parallelize)

Again I think since StructTypes themselves can't contain other JVM 
references I think its fine, but I'm less sure about this. (i.e what happens 
when we say embed an array into the structType). 

Another option here is to figure out which RBackend is calling this 
function and use its JVMObjectTracker ? Do we want to support cases where there 
are multiple active RBackends ? If not we can denote an `activeRBackend` and 
then use its jvm object tracker in these places ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16154: [SPARK-17822] [R] Make JVMObjectTracker a member ...

2016-12-06 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16154#discussion_r91136204
  
--- Diff: core/src/main/scala/org/apache/spark/api/r/RBackendHandler.scala 
---
@@ -143,12 +142,8 @@ private[r] class RBackendHandler(server: RBackend)
   val cls = if (isStatic) {
 Utils.classForName(objId)
   } else {
-JVMObjectTracker.get(objId) match {
-  case None => throw new IllegalArgumentException("Object not 
found " + objId)
-  case Some(o) =>
-obj = o
-o.getClass
-}
+obj = server.jvmObjectTracker.get(JVMObjectId(objId))
--- End diff --

This is minor but the exception now thrown will be a `NullPointerException` 
in place of what it was before  ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16154: [SPARK-17822] [R] Make JVMObjectTracker a member ...

2016-12-06 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16154#discussion_r91140911
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala 
---
@@ -158,7 +158,7 @@ private[sql] object SQLUtils extends Logging {
 val dis = new DataInputStream(bis)
 val num = SerDe.readInt(dis)
 Row.fromSeq((0 until num).map { i =>
-  doConversion(SerDe.readObject(dis), schema.fields(i).dataType)
+  doConversion(SerDe.readObject(dis, jvmObjectTracker = null), 
schema.fields(i).dataType)
--- End diff --

I think this one is safe because this is called on the output from the R 
UDF and the R UDF is assumed to return native types (i.e. no references to 
other JVM objects)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15923: [SPARK-4105] retry the fetch or stage if shuffle block i...

2016-12-06 Thread davies

Github user davies commented on the issue:

https://github.com/apache/spark/pull/15923
  
ping @JoshRosen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16174: [SPARK-18741][STREAMING] Reuse or clean-up SparkC...

2016-12-06 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16174#discussion_r91141519
  
--- Diff: 
streaming/src/test/java/org/apache/spark/streaming/LocalJavaStreamingContext.java
 ---
@@ -28,6 +29,7 @@
 
 @Before
 public void setUp() {
+SparkContext$.MODULE$.stopActiveContext();
--- End diff --

Can't you just call `SparkContext.stopActiveContext()` from Java? I see 
static methods in the bytecode for things like `jarOfObject`.

The line is also indented incorrectly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16161: [SPARK-18717][SQL] Make code generation for Scala Map wo...

2016-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16161
  
**[Test build #69741 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69741/consoleFull)**
 for PR 16161 at commit 
[`f325e75`](https://github.com/apache/spark/commit/f325e75ab61ac903a605090b756461f4b50b64ff).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16174: [SPARK-18741][STREAMING] Reuse or clean-up SparkC...

2016-12-06 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16174#discussion_r91144141
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -2350,6 +2350,16 @@ object SparkContext extends Logging {
 }
   }
 
+  private[spark] def getActiveContext(): Option[SparkContext] = {
+SPARK_CONTEXT_CONSTRUCTOR_LOCK.synchronized {
+  Option(activeContext.get())
+}
+  }
+
+  private[spark] def stopActiveContext(): Unit = {
--- End diff --

I don't know. I'm not a big fan of the approach you're taking here: calling 
this method before running tests. That feels like a sledgehammer to fix flaky 
tests. I think it would be better for test code to be more careful about 
cleaning after itself. Kinda like most tests in spark-core use 
`LocalSparkContext` to more or less automatically do that without the need for 
these methods.

The `ReuseableSparkContext` trait you have is a step in that direction. If 
you make sure all needed streaming tests are using it, and keep this state 
within that class, I think it would be a better change.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16068: [SPARK-18637][SQL]Stateful UDF should be consider...

2016-12-06 Thread zhzhan

Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16068#discussion_r91142141
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala 
---
@@ -487,6 +488,52 @@ class HiveUDFSuite extends QueryTest with 
TestHiveSingleton with SQLTestUtils {
 assert(count4 == 1)
 sql("DROP TABLE parquet_tmp")
   }
+
+  test("Hive Stateful UDF") {
+withUserDefinedFunction("statefulUDF" -> true, "statelessUDF" -> true) 
{
+  sql(s"CREATE TEMPORARY FUNCTION statefulUDF AS 
'${classOf[StatefulUDF].getName}'")
+  sql(s"CREATE TEMPORARY FUNCTION statelessUDF AS 
'${classOf[StatelessUDF].getName}'")
+  withTempView("inputTable") {
--- End diff --

The test cannot resolve the function and throw the error if I use:
  test("Hive Stateful UDF") {
withUserDefinedFunction("statefulUDF" -> true, "statelessUDF" -> true) {
  sql(s"CREATE TEMPORARY FUNCTION statefulUDF AS 
'${classOf[StatefulUDF].getName}'")
  sql(s"CREATE TEMPORARY FUNCTION statelessUDF AS 
'${classOf[StatelessUDF].getName}'")
  val testData = spark.range(10).repartition(1)
  println(s"start session: $spark")
  val m = testData.select("statefulUDF() as s, ")
  checkAnswer(testData.select("statefulUDF() as s").agg(max($"s")), 
Row(10))
  ...

Do I miss anything, or is it a bug? I will investigate why this happens.

Failed to analyze query: org.apache.spark.sql.AnalysisException: cannot 
resolve '`statefulUDF() as s`' given input columns: [id];;
'Project ['statefulUDF() as s]
+- Repartition 1, true
   +- Range (0, 10, step=1, splits=Some(1))

org.apache.spark.sql.AnalysisException: cannot resolve '`statefulUDF() as 
s`' given input columns: [id];;
'Project ['statefulUDF() as s]
+- Repartition 1, true
   +- Range (0, 10, step=1, splits=Some(1))

at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:314)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:314)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:313)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionUp$1(QueryPlan.scala:282)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:292)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2$1.apply(QueryPlan.scala:296)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:296)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$7.apply(QueryPlan.scala:301)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:192)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:301)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:74)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:67)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:132)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:67)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:57)
at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:48)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16175: [SPARK-17460][SQL]check if statistics.sizeInBytes >=0 in...

2016-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16175
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16175: [SPARK-17460][SQL]check if statistics.sizeInBytes >=0 in...

2016-12-06 Thread huaxingao

Github user huaxingao commented on the issue:

https://github.com/apache/spark/pull/16175
  
@gatorsmile Could you please take a look when you have time?  Thanks a lot!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16174: [SPARK-18741][STREAMING] Reuse or clean-up SparkContext ...

2016-12-06 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/16174
  
> This is resource intensive and it can lead to unneeded test failures 
(flakyness) when park.driver.allowMultipleContexts is disabled (this happens 
when the order of tests changes).

Do you mean the recent failures? It was fixed by #16105.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16175: [SPARK-17460][SQL]check if statistics.sizeInBytes...

2016-12-06 Thread huaxingao

GitHub user huaxingao opened a pull request:

https://github.com/apache/spark/pull/16175

[SPARK-17460][SQL]check if statistics.sizeInBytes >=0 in canBroadcast

## What changes were proposed in this pull request?

1. In SparkStrategies.canBroadcast, I will add the check   
plan.statistics.sizeInBytes >= 0
2. In LocalRelations.statistics, when calculate the statistics, I will 
change the size to BigInt so it won't overflow. 

  
## How was this patch tested?

I will add a test case to make sure the statistics.sizeInBytes won't 
overflow. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/huaxingao/spark spark-17460

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16175.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16175


commit 9ad6725975e36e513ce89f54582409bd3e9d2cfe
Author: Huaxin Gao 
Date:   2016-12-06T18:22:38Z

[SPARK-17460][SQL]check if statistics.sizeInBytes >=0 in canBroadcast




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16128: [SPARK-18671][SS][TEST] Added tests to ensure stability ...

2016-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16128
  
**[Test build #69740 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69740/consoleFull)**
 for PR 16128 at commit 
[`26a86d6`](https://github.com/apache/spark/commit/26a86d64f2f492094960b19332cabd7457f95e61).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16128: [SPARK-18671][SS][TEST] Added tests to ensure stability ...

2016-12-06 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/16128
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16172: [SPARK-18740] Log spark.app.name in driver logs

2016-12-06 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16172


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16172: [SPARK-18740] Log spark.app.name in driver logs

2016-12-06 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/16172
  
LGTM. Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16014: [SPARK-18590][SPARKR] build R source package when...

2016-12-06 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16014#discussion_r91135457
  
--- Diff: dev/create-release/release-build.sh ---
@@ -221,14 +235,13 @@ if [[ "$1" == "package" ]]; then
 
   # We increment the Zinc port each time to avoid OOM's and other 
craziness if multiple builds
   # share the same Zinc server.
-  # Make R source package only once. (--r)
   FLAGS="-Psparkr -Phive -Phive-thriftserver -Pyarn -Pmesos"
   make_binary_release "hadoop2.3" "-Phadoop-2.3 $FLAGS" "3033" &
   make_binary_release "hadoop2.4" "-Phadoop-2.4 $FLAGS" "3034" &
   make_binary_release "hadoop2.6" "-Phadoop-2.6 $FLAGS" "3035" &
   make_binary_release "hadoop2.7" "-Phadoop-2.7 $FLAGS" "3036" "withpip" &
   make_binary_release "hadoop2.4-without-hive" "-Psparkr -Phadoop-2.4 
-Pyarn -Pmesos" "3037" &
-  make_binary_release "without-hadoop" "--r -Psparkr -Phadoop-provided 
-Pyarn -Pmesos" "3038" &
+  make_binary_release "without-hadoop" "-Psparkr -Phadoop-provided -Pyarn 
-Pmesos" "3038" "withr" &
--- End diff --

Any specific reason to use the `without-hadoop` build for the R package ? 
Just wondering if this will affect the users in any fashion 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16168: [SPARK-18209][SQL] More robust view canonicalization wit...

2016-12-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16168
  
also cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16168: [SPARK-18209][SQL] More robust view canonicalization wit...

2016-12-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16168
  
@jiangxb1987 We need a low level design about the changes you made here, 
especially document the issue you hit and the solution you choose. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16138: [WIP][SPARK-16609] Add to_date/to_timestamp with format ...

2016-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16138
  
**[Test build #69739 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69739/consoleFull)**
 for PR 16138 at commit 
[`760650c`](https://github.com/apache/spark/commit/760650c312cb3e514f85404bf307ec4b41f00bb9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16138: [WIP][SPARK-16609] Add to_date/to_timestamp with ...

2016-12-06 Thread anabranch

Github user anabranch commented on a diff in the pull request:

https://github.com/apache/spark/pull/16138#discussion_r91134199
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -1047,6 +1047,53 @@ case class ToDate(child: Expression) extends 
UnaryExpression with ImplicitCastIn
 }
 
 /**
+ * Parses a column with a format to a timestamp.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(timestamp_str, fmt) - Parses the `timestamp` expression 
" +
+"with the `fmt` expression.",
+  extended = """
+Examples:
+  > SELECT _FUNC_('2016-12-31', '-MM-dd');
+   2016-12-31
+ """)
+case class ParseToDate(left: Expression, right: Expression, child: 
Expression)
+  extends RuntimeReplaceable with ImplicitCastInputTypes {
+
+  def this(left: Expression, format: Expression) = {
+this(left, format, ToDate(new ParseToTimestamp(left, format)))
+  }
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(StringType, 
StringType)
+  override def dataType: DataType = DateType
+  override def flatArguments: Iterator[Any] = Iterator(left, right)
+  override def sql: String = s"$prettyName(${left.sql}, ${right.sql})"
+}
+
+/**
+ * Parses a column with a format to a timestamp.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(timestamp, fmt) - Parses the `timestamp` expression with 
the `fmt` expression.",
+  extended = """
+Examples:
+  > SELECT _FUNC_('2016-12-31', '-MM-dd');
+   2016-12-31 00:00:00.0
+  """)
+case class ParseToTimestamp(left: Expression, right: Expression, child: 
Expression)
+  extends RuntimeReplaceable with ImplicitCastInputTypes {
+
+  def this(left: Expression, format: Expression) = {
+  this(left, format, Cast(UnixTimestamp(left, format), TimestampType))
--- End diff --

This is returning the date in millis (1970) as opposed to seconds. But if 
you use the `UnixTimestamp` expression via the function, then cast, it will 
work just fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16138: [WIP][SPARK-16609] Add to_date/to_timestamp with ...

2016-12-06 Thread anabranch

Github user anabranch commented on a diff in the pull request:

https://github.com/apache/spark/pull/16138#discussion_r91134077
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -1047,6 +1047,53 @@ case class ToDate(child: Expression) extends 
UnaryExpression with ImplicitCastIn
 }
 
 /**
+ * Parses a column with a format to a timestamp.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(timestamp_str, fmt) - Parses the `timestamp` expression 
" +
+"with the `fmt` expression.",
+  extended = """
+Examples:
+  > SELECT _FUNC_('2016-12-31', '-MM-dd');
+   2016-12-31
+ """)
+case class ParseToDate(left: Expression, right: Expression, child: 
Expression)
+  extends RuntimeReplaceable with ImplicitCastInputTypes {
+
+  def this(left: Expression, format: Expression) = {
+this(left, format, ToDate(new ParseToTimestamp(left, format)))
+  }
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(StringType, 
StringType)
+  override def dataType: DataType = DateType
+  override def flatArguments: Iterator[Any] = Iterator(left, right)
+  override def sql: String = s"$prettyName(${left.sql}, ${right.sql})"
+}
+
+/**
+ * Parses a column with a format to a timestamp.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(timestamp, fmt) - Parses the `timestamp` expression with 
the `fmt` expression.",
+  extended = """
+Examples:
+  > SELECT _FUNC_('2016-12-31', '-MM-dd');
+   2016-12-31 00:00:00.0
+  """)
+case class ParseToTimestamp(left: Expression, right: Expression, child: 
Expression)
+  extends RuntimeReplaceable with ImplicitCastInputTypes {
+
+  def this(left: Expression, format: Expression) = {
+  this(left, format, Cast(UnixTimestamp(left, format), TimestampType))
--- End diff --

For reasons unbeknownst to me, Cast(, TimestampType) is not equivalent 
to cast("timestamp")


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91133855
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -557,9 +557,12 @@ class SessionCatalog(
* If the relation is a view, the relation will be wrapped in a 
[[SubqueryAlias]] which will
* track the name of the view.
*/
-  def lookupRelation(name: TableIdentifier, alias: Option[String] = None): 
LogicalPlan = {
+  def lookupRelation(
+  name: TableIdentifier,
+  alias: Option[String] = None,
+  databaseHint: Option[String] = None): LogicalPlan = {
--- End diff --

Need a document for these parameters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91133644
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -509,32 +509,42 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+private def lookupTableFromCatalog(
+u: UnresolvedRelation, db: Option[String] = None): LogicalPlan = {
   try {
-catalog.lookupRelation(u.tableIdentifier, u.alias)
+catalog.lookupRelation(u.tableIdentifier, u.alias, db)
   } catch {
 case _: NoSuchTableException =>
   u.failAnalysis(s"Table or view not found: ${u.tableName}")
   }
 }
 
-def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
-  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
-i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
-  case u: UnresolvedRelation =>
-val table = u.tableIdentifier
-if (table.database.isDefined && conf.runSQLonFile && 
!catalog.isTemporaryTable(table) &&
+def apply(plan: LogicalPlan): LogicalPlan = {
+  var currentDatabase = catalog.getCurrentDatabase
+  plan resolveOperators {
+case i@InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+  i.copy(table = 
EliminateSubqueryAliases(lookupTableFromCatalog(u)))
+case u: UnresolvedRelation =>
+  val table = u.tableIdentifier
+  if (table.database.isDefined && conf.runSQLonFile && 
!catalog.isTemporaryTable(table) &&
 (!catalog.databaseExists(table.database.get) || 
!catalog.tableExists(table))) {
-  // If the database part is specified, and we support running SQL 
directly on files, and
-  // it's not a temporary view, and the table does not exist, then 
let's just return the
-  // original UnresolvedRelation. It is possible we are matching a 
query like "select *
-  // from parquet.`/path/to/query`". The plan will get resolved 
later.
-  // Note that we are testing (!db_exists || !table_exists) 
because the catalog throws
-  // an exception from tableExists if the database does not exist.
-  u
-} else {
-  lookupTableFromCatalog(u)
-}
+// If the database part is specified, and we support running 
SQL directly on files, and
+// it's not a temporary view, and the table does not exist, 
then let's just return the
+// original UnresolvedRelation. It is possible we are matching 
a query like "select *
+// from parquet.`/path/to/query`". The plan will get resolved 
later.
+// Note that we are testing (!db_exists || !table_exists) 
because the catalog throws
+// an exception from tableExists if the database does not 
exist.
+u
+  } else {
+val logicalPlan = lookupTableFromCatalog(u, 
Some(currentDatabase))
+currentDatabase = logicalPlan.collectFirst {
--- End diff --

Please avoid using `var` here. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91133302
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -509,32 +509,42 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+private def lookupTableFromCatalog(
+u: UnresolvedRelation, db: Option[String] = None): LogicalPlan = {
   try {
-catalog.lookupRelation(u.tableIdentifier, u.alias)
+catalog.lookupRelation(u.tableIdentifier, u.alias, db)
   } catch {
 case _: NoSuchTableException =>
   u.failAnalysis(s"Table or view not found: ${u.tableName}")
   }
 }
 
-def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
-  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
-i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
-  case u: UnresolvedRelation =>
-val table = u.tableIdentifier
-if (table.database.isDefined && conf.runSQLonFile && 
!catalog.isTemporaryTable(table) &&
+def apply(plan: LogicalPlan): LogicalPlan = {
+  var currentDatabase = catalog.getCurrentDatabase
--- End diff --

Why we need to do it here? This should be moved to 
https://github.com/jiangxb1987/spark/blob/434009ef7fcfc552314dca35dc9395bb98f5e9c6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L539


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16138: [WIP][SPARK-16609] Add to_date/to_timestamp with format ...

2016-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16138
  
**[Test build #69738 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69738/consoleFull)**
 for PR 16138 at commit 
[`f1e075c`](https://github.com/apache/spark/commit/f1e075c49ba66f81e9cc40b9d96030f6afe6d9a2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91131928
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -221,11 +223,14 @@ case class CatalogTable(
 s"Last Access: ${new Date(lastAccessTime).toString}",
 s"Type: ${tableType.name}",
 if (schema.nonEmpty) s"Schema: ${schema.mkString("[", ", ", "]")}" 
else "",
+if (originalSchema.isDefined) s"Original Schema: 
${originalSchema.mkString("[", ", ", "]")}"
+else "",
--- End diff --

If one line is not enough, you can create a variable and then use the 
variable here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91131668
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -557,9 +557,12 @@ class SessionCatalog(
* If the relation is a view, the relation will be wrapped in a 
[[SubqueryAlias]] which will
* track the name of the view.
*/
-  def lookupRelation(name: TableIdentifier, alias: Option[String] = None): 
LogicalPlan = {
+  def lookupRelation(
--- End diff --

General suggestion: when you made a change like this, please update the 
above function descriptions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2016-12-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15819
  
Could you post the layout of that staging folder?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16138: [WIP][SPARK-16609] Add to_date/to_timestamp with format ...

2016-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16138
  
**[Test build #69737 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69737/consoleFull)**
 for PR 16138 at commit 
[`8e8339c`](https://github.com/apache/spark/commit/8e8339cf0dc522616d7eba8f4b7ca3b35dd4229b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16000: [SPARK-18537][Web UI]Add a REST api to spark stre...

2016-12-06 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16000#discussion_r91123740
  
--- Diff: 
streaming/src/main/java/org/apache/spark/streaming/status/api/v1/BatchStatus.java
 ---
@@ -0,0 +1,30 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming.status.api.v1;
--- End diff --

Pretty much.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16174: [SPARK-18741][STREAMING] Reuse or clean-up SparkContext ...

2016-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16174
  
**[Test build #69736 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69736/consoleFull)**
 for PR 16174 at commit 
[`04ce488`](https://github.com/apache/spark/commit/04ce488e9d6927f3cb172e854928c78ec094fb8d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16172: [SPARK-18740] Log spark.app.name in driver logs

2016-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16172
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16172: [SPARK-18740] Log spark.app.name in driver logs

2016-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16172
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69731/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16172: [SPARK-18740] Log spark.app.name in driver logs

2016-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16172
  
**[Test build #69731 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69731/consoleFull)**
 for PR 16172 at commit 
[`5401e64`](https://github.com/apache/spark/commit/5401e64552325353104bc7addf180a5a37f35047).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16092: [SPARK-18662] Move resource managers to separate directo...

2016-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16092
  
**[Test build #69735 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69735/consoleFull)**
 for PR 16092 at commit 
[`3e78591`](https://github.com/apache/spark/commit/3e7859134b9deeec2a160c9e60d0ad69eee719f1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16174: [SPARK-18741][STREAMING] Reuse or clean-up SparkC...

2016-12-06 Thread hvanhovell

GitHub user hvanhovell opened a pull request:

https://github.com/apache/spark/pull/16174

[SPARK-18741][STREAMING] Reuse or clean-up SparkContext in streaming tests

## What changes were proposed in this pull request?
Tests in Spark Streaming currently create a `SparkContext` for each test, 
and sometimes do not clean-up afterwards. This is resource intensive and it can 
lead to unneeded test failures (flakyness) when 
park.driver.allowMultipleContexts is disabled (this happens when the order of 
tests changes).

This PR makes most test re-use a `SparkContext`. For tests that have to 
create a new context (for instance `CheckpointSuite`) we make sure that no 
active `SparkContext` exists before the test, and that the created 
`SparkContext` is cleaned up afterwards. I have refactored the `TestSuiteBase` 
into two classes `TestSuiteBase` and a parent class `ReusableSparkContext`; 
this to make `SparkContext` management relatively straightforward for most 
tests.

I have done a simple very unscientific benchmark (n=1), and streaming tests 
with this patch took 212 seconds and streaming tests without this patch took 
252 seconds.

## How was this patch tested?
The patch only covers test code.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hvanhovell/spark SPARK-18741

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16174.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16174


commit 04ce488e9d6927f3cb172e854928c78ec094fb8d
Author: Herman van Hovell 
Date:   2016-12-06T16:29:23Z

Reuse or clean-up SparkContext in streaming tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16174: [SPARK-18741][STREAMING] Reuse or clean-up SparkContext ...

2016-12-06 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/16174
  
cc @tdas @zsxwing 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16092: [SPARK-18662] Move resource managers to separate directo...

2016-12-06 Thread foxish

Github user foxish commented on the issue:

https://github.com/apache/spark/pull/16092
  
@vanzin Done. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Us...

2016-12-06 Thread koeninger

Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16006#discussion_r91118893
  
--- Diff: 
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala
 ---
@@ -143,9 +147,14 @@ private[spark] class DirectKafkaInputDStream[K, V](
 
 lagPerPartition.map { case (tp, lag) =>
   val maxRateLimitPerPartition = ppc.maxRatePerPartition(tp)
-  val backpressureRate = Math.round(lag / totalLag.toFloat * rate)
-  tp -> (if (maxRateLimitPerPartition > 0) {
-Math.min(backpressureRate, maxRateLimitPerPartition)} else 
backpressureRate)
+  val effectiveRate = if (rate >= 0) rate else 
backpressureInitialRate
+  val estimateRate = Math.round(lag / totalLag.toFloat * 
effectiveRate)
+  val backpressureRate =
+if (estimateRate > maxRateLimitPerPartition && 
maxRateLimitPerPartition > 0) {
+  maxRateLimitPerPartition
+}
+else estimateRate
--- End diff --

http://spark.apache.org/contributing.html
use braces around the else clause unless it's all on one line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Us...

2016-12-06 Thread koeninger

Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16006#discussion_r91118458
  
--- Diff: 
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala
 ---
@@ -67,6 +67,10 @@ private[spark] class DirectKafkaInputDStream[K, V](
 ekp
   }
 
+  val backpressureInitialRate: Long =
+
_ssc.sparkContext.conf.getLong("spark.streaming.backpressure.initialRate",
+  
_ssc.sparkContext.conf.getDouble("spark.streaming.backpressure.pid.minRate", 
100).round)
--- End diff --

Shouldn't these be using ssc instead of _ssc for consistency?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16173: [SPARK-18742][CORE]readd spark.broadcast.factory conf to...

2016-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16173
  
**[Test build #69734 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69734/consoleFull)**
 for PR 16173 at commit 
[`f23becc`](https://github.com/apache/spark/commit/f23beccf1d484ea5121f6cf605b92090400808ad).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16173: [SPARK-18742][CORE]readd spark.broadcast.factory conf to...

2016-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16173
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16173: [SPARK-18742][CORE]readd spark.broadcast.factory conf to...

2016-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16173
  
**[Test build #69733 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69733/consoleFull)**
 for PR 16173 at commit 
[`1b4d4b6`](https://github.com/apache/spark/commit/1b4d4b632523f33b38e6997dddbba954f324ea59).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16173: [SPARK-18742][CORE]readd spark.broadcast.factory conf to...

2016-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16173
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69733/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16173: [SPARK-18742][CORE]readd spark.broadcast.factory conf to...

2016-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16173
  
**[Test build #69733 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69733/consoleFull)**
 for PR 16173 at commit 
[`1b4d4b6`](https://github.com/apache/spark/commit/1b4d4b632523f33b38e6997dddbba954f324ea59).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread nsyca

Github user nsyca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r9663
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala 
---
@@ -448,19 +476,105 @@ class SQLViewSuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
 }
   }
 
+  test("Using view after change the origin view") {
+withView("v1", "v2") {
+  sql("CREATE VIEW v1 AS SELECT id FROM jt")
+  sql("CREATE VIEW v2 AS SELECT * FROM v1")
+  withTable("jt2", "jt3") {
+// Don't change the view schema
+val df2 = (1 until 10).map(i => i + i).toDF("id")
+df2.write.format("json").saveAsTable("jt2")
+sql("ALTER VIEW v1 AS SELECT * FROM jt2")
+// the view v2 should have the same output with the view v1
+checkAnswer(sql("SELECT * FROM v2"), sql("SELECT * FROM v1"))
+// the view v2 should have the same output with the table jt2
+checkAnswer(sql("SELECT * FROM v2"), sql("SELECT * FROM jt2"))
+
+// Change the view schema
+val df3 = (1 until 10).map(i => i -> i).toDF("i", "j")
+df3.write.format("json").saveAsTable("jt3")
+sql("ALTER VIEW v1 AS SELECT * FROM jt3")
+val e = intercept[AnalysisException] {
+  sql("SELECT * FROM v2")
+}
+assert(e.message.contains(
+  "The underlying schema doesn't match the original schema, 
expected " +
+"STRUCT<`id`: INT> but got STRUCT<`i`: INT, `j`: INT>"))
+  }
+}
+  }
+
+  test("Using view after drop the origin view") {
+withView("v1", "v2") {
+  sql("CREATE VIEW v1 AS SELECT id FROM jt")
+  sql("CREATE VIEW v2 AS SELECT * FROM v1")
+  // Drop the referenced view
+  sql("DROP VIEW v1")
+  val e = intercept[RuntimeException] {
+sql("SELECT * FROM v2")
+  }
+  assert(e.getMessage.contains(
+"Failed to analyze the canonicalized SQL"))
+}
+  }
+
+  test("Using view after change the origin table") {
+withTable("tab1") {
+  val df = (1 until 10).map(i => i).toDF("i")
+  df.write.format("json").saveAsTable("tab1")
+  withView("v1", "v2") {
+sql("CREATE VIEW v1 AS SELECT * FROM tab1")
+sql("CREATE VIEW v2 AS SELECT * FROM v1")
+// Don't change the table schema
+val df2 = (1 until 10).map(i => i * i).toDF("i")
+df2.write.format("json").mode("overwrite").saveAsTable("tab1")
+// the view v2 should have the same output with the view v1
+checkAnswer(sql("SELECT * FROM v2"), sql("SELECT * FROM v1"))
+// the view v2 should have the same output with the table testTable
--- End diff --

`testTable` or `tab1`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16173: [SPARK-18742][CORE]readd spark.broadcast.factory conf to...

2016-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16173
  
**[Test build #69732 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69732/consoleFull)**
 for PR 16173 at commit 
[`a17d325`](https://github.com/apache/spark/commit/a17d3250d4472619fa987d9b389d48b165c44745).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16173: [SPARK-18742][CORE]readd spark.broadcast.factory conf to...

2016-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16173
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16173: [SPARK-18742][CORE]readd spark.broadcast.factory conf to...

2016-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16173
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69732/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16173: [SPARK-18742][CORE]readd spark.broadcast.factory conf to...

2016-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16173
  
**[Test build #69732 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69732/consoleFull)**
 for PR 16173 at commit 
[`a17d325`](https://github.com/apache/spark/commit/a17d3250d4472619fa987d9b389d48b165c44745).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16173: [SPARK-18742][CORE]readd spark.broadcast.factory ...

2016-12-06 Thread windpiger

GitHub user windpiger opened a pull request:

https://github.com/apache/spark/pull/16173

[SPARK-18742][CORE]readd spark.broadcast.factory conf to implement 
er-defined BroadcastFactory

## What changes were proposed in this pull request?
After SPARK-12588 Remove HTTPBroadcast [1], the one and only
implementation of BroadcastFactory is TorrentBroadcastFactory. No code
in Spark 2 uses BroadcastFactory (but TorrentBroadcastFactory) however
the scaladoc says [2]:

/**
 * An interface for all the broadcast implementations in Spark (to allow
 * multiple broadcast implementations). SparkContext uses a user-specified
 * BroadcastFactory implementation to instantiate a particular broadcast 
for the
 * entire Spark job.
 */

which is not correct since there is no way to plug in a custom
user-specified BroadcastFactory.

It is better to readd spark.broadcast.factory for user-defined 
BroadcastFactory
 
[1] https://issues.apache.org/jira/browse/SPARK-12588
[2] 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/broadcast/BroadcastFactory.scala#L25-L30

## How was this patch tested?
unit test added

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/windpiger/spark addBroadFactoryConf

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16173.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16173


commit a17d3250d4472619fa987d9b389d48b165c44745
Author: root 
Date:   2016-12-06T16:14:26Z

[SPARK-18742][CORE]readd spark.broadcast.factory conf to implement 
user-defined BroadcastFactory




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread nsyca

Github user nsyca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91110663
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala 
---
@@ -448,19 +476,105 @@ class SQLViewSuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
 }
   }
 
+  test("Using view after change the origin view") {
+withView("v1", "v2") {
+  sql("CREATE VIEW v1 AS SELECT id FROM jt")
+  sql("CREATE VIEW v2 AS SELECT * FROM v1")
+  withTable("jt2", "jt3") {
+// Don't change the view schema
+val df2 = (1 until 10).map(i => i + i).toDF("id")
--- End diff --

I suggest to twist this test case a bit.
The first part of the test case has the table jt defined as columns ("j", 
"i") and then the second part jt2 defined as, what you currently have, 
columns("i", "j") to test that the code checks the order of the columns 
correctly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread nsyca

Github user nsyca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91108821
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
@@ -207,31 +205,56 @@ case class CreateViewCommand(
   }
 
   /**
-   * Returns a [[CatalogTable]] that can be used to save in the catalog. 
This comment canonicalize
-   * SQL based on the analyzed plan, and also creates the proper schema 
for the view.
+   * Returns a [[CatalogTable]] that can be used to save in the catalog. 
This stores the following
+   * properties for a view:
+   * 1. The `viewText` which is used to generate a logical plan when we 
resolve a view;
+   * 2. The `currentDatabase` which sets the current database on Analyze 
stage;
+   * 3. The `schema` which ensure we generate the correct output.
*/
   private def prepareTable(sparkSession: SparkSession, aliasedPlan: 
LogicalPlan): CatalogTable = {
-val viewSQL: String = new SQLBuilder(aliasedPlan).toSQL
+val currentDatabase = 
sparkSession.sessionState.catalog.getCurrentDatabase
 
-// Validate the view SQL - make sure we can parse it and analyze it.
-// If we cannot analyze the generated query, there is probably a bug 
in SQL generation.
-try {
-  sparkSession.sql(viewSQL).queryExecution.assertAnalyzed()
-} catch {
-  case NonFatal(e) =>
-throw new RuntimeException(s"Failed to analyze the canonicalized 
SQL: $viewSQL", e)
-}
+if (originalText.isDefined) {
+  val viewSQL = originalText.get
+
+  // Validate the view SQL - make sure we can resolve it with 
currentDatabase.
+  val originalSchema = try {
+val unresolvedPlan = 
sparkSession.sessionState.sqlParser.parsePlan(viewSQL)
+val resolvedPlan = 
sparkSession.sessionState.analyzer.execute(unresolvedPlan)
+sparkSession.sessionState.analyzer.checkAnalysis(resolvedPlan)
+
+resolvedPlan.schema
+  } catch {
+case NonFatal(e) =>
+  throw new RuntimeException(s"Failed to analyze the canonicalized 
SQL: $viewSQL", e)
+  }
 
-CatalogTable(
-  identifier = name,
-  tableType = CatalogTableType.VIEW,
-  storage = CatalogStorageFormat.empty,
-  schema = aliasedPlan.schema,
-  properties = properties,
-  viewOriginalText = originalText,
-  viewText = Some(viewSQL),
-  comment = comment
-)
+  CatalogTable(
+identifier = name,
+tableType = CatalogTableType.VIEW,
+storage = CatalogStorageFormat.empty,
+schema = aliasedPlan.schema,
+originalSchema = Some(originalSchema),
+properties = properties,
+viewOriginalText = originalText,
+viewText = Some(viewSQL),
+currentDatabase = Some(currentDatabase),
+comment = comment
+  )
+} else {
--- End diff --

I guess the code in the `else` happens when the object is a table, not a 
view, correct? Please add this to the prologue of the function for clarity.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread nsyca

Github user nsyca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r9345
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala 
---
@@ -448,19 +476,105 @@ class SQLViewSuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
 }
   }
 
+  test("Using view after change the origin view") {
+withView("v1", "v2") {
+  sql("CREATE VIEW v1 AS SELECT id FROM jt")
+  sql("CREATE VIEW v2 AS SELECT * FROM v1")
+  withTable("jt2", "jt3") {
+// Don't change the view schema
+val df2 = (1 until 10).map(i => i + i).toDF("id")
+df2.write.format("json").saveAsTable("jt2")
+sql("ALTER VIEW v1 AS SELECT * FROM jt2")
+// the view v2 should have the same output with the view v1
+checkAnswer(sql("SELECT * FROM v2"), sql("SELECT * FROM v1"))
+// the view v2 should have the same output with the table jt2
+checkAnswer(sql("SELECT * FROM v2"), sql("SELECT * FROM jt2"))
+
+// Change the view schema
+val df3 = (1 until 10).map(i => i -> i).toDF("i", "j")
+df3.write.format("json").saveAsTable("jt3")
+sql("ALTER VIEW v1 AS SELECT * FROM jt3")
+val e = intercept[AnalysisException] {
+  sql("SELECT * FROM v2")
+}
+assert(e.message.contains(
+  "The underlying schema doesn't match the original schema, 
expected " +
+"STRUCT<`id`: INT> but got STRUCT<`i`: INT, `j`: INT>"))
+  }
+}
+  }
+
+  test("Using view after drop the origin view") {
+withView("v1", "v2") {
+  sql("CREATE VIEW v1 AS SELECT id FROM jt")
+  sql("CREATE VIEW v2 AS SELECT * FROM v1")
+  // Drop the referenced view
+  sql("DROP VIEW v1")
+  val e = intercept[RuntimeException] {
+sql("SELECT * FROM v2")
+  }
+  assert(e.getMessage.contains(
+"Failed to analyze the canonicalized SQL"))
+}
+  }
--- End diff --

I suggest adding a variation of this test case, dropping the underlining 
table "jt" and v2 should be invalid. This will test the multiple-level 
expansion of a view. It is also no harm to check that v1 is also invalid in 
this new test case. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread nsyca

Github user nsyca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91106867
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -151,6 +151,7 @@ case class CatalogTable(
 tableType: CatalogTableType,
 storage: CatalogStorageFormat,
 schema: StructType,
+originalSchema: Option[StructType] = None,
--- End diff --

I guess this is to pass 1) the persistent schema of the view at the last 
call to `(Simple)Analyzer`, and 2) the schema of the view from calling 
`Analyzer` on the definition of the view definition captured in SQL string 
input from the user. If this is correct, we need better variable names (and a 
few comments will greatly help.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread nsyca

Github user nsyca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91112094
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala 
---
@@ -448,19 +476,105 @@ class SQLViewSuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
 }
   }
 
+  test("Using view after change the origin view") {
+withView("v1", "v2") {
+  sql("CREATE VIEW v1 AS SELECT id FROM jt")
+  sql("CREATE VIEW v2 AS SELECT * FROM v1")
+  withTable("jt2", "jt3") {
+// Don't change the view schema
+val df2 = (1 until 10).map(i => i + i).toDF("id")
+df2.write.format("json").saveAsTable("jt2")
+sql("ALTER VIEW v1 AS SELECT * FROM jt2")
+// the view v2 should have the same output with the view v1
+checkAnswer(sql("SELECT * FROM v2"), sql("SELECT * FROM v1"))
+// the view v2 should have the same output with the table jt2
+checkAnswer(sql("SELECT * FROM v2"), sql("SELECT * FROM jt2"))
+
+// Change the view schema
+val df3 = (1 until 10).map(i => i -> i).toDF("i", "j")
+df3.write.format("json").saveAsTable("jt3")
+sql("ALTER VIEW v1 AS SELECT * FROM jt3")
+val e = intercept[AnalysisException] {
+  sql("SELECT * FROM v2")
+}
+assert(e.message.contains(
+  "The underlying schema doesn't match the original schema, 
expected " +
+"STRUCT<`id`: INT> but got STRUCT<`i`: INT, `j`: INT>"))
+  }
+}
+  }
+
+  test("Using view after drop the origin view") {
+withView("v1", "v2") {
+  sql("CREATE VIEW v1 AS SELECT id FROM jt")
+  sql("CREATE VIEW v2 AS SELECT * FROM v1")
+  // Drop the referenced view
+  sql("DROP VIEW v1")
+  val e = intercept[RuntimeException] {
+sql("SELECT * FROM v2")
+  }
+  assert(e.getMessage.contains(
+"Failed to analyze the canonicalized SQL"))
+}
+  }
+
+  test("Using view after change the origin table") {
+withTable("tab1") {
+  val df = (1 until 10).map(i => i).toDF("i")
+  df.write.format("json").saveAsTable("tab1")
+  withView("v1", "v2") {
+sql("CREATE VIEW v1 AS SELECT * FROM tab1")
+sql("CREATE VIEW v2 AS SELECT * FROM v1")
+// Don't change the table schema
+val df2 = (1 until 10).map(i => i * i).toDF("i")
+df2.write.format("json").mode("overwrite").saveAsTable("tab1")
+// the view v2 should have the same output with the view v1
+checkAnswer(sql("SELECT * FROM v2"), sql("SELECT * FROM v1"))
+// the view v2 should have the same output with the table testTable
+checkAnswer(sql("SELECT * FROM v2"), sql("SELECT * FROM tab1"))
+
+// Change the table schema
+val df3 = (1 until 10).map(i => i -> i).toDF("a", "b")
+df3.write.format("json").mode("overwrite").saveAsTable("tab1")
+val e = intercept[RuntimeException] {
+  sql("SELECT * FROM v2")
+}
+assert(e.getMessage.contains(
+  "Failed to analyze the canonicalized SQL"))
+  }
+}
+  }
+
+  test("Using view after drop the origin table") {
+withTable("tab1") {
+  val df = (1 until 10).map(i => i).toDF("i")
+  df.write.format("json").saveAsTable("tab1")
+  withView("v1") {
+sql("CREATE VIEW v1 AS SELECT * FROM tab1")
+// Drop the referenced table
+sql("DROP TABLE tab1")
+val e = intercept[RuntimeException] {
+  sql("SELECT * FROM v1")
+}
+assert(e.getMessage.contains(
+  "Failed to analyze the canonicalized SQL"))
+  }
+}
+  }
+
--- End diff --

If you add a new test case to test the multiple-level expansion as I 
suggested before, this test case is just a duplicate. It may be removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Use spark...

2016-12-06 Thread omuravskiy

Github user omuravskiy commented on the issue:

https://github.com/apache/spark/pull/16006
  
I added new commit, the default value of `backpressureInitialRate` if 
`spark.streaming.backpressure.initialRate` is now taken from 
`spark.streaming.backpressure.pid.minRate`, or 100 if that is also not set 
(which is the same as in 
`org.apache.spark.streaming.scheduler.rate.RateEstimator`).
And `backpressureInitialRate`, when applied, is now shared proportionally 
among all partitions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16168: [SPARK-18209][SQL] More robust view canonicalization wit...

2016-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16168
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16168: [SPARK-18209][SQL] More robust view canonicalization wit...

2016-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16168
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69730/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread nsyca

Github user nsyca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91104224
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -221,11 +223,14 @@ case class CatalogTable(
 s"Last Access: ${new Date(lastAccessTime).toString}",
 s"Type: ${tableType.name}",
 if (schema.nonEmpty) s"Schema: ${schema.mkString("[", ", ", "]")}" 
else "",
+if (originalSchema.isDefined) s"Original Schema: 
${originalSchema.mkString("[", ", ", "]")}"
+else "",
 if (provider.isDefined) s"Provider: ${provider.get}" else "",
 if (partitionColumnNames.nonEmpty) s"Partition Columns: 
$partitionColumns" else ""
   ) ++ bucketStrings ++ Seq(
 viewOriginalText.map("Original View: " + _).getOrElse(""),
 viewText.map("View: " + _).getOrElse(""),
+currentDatabase.map("Database Hint: " + _).getOrElse(""),
--- End diff --

Same comment. This is an optional database name. Should we change the 
string to `"Database name: "`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16168: [SPARK-18209][SQL] More robust view canonicalization wit...

2016-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16168
  
**[Test build #69730 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69730/consoleFull)**
 for PR 16168 at commit 
[`434009e`](https://github.com/apache/spark/commit/434009ef7fcfc552314dca35dc9395bb98f5e9c6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread nsyca

Github user nsyca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91102867
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -557,9 +557,12 @@ class SessionCatalog(
* If the relation is a view, the relation will be wrapped in a 
[[SubqueryAlias]] which will
* track the name of the view.
*/
-  def lookupRelation(name: TableIdentifier, alias: Option[String] = None): 
LogicalPlan = {
+  def lookupRelation(
+  name: TableIdentifier,
+  alias: Option[String] = None,
+  databaseHint: Option[String] = None): LogicalPlan = {
--- End diff --

I don't think the variable `databaseHint` is properly named. It is simply 
an optional database name passed into this function. Should it be named 
`databaseName`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15722: [SPARK-18208] [Shuffle] Executor OOM due to a growing Lo...

2016-12-06 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/15722
  
@jiexiong PR descriptions are used in git commit messages, and should be 
clear and concise. The fix LGTM, but the description should be improved for 
future reference. How about we change it into the following (I only redacted 
the comments in the PR):

## What changes were proposed in this pull request?
`BytesToBytesMap` currently does not release the in-memory storage (the 
`longArray` variable) after it spills to disk. This is typically not a problem 
during aggregation because the longArray should be much smaller than the pages, 
and because we grow the `longArray` at a conservative rate.

However this can lead to an OOM when an already running task is allocated 
more than its fair share, this can happen because of a scheduling delay. In 
this case the `longArray` can grow beyond the fair share of memory for the 
task. This becomes problematic when the task spills and the long array is not 
freed, that causes subsequent memory allocation requests to be denied by the 
memory manager resulting in an OOM.

This PR fixes this issuing by freeing the `longArray` when the 
`BytesToBytesMap` spills.

## How was this patch tested?
Existing tests and tested on realworld workloads.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16172: [SPARK-18740] Log spark.app.name in driver logs

2016-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16172
  
**[Test build #69731 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69731/consoleFull)**
 for PR 16172 at commit 
[`5401e64`](https://github.com/apache/spark/commit/5401e64552325353104bc7addf180a5a37f35047).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16172: [SPARK-18740] Log spark.app.name in driver logs

2016-12-06 Thread peterableda

GitHub user peterableda opened a pull request:

https://github.com/apache/spark/pull/16172

[SPARK-18740] Log spark.app.name in driver logs

## What changes were proposed in this pull request?

Added simple logInfo line to print out the `spark.app.name` in the driver 
logs

## How was this patch tested?

Spark was built and tested with SparkPi app. Example log:
```
16/12/06 05:49:50 INFO spark.SparkContext: Running Spark version 2.0.0
16/12/06 05:49:52 INFO spark.SparkContext: Submitted application: Spark Pi
16/12/06 05:49:52 INFO spark.SecurityManager: Changing view acls to: root
16/12/06 05:49:52 INFO spark.SecurityManager: Changing modify acls to: root
```


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/peterableda/spark feature/print_appname

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16172.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16172


commit 5401e64552325353104bc7addf180a5a37f35047
Author: Peter Ableda 
Date:   2016-12-06T14:18:03Z

log out spark.app.name in the Spark driver logs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16169: [SPARK-18326][SPARKR][ML] Review SparkR ML wrappers API ...

2016-12-06 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/16169
  
cc @felixcheung @jkbradley 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-06 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/13909
  
@cloud-fan thank you for your thoughtful review comments. The latest one 
looks better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13909
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69728/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13909
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13909
  
**[Test build #69728 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69728/consoleFull)**
 for PR 13909 at commit 
[`185ddf5`](https://github.com/apache/spark/commit/185ddf58b9d2e30c76589de4f566911e7855ea79).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-12-06 Thread a-roberts

Github user a-roberts commented on the issue:

https://github.com/apache/spark/pull/15736
  
New data for us, inlined comparator scores here (code provided below to 
check I've not profiled something useless!):
```
ScalaSparkPagerank 2016-12-05 13:44:41 25992811548.149  
 5398411  5398411
ScalaSparkPagerank 2016-12-05 13:46:43 25992811546.897  
 5542531  5542531
ScalaSparkPagerank 2016-12-05 13:48:46 25992811549.130  
 5290619  5290619
ScalaSparkPagerank 2016-12-05 13:50:49 25992811549.793  
 5220173  5220173
ScalaSparkPagerank 2016-12-05 13:52:50 25992811548.061  
 5408296  5408296
ScalaSparkPagerank 2016-12-05 13:54:52 25992811546.468  
 5593701  5593701
ScalaSparkPagerank 2016-12-05 13:56:56 25992811551.385  
 5058443  5058443
ScalaSparkPagerank 2016-12-05 13:58:59 25992811547.857  
 5431349  5431349
ScalaSparkPagerank 2016-12-05 14:00:59 25992811546.515  
 5588049  5588049
ScalaSparkPagerank 2016-12-05 14:03:03 25992811547.791  
 5438850  5438850
Avg 48.2046s
```

Remember our "vanilla" average time is 47.752s and our first commit 
averaged 47.229s (so not much of a difference really).

I think we're splitting hairs and I've got another PR I am seeing good 
results on that I plan to focus on instead: the SizeEstimator. 

This is what I've benchmarked, PartitionedAppendOnlyMap first, so let me 
know if there any further suggestions, otherwise I propose leaving this one for 
later as actually against the Spark master codebase I'm not noticing anything 
exciting.

```
  def partitionedDestructiveSortedIterator(keyComparator: 
Option[Comparator[K]])
: Iterator[((Int, K), V)] = {
val comparator = {
  if (keyComparator.isDefined) {
val theKeyComp = keyComparator.get
new Comparator[(Int, K)] {
  // We know we have a non-empty comparator here
  override def compare(a: (Int, K), b: (Int, K)): Int = {
if (a._1 != b._1) {
  a._1 - b._1
} else {
 theKeyComp.compare(a._2, b._2)
}
  }
}
  } else {
new Comparator[(Int, K)] {
  override def compare(a: (Int, K), b: (Int, K)): Int = {
a._1 - b._1
  }
}
  }
}
destructiveSortedIterator(comparator)
  }
```

In PartitionedPairBuffer

```

  /** Iterate through the data in a given order. For this class this is not 
really destructive. */
  override def partitionedDestructiveSortedIterator(keyComparator: 
Option[Comparator[K]])
: Iterator[((Int, K), V)] = {
val comparator = {
  if (keyComparator.isDefined) {
val theKeyComp = keyComparator.get
new Comparator[(Int, K)] {
  // We know we have a non-empty comparator here
  override def compare(a: (Int, K), b: (Int, K)): Int = {
if (a._1 != b._1) {
  a._1 - b._1
} else {
 theKeyComp.compare(a._2, b._2)
}
  }
}
  } else {
new Comparator[(Int, K)] {
  override def compare(a: (Int, K), b: (Int, K)): Int = {
a._1 - b._1
  }
}
  }
}
new Sorter(new KVArraySortDataFormat[(Int, K), AnyRef]).sort(data, 0, 
curSize, comparator)
iterator
  }
```

WritablePartitionedPairCollection remains unchanged.







---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16170: [SPARK-18634][SQL][TRIVIAL] Touch-up Generate

2016-12-06 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16170


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16170: [SPARK-18634][SQL][TRIVIAL] Touch-up Generate

2016-12-06 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16170
  
@hvanhovell Thanks for helping fixing this I missed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16170: [SPARK-18634][SQL][TRIVIAL] Touch-up Generate

2016-12-06 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/16170
  
Thanks for the review. Merging this one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16170: [SPARK-18634][SQL][TRIVIAL] Touch-up Generate

2016-12-06 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16170
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91054484
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -114,9 +117,26 @@ private[hive] class HiveMetastoreCatalog(sparkSession: 
SparkSession) extends Log
   alias.map(a => SubqueryAlias(a, qualifiedTable, 
None)).getOrElse(qualifiedTable)
 } else if (table.tableType == CatalogTableType.VIEW) {
   val viewText = table.viewText.getOrElse(sys.error("Invalid view 
without text."))
+  val unresolvedPlan = 
sparkSession.sessionState.sqlParser.parsePlan(viewText).transform {
+case u: UnresolvedRelation if u.tableIdentifier.database.isEmpty =>
+  u.copy(tableIdentifier = 
TableIdentifier(u.tableIdentifier.table, table.currentDatabase))
+  }
+  // Resolve the plan and check whether the analyzed plan is valid.
+  val resolvedPlan = try {
--- End diff --

Duplicate of what is in `views.scala`, put this in a common object.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91051441
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
@@ -207,31 +205,56 @@ case class CreateViewCommand(
   }
 
   /**
-   * Returns a [[CatalogTable]] that can be used to save in the catalog. 
This comment canonicalize
-   * SQL based on the analyzed plan, and also creates the proper schema 
for the view.
+   * Returns a [[CatalogTable]] that can be used to save in the catalog. 
This stores the following
+   * properties for a view:
+   * 1. The `viewText` which is used to generate a logical plan when we 
resolve a view;
+   * 2. The `currentDatabase` which sets the current database on Analyze 
stage;
+   * 3. The `schema` which ensure we generate the correct output.
*/
   private def prepareTable(sparkSession: SparkSession, aliasedPlan: 
LogicalPlan): CatalogTable = {
-val viewSQL: String = new SQLBuilder(aliasedPlan).toSQL
+val currentDatabase = 
sparkSession.sessionState.catalog.getCurrentDatabase
 
-// Validate the view SQL - make sure we can parse it and analyze it.
-// If we cannot analyze the generated query, there is probably a bug 
in SQL generation.
-try {
-  sparkSession.sql(viewSQL).queryExecution.assertAnalyzed()
-} catch {
-  case NonFatal(e) =>
-throw new RuntimeException(s"Failed to analyze the canonicalized 
SQL: $viewSQL", e)
-}
+if (originalText.isDefined) {
--- End diff --

When does this happen?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91053965
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -509,32 +509,42 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+private def lookupTableFromCatalog(
+u: UnresolvedRelation, db: Option[String] = None): LogicalPlan = {
   try {
-catalog.lookupRelation(u.tableIdentifier, u.alias)
+catalog.lookupRelation(u.tableIdentifier, u.alias, db)
   } catch {
 case _: NoSuchTableException =>
   u.failAnalysis(s"Table or view not found: ${u.tableName}")
   }
 }
 
-def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
-  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
-i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
-  case u: UnresolvedRelation =>
-val table = u.tableIdentifier
-if (table.database.isDefined && conf.runSQLonFile && 
!catalog.isTemporaryTable(table) &&
+def apply(plan: LogicalPlan): LogicalPlan = {
+  var currentDatabase = catalog.getCurrentDatabase
+  plan resolveOperators {
+case i@InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+  i.copy(table = 
EliminateSubqueryAliases(lookupTableFromCatalog(u)))
+case u: UnresolvedRelation =>
+  val table = u.tableIdentifier
+  if (table.database.isDefined && conf.runSQLonFile && 
!catalog.isTemporaryTable(table) &&
 (!catalog.databaseExists(table.database.get) || 
!catalog.tableExists(table))) {
-  // If the database part is specified, and we support running SQL 
directly on files, and
-  // it's not a temporary view, and the table does not exist, then 
let's just return the
-  // original UnresolvedRelation. It is possible we are matching a 
query like "select *
-  // from parquet.`/path/to/query`". The plan will get resolved 
later.
-  // Note that we are testing (!db_exists || !table_exists) 
because the catalog throws
-  // an exception from tableExists if the database does not exist.
-  u
-} else {
-  lookupTableFromCatalog(u)
-}
+// If the database part is specified, and we support running 
SQL directly on files, and
+// it's not a temporary view, and the table does not exist, 
then let's just return the
+// original UnresolvedRelation. It is possible we are matching 
a query like "select *
+// from parquet.`/path/to/query`". The plan will get resolved 
later.
+// Note that we are testing (!db_exists || !table_exists) 
because the catalog throws
+// an exception from tableExists if the database does not 
exist.
+u
+  } else {
+val logicalPlan = lookupTableFromCatalog(u, 
Some(currentDatabase))
+currentDatabase = logicalPlan.collectFirst {
--- End diff --

BTW we could also choose to move most of the logic here into the 
`SessionCatalog`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91053897
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -509,32 +509,42 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+private def lookupTableFromCatalog(
+u: UnresolvedRelation, db: Option[String] = None): LogicalPlan = {
   try {
-catalog.lookupRelation(u.tableIdentifier, u.alias)
+catalog.lookupRelation(u.tableIdentifier, u.alias, db)
   } catch {
 case _: NoSuchTableException =>
   u.failAnalysis(s"Table or view not found: ${u.tableName}")
   }
 }
 
-def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
-  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
-i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
-  case u: UnresolvedRelation =>
-val table = u.tableIdentifier
-if (table.database.isDefined && conf.runSQLonFile && 
!catalog.isTemporaryTable(table) &&
+def apply(plan: LogicalPlan): LogicalPlan = {
+  var currentDatabase = catalog.getCurrentDatabase
+  plan resolveOperators {
+case i@InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+  i.copy(table = 
EliminateSubqueryAliases(lookupTableFromCatalog(u)))
+case u: UnresolvedRelation =>
+  val table = u.tableIdentifier
+  if (table.database.isDefined && conf.runSQLonFile && 
!catalog.isTemporaryTable(table) &&
 (!catalog.databaseExists(table.database.get) || 
!catalog.tableExists(table))) {
-  // If the database part is specified, and we support running SQL 
directly on files, and
-  // it's not a temporary view, and the table does not exist, then 
let's just return the
-  // original UnresolvedRelation. It is possible we are matching a 
query like "select *
-  // from parquet.`/path/to/query`". The plan will get resolved 
later.
-  // Note that we are testing (!db_exists || !table_exists) 
because the catalog throws
-  // an exception from tableExists if the database does not exist.
-  u
-} else {
-  lookupTableFromCatalog(u)
-}
+// If the database part is specified, and we support running 
SQL directly on files, and
+// it's not a temporary view, and the table does not exist, 
then let's just return the
+// original UnresolvedRelation. It is possible we are matching 
a query like "select *
+// from parquet.`/path/to/query`". The plan will get resolved 
later.
+// Note that we are testing (!db_exists || !table_exists) 
because the catalog throws
+// an exception from tableExists if the database does not 
exist.
+u
+  } else {
+val logicalPlan = lookupTableFromCatalog(u, 
Some(currentDatabase))
+currentDatabase = logicalPlan.collectFirst {
--- End diff --

This looks scary; for instance it depends on the iteration order for 
`collectFirst`.

Why is this needed and why can't we use the current database?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91053671
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -509,32 +509,42 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+private def lookupTableFromCatalog(
+u: UnresolvedRelation, db: Option[String] = None): LogicalPlan = {
   try {
-catalog.lookupRelation(u.tableIdentifier, u.alias)
+catalog.lookupRelation(u.tableIdentifier, u.alias, db)
   } catch {
 case _: NoSuchTableException =>
   u.failAnalysis(s"Table or view not found: ${u.tableName}")
   }
 }
 
-def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
-  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
-i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
-  case u: UnresolvedRelation =>
-val table = u.tableIdentifier
-if (table.database.isDefined && conf.runSQLonFile && 
!catalog.isTemporaryTable(table) &&
+def apply(plan: LogicalPlan): LogicalPlan = {
+  var currentDatabase = catalog.getCurrentDatabase
+  plan resolveOperators {
+case i@InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+  i.copy(table = 
EliminateSubqueryAliases(lookupTableFromCatalog(u)))
+case u: UnresolvedRelation =>
+  val table = u.tableIdentifier
+  if (table.database.isDefined && conf.runSQLonFile && 
!catalog.isTemporaryTable(table) &&
--- End diff --

Why not put all this into a helper function and put the if statement into 
`case ... if ...`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91078953
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -151,6 +151,7 @@ case class CatalogTable(
 tableType: CatalogTableType,
 storage: CatalogStorageFormat,
 schema: StructType,
+originalSchema: Option[StructType] = None,
--- End diff --

The only difference between originalSchema and schema are the column names, 
right? If that is the case why not store the `source` column name in the schema 
field metadata?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91053117
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -151,6 +151,7 @@ case class CatalogTable(
 tableType: CatalogTableType,
 storage: CatalogStorageFormat,
 schema: StructType,
+originalSchema: Option[StructType] = None,
--- End diff --

It is nice to add some documentation for this. What is its purpose? The 
same goes for currentDatabase.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16168: [SPARK-18209][SQL] More robust view canonicalization wit...

2016-12-06 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/16168
  
also cc @gatorsmile


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91064615
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -126,6 +146,55 @@ private[hive] class HiveMetastoreCatalog(sparkSession: 
SparkSession) extends Log
 }
   }
 
+  /**
+   * Apply Projection on unresolved logical plan to:
+   * 1. Omit the columns which are not referenced by the view;
+   * 2. Reorder the columns to keep the same order with the view;
+   */
+  private def withProjection(plan: LogicalPlan, schema: StructType): 
LogicalPlan = {
+// All fields in schema should exist in plan.schema, or we should 
throw an AnalysisException
+// to notify the underlying schema has been changed.
+if (schema.fields.forall { field =>
+  plan.schema.fields.exists(other => compareStructField(field, 
other))}) {
+  val output = schema.fields.map { field =>
+plan.output.find { expr =>
+  expr.name == field.name && expr.dataType == 
field.dataType}.getOrElse(
--- End diff --

This is a case-sensitive match; spark's default is case-insensitive 
matching. Please use a resolver for such a check.

A more general thought is that we really should leverage the analyzer's 
capabilities here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91051621
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
@@ -207,31 +205,56 @@ case class CreateViewCommand(
   }
 
   /**
-   * Returns a [[CatalogTable]] that can be used to save in the catalog. 
This comment canonicalize
-   * SQL based on the analyzed plan, and also creates the proper schema 
for the view.
+   * Returns a [[CatalogTable]] that can be used to save in the catalog. 
This stores the following
+   * properties for a view:
+   * 1. The `viewText` which is used to generate a logical plan when we 
resolve a view;
+   * 2. The `currentDatabase` which sets the current database on Analyze 
stage;
+   * 3. The `schema` which ensure we generate the correct output.
*/
   private def prepareTable(sparkSession: SparkSession, aliasedPlan: 
LogicalPlan): CatalogTable = {
-val viewSQL: String = new SQLBuilder(aliasedPlan).toSQL
+val currentDatabase = 
sparkSession.sessionState.catalog.getCurrentDatabase
 
-// Validate the view SQL - make sure we can parse it and analyze it.
-// If we cannot analyze the generated query, there is probably a bug 
in SQL generation.
-try {
-  sparkSession.sql(viewSQL).queryExecution.assertAnalyzed()
-} catch {
-  case NonFatal(e) =>
-throw new RuntimeException(s"Failed to analyze the canonicalized 
SQL: $viewSQL", e)
-}
+if (originalText.isDefined) {
+  val viewSQL = originalText.get
+
+  // Validate the view SQL - make sure we can resolve it with 
currentDatabase.
+  val originalSchema = try {
+val unresolvedPlan = 
sparkSession.sessionState.sqlParser.parsePlan(viewSQL)
+val resolvedPlan = 
sparkSession.sessionState.analyzer.execute(unresolvedPlan)
+sparkSession.sessionState.analyzer.checkAnalysis(resolvedPlan)
+
+resolvedPlan.schema
+  } catch {
+case NonFatal(e) =>
+  throw new RuntimeException(s"Failed to analyze the canonicalized 
SQL: $viewSQL", e)
--- End diff --

Why do you throw a RuntimeException? Just throw an AnalysisException. The 
message is also outdated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91051761
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
@@ -275,20 +297,26 @@ case class AlterViewAsCommand(
   throw new AnalysisException(s"${viewMeta.identifier} is not a view.")
 }
 
-val viewSQL: String = new SQLBuilder(analyzedPlan).toSQL
-// Validate the view SQL - make sure we can parse it and analyze it.
-// If we cannot analyze the generated query, there is probably a bug 
in SQL generation.
-try {
-  session.sql(viewSQL).queryExecution.assertAnalyzed()
+val currentDatabase = session.sessionState.catalog.getCurrentDatabase
+
+// Validate the view SQL - make sure we can resolve it with 
currentDatabase.
+val originalSchema = try {
--- End diff --

This is duplicate - should we put this into a common trait?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91065113
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -126,6 +146,55 @@ private[hive] class HiveMetastoreCatalog(sparkSession: 
SparkSession) extends Log
 }
   }
 
+  /**
+   * Apply Projection on unresolved logical plan to:
+   * 1. Omit the columns which are not referenced by the view;
+   * 2. Reorder the columns to keep the same order with the view;
+   */
+  private def withProjection(plan: LogicalPlan, schema: StructType): 
LogicalPlan = {
+// All fields in schema should exist in plan.schema, or we should 
throw an AnalysisException
+// to notify the underlying schema has been changed.
+if (schema.fields.forall { field =>
+  plan.schema.fields.exists(other => compareStructField(field, 
other))}) {
+  val output = schema.fields.map { field =>
+plan.output.find { expr =>
+  expr.name == field.name && expr.dataType == 
field.dataType}.getOrElse(
+throw new AnalysisException("The underlying schema doesn't 
match the original " +
+  s"schema, expected ${schema.sql} but got ${plan.schema.sql}")
+  )}
+  Project(output, plan)
+} else {
+  throw new AnalysisException("The underlying schema doesn't match the 
original schema, " +
+s"expected ${schema.sql} but got ${plan.schema.sql}")
+}
+  }
+
+  /**
+   * Compare the both [[StructField]] to verify whether they have the same 
name and dataType.
+   */
+  private def compareStructField(field: StructField, other: StructField): 
Boolean = {
+field.name == other.name && field.dataType == other.dataType
--- End diff --

case-sensitive checking, see my previous comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91078407
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -114,9 +117,26 @@ private[hive] class HiveMetastoreCatalog(sparkSession: 
SparkSession) extends Log
   alias.map(a => SubqueryAlias(a, qualifiedTable, 
None)).getOrElse(qualifiedTable)
 } else if (table.tableType == CatalogTableType.VIEW) {
   val viewText = table.viewText.getOrElse(sys.error("Invalid view 
without text."))
+  val unresolvedPlan = 
sparkSession.sessionState.sqlParser.parsePlan(viewText).transform {
--- End diff --

This will probably break with common table expressions. We could just set 
the current database, have the analyzer solve it, and set the current database 
back to its old value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91066027
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -126,6 +146,55 @@ private[hive] class HiveMetastoreCatalog(sparkSession: 
SparkSession) extends Log
 }
   }
 
+  /**
+   * Apply Projection on unresolved logical plan to:
+   * 1. Omit the columns which are not referenced by the view;
+   * 2. Reorder the columns to keep the same order with the view;
+   */
+  private def withProjection(plan: LogicalPlan, schema: StructType): 
LogicalPlan = {
+// All fields in schema should exist in plan.schema, or we should 
throw an AnalysisException
+// to notify the underlying schema has been changed.
+if (schema.fields.forall { field =>
+  plan.schema.fields.exists(other => compareStructField(field, 
other))}) {
+  val output = schema.fields.map { field =>
+plan.output.find { expr =>
+  expr.name == field.name && expr.dataType == 
field.dataType}.getOrElse(
+throw new AnalysisException("The underlying schema doesn't 
match the original " +
+  s"schema, expected ${schema.sql} but got ${plan.schema.sql}")
+  )}
+  Project(output, plan)
+} else {
+  throw new AnalysisException("The underlying schema doesn't match the 
original schema, " +
+s"expected ${schema.sql} but got ${plan.schema.sql}")
+}
+  }
+
+  /**
+   * Compare the both [[StructField]] to verify whether they have the same 
name and dataType.
+   */
+  private def compareStructField(field: StructField, other: StructField): 
Boolean = {
+field.name == other.name && field.dataType == other.dataType
+  }
+
+  /**
+   * Aliases the schema of the LogicalPlan to the view attribute names
+   */
+  private def aliasColumns(plan: LogicalPlan, fields: Seq[StructField]): 
LogicalPlan = {
+val output = fields.map(field => (field.name, field.getComment))
+if (plan.output.size != output.size) {
--- End diff --

Is this even possible? This should have been caught in `withProjection.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91052968
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
@@ -207,31 +205,56 @@ case class CreateViewCommand(
   }
 
   /**
-   * Returns a [[CatalogTable]] that can be used to save in the catalog. 
This comment canonicalize
-   * SQL based on the analyzed plan, and also creates the proper schema 
for the view.
+   * Returns a [[CatalogTable]] that can be used to save in the catalog. 
This stores the following
+   * properties for a view:
+   * 1. The `viewText` which is used to generate a logical plan when we 
resolve a view;
+   * 2. The `currentDatabase` which sets the current database on Analyze 
stage;
+   * 3. The `schema` which ensure we generate the correct output.
*/
   private def prepareTable(sparkSession: SparkSession, aliasedPlan: 
LogicalPlan): CatalogTable = {
-val viewSQL: String = new SQLBuilder(aliasedPlan).toSQL
+val currentDatabase = 
sparkSession.sessionState.catalog.getCurrentDatabase
 
-// Validate the view SQL - make sure we can parse it and analyze it.
-// If we cannot analyze the generated query, there is probably a bug 
in SQL generation.
-try {
-  sparkSession.sql(viewSQL).queryExecution.assertAnalyzed()
-} catch {
-  case NonFatal(e) =>
-throw new RuntimeException(s"Failed to analyze the canonicalized 
SQL: $viewSQL", e)
-}
+if (originalText.isDefined) {
+  val viewSQL = originalText.get
+
+  // Validate the view SQL - make sure we can resolve it with 
currentDatabase.
+  val originalSchema = try {
+val unresolvedPlan = 
sparkSession.sessionState.sqlParser.parsePlan(viewSQL)
+val resolvedPlan = 
sparkSession.sessionState.analyzer.execute(unresolvedPlan)
+sparkSession.sessionState.analyzer.checkAnalysis(resolvedPlan)
+
+resolvedPlan.schema
+  } catch {
+case NonFatal(e) =>
+  throw new RuntimeException(s"Failed to analyze the canonicalized 
SQL: $viewSQL", e)
--- End diff --

I think we are expecting an AnalysisException here, why not catch that 
instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91051675
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
@@ -275,20 +297,26 @@ case class AlterViewAsCommand(
   throw new AnalysisException(s"${viewMeta.identifier} is not a view.")
 }
 
-val viewSQL: String = new SQLBuilder(analyzedPlan).toSQL
-// Validate the view SQL - make sure we can parse it and analyze it.
-// If we cannot analyze the generated query, there is probably a bug 
in SQL generation.
-try {
-  session.sql(viewSQL).queryExecution.assertAnalyzed()
+val currentDatabase = session.sessionState.catalog.getCurrentDatabase
+
+// Validate the view SQL - make sure we can resolve it with 
currentDatabase.
+val originalSchema = try {
+  val unresolvedPlan = 
session.sessionState.sqlParser.parsePlan(originalText)
+  val resolvedPlan = 
session.sessionState.analyzer.execute(unresolvedPlan)
+  session.sessionState.analyzer.checkAnalysis(resolvedPlan)
+
+  resolvedPlan.schema
 } catch {
   case NonFatal(e) =>
-throw new RuntimeException(s"Failed to analyze the canonicalized 
SQL: $viewSQL", e)
+throw new RuntimeException(s"Failed to analyze the canonicalized 
SQL: $originalText", e)
--- End diff --

Same a before


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91064795
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -114,9 +117,26 @@ private[hive] class HiveMetastoreCatalog(sparkSession: 
SparkSession) extends Log
   alias.map(a => SubqueryAlias(a, qualifiedTable, 
None)).getOrElse(qualifiedTable)
 } else if (table.tableType == CatalogTableType.VIEW) {
   val viewText = table.viewText.getOrElse(sys.error("Invalid view 
without text."))
+  val unresolvedPlan = 
sparkSession.sessionState.sqlParser.parsePlan(viewText).transform {
+case u: UnresolvedRelation if u.tableIdentifier.database.isEmpty =>
+  u.copy(tableIdentifier = 
TableIdentifier(u.tableIdentifier.table, table.currentDatabase))
+  }
+  // Resolve the plan and check whether the analyzed plan is valid.
+  val resolvedPlan = try {
+val resolvedPlan = 
sparkSession.sessionState.analyzer.execute(unresolvedPlan)
+sparkSession.sessionState.analyzer.checkAnalysis(resolvedPlan)
+
+resolvedPlan
+  } catch {
+case NonFatal(e) =>
+  throw new RuntimeException(s"Failed to analyze the canonicalized 
SQL: $viewText", e)
+  }
+  val planWithProjection = 
table.originalSchema.map(withProjection(resolvedPlan, _))
--- End diff --

So this is correct, but it is really hard to grasp. Could you use an if ... 
else .. here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91064962
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -114,9 +117,26 @@ private[hive] class HiveMetastoreCatalog(sparkSession: 
SparkSession) extends Log
   alias.map(a => SubqueryAlias(a, qualifiedTable, 
None)).getOrElse(qualifiedTable)
 } else if (table.tableType == CatalogTableType.VIEW) {
   val viewText = table.viewText.getOrElse(sys.error("Invalid view 
without text."))
+  val unresolvedPlan = 
sparkSession.sessionState.sqlParser.parsePlan(viewText).transform {
+case u: UnresolvedRelation if u.tableIdentifier.database.isEmpty =>
+  u.copy(tableIdentifier = 
TableIdentifier(u.tableIdentifier.table, table.currentDatabase))
+  }
+  // Resolve the plan and check whether the analyzed plan is valid.
+  val resolvedPlan = try {
+val resolvedPlan = 
sparkSession.sessionState.analyzer.execute(unresolvedPlan)
+sparkSession.sessionState.analyzer.checkAnalysis(resolvedPlan)
+
+resolvedPlan
+  } catch {
+case NonFatal(e) =>
+  throw new RuntimeException(s"Failed to analyze the canonicalized 
SQL: $viewText", e)
+  }
+  val planWithProjection = 
table.originalSchema.map(withProjection(resolvedPlan, _))
--- End diff --

This is only for here for legacy reasons right? So a view using the new 
code path always has an originalSchema defined?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91065717
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -114,9 +117,26 @@ private[hive] class HiveMetastoreCatalog(sparkSession: 
SparkSession) extends Log
   alias.map(a => SubqueryAlias(a, qualifiedTable, 
None)).getOrElse(qualifiedTable)
 } else if (table.tableType == CatalogTableType.VIEW) {
   val viewText = table.viewText.getOrElse(sys.error("Invalid view 
without text."))
+  val unresolvedPlan = 
sparkSession.sessionState.sqlParser.parsePlan(viewText).transform {
+case u: UnresolvedRelation if u.tableIdentifier.database.isEmpty =>
+  u.copy(tableIdentifier = 
TableIdentifier(u.tableIdentifier.table, table.currentDatabase))
+  }
+  // Resolve the plan and check whether the analyzed plan is valid.
+  val resolvedPlan = try {
+val resolvedPlan = 
sparkSession.sessionState.analyzer.execute(unresolvedPlan)
+sparkSession.sessionState.analyzer.checkAnalysis(resolvedPlan)
+
+resolvedPlan
+  } catch {
+case NonFatal(e) =>
+  throw new RuntimeException(s"Failed to analyze the canonicalized 
SQL: $viewText", e)
+  }
+  val planWithProjection = 
table.originalSchema.map(withProjection(resolvedPlan, _))
+.getOrElse(resolvedPlan)
+
   SubqueryAlias(
 alias.getOrElse(table.identifier.table),
-sparkSession.sessionState.sqlParser.parsePlan(viewText),
+aliasColumns(planWithProjection, table.schema.fields),
--- End diff --

Could we try to do one projection instead of two? And could we also avoid a 
projection if we don't need it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91064259
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -126,6 +146,55 @@ private[hive] class HiveMetastoreCatalog(sparkSession: 
SparkSession) extends Log
 }
   }
 
+  /**
+   * Apply Projection on unresolved logical plan to:
+   * 1. Omit the columns which are not referenced by the view;
+   * 2. Reorder the columns to keep the same order with the view;
+   */
+  private def withProjection(plan: LogicalPlan, schema: StructType): 
LogicalPlan = {
+// All fields in schema should exist in plan.schema, or we should 
throw an AnalysisException
+// to notify the underlying schema has been changed.
+if (schema.fields.forall { field =>
--- End diff --

It took me while to figure out what was going on here. Please restructure 
this. You could just do the check in a different code block since you are 
throwing an exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16170: [SPARK-18634][SQL][TRIVIAL] Touch-up Generate

2016-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16170
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69726/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16170: [SPARK-18634][SQL][TRIVIAL] Touch-up Generate

2016-12-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16170
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16170: [SPARK-18634][SQL][TRIVIAL] Touch-up Generate

2016-12-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16170
  
**[Test build #69726 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69726/consoleFull)**
 for PR 16170 at commit 
[`8bf57e0`](https://github.com/apache/spark/commit/8bf57e04cf50790f8788791353c35a3a847712eb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 >

401 - 500 of 596 matches

Mail list logo