[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-25 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21320
  
do we have comments other than code style issues? Generally we should not 
block a PR just for code style issues, as long as the PR passes the style check.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21830: [SPARK-24878][SQL] Fix reverse function for array...

2018-07-25 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21830#discussion_r205338930
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1244,46 +1244,50 @@ case class Reverse(child: Expression) extends 
UnaryExpression with ImplicitCastI
   }
 
   private def arrayCodeGen(ctx: CodegenContext, ev: ExprCode, childName: 
String): String = {
-val length = ctx.freshName("length")
-val javaElementType = CodeGenerator.javaType(elementType)
+
 val isPrimitiveType = CodeGenerator.isPrimitiveType(elementType)
 
+val numElements = ctx.freshName("numElements")
+val arrayData = ctx.freshName("arrayData")
+
 val initialization = if (isPrimitiveType) {
-  s"$childName.copy()"
+  ctx.createUnsafeArray(arrayData, numElements, elementType, s" 
$prettyName failed.")
 } else {
-  s"new ${classOf[GenericArrayData].getName()}(new Object[$length])"
-}
-
-val numberOfIterations = if (isPrimitiveType) s"$length / 2" else 
length
-
-val swapAssigments = if (isPrimitiveType) {
-  val setFunc = "set" + CodeGenerator.primitiveTypeName(elementType)
-  val getCall = (index: String) => CodeGenerator.getValue(ev.value, 
elementType, index)
-  s"""|boolean isNullAtK = ${ev.value}.isNullAt(k);
-  |boolean isNullAtL = ${ev.value}.isNullAt(l);
-  |if(!isNullAtK) {
-  |  $javaElementType el = ${getCall("k")};
-  |  if(!isNullAtL) {
-  |${ev.value}.$setFunc(k, ${getCall("l")});
-  |  } else {
-  |${ev.value}.setNullAt(k);
-  |  }
-  |  ${ev.value}.$setFunc(l, el);
-  |} else if (!isNullAtL) {
-  |  ${ev.value}.$setFunc(k, ${getCall("l")});
-  |  ${ev.value}.setNullAt(l);
-  |}""".stripMargin
+  val arrayDataClass = classOf[GenericArrayData].getName
+  s"$arrayDataClass $arrayData = new $arrayDataClass(new 
Object[$numElements]);"
+}
+
+val i = ctx.freshName("i")
+val j = ctx.freshName("j")
+
+val getValue = CodeGenerator.getValue(childName, elementType, i)
+
+val setFunc = if (isPrimitiveType) {
+  s"set${CodeGenerator.primitiveTypeName(elementType)}"
+} else {
+  "update"
+}
+
+val assignment = if (isPrimitiveType && 
dataType.asInstanceOf[ArrayType].containsNull) {
--- End diff --

We can't override `dataType` only for `ArrayType` because `Reverse` is also 
used for `StringType`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21830: [SPARK-24878][SQL] Fix reverse function for array...

2018-07-25 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21830#discussion_r205338428
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1244,46 +1244,50 @@ case class Reverse(child: Expression) extends 
UnaryExpression with ImplicitCastI
   }
 
   private def arrayCodeGen(ctx: CodegenContext, ev: ExprCode, childName: 
String): String = {
-val length = ctx.freshName("length")
-val javaElementType = CodeGenerator.javaType(elementType)
+
 val isPrimitiveType = CodeGenerator.isPrimitiveType(elementType)
 
+val numElements = ctx.freshName("numElements")
+val arrayData = ctx.freshName("arrayData")
+
 val initialization = if (isPrimitiveType) {
-  s"$childName.copy()"
+  ctx.createUnsafeArray(arrayData, numElements, elementType, s" 
$prettyName failed.")
 } else {
-  s"new ${classOf[GenericArrayData].getName()}(new Object[$length])"
-}
-
-val numberOfIterations = if (isPrimitiveType) s"$length / 2" else 
length
-
-val swapAssigments = if (isPrimitiveType) {
-  val setFunc = "set" + CodeGenerator.primitiveTypeName(elementType)
-  val getCall = (index: String) => CodeGenerator.getValue(ev.value, 
elementType, index)
-  s"""|boolean isNullAtK = ${ev.value}.isNullAt(k);
-  |boolean isNullAtL = ${ev.value}.isNullAt(l);
-  |if(!isNullAtK) {
-  |  $javaElementType el = ${getCall("k")};
-  |  if(!isNullAtL) {
-  |${ev.value}.$setFunc(k, ${getCall("l")});
-  |  } else {
-  |${ev.value}.setNullAt(k);
-  |  }
-  |  ${ev.value}.$setFunc(l, el);
-  |} else if (!isNullAtL) {
-  |  ${ev.value}.$setFunc(k, ${getCall("l")});
-  |  ${ev.value}.setNullAt(l);
-  |}""".stripMargin
+  val arrayDataClass = classOf[GenericArrayData].getName
+  s"$arrayDataClass $arrayData = new $arrayDataClass(new 
Object[$numElements]);"
+}
+
+val i = ctx.freshName("i")
+val j = ctx.freshName("j")
+
+val getValue = CodeGenerator.getValue(childName, elementType, i)
+
+val setFunc = if (isPrimitiveType) {
+  s"set${CodeGenerator.primitiveTypeName(elementType)}"
+} else {
+  "update"
+}
+
+val assignment = if (isPrimitiveType && 
dataType.asInstanceOf[ArrayType].containsNull) {
+  s"""
+ |if ($childName.isNullAt($i)) {
+ |  $arrayData.setNullAt($j);
+ |} else {
+ |  $arrayData.$setFunc($j, $getValue);
+ |}
+   """.stripMargin
 } else {
-  s"${ev.value}.update(k, ${CodeGenerator.getValue(childName, 
elementType, "l")});"
+  s"$arrayData.$setFunc($j, $getValue);"
 }
 
 s"""
-   |final int $length = $childName.numElements();
-   |${ev.value} = $initialization;
-   |for(int k = 0; k < $numberOfIterations; k++) {
-   |  int l = $length - k - 1;
-   |  $swapAssigments
+   |final int $numElements = $childName.numElements();
+   |$initialization
+   |for (int $i = 0; $i < $numElements; $i++) {
+   |  int $j = $numElements - $i - 1;
--- End diff --

We still need to calculate the index of the opposite side?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uui...

2018-07-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20861#discussion_r205337069
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1994,6 +1996,20 @@ class Analyzer(
 }
   }
 
+  /**
+   * Set the seed for random number generation in Uuid expressions.
+   */
+  object ResolvedUuidExpressions extends Rule[LogicalPlan] {
+private lazy val random = new Random()
+
+override def apply(plan: LogicalPlan): LogicalPlan = plan.transformUp {
+  case p if p.resolved => p
+  case p => p transformExpressionsUp {
+case Uuid(None) => Uuid(Some(random.nextLong()))
--- End diff --

what's the current behavior for rand in streaming?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

2018-07-25 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21403
  
@maryannxue as I said my initial proposal was like that. I think that this 
has the advantage of avoiding some code duplication as the same logic which is 
added in ResolveInValues has to be spread over all the places where a In is 
build and avoiding to change the In signature, so that if a user is using In 
directly in his/her code we don't break it. On the other side, I agree with you 
that the approach having a `Seq[Expression]` is cleaner IMO (that's why it was 
my original proposal). @cloud-fan @gatorsmile what do you think about this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21830: [SPARK-24878][SQL] Fix reverse function for array type o...

2018-07-25 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21830
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21830: [SPARK-24878][SQL] Fix reverse function for array...

2018-07-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21830#discussion_r205336425
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1244,46 +1244,50 @@ case class Reverse(child: Expression) extends 
UnaryExpression with ImplicitCastI
   }
 
   private def arrayCodeGen(ctx: CodegenContext, ev: ExprCode, childName: 
String): String = {
-val length = ctx.freshName("length")
-val javaElementType = CodeGenerator.javaType(elementType)
+
 val isPrimitiveType = CodeGenerator.isPrimitiveType(elementType)
 
+val numElements = ctx.freshName("numElements")
+val arrayData = ctx.freshName("arrayData")
+
 val initialization = if (isPrimitiveType) {
-  s"$childName.copy()"
+  ctx.createUnsafeArray(arrayData, numElements, elementType, s" 
$prettyName failed.")
 } else {
-  s"new ${classOf[GenericArrayData].getName()}(new Object[$length])"
-}
-
-val numberOfIterations = if (isPrimitiveType) s"$length / 2" else 
length
-
-val swapAssigments = if (isPrimitiveType) {
-  val setFunc = "set" + CodeGenerator.primitiveTypeName(elementType)
-  val getCall = (index: String) => CodeGenerator.getValue(ev.value, 
elementType, index)
-  s"""|boolean isNullAtK = ${ev.value}.isNullAt(k);
-  |boolean isNullAtL = ${ev.value}.isNullAt(l);
-  |if(!isNullAtK) {
-  |  $javaElementType el = ${getCall("k")};
-  |  if(!isNullAtL) {
-  |${ev.value}.$setFunc(k, ${getCall("l")});
-  |  } else {
-  |${ev.value}.setNullAt(k);
-  |  }
-  |  ${ev.value}.$setFunc(l, el);
-  |} else if (!isNullAtL) {
-  |  ${ev.value}.$setFunc(k, ${getCall("l")});
-  |  ${ev.value}.setNullAt(l);
-  |}""".stripMargin
+  val arrayDataClass = classOf[GenericArrayData].getName
+  s"$arrayDataClass $arrayData = new $arrayDataClass(new 
Object[$numElements]);"
+}
+
+val i = ctx.freshName("i")
+val j = ctx.freshName("j")
+
+val getValue = CodeGenerator.getValue(childName, elementType, i)
+
+val setFunc = if (isPrimitiveType) {
+  s"set${CodeGenerator.primitiveTypeName(elementType)}"
+} else {
+  "update"
+}
+
+val assignment = if (isPrimitiveType && 
dataType.asInstanceOf[ArrayType].containsNull) {
--- End diff --

nit: we can simplify the code if we do `override def dataType: ArrayType = 
child.dataType.asInstanceOf[ArrayType]`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21830: [SPARK-24878][SQL] Fix reverse function for array...

2018-07-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21830#discussion_r205336334
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1244,46 +1244,50 @@ case class Reverse(child: Expression) extends 
UnaryExpression with ImplicitCastI
   }
 
   private def arrayCodeGen(ctx: CodegenContext, ev: ExprCode, childName: 
String): String = {
-val length = ctx.freshName("length")
-val javaElementType = CodeGenerator.javaType(elementType)
+
 val isPrimitiveType = CodeGenerator.isPrimitiveType(elementType)
 
+val numElements = ctx.freshName("numElements")
+val arrayData = ctx.freshName("arrayData")
+
 val initialization = if (isPrimitiveType) {
-  s"$childName.copy()"
+  ctx.createUnsafeArray(arrayData, numElements, elementType, s" 
$prettyName failed.")
 } else {
-  s"new ${classOf[GenericArrayData].getName()}(new Object[$length])"
-}
-
-val numberOfIterations = if (isPrimitiveType) s"$length / 2" else 
length
-
-val swapAssigments = if (isPrimitiveType) {
-  val setFunc = "set" + CodeGenerator.primitiveTypeName(elementType)
-  val getCall = (index: String) => CodeGenerator.getValue(ev.value, 
elementType, index)
-  s"""|boolean isNullAtK = ${ev.value}.isNullAt(k);
-  |boolean isNullAtL = ${ev.value}.isNullAt(l);
-  |if(!isNullAtK) {
-  |  $javaElementType el = ${getCall("k")};
-  |  if(!isNullAtL) {
-  |${ev.value}.$setFunc(k, ${getCall("l")});
-  |  } else {
-  |${ev.value}.setNullAt(k);
-  |  }
-  |  ${ev.value}.$setFunc(l, el);
-  |} else if (!isNullAtL) {
-  |  ${ev.value}.$setFunc(k, ${getCall("l")});
-  |  ${ev.value}.setNullAt(l);
-  |}""".stripMargin
+  val arrayDataClass = classOf[GenericArrayData].getName
+  s"$arrayDataClass $arrayData = new $arrayDataClass(new 
Object[$numElements]);"
+}
+
+val i = ctx.freshName("i")
+val j = ctx.freshName("j")
+
+val getValue = CodeGenerator.getValue(childName, elementType, i)
+
+val setFunc = if (isPrimitiveType) {
+  s"set${CodeGenerator.primitiveTypeName(elementType)}"
+} else {
+  "update"
+}
+
+val assignment = if (isPrimitiveType && 
dataType.asInstanceOf[ArrayType].containsNull) {
+  s"""
+ |if ($childName.isNullAt($i)) {
+ |  $arrayData.setNullAt($j);
+ |} else {
+ |  $arrayData.$setFunc($j, $getValue);
+ |}
+   """.stripMargin
 } else {
-  s"${ev.value}.update(k, ${CodeGenerator.getValue(childName, 
elementType, "l")});"
+  s"$arrayData.$setFunc($j, $getValue);"
 }
 
 s"""
-   |final int $length = $childName.numElements();
-   |${ev.value} = $initialization;
-   |for(int k = 0; k < $numberOfIterations; k++) {
-   |  int l = $length - k - 1;
-   |  $swapAssigments
+   |final int $numElements = $childName.numElements();
+   |$initialization
+   |for (int $i = 0; $i < $numElements; $i++) {
+   |  int $j = $numElements - $i - 1;
--- End diff --

we don't need `j` if we do
```
for (int i = numElements - 1; i >=0; i--)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21306
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-07-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21306
  
**[Test build #93581 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93581/testReport)**
 for PR 21306 at commit 
[`f95800c`](https://github.com/apache/spark/commit/f95800c737f160255122da6bbe336309a4e1532e).
 * This patch **fails Java style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21306
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93581/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread ajacques
Github user ajacques commented on a diff in the pull request:

https://github.com/apache/spark/pull/21320#discussion_r205329769
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/SelectedField.scala
 ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.planning
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types._
+
+/**
+ * A Scala extractor that builds a 
[[org.apache.spark.sql.types.StructField]] from a Catalyst
+ * complex type extractor. For example, consider a relation with the 
following schema:
+ *
+ *   {{{
+ *   root
+ *|-- name: struct (nullable = true)
+ *||-- first: string (nullable = true)
+ *||-- last: string (nullable = true)
+ *}}}
+ *
+ * Further, suppose we take the select expression `name.first`. This will 
parse into an
+ * `Alias(child, "first")`. Ignoring the alias, `child` matches the 
following pattern:
+ *
+ *   {{{
+ *   GetStructFieldObject(
+ * AttributeReference("name", StructType(_), _, _),
+ * StructField("first", StringType, _, _))
+ *   }}}
+ *
+ * [[SelectedField]] converts that expression into
+ *
+ *   {{{
+ *   StructField("name", StructType(Array(StructField("first", 
StringType
+ *   }}}
+ *
+ * by mapping each complex type extractor to a 
[[org.apache.spark.sql.types.StructField]] with the
+ * same name as its child (or "parent" going right to left in the select 
expression) and a data
+ * type appropriate to the complex type extractor. In our example, the 
name of the child expression
+ * is "name" and its data type is a 
[[org.apache.spark.sql.types.StructType]] with a single string
+ * field named "first".
+ *
+ * @param expr the top-level complex type extractor
+ */
+object SelectedField {
+  def unapply(expr: Expression): Option[StructField] = {
--- End diff --

```
Error:(61, 12) constructor cannot be instantiated to expected type;
 found   : org.apache.spark.sql.catalyst.expressions.Alias
 required: org.apache.spark.sql.catalyst.expressions.ExtractValue
  case Alias(child, _) => child
```

Alias takes: `Alias(child: Expression, name: String)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread ajacques
Github user ajacques commented on a diff in the pull request:

https://github.com/apache/spark/pull/21320#discussion_r205329633
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/ProjectionOverSchema.scala
 ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.planning
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types._
+
+/**
+ * A Scala extractor that projects an expression over a given schema. Data 
types,
+ * field indexes and field counts of complex type extractors and attributes
+ * are adjusted to fit the schema. All other expressions are left as-is. 
This
+ * class is motivated by columnar nested schema pruning.
+ */
+case class ProjectionOverSchema(schema: StructType) {
--- End diff --

We can move this to `sql.execution` if we move all three classes: 
`ProjectionOverSchema`, `GetStructFieldObject`, and `SelectedField`. Is there a 
difference in the catalyst.planning vs the execution packages?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-25 Thread ajacques
Github user ajacques commented on the issue:

https://github.com/apache/spark/pull/21320
  
@HyukjinKwon, I'm not totally familiar with Spark internals yet, so to be 
honest I don't feel confident making big changes and hopefully can keep them 
simple at first.

I've gone through the code review comments and made as many changes as 
possible 
[here](https://github.com/apache/spark/compare/master...ajacques:spark-4502-parquet_column_pruning-foundation).
 If this PR is mostly feature complete and it's just small things, then I can 
push forward.

If the feedback comments push past simple refactoring level right now I 
would prefer to let someone else take over, but feel free to use what I've done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21103
  
**[Test build #93582 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93582/testReport)**
 for PR 21103 at commit 
[`cf76c1f`](https://github.com/apache/spark/commit/cf76c1f4a2a41ec88fcd744470a113321e897a71).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21103
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1340/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21103
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...

2018-07-25 Thread dilipbiswal
Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/21875
  
@maryannxue It looks good to me. As a minor comment, could we state the 
default value for this parameter as well ? For some of the other parameters, we 
specify the default value.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-07-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21306
  
**[Test build #93581 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93581/testReport)**
 for PR 21306 at commit 
[`f95800c`](https://github.com/apache/spark/commit/f95800c737f160255122da6bbe336309a4e1532e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21306
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21306
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1339/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21852
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21852
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93578/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...

2018-07-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21852
  
**[Test build #93578 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93578/testReport)**
 for PR 21852 at commit 
[`4acda6f`](https://github.com/apache/spark/commit/4acda6fbf4fb5b1be30a0ad213cd5369b64b02b5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.

2018-07-25 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21857#discussion_r205331401
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -52,7 +52,7 @@ trait CheckAnalysis extends PredicateHelper {
   }
 
   protected def mapColumnInSetOperation(plan: LogicalPlan): 
Option[Attribute] = plan match {
-case _: Intersect | _: Except | _: Distinct =>
+case _: Intersect | _: ExceptBase | _: Distinct =>
--- End diff --

@gatorsmile @maropu OK


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21878
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93577/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21878
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...

2018-07-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21878
  
**[Test build #93577 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93577/testReport)**
 for PR 21878 at commit 
[`d2759cc`](https://github.com/apache/spark/commit/d2759cce48eb9a85145e90d8a126fb83351d0fda).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21852
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21852
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93576/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...

2018-07-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21852
  
**[Test build #93576 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93576/testReport)**
 for PR 21852 at commit 
[`0b67e2e`](https://github.com/apache/spark/commit/0b67e2efcb6f827248ee11fffe9eca44a86fceaa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21789: [SPARK-24829][STS]In Spark Thrift Server, CAST AS FLOAT ...

2018-07-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21789
  
Let me leave this open for few days in case some reviewers have more 
comments on this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

2018-07-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21102
  
I agree with @ueshin's. I wouldn't make a guarantee of returning order here 
in documentation yet though.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21867
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93574/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21867
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.

2018-07-25 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21857#discussion_r205325587
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -52,7 +52,7 @@ trait CheckAnalysis extends PredicateHelper {
   }
 
   protected def mapColumnInSetOperation(plan: LogicalPlan): 
Option[Attribute] = plan match {
-case _: Intersect | _: Except | _: Distinct =>
+case _: Intersect | _: ExceptBase | _: Distinct =>
--- End diff --

I am fine about that. Please make a change and avoid introducing a new 
LogicalPlan node. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.

2018-07-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21867
  
**[Test build #93574 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93574/testReport)**
 for PR 21867 at commit 
[`a5b00b8`](https://github.com/apache/spark/commit/a5b00b8a05538a6adb3a4525c2fecc1e15575f7c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21789: [SPARK-24829][STS]In Spark Thrift Server, CAST AS FLOAT ...

2018-07-25 Thread zuotingbing
Github user zuotingbing commented on the issue:

https://github.com/apache/spark/pull/21789
  
@HyukjinKwon could you help to merge this to master branch? Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode

2018-07-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21758
  
**[Test build #93580 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93580/testReport)**
 for PR 21758 at commit 
[`c7600c2`](https://github.com/apache/spark/commit/c7600c24221d29fde31dca921d9d5863af2666e9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode

2018-07-25 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/21758
  
> What's the failure mode if there are not enough slots for the barrier 
mode? We should throw an exception right?

Yes, as mentioned in 
https://github.com/apache/spark/pull/21758/files/c16a47f0d15998133b9d61d8df5310f1f66b11b0#diff-d4000438827afe3a185ae75b24987a61R372
 , we shall fail the job on submit if there is no enough slots for the barrier 
stage. I'll submit another PR to add this check (tracked by SPARK-24819).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21758
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21758
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1338/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...

2018-07-25 Thread maryannxue
Github user maryannxue commented on the issue:

https://github.com/apache/spark/pull/21875
  
Programming guide updated. Thank you, @dilipbiswal and @HyukjinKwon!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...

2018-07-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21875
  
**[Test build #93579 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93579/testReport)**
 for PR 21875 at commit 
[`027b6c4`](https://github.com/apache/spark/commit/027b6c43f8c448d3231d19b21c64ab8306881fde).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21875
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21875
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1337/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21857
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93573/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21878
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93575/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21857
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21878
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.

2018-07-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21857
  
**[Test build #93573 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93573/testReport)**
 for PR 21857 at commit 
[`b201b88`](https://github.com/apache/spark/commit/b201b8890b8f5f580f80b652d9da09186d32c824).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...

2018-07-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21878
  
**[Test build #93575 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93575/testReport)**
 for PR 21878 at commit 
[`d95ba40`](https://github.com/apache/spark/commit/d95ba4081ac1188515b7e6363640700d56f2c93f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-25 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21758#discussion_r205318258
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -359,20 +366,55 @@ private[spark] class TaskSchedulerImpl(
 // of locality levels so that it gets a chance to launch local tasks 
on all of them.
 // NOTE: the preferredLocality order: PROCESS_LOCAL, NODE_LOCAL, 
NO_PREF, RACK_LOCAL, ANY
 for (taskSet <- sortedTaskSets) {
-  var launchedAnyTask = false
-  var launchedTaskAtCurrentMaxLocality = false
-  for (currentMaxLocality <- taskSet.myLocalityLevels) {
-do {
-  launchedTaskAtCurrentMaxLocality = resourceOfferSingleTaskSet(
-taskSet, currentMaxLocality, shuffledOffers, availableCpus, 
tasks)
-  launchedAnyTask |= launchedTaskAtCurrentMaxLocality
-} while (launchedTaskAtCurrentMaxLocality)
-  }
-  if (!launchedAnyTask) {
-taskSet.abortIfCompletelyBlacklisted(hostToExecutors)
+  // Skip the barrier taskSet if the available slots are less than the 
number of pending tasks.
+  if (taskSet.isBarrier && availableSlots < taskSet.numTasks) {
--- End diff --

We plan to fail the job on submit if it requires more slots than available. 
Are there other scenarios we shall fail fast with dynamic allocation? IIUC the 
barrier tasks that have not get launched are still counted into the number of 
pending tasks, so dynamic resource allocation shall still be able to compute a 
correct expected number of executors.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-25 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21758#discussion_r205317494
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierTaskInfo.scala ---
@@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import org.apache.spark.annotation.{Experimental, Since}
+
+
+/**
+ * :: Experimental ::
+ * Carries all task infos of a barrier task.
+ *
+ * @param address the IPv4 address(host:port) of the executor that a 
barrier task is running on
+ */
+@Experimental
+@Since("2.4.0")
+class BarrierTaskInfo(val address: String)
--- End diff --

If we don't mind to make TaskInfo a public API then I think it shall be 
fine to just put address into  TaskInfo. The major concern is TaskInfo have 
been stable for a long time and do we want to potentially make frequent changes 
to it? (e.g. may add more variables useful for barrier tasks, though I don't 
really have an example at hand)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...

2018-07-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21875
  
which is here 
https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md#jdbc-to-other-databases


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21867: [SPARK-24307][CORE] Add conf to revert to old cod...

2018-07-25 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21867#discussion_r205312971
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -731,7 +731,14 @@ private[spark] class BlockManager(
   }
 
   if (data != null) {
-return Some(ChunkedByteBuffer.fromManagedBuffer(data, chunkSize))
+// SPARK-24307 undocumented "escape-hatch" in case there are any 
issues in converting to
+// to ChunkedByteBuffer, to go back to old code-path.  Can be 
removed post Spark 2.4 if
+// new path is stable.
+if (conf.getBoolean("spark.fetchToNioBuffer", false)) {
--- End diff --

Maybe we'd better to rename that one "spark.maxRemoteBlockSizeFetchToMem" 
also ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...

2018-07-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21650
  
ehh .. @BryanCutler, WDYT about just doing the previous one for now? The 
approach you suggested sounds efficient of course but.. here's not a hot path 
so I think the previous way is fine too .. since that's a bit cleaner (but a 
bit less efficient), and partly the code freeze is close. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Pyt...

2018-07-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21650#discussion_r205311130
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -5060,6 +5049,147 @@ def test_type_annotation(self):
 df = self.spark.range(1).select(pandas_udf(f=_locals['noop'], 
returnType='bigint')('id'))
 self.assertEqual(df.first()[0], 0)
 
+def test_mixed_udf(self):
+import pandas as pd
+from pyspark.sql.functions import col, udf, pandas_udf
+
+df = self.spark.range(0, 1).toDF('v')
+
+# Test mixture of multiple UDFs and Pandas UDFs
+
+@udf('int')
+def f1(x):
+assert type(x) == int
+return x + 1
+
+@pandas_udf('int')
+def f2(x):
+assert type(x) == pd.Series
+return x + 10
+
+@udf('int')
+def f3(x):
+assert type(x) == int
+return x + 100
+
+@pandas_udf('int')
+def f4(x):
+assert type(x) == pd.Series
+return x + 1000
+
+# Test mixed udfs in a single projection
+df1 = df \
+.withColumn('f1', f1(col('v'))) \
+.withColumn('f2', f2(col('v'))) \
+.withColumn('f3', f3(col('v'))) \
+.withColumn('f4', f4(col('v'))) \
+.withColumn('f2_f1', f2(col('f1'))) \
+.withColumn('f3_f1', f3(col('f1'))) \
+.withColumn('f4_f1', f4(col('f1'))) \
+.withColumn('f3_f2', f3(col('f2'))) \
+.withColumn('f4_f2', f4(col('f2'))) \
+.withColumn('f4_f3', f4(col('f3'))) \
+.withColumn('f3_f2_f1', f3(col('f2_f1'))) \
+.withColumn('f4_f2_f1', f4(col('f2_f1'))) \
+.withColumn('f4_f3_f1', f4(col('f3_f1'))) \
+.withColumn('f4_f3_f2', f4(col('f3_f2'))) \
+.withColumn('f4_f3_f2_f1', f4(col('f3_f2_f1')))
+
+# Test mixed udfs in a single expression
+df2 = df \
+.withColumn('f1', f1(col('v'))) \
+.withColumn('f2', f2(col('v'))) \
+.withColumn('f3', f3(col('v'))) \
+.withColumn('f4', f4(col('v'))) \
+.withColumn('f2_f1', f2(f1(col('v' \
+.withColumn('f3_f1', f3(f1(col('v' \
+.withColumn('f4_f1', f4(f1(col('v' \
+.withColumn('f3_f2', f3(f2(col('v' \
+.withColumn('f4_f2', f4(f2(col('v' \
+.withColumn('f4_f3', f4(f3(col('v' \
+.withColumn('f3_f2_f1', f3(f2(f1(col('v') \
+.withColumn('f4_f2_f1', f4(f2(f1(col('v') \
+.withColumn('f4_f3_f1', f4(f3(f1(col('v') \
+.withColumn('f4_f3_f2', f4(f3(f2(col('v') \
+.withColumn('f4_f3_f2_f1', f4(f3(f2(f1(col('v'))
+
+# expected result
+df3 = df \
+.withColumn('f1', df['v'] + 1) \
+.withColumn('f2', df['v'] + 10) \
+.withColumn('f3', df['v'] + 100) \
+.withColumn('f4', df['v'] + 1000) \
+.withColumn('f2_f1', df['v'] + 11) \
+.withColumn('f3_f1', df['v'] + 101) \
+.withColumn('f4_f1', df['v'] + 1001) \
+.withColumn('f3_f2', df['v'] + 110) \
+.withColumn('f4_f2', df['v'] + 1010) \
+.withColumn('f4_f3', df['v'] + 1100) \
+.withColumn('f3_f2_f1', df['v'] + 111) \
+.withColumn('f4_f2_f1', df['v'] + 1011) \
+.withColumn('f4_f3_f1', df['v'] + 1101) \
+.withColumn('f4_f3_f2', df['v'] + 1110) \
+.withColumn('f4_f3_f2_f1', df['v'] + )
+
+self.assertEquals(df3.collect(), df1.collect())
+self.assertEquals(df3.collect(), df2.collect())
+
+def test_mixed_udf_and_sql(self):
+import pandas as pd
+from pyspark.sql.functions import udf, pandas_udf
+
+df = self.spark.range(0, 1).toDF('v')
+
+# Test mixture of UDFs, Pandas UDFs and SQL expression.
+
+@udf('int')
+def f1(x):
+assert type(x) == int
+return x + 1
+
+def f2(x):
+return x + 10
+
+@pandas_udf('int')
+def f3(x):
+assert type(x) == pd.Series
+return x + 100
+
+df1 = df.withColumn('f1', f1(df['v'])) \
+.withColumn('f2', f2(df['v'])) \
+.withColumn('f3', f3(df['v'])) \
+.withColumn('f1_f2', f1(f2(df['v']))) \
+.withColumn('f1_f3', f1(f3(df['v']))) \
+

[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function

2018-07-25 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21103#discussion_r205310335
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3805,3 +3799,330 @@ object ArrayUnion {
 new GenericArrayData(arrayBuffer)
   }
 }
+
+/**
+ * Returns an array of the elements in the intersect of x and y, without 
duplicates
+ */
+@ExpressionDescription(
+  usage = """
+  _FUNC_(array1, array2) - Returns an array of the elements in array1 but 
not in array2,
+without duplicates.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
+   array(2)
+  """,
+  since = "2.4.0")
+case class ArrayExcept(left: Expression, right: Expression) extends 
ArraySetLike {
+  override def dataType: DataType = left.dataType
+
+  var hsInt: OpenHashSet[Int] = _
+  var hsLong: OpenHashSet[Long] = _
+
+  def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getInt(idx)
+if (!hsInt.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setInt(pos, elem)
+  }
+  hsInt.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getLong(idx)
+if (!hsLong.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setLong(pos, elem)
+  }
+  hsLong.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def evalIntLongPrimitiveType(
+  array1: ArrayData,
+  array2: ArrayData,
+  resultArray: ArrayData,
+  isLongType: Boolean): Int = {
+// store elements into resultArray
+var notFoundNullElement = true
+var i = 0
+while (i < array2.numElements()) {
+  if (array2.isNullAt(i)) {
+notFoundNullElement = false
+  } else {
+val assigned = if (!isLongType) {
+  hsInt.add(array2.getInt(i))
+} else {
+  hsLong.add(array2.getLong(i))
+}
+  }
+  i += 1
+}
+var pos = 0
+i = 0
+while (i < array1.numElements()) {
+  if (array1.isNullAt(i)) {
+if (notFoundNullElement) {
+  if (resultArray != null) {
+resultArray.setNullAt(pos)
+  }
+  pos += 1
+  notFoundNullElement = false
+}
+  } else {
+val assigned = if (!isLongType) {
+  assignInt(array1, i, resultArray, pos)
+} else {
+  assignLong(array1, i, resultArray, pos)
+}
+if (assigned) {
+  pos += 1
+}
+  }
+  i += 1
+}
+pos
+  }
+
+  override def nullSafeEval(input1: Any, input2: Any): Any = {
+val array1 = input1.asInstanceOf[ArrayData]
+val array2 = input2.asInstanceOf[ArrayData]
+
+if (elementTypeSupportEquals) {
+  elementType match {
+case IntegerType =>
+  // avoid boxing of primitive int array elements
+  // calculate result array size
+  hsInt = new OpenHashSet[Int]
+  val elements = evalIntLongPrimitiveType(array1, array2, null, 
false)
+  // allocate result array
+  hsInt = new OpenHashSet[Int]
+  val resultArray = if (UnsafeArrayData.shouldUseGenericArrayData(
+IntegerType.defaultSize, elements)) {
+new GenericArrayData(new Array[Any](elements))
+  } else {
+UnsafeArrayData.forPrimitiveArray(
+  Platform.INT_ARRAY_OFFSET, elements, IntegerType.defaultSize)
+  }
+  // assign elements into the result array
+  evalIntLongPrimitiveType(array1, array2, resultArray, false)
+  resultArray
+case LongType =>
+  // avoid boxing of primitive long array elements
+  // calculate result array size
+  hsLong = new OpenHashSet[Long]
+  val elements = evalIntLongPrimitiveType(array1, array2, null, 
true)
+  // allocate result array
+  hsLong = new OpenHashSet[Long]
+  val resultArray = if (UnsafeArrayData.shouldUseGenericArrayData(
+LongType.defaultSize, elements)) {
+new GenericArrayData(new Array[Any](elements))
+  } else {
+UnsafeArrayData.forPrimitiveArray(
+  Platform.LONG_ARRAY_OFFSET, elements, LongType.defaultSize)
+  }
+  

[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-25 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21852#discussion_r205309619
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -416,6 +416,21 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
 // these branches can be pruned away
 val (h, t) = branches.span(_._1 != TrueLiteral)
 CaseWhen( h :+ t.head, None)
+
+  case e @ CaseWhen(branches, Some(elseValue)) if {
+val list = branches.map(_._2) :+ elseValue
+list.tail.forall(list.head.semanticEquals)
+  } =>
+// For non-deterministic conditions with side effect, we can not 
remove it.
+// Since the output of all the branches are semantic equivalence, 
`elseValue`
+// is picked for all the branches.
+val newBranches = 
branches.map(_._1).filter(!_.deterministic).map(cond => (cond, elseValue))
--- End diff --

All conds must be deterministic, otherwise a non deterministic one not run 
before can be run after this rule.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21876: [SPARK-24802][SQL][FOLLOW-UP] Add a new config for Optim...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21876
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93571/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21876: [SPARK-24802][SQL][FOLLOW-UP] Add a new config for Optim...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21876
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21320
  
@ajacques, if you are willing to take over this, please go ahead. I would 
appreciate it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21320
  
@mallman and @ajacques, if you guys find it's any difficulty, I will take 
over this. Please review this. Let me know if you guys think that's better way 
to get through this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21876: [SPARK-24802][SQL][FOLLOW-UP] Add a new config for Optim...

2018-07-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21876
  
**[Test build #93571 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93571/testReport)**
 for PR 21876 at commit 
[`3730053`](https://github.com/apache/spark/commit/3730053d7386188042b2f2d4bd6784c3de722df6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-25 Thread ajacques
Github user ajacques commented on the issue:

https://github.com/apache/spark/pull/21320
  
Hey @mallman, I want to thank you for your work on this so far. I've been 
watching this pull request hoping this would get merged into 2.4 since it would 
be a benefit to me, but can see how it might be frustrating.

Unfortunately, I've only been following the comments and not the 
code/architecture itself, so I can't take over effectively, but I did try to 
make the minor comments as requested hopefully to help out. I've started in 
7ee616076f93d6cfd55b6646314f3d4a6d1530d3. This may not be super helpful right 
now, but if these were the only blockers for getting this change into mainline 
in time for 2.4.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21221
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93570/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21221
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-07-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21221
  
**[Test build #93570 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93570/testReport)**
 for PR 21221 at commit 
[`8905d23`](https://github.com/apache/spark/commit/8905d231c3a959f70266223d3546b17a655cee39).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21320
  
> After more than two years of off and on review, discussion/debate, 
nitpicking, commits, steps forward and backwards, to have someone swoop in at 
this time with a new raft of nitpicking and stylistic issues that set the 
review back again further is beyond maddening. 

I think that's primarily because the change looks incomplete but the 
feature itself sounds good to have. I think that's why people try to take a 
look a lot.

Stepping forward and backwards is bad. That's why I am sticking with this 
PR to get this change in and help you address other people's comments and 
prevent such forward and backward.

Stylistic issues are virtually based upon 
https://github.com/databricks/scala-style-guide .

Nitpicking from me is basically from referring other codes or PRs in Spark, 
or other committer's preference so that we can get through this. I guess nits 
are still good to fix if you happen to push more changes. I guess it would take 
few seconds to address them. If that's not, please ignore my nit or minor 
comments. They don't block the PR usually.

For clarification, few comments mentioned in 
https://github.com/apache/spark/pull/21320#issuecomment-407714036 are pretty 
reject comments in general in other PRs too.

> Contributing to this PR is a tax on what is completely voluntary, unpaid 
time.

FWIW, all my works have been unpaid and completely voluntary to me more 
than 3 years in the past except the recent half 6 months (which basically means 
until I became a committer). To be honest, I believe I still work on Spark like 
when I worked individually before.

> I have no professional responsibility to this effort. Maybe it's better 
off done by someone who does.

I completely agree. There should be no professional responsibility like a 
task to do in an open source in general. I think no one has that professional 
responsibility to take this and here we should be transparent on this. If 
anyone interested in this finds that you want someone else to take over, this 
might be taken over _voluntarily_ with a comment saying I want to take over 
this.

I might cc some people who might be interested in this in order to inform 
them but it doesn't mean I hand it off to someone else.

I am sorry if you felt I am pushing or rushing you - was trying to get this 
change in since people find it's a good feature to have. That's why I 
prioritized this and stick to this PR.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #4: SPARK-1137: Make ZK PersistenceEngine not crash for wrong se...

2018-07-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/4
  
**[Test build #41 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/ubuntuSparkPRB/41/testReport)**
 for PR 4 at commit 
[`414d267`](https://github.com/apache/spark/commit/414d2673b31a72d8a9edb4f5da71f4b12a8a1555).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21852
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21852
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1336/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...

2018-07-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21852
  
**[Test build #93578 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93578/testReport)**
 for PR 21852 at commit 
[`4acda6f`](https://github.com/apache/spark/commit/4acda6fbf4fb5b1be30a0ad213cd5369b64b02b5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-25 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/21852#discussion_r205306098
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -416,6 +416,22 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
 // these branches can be pruned away
 val (h, t) = branches.span(_._1 != TrueLiteral)
 CaseWhen( h :+ t.head, None)
+
+  case e @ CaseWhen(branches, Some(elseValue)) if {
+// With previous rules, it's guaranteed that there must be one 
branch.
--- End diff --

You're right. I removed the comment. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-25 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/21852#discussion_r205305691
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyConditionalSuite.scala
 ---
@@ -122,4 +126,25 @@ class SimplifyConditionalSuite extends PlanTest with 
PredicateHelper {
 None),
   CaseWhen(normalBranch :: trueBranch :: Nil, None))
   }
+
+  test("remove entire CaseWhen if all the outputs are semantic 
equivalence") {
--- End diff --

Yes, I plan to add couple more tests tonight.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...

2018-07-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21878
  
**[Test build #93577 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93577/testReport)**
 for PR 21878 at commit 
[`d2759cc`](https://github.com/apache/spark/commit/d2759cce48eb9a85145e90d8a126fb83351d0fda).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21878
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21878
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1335/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-25 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21852#discussion_r205303174
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyConditionalSuite.scala
 ---
@@ -122,4 +126,25 @@ class SimplifyConditionalSuite extends PlanTest with 
PredicateHelper {
 None),
   CaseWhen(normalBranch :: trueBranch :: Nil, None))
   }
+
+  test("remove entire CaseWhen if all the outputs are semantic 
equivalence") {
--- End diff --

We may need test case including non deterministic cond.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-25 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21852#discussion_r205303069
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -416,6 +416,22 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
 // these branches can be pruned away
 val (h, t) = branches.span(_._1 != TrueLiteral)
 CaseWhen( h :+ t.head, None)
+
+  case e @ CaseWhen(branches, Some(elseValue)) if {
+// With previous rules, it's guaranteed that there must be one 
branch.
--- End diff --

Is this comment correct?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21852
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1334/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21852
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFrameWrit...

2018-07-25 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21821
  
Thanks! Merged to master/2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21596
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93569/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21596
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version

2018-07-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21596
  
**[Test build #93569 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93569/testReport)**
 for PR 21596 at commit 
[`e16f7a1`](https://github.com/apache/spark/commit/e16f7a130b4287b7e4dcbd5132b3e7208b91a8f9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...

2018-07-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21852
  
**[Test build #93576 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93576/testReport)**
 for PR 21852 at commit 
[`0b67e2e`](https://github.com/apache/spark/commit/0b67e2efcb6f827248ee11fffe9eca44a86fceaa).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19528: [SPARK-20393][WEBU UI][1.6] Strengthen Spark to prevent ...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19528
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21871: [SPARK-24916][SQL] Fix type coercion for IN expre...

2018-07-25 Thread wangyum
Github user wangyum closed the pull request at:

https://github.com/apache/spark/pull/21871


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFr...

2018-07-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21821


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...

2018-07-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21878
  
**[Test build #93575 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93575/testReport)**
 for PR 21878 at commit 
[`d95ba40`](https://github.com/apache/spark/commit/d95ba4081ac1188515b7e6363640700d56f2c93f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21878
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21878
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1333/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...

2018-07-25 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/21878
  
cc @gengliangwang and @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21878: [SPARK-24924][SQL] Add mapping for built-in Avro ...

2018-07-25 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/21878

[SPARK-24924][SQL] Add mapping for built-in Avro data source

## What changes were proposed in this pull request?

This PR aims to the followings.
1. Like `com.databricks.spark.csv` mapping, we had better map 
`com.databricks.spark.avro` to built-in Avro data source.
2. Remove incorrect error message, `Please find an Avro package at ...`.

## How was this patch tested?

Pass the newly added tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-24924

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21878.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21878


commit d95ba4081ac1188515b7e6363640700d56f2c93f
Author: Dongjoon Hyun 
Date:   2018-07-25T22:51:56Z

[SPARK-24924][SQL] Add mapping for built-in Avro data source




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21306
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93568/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21306
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-07-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21306
  
**[Test build #93568 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93568/testReport)**
 for PR 21306 at commit 
[`0ee938b`](https://github.com/apache/spark/commit/0ee938bb2e17a9981062042b97e8036179a9eae8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.

2018-07-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21867
  
**[Test build #93574 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93574/testReport)**
 for PR 21867 at commit 
[`a5b00b8`](https://github.com/apache/spark/commit/a5b00b8a05538a6adb3a4525c2fecc1e15575f7c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >