date:20180726

[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function

2018-07-26 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21103#discussion_r205674364
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3805,3 +3799,233 @@ object ArrayUnion {
 new GenericArrayData(arrayBuffer)
   }
 }
+
+/**
+ * Returns an array of the elements in the intersect of x and y, without 
duplicates
+ */
+@ExpressionDescription(
+  usage = """
+  _FUNC_(array1, array2) - Returns an array of the elements in array1 but 
not in array2,
+without duplicates.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
+   array(2)
+  """,
+  since = "2.4.0")
+case class ArrayExcept(left: Expression, right: Expression) extends 
ArraySetLike {
--- End diff --

The check at `dataType` in `ComplexTypeMergingExpression` should be useful.
Based on [the 
discussion](https://github.com/apache/spark/pull/21103#discussion_r203746577), 
the result of `dataType should be only `left.dataType`. Will we use only the 
checks?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier

2018-07-26 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21822#discussion_r205674265
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -787,6 +782,7 @@ class Analyzer(
   right
 case Some((oldRelation, newRelation)) =>
   val attributeRewrites = 
AttributeMap(oldRelation.output.zip(newRelation.output))
+  // TODO(rxin): Why do we need transformUp here?
--- End diff --

we still need to transform resolved plan here to resolve self-join. Image
```
val df = ...
df.as("a").join(df.as("b"), ...)
```
We need to look into the resolved plan to replace the conflicted attributes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function

2018-07-26 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21103#discussion_r205674062
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3805,3 +3799,233 @@ object ArrayUnion {
 new GenericArrayData(arrayBuffer)
   }
 }
+
+/**
+ * Returns an array of the elements in the intersect of x and y, without 
duplicates
+ */
+@ExpressionDescription(
+  usage = """
+  _FUNC_(array1, array2) - Returns an array of the elements in array1 but 
not in array2,
+without duplicates.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
+   array(2)
+  """,
+  since = "2.4.0")
+case class ArrayExcept(left: Expression, right: Expression) extends 
ArraySetLike {
+  override def dataType: DataType = left.dataType
+
+  var hsInt: OpenHashSet[Int] = _
+  var hsLong: OpenHashSet[Long] = _
+
+  def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getInt(idx)
+if (!hsInt.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setInt(pos, elem)
+  }
+  hsInt.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getLong(idx)
+if (!hsLong.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setLong(pos, elem)
+  }
+  hsLong.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def evalIntLongPrimitiveType(
+  array1: ArrayData,
+  array2: ArrayData,
+  resultArray: ArrayData,
+  isLongType: Boolean): Int = {
+// store elements into resultArray
+var notFoundNullElement = true
+var i = 0
+while (i < array2.numElements()) {
+  if (array2.isNullAt(i)) {
+notFoundNullElement = false
+  } else {
+val assigned = if (!isLongType) {
+  hsInt.add(array2.getInt(i))
+} else {
+  hsLong.add(array2.getLong(i))
+}
+  }
+  i += 1
+}
+var pos = 0
+i = 0
+while (i < array1.numElements()) {
+  if (array1.isNullAt(i)) {
+if (notFoundNullElement) {
+  if (resultArray != null) {
+resultArray.setNullAt(pos)
+  }
+  pos += 1
+  notFoundNullElement = false
+}
+  } else {
+val assigned = if (!isLongType) {
+  assignInt(array1, i, resultArray, pos)
+} else {
+  assignLong(array1, i, resultArray, pos)
+}
+if (assigned) {
+  pos += 1
+}
+  }
+  i += 1
+}
+pos
+  }
+
+  @transient lazy val evalExcept: (ArrayData, ArrayData) => ArrayData = {
+if (elementTypeSupportEquals) {
+  elementType match {
+case IntegerType =>
--- End diff --

Ah, if we usually use the generated code, we will optimize them only in the 
generated code. In the interpreted path, we leave only the generic case as a 
backup.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21886
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21886
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93651/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause

2018-07-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21886
  
**[Test build #93651 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93651/testReport)**
 for PR 21886 at commit 
[`7268736`](https://github.com/apache/spark/commit/7268736897885c26b65c459056d5c0a7bae5fedf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function

2018-07-26 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21103#discussion_r205672980
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3805,3 +3799,233 @@ object ArrayUnion {
 new GenericArrayData(arrayBuffer)
   }
 }
+
+/**
+ * Returns an array of the elements in the intersect of x and y, without 
duplicates
+ */
+@ExpressionDescription(
+  usage = """
+  _FUNC_(array1, array2) - Returns an array of the elements in array1 but 
not in array2,
+without duplicates.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
+   array(2)
+  """,
+  since = "2.4.0")
+case class ArrayExcept(left: Expression, right: Expression) extends 
ArraySetLike {
+  override def dataType: DataType = left.dataType
+
+  var hsInt: OpenHashSet[Int] = _
+  var hsLong: OpenHashSet[Long] = _
+
+  def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getInt(idx)
+if (!hsInt.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setInt(pos, elem)
+  }
+  hsInt.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getLong(idx)
+if (!hsLong.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setLong(pos, elem)
+  }
+  hsLong.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def evalIntLongPrimitiveType(
+  array1: ArrayData,
+  array2: ArrayData,
+  resultArray: ArrayData,
+  isLongType: Boolean): Int = {
+// store elements into resultArray
+var notFoundNullElement = true
+var i = 0
+while (i < array2.numElements()) {
+  if (array2.isNullAt(i)) {
+notFoundNullElement = false
+  } else {
+val assigned = if (!isLongType) {
+  hsInt.add(array2.getInt(i))
+} else {
+  hsLong.add(array2.getLong(i))
+}
+  }
+  i += 1
+}
+var pos = 0
+i = 0
+while (i < array1.numElements()) {
+  if (array1.isNullAt(i)) {
+if (notFoundNullElement) {
+  if (resultArray != null) {
+resultArray.setNullAt(pos)
+  }
+  pos += 1
+  notFoundNullElement = false
+}
+  } else {
+val assigned = if (!isLongType) {
+  assignInt(array1, i, resultArray, pos)
+} else {
+  assignLong(array1, i, resultArray, pos)
+}
+if (assigned) {
+  pos += 1
+}
+  }
+  i += 1
+}
+pos
+  }
+
+  @transient lazy val evalExcept: (ArrayData, ArrayData) => ArrayData = {
+if (elementTypeSupportEquals) {
+  elementType match {
+case IntegerType =>
--- End diff --

Yeah, since the generated code also uses this path, we can optimize 
byte/short/boolean. However, it is not easy to optimize float and double with 
`OpenHashSet`. Let me look for hashset than can store primitive float/double.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21822
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93648/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21822
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...

2018-07-26 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21879
  
Seems right to me too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier

2018-07-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21822
  
**[Test build #93648 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93648/testReport)**
 for PR 21822 at commit 
[`fe52801`](https://github.com/apache/spark/commit/fe528010f98ccbafaa486a8d57afced0e6f10393).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21880: [SPARK-24929][INFRA] Make merge script don't swal...

2018-07-26 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21880


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21789: [SPARK-24829][STS]In Spark Thrift Server, CAST AS...

2018-07-26 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21789


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21880: [SPARK-24929][INFRA] Make merge script don't swallow Key...

2018-07-26 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21880
  
Thanks @squito.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21880: [SPARK-24929][INFRA] Make merge script don't swallow Key...

2018-07-26 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21880
  
Manually tested. 

Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21789: [SPARK-24829][STS]In Spark Thrift Server, CAST AS FLOAT ...

2018-07-26 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21789
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-26 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21889
  
If there's no objection within few days, let me get this in cc @cloud-fan 
and @gatorsmile and make other works and comments separate.

@mallman, if we are all happy here, mind taking a look 
https://github.com/apache/spark/pull/21320#issuecomment-408271470 and 
https://github.com/apache/spark/pull/21320#issuecomment-406765851

I will fix my own comments as a follow by myself.

Will credit this to you FWIW.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21882: [SPARK-24934][SQL] Explicitly whitelist supported types ...

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21882
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21882: [SPARK-24934][SQL] Explicitly whitelist supported types ...

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21882
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93649/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21882: [SPARK-24934][SQL] Explicitly whitelist supported types ...

2018-07-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21882
  
**[Test build #93649 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93649/testReport)**
 for PR 21882 at commit 
[`7f1040e`](https://github.com/apache/spark/commit/7f1040e5e218c60a52afdff015fb84ad2c386b52).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...

2018-07-26 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/21879
  
@cloud-fan  Didn't try to actually reproduce this issue in branches other 
than branch-2.3, but just by checking the POM files, this issue existed ever 
since at least 1.6.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21699: [SPARK-24722][SQL] pivot() with Column type argum...

2018-07-26 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21699#discussion_r205669387
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -340,29 +399,23 @@ class RelationalGroupedDataset protected[sql](
 
   /**
* Pivots a column of the current `DataFrame` and performs the specified 
aggregation.
-   * There are two versions of pivot function: one that requires the 
caller to specify the list
--- End diff --

Shall we note this in `Column` API too, or note that this is an overloaded 
version of string's?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function

2018-07-26 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21103#discussion_r205668821
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3805,3 +3799,233 @@ object ArrayUnion {
 new GenericArrayData(arrayBuffer)
   }
 }
+
+/**
+ * Returns an array of the elements in the intersect of x and y, without 
duplicates
+ */
+@ExpressionDescription(
+  usage = """
+  _FUNC_(array1, array2) - Returns an array of the elements in array1 but 
not in array2,
+without duplicates.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
+   array(2)
+  """,
+  since = "2.4.0")
+case class ArrayExcept(left: Expression, right: Expression) extends 
ArraySetLike {
+  override def dataType: DataType = left.dataType
+
+  var hsInt: OpenHashSet[Int] = _
+  var hsLong: OpenHashSet[Long] = _
+
+  def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getInt(idx)
+if (!hsInt.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setInt(pos, elem)
+  }
+  hsInt.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getLong(idx)
+if (!hsLong.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setLong(pos, elem)
+  }
+  hsLong.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def evalIntLongPrimitiveType(
+  array1: ArrayData,
+  array2: ArrayData,
+  resultArray: ArrayData,
+  isLongType: Boolean): Int = {
+// store elements into resultArray
+var notFoundNullElement = true
+var i = 0
+while (i < array2.numElements()) {
+  if (array2.isNullAt(i)) {
+notFoundNullElement = false
+  } else {
+val assigned = if (!isLongType) {
+  hsInt.add(array2.getInt(i))
+} else {
+  hsLong.add(array2.getLong(i))
+}
+  }
+  i += 1
+}
+var pos = 0
+i = 0
+while (i < array1.numElements()) {
+  if (array1.isNullAt(i)) {
+if (notFoundNullElement) {
+  if (resultArray != null) {
+resultArray.setNullAt(pos)
+  }
+  pos += 1
+  notFoundNullElement = false
+}
+  } else {
+val assigned = if (!isLongType) {
+  assignInt(array1, i, resultArray, pos)
+} else {
+  assignLong(array1, i, resultArray, pos)
+}
+if (assigned) {
+  pos += 1
+}
+  }
+  i += 1
+}
+pos
+  }
+
+  @transient lazy val evalExcept: (ArrayData, ArrayData) => ArrayData = {
+if (elementTypeSupportEquals) {
+  elementType match {
+case IntegerType =>
--- End diff --

if they all worth to optimize, I feel we should not optimize it in the 
interpreted code path and leave it to codegen path. That's a strong reason to 
add codegen support.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function

2018-07-26 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21103#discussion_r205668541
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3805,3 +3799,233 @@ object ArrayUnion {
 new GenericArrayData(arrayBuffer)
   }
 }
+
+/**
+ * Returns an array of the elements in the intersect of x and y, without 
duplicates
+ */
+@ExpressionDescription(
+  usage = """
+  _FUNC_(array1, array2) - Returns an array of the elements in array1 but 
not in array2,
+without duplicates.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
+   array(2)
+  """,
+  since = "2.4.0")
+case class ArrayExcept(left: Expression, right: Expression) extends 
ArraySetLike {
+  override def dataType: DataType = left.dataType
+
+  var hsInt: OpenHashSet[Int] = _
+  var hsLong: OpenHashSet[Long] = _
+
+  def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getInt(idx)
+if (!hsInt.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setInt(pos, elem)
+  }
+  hsInt.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getLong(idx)
+if (!hsLong.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setLong(pos, elem)
+  }
+  hsLong.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def evalIntLongPrimitiveType(
+  array1: ArrayData,
+  array2: ArrayData,
+  resultArray: ArrayData,
+  isLongType: Boolean): Int = {
+// store elements into resultArray
+var notFoundNullElement = true
+var i = 0
+while (i < array2.numElements()) {
+  if (array2.isNullAt(i)) {
+notFoundNullElement = false
+  } else {
+val assigned = if (!isLongType) {
+  hsInt.add(array2.getInt(i))
+} else {
+  hsLong.add(array2.getLong(i))
+}
+  }
+  i += 1
+}
+var pos = 0
+i = 0
+while (i < array1.numElements()) {
+  if (array1.isNullAt(i)) {
+if (notFoundNullElement) {
+  if (resultArray != null) {
+resultArray.setNullAt(pos)
+  }
+  pos += 1
+  notFoundNullElement = false
+}
+  } else {
+val assigned = if (!isLongType) {
+  assignInt(array1, i, resultArray, pos)
+} else {
+  assignLong(array1, i, resultArray, pos)
+}
+if (assigned) {
+  pos += 1
+}
+  }
+  i += 1
+}
+pos
+  }
+
+  @transient lazy val evalExcept: (ArrayData, ArrayData) => ArrayData = {
+if (elementTypeSupportEquals) {
+  elementType match {
+case IntegerType =>
--- End diff --

why we only optimize for int and long? how about byte, short, float, double?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21103
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21103
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93644/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21103
  
**[Test build #93644 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93644/testReport)**
 for PR 21103 at commit 
[`ad0c318`](https://github.com/apache/spark/commit/ad0c318c1fb63bd08c69995d192ed3ab9a98e4c2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93646/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #93646 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93646/testReport)**
 for PR 21889 at commit 
[`4b847ac`](https://github.com/apache/spark/commit/4b847acd8279d8c40115638faac30a4bc1736307).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function

2018-07-26 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21103#discussion_r205667881
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3805,3 +3799,233 @@ object ArrayUnion {
 new GenericArrayData(arrayBuffer)
   }
 }
+
+/**
+ * Returns an array of the elements in the intersect of x and y, without 
duplicates
+ */
+@ExpressionDescription(
+  usage = """
+  _FUNC_(array1, array2) - Returns an array of the elements in array1 but 
not in array2,
+without duplicates.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
+   array(2)
+  """,
+  since = "2.4.0")
+case class ArrayExcept(left: Expression, right: Expression) extends 
ArraySetLike {
--- End diff --

does this need to extend `ComplexTypeMergingExpression`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.

2018-07-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21857
  
**[Test build #93656 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93656/testReport)**
 for PR 21857 at commit 
[`1f107aa`](https://github.com/apache/spark/commit/1f107aaa1fb4e6f261c1720058877b943c46706d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21857
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1396/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21857
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...

2018-07-26 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/21879
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...

2018-07-26 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21879
  
LGTM, how far shall we backport this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.

2018-07-26 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21857#discussion_r205667472
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/except-all.sql ---
@@ -0,0 +1,146 @@
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1);
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+(1), (2), (2), (3), (5), (5), (null) AS tab2(c1);
+CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2),
+(1, 3),
+(2, 3),
+(2, 2)
+AS tab3(k, v);
+CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES
+(1, 2), 
+(2, 3),
+(2, 2),
+(2, 2),
+(2, 20)
+AS tab4(k, v);
+
+-- Basic ExceptAll
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- ExceptAll same table in both branches
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE c1 IS NOT NULL;
+
+-- Empty left relation
+SELECT * FROM tab1 WHERE c1 > 5
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- Empty right relation
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE c1 > 6;
+
+-- Type Coerced ExceptAll
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT CAST(1 AS BIGINT);
+
+-- Error as types of two side are not compatible
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT array(1);
+
+-- Basic
+SELECT * FROM tab3
+EXCEPT ALL
+SELECT * FROM tab4;
+
+-- Basic
+SELECT * FROM tab4
+EXCEPT ALL
+SELECT * FROM tab3;
+
+-- ExceptAll + Intersect
+SELECT * FROM tab4
+EXCEPT ALL
+SELECT * FROM tab3
+INTERSECT DISTINCT
+SELECT * FROM tab4;
+
+-- ExceptAll + Except
+SELECT * FROM tab4
+EXCEPT ALL
+SELECT * FROM tab3
+EXCEPT DISTINCT
+SELECT * FROM tab4;
+
+-- Chain of set operations
+SELECT * FROM tab3
+EXCEPT ALL
+SELECT * FROM tab4
+UNION ALL
+SELECT * FROM tab3
+EXCEPT DISTINCT
+SELECT * FROM tab4;
+
+-- Mismatch on number of columns across both branches
+SELECT k FROM tab3
+EXCEPT ALL
+SELECT k, v FROM tab4;
+
+-- Chain of set operations
+SELECT * FROM tab3
+EXCEPT ALL
+SELECT * FROM tab4
+UNION
+SELECT * FROM tab3
+EXCEPT DISTINCT
+SELECT * FROM tab4;
+
+-- Chain of set operations
+SELECT * FROM tab3
+EXCEPT ALL
+SELECT * FROM tab4
+EXCEPT DISTINCT
+SELECT * FROM tab3
+EXCEPT DISTINCT
+SELECT * FROM tab4;
+
+-- Join under except all. Should produce empty resultset since both left 
and right sets 
+-- are same.
+SELECT * 
+FROM   (SELECT tab3.k, 
+   tab4.v 
+FROM   tab3 
+   JOIN tab4 
+ ON tab3.k = tab4.k)
+EXCEPT ALL 
+SELECT * 
+FROM   (SELECT tab3.k, 
+   tab4.v 
+FROM   tab3 
+   JOIN tab4 
+ ON tab3.k = tab4.k);
+
+-- Join under except all (2)
+SELECT * 
+FROM   (SELECT tab3.k, 
+   tab4.v 
+FROM   tab3 
+   JOIN tab4 
+ ON tab3.k = tab4.k) 
+EXCEPT ALL 
+SELECT * 
+FROM   (SELECT tab4.v AS k, 
+   tab3.k AS v 
+FROM   tab3 
+   JOIN tab4 
+ ON tab3.k = tab4.k);
+
+-- Group by under ExceptAll
+SELECT v FROM tab3 GROUP BY v
+EXCEPT ALL
+SELECT k FROM tab4 GROUP BY k 
--- End diff --

@gatorsmile Thank you. fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21875
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93647/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21875
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...

2018-07-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21875
  
**[Test build #93647 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93647/testReport)**
 for PR 21875 at commit 
[`6884ed6`](https://github.com/apache/spark/commit/6884ed6949c4b8b61bec31248d23cb827bfbc944).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21852
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93643/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...

2018-07-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21879
  
**[Test build #93655 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93655/testReport)**
 for PR 21879 at commit 
[`93c34da`](https://github.com/apache/spark/commit/93c34da713136eb7b4ed8bb8775353c8219efa22).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21852
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21879
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1395/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...

2018-07-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21852
  
**[Test build #93643 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93643/testReport)**
 for PR 21852 at commit 
[`9171773`](https://github.com/apache/spark/commit/91717737ed39dd3f972603d1febb7c6c459b5060).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21879
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...

2018-07-26 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/21879
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21103
  
**[Test build #93654 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93654/testReport)**
 for PR 21103 at commit 
[`e902974`](https://github.com/apache/spark/commit/e9029746a9cbc204d043cb7a0f9c1c3285284b54).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21103
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21103
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1394/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.

2018-07-26 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21857#discussion_r205663395
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/except-all.sql ---
@@ -0,0 +1,146 @@
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1);
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+(1), (2), (2), (3), (5), (5), (null) AS tab2(c1);
+CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2),
+(1, 3),
+(2, 3),
+(2, 2)
+AS tab3(k, v);
+CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES
+(1, 2), 
+(2, 3),
+(2, 2),
+(2, 2),
+(2, 20)
+AS tab4(k, v);
+
+-- Basic ExceptAll
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- ExceptAll same table in both branches
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE c1 IS NOT NULL;
+
+-- Empty left relation
+SELECT * FROM tab1 WHERE c1 > 5
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- Empty right relation
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE c1 > 6;
+
+-- Type Coerced ExceptAll
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT CAST(1 AS BIGINT);
+
+-- Error as types of two side are not compatible
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT array(1);
+
+-- Basic
+SELECT * FROM tab3
+EXCEPT ALL
+SELECT * FROM tab4;
+
+-- Basic
+SELECT * FROM tab4
+EXCEPT ALL
+SELECT * FROM tab3;
+
+-- ExceptAll + Intersect
+SELECT * FROM tab4
+EXCEPT ALL
+SELECT * FROM tab3
+INTERSECT DISTINCT
+SELECT * FROM tab4;
+
+-- ExceptAll + Except
+SELECT * FROM tab4
+EXCEPT ALL
+SELECT * FROM tab3
+EXCEPT DISTINCT
+SELECT * FROM tab4;
+
+-- Chain of set operations
+SELECT * FROM tab3
+EXCEPT ALL
+SELECT * FROM tab4
+UNION ALL
+SELECT * FROM tab3
+EXCEPT DISTINCT
+SELECT * FROM tab4;
+
+-- Mismatch on number of columns across both branches
+SELECT k FROM tab3
+EXCEPT ALL
+SELECT k, v FROM tab4;
+
+-- Chain of set operations
+SELECT * FROM tab3
+EXCEPT ALL
+SELECT * FROM tab4
+UNION
+SELECT * FROM tab3
+EXCEPT DISTINCT
+SELECT * FROM tab4;
+
+-- Chain of set operations
+SELECT * FROM tab3
+EXCEPT ALL
+SELECT * FROM tab4
+EXCEPT DISTINCT
+SELECT * FROM tab3
+EXCEPT DISTINCT
+SELECT * FROM tab4;
+
+-- Join under except all. Should produce empty resultset since both left 
and right sets 
+-- are same.
+SELECT * 
+FROM   (SELECT tab3.k, 
+   tab4.v 
+FROM   tab3 
+   JOIN tab4 
+ ON tab3.k = tab4.k)
+EXCEPT ALL 
+SELECT * 
+FROM   (SELECT tab3.k, 
+   tab4.v 
+FROM   tab3 
+   JOIN tab4 
+ ON tab3.k = tab4.k);
+
+-- Join under except all (2)
+SELECT * 
+FROM   (SELECT tab3.k, 
+   tab4.v 
+FROM   tab3 
+   JOIN tab4 
+ ON tab3.k = tab4.k) 
+EXCEPT ALL 
+SELECT * 
+FROM   (SELECT tab4.v AS k, 
+   tab3.k AS v 
+FROM   tab3 
+   JOIN tab4 
+ ON tab3.k = tab4.k);
+
+-- Group by under ExceptAll
+SELECT v FROM tab3 GROUP BY v
+EXCEPT ALL
+SELECT k FROM tab4 GROUP BY k 
--- End diff --

;


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function

2018-07-26 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21103#discussion_r205663032
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3805,3 +3799,230 @@ object ArrayUnion {
 new GenericArrayData(arrayBuffer)
   }
 }
+
+/**
+ * Returns an array of the elements in the intersect of x and y, without 
duplicates
+ */
+@ExpressionDescription(
+  usage = """
+  _FUNC_(array1, array2) - Returns an array of the elements in array1 but 
not in array2,
+without duplicates.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
+   array(2)
+  """,
+  since = "2.4.0")
+case class ArrayExcept(left: Expression, right: Expression) extends 
ArraySetLike {
+  override def dataType: DataType = left.dataType
+
+  var hsInt: OpenHashSet[Int] = _
+  var hsLong: OpenHashSet[Long] = _
+
+  def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getInt(idx)
+if (!hsInt.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setInt(pos, elem)
+  }
+  hsInt.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getLong(idx)
+if (!hsLong.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setLong(pos, elem)
+  }
+  hsLong.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def evalIntLongPrimitiveType(
+  array1: ArrayData,
+  array2: ArrayData,
+  resultArray: ArrayData,
+  isLongType: Boolean): Int = {
+// store elements into resultArray
+var notFoundNullElement = true
+var i = 0
+while (i < array2.numElements()) {
+  if (array2.isNullAt(i)) {
+notFoundNullElement = false
+  } else {
+val assigned = if (!isLongType) {
+  hsInt.add(array2.getInt(i))
+} else {
+  hsLong.add(array2.getLong(i))
+}
+  }
+  i += 1
+}
+var pos = 0
+i = 0
+while (i < array1.numElements()) {
+  if (array1.isNullAt(i)) {
+if (notFoundNullElement) {
+  if (resultArray != null) {
+resultArray.setNullAt(pos)
+  }
+  pos += 1
+  notFoundNullElement = false
+}
+  } else {
+val assigned = if (!isLongType) {
+  assignInt(array1, i, resultArray, pos)
+} else {
+  assignLong(array1, i, resultArray, pos)
+}
+if (assigned) {
+  pos += 1
+}
+  }
+  i += 1
+}
+pos
+  }
+
+  val exceptEquals: (ArrayData, ArrayData) => ArrayData = {
--- End diff --

Ah, you are right. Thank you.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-26 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21758#discussion_r205660610
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1647,6 +1647,14 @@ abstract class RDD[T: ClassTag](
 }
   }
 
+  /**
+   * :: Experimental ::
+   * Indicates that Spark must launch the tasks together for the current 
stage.
+   */
+  @Experimental
+  @Since("2.4.0")
+  def barrier(): RDDBarrier[T] = withScope(new RDDBarrier[T](this))
--- End diff --

I opened https://issues.apache.org/jira/browse/SPARK-24941 for this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-26 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21758#discussion_r205660568
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -359,20 +366,55 @@ private[spark] class TaskSchedulerImpl(
 // of locality levels so that it gets a chance to launch local tasks 
on all of them.
 // NOTE: the preferredLocality order: PROCESS_LOCAL, NODE_LOCAL, 
NO_PREF, RACK_LOCAL, ANY
 for (taskSet <- sortedTaskSets) {
-  var launchedAnyTask = false
-  var launchedTaskAtCurrentMaxLocality = false
-  for (currentMaxLocality <- taskSet.myLocalityLevels) {
-do {
-  launchedTaskAtCurrentMaxLocality = resourceOfferSingleTaskSet(
-taskSet, currentMaxLocality, shuffledOffers, availableCpus, 
tasks)
-  launchedAnyTask |= launchedTaskAtCurrentMaxLocality
-} while (launchedTaskAtCurrentMaxLocality)
-  }
-  if (!launchedAnyTask) {
-taskSet.abortIfCompletelyBlacklisted(hostToExecutors)
+  // Skip the barrier taskSet if the available slots are less than the 
number of pending tasks.
+  if (taskSet.isBarrier && availableSlots < taskSet.numTasks) {
--- End diff --

Yea you made really good point here, I've opened 
https://issues.apache.org/jira/browse/SPARK-24942 to track the cluster resource 
management issue.

> what exactly do you mean by "available"? Its not so well defined for 
dynamic allocation. The resources you have right when the job is submitted? 
Also can you point me to where that is being done? I didn't see it here -- is 
it another jira?

This is tracked by https://issues.apache.org/jira/browse/SPARK-24819, we 
shall check all the barrier stages on job submitted, to see whether the barrier 
stages require more slots (to be able to launch all the barrier tasks in the 
same stage together) than currently active slots in the cluster. If the job 
requires more slots than available (both busy and free slots), fail the job on 
submit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21103
  
**[Test build #93653 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93653/testReport)**
 for PR 21103 at commit 
[`4d01c98`](https://github.com/apache/spark/commit/4d01c9848e021006e2412ebb2db3e37782b5f41a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21103
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21103
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1393/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21811: [SPARK-24801][CORE] Avoid memory waste by empty b...

2018-07-26 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21811


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21811: [SPARK-24801][CORE] Avoid memory waste by empty byte[] a...

2018-07-26 Thread squito

Github user squito commented on the issue:

https://github.com/apache/spark/pull/21811
  
merged to master, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...

2018-07-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21883
  
**[Test build #93652 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93652/testReport)**
 for PR 21883 at commit 
[`b44e578`](https://github.com/apache/spark/commit/b44e578cf5cf1ac0e25dab779739ef253786c366).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21883
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1392/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21883
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21877: [SPARK-24923][SQL][WIP] Add unpartitioned CTAS and RTAS ...

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21877
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21877: [SPARK-24923][SQL][WIP] Add unpartitioned CTAS and RTAS ...

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21877
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93638/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...

2018-07-26 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/21883
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21877: [SPARK-24923][SQL][WIP] Add unpartitioned CTAS and RTAS ...

2018-07-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21877
  
**[Test build #93638 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93638/testReport)**
 for PR 21877 at commit 
[`b6b29d8`](https://github.com/apache/spark/commit/b6b29d809bb76ea59b7ec66e8a80b224a938495b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CreateTableAsSelect(`
  * `case class ReplaceTableAsSelect(`
  * `case class TableV2Relation(`
  * `case class AppendDataExec(`
  * `case class CreateTableAsSelectExec(`
  * `case class ReplaceTableAsSelectExec(`
  * `case class WriteToDataSourceV2Exec(`
  * `abstract class V2TableWriteExec(`
  * `  implicit class CatalogHelper(catalog: CatalogProvider) `
  * `  implicit class TableHelper(table: Table) `
  * `  implicit class SourceHelper(source: DataSourceV2) `
  * `  implicit class OptionsHelper(options: Map[String, String]) `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.

2018-07-26 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21857#discussion_r205656881
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1400,13 +1401,71 @@ object ReplaceIntersectWithSemiJoin extends 
Rule[LogicalPlan] {
  */
 object ReplaceExceptWithAntiJoin extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
-case Except(left, right) =>
+case Except(left, right, false) =>
   assert(left.output.size == right.output.size)
   val joinCond = left.output.zip(right.output).map { case (l, r) => 
EqualNullSafe(l, r) }
   Distinct(Join(left, right, LeftAnti, joinCond.reduceLeftOption(And)))
   }
 }
 
+/**
+ * Replaces logical [[ExceptAll]] operator using a combination of Union, 
Aggregate
+ * and Generate operator.
+ *
+ * Input Query :
+ * {{{
+ *SELECT c1 FROM ut1 EXCEPT ALL SELECT c1 FROM ut2
+ * }}}
+ *
+ * Rewritten Query:
+ * {{{
+ *   SELECT c1
+ *   FROM (
+ * SELECT replicate_rows(sum_val, c1) AS (sum_val, c1)
--- End diff --

Please don't forget to update the pr description as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21890: [SPARK-24932] Allow update mode for streaming queries wi...

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21890
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21890: [SPARK-24932] Allow update mode for streaming queries wi...

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21890
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21890: [SPARK-24932] Allow update mode for streaming queries wi...

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21890
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21857
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93639/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21857
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21883
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.

2018-07-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21857
  
**[Test build #93639 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93639/testReport)**
 for PR 21857 at commit 
[`f0da978`](https://github.com/apache/spark/commit/f0da978cad619547b7f77caf29a3799bb4aa2884).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21883
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93645/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...

2018-07-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21883
  
**[Test build #93645 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93645/testReport)**
 for PR 21883 at commit 
[`b44e578`](https://github.com/apache/spark/commit/b44e578cf5cf1ac0e25dab779739ef253786c366).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21890: [SPARK-24932] Allow update mode for streaming que...

2018-07-26 Thread fuyufjh

GitHub user fuyufjh opened a pull request:

https://github.com/apache/spark/pull/21890

[SPARK-24932] Allow update mode for streaming queries with join

## What changes were proposed in this pull request?

In issue SPARK-19140 we supported update output mode for non-aggregation 
streaming queries. This should also be applied to streaming join to keep 
semantic consistent.

PS. Streaming join feature is added after SPARK-19140. 

When using *update* output mode the join will works exactly as *append* 
mode. However, for example, this will allow user to run an 
aggregation-after-join query in *update* mode in order to get a more real-time 
result output.

## How was this patch tested?

See changes in UnsupportedOperationsSuite.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fuyufjh/spark 
SPARK-19140-allow-update-for-stream-join

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21890.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21890


commit f2daf62c4e5d9daf397fc804ed9365204933ddbd
Author: Eric Fu 
Date:   2018-07-27T02:55:17Z

[SPARK-24932] Allow update mode for streaming queries with join




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21837: [SPARK-24881][SQL] New Avro option - compression

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21837
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93637/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21837: [SPARK-24881][SQL] New Avro option - compression

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21837
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21837: [SPARK-24881][SQL] New Avro option - compression

2018-07-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21837
  
**[Test build #93637 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93637/testReport)**
 for PR 21837 at commit 
[`ebaf327`](https://github.com/apache/spark/commit/ebaf327d17ffda55a35490e080cde5b2948cc655).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21879
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93640/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21879
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...

2018-07-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21879
  
**[Test build #93640 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93640/testReport)**
 for PR 21879 at commit 
[`93c34da`](https://github.com/apache/spark/commit/93c34da713136eb7b4ed8bb8775353c8219efa22).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-26 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21758#discussion_r205652334
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1647,6 +1647,14 @@ abstract class RDD[T: ClassTag](
 }
   }
 
+  /**
+   * :: Experimental ::
+   * Indicates that Spark must launch the tasks together for the current 
stage.
+   */
+  @Experimental
+  @Since("2.4.0")
+  def barrier(): RDDBarrier[T] = withScope(new RDDBarrier[T](this))
--- End diff --

was this addressed at all?  is there another jira for it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-26 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21758#discussion_r205652317
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -359,20 +366,55 @@ private[spark] class TaskSchedulerImpl(
 // of locality levels so that it gets a chance to launch local tasks 
on all of them.
 // NOTE: the preferredLocality order: PROCESS_LOCAL, NODE_LOCAL, 
NO_PREF, RACK_LOCAL, ANY
 for (taskSet <- sortedTaskSets) {
-  var launchedAnyTask = false
-  var launchedTaskAtCurrentMaxLocality = false
-  for (currentMaxLocality <- taskSet.myLocalityLevels) {
-do {
-  launchedTaskAtCurrentMaxLocality = resourceOfferSingleTaskSet(
-taskSet, currentMaxLocality, shuffledOffers, availableCpus, 
tasks)
-  launchedAnyTask |= launchedTaskAtCurrentMaxLocality
-} while (launchedTaskAtCurrentMaxLocality)
-  }
-  if (!launchedAnyTask) {
-taskSet.abortIfCompletelyBlacklisted(hostToExecutors)
+  // Skip the barrier taskSet if the available slots are less than the 
number of pending tasks.
+  if (taskSet.isBarrier && availableSlots < taskSet.numTasks) {
--- End diff --

You'll request the slots, but I think there are a lot more complications.  
The whole point of using dynamic allocation is on a multi-tenant cluster, so 
resources will come and go.  If there aren't enough resources available on the 
cluster no matter what, then you'll see executors get acquired, have their idle 
timeout expire, get released, and then acquired again.  This will be really 
confusing to the user, as it might look there is some progress with the 
constant logging about executors getting acquired and released, though really 
it would just wait indefinitely.

Or you might get deadlock with two concurrent applications.  Even if they 
could fit on the cluster by themselves, they might both acquire some resources, 
which would prevent either of them from getting enough.  Again, they'd both go 
through the same loop, of acquiring some resources, then having them hit the 
idle timeout and releasing them, then acquiring resources, but they might just 
continually trade resources between each other.  They'd only advance by luck.

You have the similar problems with concurrent jobs within one spark 
application, but its a bit easier to control since at least the spark scheduler 
knows about everything.

> We plan to fail the job on submit if it requires more slots than 
available.

what exactly do you mean by "available"?  Its not so well defined for 
dynamic allocation.  The resources you have right when the job is submitted?  
Also can you point me to where that is being done?  I didn't see it here -- is 
it another jira?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...

2018-07-26 Thread icexelloss

Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/21650
  
@HyukjinKwon  I think Bryan's imple looks promising. Please let me take a 
look.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21809: [SPARK-24851][UI] Map a Stage ID to it's Associated Job ...

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21809
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21809: [SPARK-24851][UI] Map a Stage ID to it's Associated Job ...

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21809
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93631/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function

2018-07-26 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21103#discussion_r205650766
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3805,3 +3799,230 @@ object ArrayUnion {
 new GenericArrayData(arrayBuffer)
   }
 }
+
+/**
+ * Returns an array of the elements in the intersect of x and y, without 
duplicates
+ */
+@ExpressionDescription(
+  usage = """
+  _FUNC_(array1, array2) - Returns an array of the elements in array1 but 
not in array2,
+without duplicates.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
+   array(2)
+  """,
+  since = "2.4.0")
+case class ArrayExcept(left: Expression, right: Expression) extends 
ArraySetLike {
+  override def dataType: DataType = left.dataType
+
+  var hsInt: OpenHashSet[Int] = _
+  var hsLong: OpenHashSet[Long] = _
+
+  def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getInt(idx)
+if (!hsInt.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setInt(pos, elem)
+  }
+  hsInt.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getLong(idx)
+if (!hsLong.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setLong(pos, elem)
+  }
+  hsLong.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def evalIntLongPrimitiveType(
+  array1: ArrayData,
+  array2: ArrayData,
+  resultArray: ArrayData,
+  isLongType: Boolean): Int = {
+// store elements into resultArray
+var notFoundNullElement = true
+var i = 0
+while (i < array2.numElements()) {
+  if (array2.isNullAt(i)) {
+notFoundNullElement = false
+  } else {
+val assigned = if (!isLongType) {
+  hsInt.add(array2.getInt(i))
+} else {
+  hsLong.add(array2.getLong(i))
+}
+  }
+  i += 1
+}
+var pos = 0
+i = 0
+while (i < array1.numElements()) {
+  if (array1.isNullAt(i)) {
+if (notFoundNullElement) {
+  if (resultArray != null) {
+resultArray.setNullAt(pos)
+  }
+  pos += 1
+  notFoundNullElement = false
+}
+  } else {
+val assigned = if (!isLongType) {
+  assignInt(array1, i, resultArray, pos)
+} else {
+  assignLong(array1, i, resultArray, pos)
+}
+if (assigned) {
+  pos += 1
+}
+  }
+  i += 1
+}
+pos
+  }
+
+  val exceptEquals: (ArrayData, ArrayData) => ArrayData = {
--- End diff --

Btw, `@transient lazy val`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21809: [SPARK-24851][UI] Map a Stage ID to it's Associated Job ...

2018-07-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21809
  
**[Test build #93631 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93631/testReport)**
 for PR 21809 at commit 
[`3a06b87`](https://github.com/apache/spark/commit/3a06b876268c291604c395eef51387a419486fee).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function

2018-07-26 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21103#discussion_r205649948
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3805,3 +3799,230 @@ object ArrayUnion {
 new GenericArrayData(arrayBuffer)
   }
 }
+
+/**
+ * Returns an array of the elements in the intersect of x and y, without 
duplicates
+ */
+@ExpressionDescription(
+  usage = """
+  _FUNC_(array1, array2) - Returns an array of the elements in array1 but 
not in array2,
+without duplicates.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
+   array(2)
+  """,
+  since = "2.4.0")
+case class ArrayExcept(left: Expression, right: Expression) extends 
ArraySetLike {
+  override def dataType: DataType = left.dataType
+
+  var hsInt: OpenHashSet[Int] = _
+  var hsLong: OpenHashSet[Long] = _
+
+  def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getInt(idx)
+if (!hsInt.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setInt(pos, elem)
+  }
+  hsInt.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getLong(idx)
+if (!hsLong.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setLong(pos, elem)
+  }
+  hsLong.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def evalIntLongPrimitiveType(
+  array1: ArrayData,
+  array2: ArrayData,
+  resultArray: ArrayData,
+  isLongType: Boolean): Int = {
+// store elements into resultArray
+var notFoundNullElement = true
+var i = 0
+while (i < array2.numElements()) {
+  if (array2.isNullAt(i)) {
+notFoundNullElement = false
+  } else {
+val assigned = if (!isLongType) {
+  hsInt.add(array2.getInt(i))
+} else {
+  hsLong.add(array2.getLong(i))
+}
+  }
+  i += 1
+}
+var pos = 0
+i = 0
+while (i < array1.numElements()) {
+  if (array1.isNullAt(i)) {
+if (notFoundNullElement) {
+  if (resultArray != null) {
+resultArray.setNullAt(pos)
+  }
+  pos += 1
+  notFoundNullElement = false
+}
+  } else {
+val assigned = if (!isLongType) {
+  assignInt(array1, i, resultArray, pos)
+} else {
+  assignLong(array1, i, resultArray, pos)
+}
+if (assigned) {
+  pos += 1
+}
+  }
+  i += 1
+}
+pos
+  }
+
+  val exceptEquals: (ArrayData, ArrayData) => ArrayData = {
--- End diff --

Maybe:

```scala
val exceptEquals: (ArrayData, ArrayData) => ArrayData = {
  if (elementTypeSupportEquals) {
elementType match {
  case IntegerType =>
(array1, array2) =>
  ...
  case LongType =>
(array1, array2) =>
  ...
  ...
  else {
(array1, array2) =>
  ...
  }
}
```

to avoid the per-row pattern match.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause

2018-07-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21886
  
**[Test build #93651 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93651/testReport)**
 for PR 21886 at commit 
[`7268736`](https://github.com/apache/spark/commit/7268736897885c26b65c459056d5c0a7bae5fedf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21886
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1391/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21886
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause

2018-07-26 Thread dilipbiswal

Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/21886
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21888: [SPARK-24253][SQL][WIP] Implement DeleteFrom for v2 tabl...

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21888
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21888: [SPARK-24253][SQL][WIP] Implement DeleteFrom for v2 tabl...

2018-07-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21888
  
**[Test build #93650 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93650/testReport)**
 for PR 21888 at commit 
[`f8b178d`](https://github.com/apache/spark/commit/f8b178d34b870e779ec061175f01ba63a5adc076).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class UnresolvedRelation(table: CatalogTableIdentifier) extends 
LeafNode with NamedRelation `
  * `sealed trait IdentifierWithOptionalDatabaseAndCatalog `
  * `case class CatalogTableIdentifier(table: String, database: 
Option[String], catalog: Option[String])`
  * `class TableIdentifier(name: String, db: Option[String])`
  * `  implicit class CatalogHelper(catalog: CatalogProvider) `
  * `case class ResolveCatalogV2Relations(sparkSession: SparkSession) 
extends Rule[LogicalPlan] `
  * `case class DeleteFromV2Exec(rel: TableV2Relation, expr: Expression)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21888: [SPARK-24253][SQL][WIP] Implement DeleteFrom for v2 tabl...

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21888
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93650/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21837: [SPARK-24881][SQL] New Avro option - compression

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21837
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93634/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21837: [SPARK-24881][SQL] New Avro option - compression

2018-07-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21837
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 674 matches

Mail list logo